A package to process xml documents



> Fred,
>
> Thanks for the info about your XML package.
> I'd like to know more about the capabilities of this new package
> compared to the existing two XML packages in Maxima.
> I'd be interested to replace one or the other in favor of
> the new package if we know that the new package is a
> superset (not just a different set) of the capabilities of the
> other package or packages.

I suppose you talk about the two MathML packages. The package I made is
more general because it is conformed to the XML 1.1 W3C Recommendation 
( http://www.w3.org/TR/xml11/ ), except that it does not deal with
doctype. It is really interesting for XML input that is more difficult
to do than output. Nevertheless It has currently no MathML capability :
It only builds a LISP list that represents the XML document. Then, the
idea is to modify Paul Wang's MathML import package in a way that it
uses this list instead of the XML file.

> I wonder if you can create a collection of test cases to
> help us compare the new XML package and the old ones.

mathml.xml is a good test case : it contains the basic XML elements
<?xml?>, <!DOCTYPE>, <!-- comment -->, <math
xmlns="http://www.w3.org/1998/Math/MathML">; that Paul's package can not
process. It also contains several way to encode a character : entity
names and the Unicode character "integral" (see below). Other elements
that must be accepted are the attributes, CDATA sections and processing
instructions. I'll see whether I have to add other test cases when you
can use the package.

> I tried to run the example you gave but I ran into errors.
> First MERROR doesn't seem to recognize the ~D format
> directive. (I get this same error in SBCL, Clisp, and GCL
> on Linux.) I replaced all the ~D with ~S to resolve that.

The ~D directive is to display an integer (
http://www.cs.queensu.ca/software_docs/gnudev/gcl-ansi/gcl_1266.html ).
I believe I put ~S at the beginning but it did not return the good line.
A possibility could be to use ~S, but to convert the integer to a string
before.

> Then I get
> Loading the XML document...
> xml error: character not allowed here [line 41]
> Digging into this it appears the character in question
> fails the test (not (is_restricted_char c)) in xml_import.lisp.
> I don't see anything odd about line 41 in mathml.xml.
> Anyway maybe that's something you can look into.

As I wrote in the known issues of xml_import.list "read-char seems to
handle well UTF-8 characters but this could depend on the LISP
implementation...". I suspect the error is actually at the line 21 but
as you change the ~D, the line indicated is wrong. Try to change the
integral symbol by the entity "&int;" for each character to be encoded
in one byte.

> All the best,
> Robert Dodier