Loading info files



I've spent more time than is probably healthy watching Maxima
compilations over the last few days and one thing that was driving me up
the wall was the 10 seconds or so my machine took to build
maxima-index.lisp in the documentation directory.

Looking more closely, I saw that this took so long because of an
inefficiency in the perl script (reading in files loads of times instead
of keeping a line-number <-> byte offset list). But I thought we were
"doing it wrong". After all, reading in the offsets for a couple of info
files should be quick, right? And, in which case, surely we should do it
at run time? (Also my perl-fu is severely lacking, so this gave me an
excuse to tackle the problem in a language I speak more fluently)

A few hundred lines of lisp later... and I have a prototype for people
to throw things at. I'm pushing it to Sourceforge as the branch
"parse-info" (since that's what it does). Basically, the code runs the
first time someone uses "?" or "??" and grabs all the offsets from the
relevant info files then. On my machine, the slowest implementation is
ECL, which then takes about 0.6-0.7 seconds to load everything in. I
hope this is an acceptable wait... For boring performance details, see
the commit log for the last patch in the series.

This is sort of orthogonal to Robert's work from last December to allow
multiple info directories, and I think it should work properly with
it. I have tested pretty exhaustively with the English documentation,
comparing tables of offsets with those generated by the perl script
etc. and I'm reasonably confident that the code is correct. (Realism
interjects: My money is on someone finding a bug before I get up
tomorrow morning)

I haven't carefully tested the documents with other languages
yet. That's a job for the weekend or early next week, I guess, but I
thought I'd see if someone would kick the tyres for me...

A note about byte offsets vs. char offsets: The perl script was very
careful to compute and store byte offsets. We then used "file-position"
on the lisp side (with a character stream) to get to the relevant point
in the file. I haven't carefully checked, but I presume that there are
lisps where this did the wrong thing. For example, on SBCL it seems that
file-position counts characters rather than bytes. With the new code, we
compute file-position and then use file-position (on the same
implementation!), so we needn't care about what it represents, just that
it's a monotonic function of, well, the file position(!). I think this
is probably an improvement.

Comments anyone? I didn't think I should commit this without both
testing the non-English documentation and getting at least some
feedback. I can do the first, but talking to myself would probably be a
bad sign, so I'd appreciate help with the second! :-)


Rupert


PS  For lisp gurus (or at least semi-gurus):

    If you look at the last patch in the series, you'll see a patch that
    changes something looking like

       (find-if (lambda (line-num) (<= start line-num end)) ses :key #'car)

    to the dolist form:

       (dolist (se ses nil) (when (<= start (car se) end) (return se)))

    This gives about a factor of 10 performance improvement on GCL, so
    was kind of needed. But, in between muttering imprecations about
    certain lisp implementations, I noticed that this also actually sped
    up the code on at least SBCL as well. Is there a reason that a
    compiler has to emit slower code for the find-if version?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 315 bytes
Desc: not available
URL: <http://www.math.utexas.edu/pipermail/maxima/attachments/20130322/192f43cc/attachment.pgp>;