Loading info files



I haven't looked at your changes yet, but I just wanted to let you know
that someone else had done some work on parsing the info files.  I can't
remember who it was, but it's not available in the current git repo, but in
some clone.  There should be messages about it.  IIRC, it was very fast,
but used cl-ppcre which didn't run with gcl.

Also, historically, gcl used to parse the info files directly using its
builtin regex engine.  That was specific to gcl and didn't work for any
other lisp.  I think it was replaced with a very simple equivalent using
the portable nregex package.  This was somewhat slow.  Then Robert replaced
that with the current perl script.


On Thu, Mar 21, 2013 at 5:59 PM, Rupert Swarbrick <rswarbrick at gmail.com>wrote:

> I've spent more time than is probably healthy watching Maxima
> compilations over the last few days and one thing that was driving me up
> the wall was the 10 seconds or so my machine took to build
> maxima-index.lisp in the documentation directory.
>
> Looking more closely, I saw that this took so long because of an
> inefficiency in the perl script (reading in files loads of times instead
> of keeping a line-number <-> byte offset list). But I thought we were
> "doing it wrong". After all, reading in the offsets for a couple of info
> files should be quick, right? And, in which case, surely we should do it
> at run time? (Also my perl-fu is severely lacking, so this gave me an
> excuse to tackle the problem in a language I speak more fluently)
>
> A few hundred lines of lisp later... and I have a prototype for people
> to throw things at. I'm pushing it to Sourceforge as the branch
> "parse-info" (since that's what it does). Basically, the code runs the
> first time someone uses "?" or "??" and grabs all the offsets from the
> relevant info files then. On my machine, the slowest implementation is
> ECL, which then takes about 0.6-0.7 seconds to load everything in. I
> hope this is an acceptable wait... For boring performance details, see
> the commit log for the last patch in the series.
>
> This is sort of orthogonal to Robert's work from last December to allow
> multiple info directories, and I think it should work properly with
> it. I have tested pretty exhaustively with the English documentation,
> comparing tables of offsets with those generated by the perl script
> etc. and I'm reasonably confident that the code is correct. (Realism
> interjects: My money is on someone finding a bug before I get up
> tomorrow morning)
>
> I haven't carefully tested the documents with other languages
> yet. That's a job for the weekend or early next week, I guess, but I
> thought I'd see if someone would kick the tyres for me...
>
> A note about byte offsets vs. char offsets: The perl script was very
> careful to compute and store byte offsets. We then used "file-position"
> on the lisp side (with a character stream) to get to the relevant point
> in the file. I haven't carefully checked, but I presume that there are
> lisps where this did the wrong thing. For example, on SBCL it seems that
> file-position counts characters rather than bytes. With the new code, we
> compute file-position and then use file-position (on the same
> implementation!), so we needn't care about what it represents, just that
> it's a monotonic function of, well, the file position(!). I think this
> is probably an improvement.
>
> Comments anyone? I didn't think I should commit this without both
> testing the non-English documentation and getting at least some
> feedback. I can do the first, but talking to myself would probably be a
> bad sign, so I'd appreciate help with the second! :-)
>
>
> Rupert
>
>
> PS  For lisp gurus (or at least semi-gurus):
>
>     If you look at the last patch in the series, you'll see a patch that
>     changes something looking like
>
>        (find-if (lambda (line-num) (<= start line-num end)) ses :key #'car)
>
>     to the dolist form:
>
>        (dolist (se ses nil) (when (<= start (car se) end) (return se)))
>
>     This gives about a factor of 10 performance improvement on GCL, so
>     was kind of needed. But, in between muttering imprecations about
>     certain lisp implementations, I noticed that this also actually sped
>     up the code on at least SBCL as well. Is there a reason that a
>     compiler has to emit slower code for the find-if version?
>
> _______________________________________________
> Maxima mailing list
> Maxima at math.utexas.edu
> http://www.math.utexas.edu/mailman/listinfo/maxima
>
>