On 12/6/10 6:29 PM, Leo Butler wrote:
>
>
> On Mon, 6 Dec 2010, Raymond Toy wrote:
>
> < On 12/6/10 11:37 AM, Leo Butler wrote:
> < >
> < > On Mon, 6 Dec 2010, Raymond Toy wrote:
> < >
> < > < On 12/6/10 1:10 AM, Robert Dodier wrote:
> < > < > Yeah, I see the problem with the incorrect indexing too.
> < > < > Could be looking in the correct file at the incorrect offset,
> < > < > or the incorrect file at the correct offset, or
> < > < > both the file and offset are incorrect. I didn't
> < > < > look at it carefully.
> < > < I don't read perl very well, but could the problem be that
> < > < build-index.pl is reading the info files with a utf-8 encoding? This is
> < > < the right encoding, but won't that totally mess up the index in
> < > < maxima-index.lisp? I'm pretty sure the indices in maxima-index.lisp are
> < > < octet offsets, not character offsets.
> < >
> < > I was inclined to believe this, but I don't think the problem is here.
> < > I re-wrote the build_index.pl to use the right encoding (and speed it
> < > up), but this doesn't affect the problem.
> < >
> < > Indeed, if you open maxima.info-1 in an emacs buffer, put point at
> < > (point-min) and (goto-char 288618), you will arrive in the middle
> < > of the `expand' documentation. So the char vs. byte counts are quite
> < > close. Accessing online help for `expand'
> < > puts you in the midst of the docstring for `example'.
> < But, from looking at read-info-text in cl-info.lisp, the octet count has
> < to be exact because read-info-text moves to the exact offset in the file
> < and reads some number of octets. So, close isn't enough. From tracing
> < read-info-text on "? expand", the offset is 33623, but the documentation
> < for expand starts at offset 288346.
> <
> < (33623 was obtained from maxima-index.lisp.)
>
> Lest there be confusion, the offsets I quote above are for the info
> files in de/, determined by the modified build_index.pl. I believe that
I'm using whatever is in CVS. If that's not what you're using, then
we're talking about different things. Where is this modified
build_index.pl?
> these are correct, based on the above experiment.
> The offset determined by the original perl script,
> 336623, puts you past EOF, as the documentation for expand
How did you get 336623? The cvs version produces a maxima-index.lisp
that says 33623.
> Let me note that the re-written script reproduces the default
> maxima-index.lisp produced by the unmodified build_index.pl.
> This makes me more confident that its offsets for the de
> documentation are correct.
What? Your rewritten script produces the same values as build_index.pl
in CVS? I don't seem to be getting the same answers as you. Something
is messed up. Maybe I'm not building things correctly?
>
> <
> < So calling read-info-text with the correct offset produces the correct
> < documentation (more or less).
> < > Even more peculiarly, ? expandwrt displays the same string as ? expand,
> < > but the offsets differ.
> < Because maxima-index.lisp says the offsets are the same.
>
> Yes, maybe for the old perl script, but not for the re-written one. The
> offsets are genuinely different.
>
> < >
> < > Based on all this, I tend to think the problem lies in the lisp
> < > function reading the info files.
> < You are also correct about this. read-info-text opens the file with
> < some default encoding. I'm not exactly sure what file-position does in
> < various lisps for encoded files. If file-position moves the to the
> < specified octet, then that's ok. But then we use read-sequence.
> < Read-sequence doesn't support any kind of encoding, so the returned
> < string will probably be messed up.
> <
> < I think what we need to do here is open the file as a binary file of
> < octets, move to the correct offset and read in the desired number of
> < octets into an array. Then this array needs to be converted to a string
> < using the correct encoding. (Most lisps have some kind of
> < octets-to-string function.)
>
> Yes, this is more or less the conclusion that I had stumbled upon.
> The HS manual explains that file-position treats the file as if
> it were opened with :element-type '(unsigned-byte 8) (or ignores
> how it was opened, take you pick).
>
> http://www.lispworks.com/documentation/HyperSpec/Body/f_file_p.htm
That's not my reading of the spec. It just says read-char may increment
file-position by more than one, and that each read-byte (which may
actually read more than one octet!) increments the file-position by one.
>
> So, to summarise: build_index.pl needs correcting (and I think I
> have corrected it), and so does read-info-text.
Can't say anything about the modified build_index since I don't seem to
have that, but I do agree that read-info-text needs to be modified to
open the file with element-type unsigned-byte 8 and the result of
read-sequence needs to be converted from an array of octets to a string
using the correct encoding. All lisps have some way of doing this, but
I'm not sure maxima currently knows what the encoding of the file is.
Ray