On 12/6/10 11:37 AM, Leo Butler wrote:
>
> On Mon, 6 Dec 2010, Raymond Toy wrote:
>
> < On 12/6/10 1:10 AM, Robert Dodier wrote:
> < > Yeah, I see the problem with the incorrect indexing too.
> < > Could be looking in the correct file at the incorrect offset,
> < > or the incorrect file at the correct offset, or
> < > both the file and offset are incorrect. I didn't
> < > look at it carefully.
> < I don't read perl very well, but could the problem be that
> < build-index.pl is reading the info files with a utf-8 encoding? This is
> < the right encoding, but won't that totally mess up the index in
> < maxima-index.lisp? I'm pretty sure the indices in maxima-index.lisp are
> < octet offsets, not character offsets.
>
> I was inclined to believe this, but I don't think the problem is here.
> I re-wrote the build_index.pl to use the right encoding (and speed it
> up), but this doesn't affect the problem.
>
> Indeed, if you open maxima.info-1 in an emacs buffer, put point at
> (point-min) and (goto-char 288618), you will arrive in the middle
> of the `expand' documentation. So the char vs. byte counts are quite
> close. Accessing online help for `expand'
> puts you in the midst of the docstring for `example'.
But, from looking at read-info-text in cl-info.lisp, the octet count has
to be exact because read-info-text moves to the exact offset in the file
and reads some number of octets. So, close isn't enough. From tracing
read-info-text on "? expand", the offset is 33623, but the documentation
for expand starts at offset 288346.
(33623 was obtained from maxima-index.lisp.)
So calling read-info-text with the correct offset produces the correct
documentation (more or less).
> Even more peculiarly, ? expandwrt displays the same string as ? expand,
> but the offsets differ.
Because maxima-index.lisp says the offsets are the same.
>
> Based on all this, I tend to think the problem lies in the lisp
> function reading the info files.
You are also correct about this. read-info-text opens the file with
some default encoding. I'm not exactly sure what file-position does in
various lisps for encoded files. If file-position moves the to the
specified octet, then that's ok. But then we use read-sequence.
Read-sequence doesn't support any kind of encoding, so the returned
string will probably be messed up.
I think what we need to do here is open the file as a binary file of
octets, move to the correct offset and read in the desired number of
octets into an array. Then this array needs to be converted to a string
using the correct encoding. (Most lisps have some kind of
octets-to-string function.)
Do this make sense to you?
Ray