When I wrote the perl script to compute the offsets,
there was some Lisp-inspired craziness about byte offsets vs character offsets.
I seem to recall that the offsets are character offsets,
but there something else which is a byte count ... maybe the
amount of stuff to read is a number of bytes, not a number of characters.
I'm pretty sure I did try reading some files with multibyte characters
(Spanish & Portuguese, I guess) so I wouldn't throw out the
existing offset/count stuff just yet. Unfortunately I won't
have time to investigate for a few days, maybe someone else
can take a look at it.
best, Robert Dodier
On 12/6/10, Raymond Toy <toy.raymond at gmail.com> wrote:
> On 12/6/10 11:37 AM, Leo Butler wrote:
>>
>> On Mon, 6 Dec 2010, Raymond Toy wrote:
>>
>> < On 12/6/10 1:10 AM, Robert Dodier wrote:
>> < > Yeah, I see the problem with the incorrect indexing too.
>> < > Could be looking in the correct file at the incorrect offset,
>> < > or the incorrect file at the correct offset, or
>> < > both the file and offset are incorrect. I didn't
>> < > look at it carefully.
>> < I don't read perl very well, but could the problem be that
>> < build-index.pl is reading the info files with a utf-8 encoding? This is
>> < the right encoding, but won't that totally mess up the index in
>> < maxima-index.lisp? I'm pretty sure the indices in maxima-index.lisp are
>> < octet offsets, not character offsets.
>>
>> I was inclined to believe this, but I don't think the problem is here.
>> I re-wrote the build_index.pl to use the right encoding (and speed it
>> up), but this doesn't affect the problem.
>>
>> Indeed, if you open maxima.info-1 in an emacs buffer, put point at
>> (point-min) and (goto-char 288618), you will arrive in the middle
>> of the `expand' documentation. So the char vs. byte counts are quite
>> close. Accessing online help for `expand'
>> puts you in the midst of the docstring for `example'.
> But, from looking at read-info-text in cl-info.lisp, the octet count has
> to be exact because read-info-text moves to the exact offset in the file
> and reads some number of octets. So, close isn't enough. From tracing
> read-info-text on "? expand", the offset is 33623, but the documentation
> for expand starts at offset 288346.
>
> (33623 was obtained from maxima-index.lisp.)
>
> So calling read-info-text with the correct offset produces the correct
> documentation (more or less).
>> Even more peculiarly, ? expandwrt displays the same string as ? expand,
>> but the offsets differ.
> Because maxima-index.lisp says the offsets are the same.
>>
>> Based on all this, I tend to think the problem lies in the lisp
>> function reading the info files.
> You are also correct about this. read-info-text opens the file with
> some default encoding. I'm not exactly sure what file-position does in
> various lisps for encoded files. If file-position moves the to the
> specified octet, then that's ok. But then we use read-sequence.
> Read-sequence doesn't support any kind of encoding, so the returned
> string will probably be messed up.
>
> I think what we need to do here is open the file as a binary file of
> octets, move to the correct offset and read in the desired number of
> octets into an array. Then this array needs to be converted to a string
> using the correct encoding. (Most lisps have some kind of
> octets-to-string function.)
>
> Do this make sense to you?
>
> Ray
>
> _______________________________________________
> Maxima mailing list
> Maxima at math.utexas.edu
> http://www.math.utexas.edu/mailman/listinfo/maxima
>