error trying to build de documentation



Here's some commentary from doc/info/build_index.pl.
Make of it what you will.

# (1.1a) Scan the *.info-* files for unit separator characters;
#        those mark the start of each texinfo node.
#        Build a hash table which associates the node name with the filename
#        and byte offset (NOT character offset) of the unit separator.
#
#        Do NOT use the indirect table + tag table (generated by makeinfo),
#        because those tables give character offsets; we want byte offsets.
#        It is easier to construct a byte offset table by hand,
#        rather than attempting to fix up the character offsets.
#        (Which are strange anyway.)

best, Robert Dodier

On 12/6/10, Robert Dodier <robert.dodier at gmail.com> wrote:
> When I wrote the perl script to compute the offsets,
> there was some Lisp-inspired craziness about byte offsets vs character
> offsets.
> I seem to recall that the offsets are character offsets,
> but there something else which is a byte count ... maybe the
> amount of stuff to read is a number of bytes, not a number of characters.
>
> I'm pretty sure I did try reading some files with multibyte characters
> (Spanish & Portuguese, I guess) so I wouldn't throw out the
> existing offset/count stuff just yet. Unfortunately I won't
> have time to investigate for a few days, maybe someone else
> can take a look at it.
>
> best, Robert Dodier
>
> On 12/6/10, Raymond Toy <toy.raymond at gmail.com> wrote:
>> On 12/6/10 11:37 AM, Leo Butler wrote:
>>>
>>> On Mon, 6 Dec 2010, Raymond Toy wrote:
>>>
>>> < On 12/6/10 1:10 AM, Robert Dodier wrote:
>>> < > Yeah, I see the problem with the incorrect indexing too.
>>> < > Could be looking in the correct file at the incorrect offset,
>>> < > or the incorrect file at the correct offset, or
>>> < > both the file and offset are incorrect. I didn't
>>> < > look at it carefully.
>>> < I don't read perl very well, but could the problem be that
>>> < build-index.pl is reading the info files with a utf-8 encoding?  This
>>> is
>>> < the right encoding, but won't that totally mess up the index in
>>> < maxima-index.lisp?  I'm pretty sure the indices in maxima-index.lisp
>>> are
>>> < octet offsets, not character offsets.
>>>
>>>  I was inclined to believe this, but I don't think the problem is here.
>>>  I re-wrote the build_index.pl to use the right encoding (and speed it
>>>  up), but this doesn't affect the problem.
>>>
>>>  Indeed, if you open maxima.info-1 in an emacs buffer, put point at
>>>  (point-min) and (goto-char 288618), you will arrive in the middle
>>>  of the `expand' documentation. So the char vs. byte counts are quite
>>>  close. Accessing online help for `expand'
>>>  puts you in the midst of the docstring for `example'.
>> But, from looking at read-info-text in cl-info.lisp, the octet count has
>> to be exact because read-info-text moves to the exact offset in the file
>> and reads some number of octets.  So, close isn't enough.  From tracing
>> read-info-text on "? expand", the offset is 33623, but the documentation
>> for expand starts at offset 288346.
>>
>> (33623 was obtained from maxima-index.lisp.)
>>
>> So calling read-info-text with the correct offset produces the correct
>> documentation (more or less).
>>>  Even more peculiarly, ? expandwrt displays the same string as ? expand,
>>>  but the offsets differ.
>> Because maxima-index.lisp says the offsets are the same.
>>>
>>>  Based on all this, I tend to think the problem lies in the lisp
>>>  function reading the info files.
>> You are also correct about this.  read-info-text opens the file with
>> some default encoding.   I'm not exactly sure what file-position does in
>> various lisps for encoded files.  If file-position moves the to the
>> specified octet, then that's ok.  But then we use read-sequence.
>> Read-sequence doesn't support any kind of encoding, so the returned
>> string will probably be messed up.
>>
>> I think what we need to do here is open the file as a binary file of
>> octets, move to the correct offset and read in the desired number of
>> octets into an array.  Then this array needs to be converted to a string
>> using the correct encoding.  (Most lisps have some kind of
>> octets-to-string function.)
>>
>> Do this make sense to you?
>>
>> Ray
>>
>> _______________________________________________
>> Maxima mailing list
>> Maxima at math.utexas.edu
>> http://www.math.utexas.edu/mailman/listinfo/maxima
>>
>