error trying to build de documentation




On Mon, 6 Dec 2010, Raymond Toy wrote:

< On 12/6/10 11:37 AM, Leo Butler wrote:
< >
< > On Mon, 6 Dec 2010, Raymond Toy wrote:
< >
< > < On 12/6/10 1:10 AM, Robert Dodier wrote:
< > < > Yeah, I see the problem with the incorrect indexing too.
< > < > Could be looking in the correct file at the incorrect offset,
< > < > or the incorrect file at the correct offset, or
< > < > both the file and offset are incorrect. I didn't
< > < > look at it carefully.
< > < I don't read perl very well, but could the problem be that
< > < build-index.pl is reading the info files with a utf-8 encoding?  This is
< > < the right encoding, but won't that totally mess up the index in
< > < maxima-index.lisp?  I'm pretty sure the indices in maxima-index.lisp are
< > < octet offsets, not character offsets.
< >  
< >  I was inclined to believe this, but I don't think the problem is here.
< >  I re-wrote the build_index.pl to use the right encoding (and speed it
< >  up), but this doesn't affect the problem.
< >
< >  Indeed, if you open maxima.info-1 in an emacs buffer, put point at
< >  (point-min) and (goto-char 288618), you will arrive in the middle
< >  of the `expand' documentation. So the char vs. byte counts are quite
< >  close. Accessing online help for `expand'
< >  puts you in the midst of the docstring for `example'.
< But, from looking at read-info-text in cl-info.lisp, the octet count has
< to be exact because read-info-text moves to the exact offset in the file
< and reads some number of octets.  So, close isn't enough.  From tracing
< read-info-text on "? expand", the offset is 33623, but the documentation
< for expand starts at offset 288346.
< 
< (33623 was obtained from maxima-index.lisp.)

Lest there be confusion, the offsets I quote above are for the info
files in de/, determined by the modified build_index.pl. I believe that
these are correct, based on the above experiment. 
The offset determined by the original perl script,
336623, puts you past EOF, as the documentation for expand
begins at about 95%*file size. I expect that the behaviour of
file-position is then implementation dependent.

Let me note that the re-written script reproduces the default
maxima-index.lisp produced by the unmodified build_index.pl.
This makes me more confident that its offsets for the de
documentation are correct.

< 
< So calling read-info-text with the correct offset produces the correct
< documentation (more or less).
< >  Even more peculiarly, ? expandwrt displays the same string as ? expand,
< >  but the offsets differ. 
< Because maxima-index.lisp says the offsets are the same.

Yes, maybe for the old perl script, but not for the re-written one. The
offsets are genuinely different. 

< >  
< >  Based on all this, I tend to think the problem lies in the lisp
< >  function reading the info files.
< You are also correct about this.  read-info-text opens the file with 
< some default encoding.   I'm not exactly sure what file-position does in
< various lisps for encoded files.  If file-position moves the to the
< specified octet, then that's ok.  But then we use read-sequence.  
< Read-sequence doesn't support any kind of encoding, so the returned
< string will probably be messed up.
< 
< I think what we need to do here is open the file as a binary file of
< octets, move to the correct offset and read in the desired number of
< octets into an array.  Then this array needs to be converted to a string
< using the correct encoding.  (Most lisps have some kind of
< octets-to-string function.)

Yes, this is more or less the conclusion that I had stumbled upon.
The HS manual explains that file-position treats the file as if
it were opened with :element-type '(unsigned-byte 8) (or ignores
how it was opened, take you pick).

http://www.lispworks.com/documentation/HyperSpec/Body/f_file_p.htm

So, to summarise: build_index.pl needs correcting (and I think I
have corrected it), and so does read-info-text.

Leo

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.