more about build-index+cl-ppcre branch & encodings



Hi Ray,

Yes, you are right, I was not using the Unicode version which explains it.  Guess it's just an implementation decision and not 'broken'.

Douglas

On 03/03/11 13:52, Raymond Toy wrote:
> On 3/2/11 5:59 PM, Douglas Crosher wrote:
>>
>> Using a byte offset to position in a character file, exploiting broken
>> implementations of 'file-offset, does not seem a good approach.  At
>> any time the broken implementations could correct 'file-offset and
>> Maxima would then look up the wrong location.
>>
>> The SCL, and it would also seem CMUCL, do correctly position in a
>> character file so currently return text from the wrong location.
> I'm surprised this works in cmucl.  FILE-POSITION in cmucl is the octet
> position, not the character position.  For variable length encodings,
> how do you position by character other than by, more or less, reading
> every character?
>
> Could be a bug in CMUCL.
>>
>> It is frustrating that 'file-position is inconsistent with the number
>> of characters read in some CL implementations.  It would seem like a
>> bug, but is easy to work around.  Leo's code reads the entire file
>> into a string and then extracts characters from the string using a
>> character offset, avoiding 'file-position.  Alternatively are
>> 'read-char loop could be used for broken implementations.
> I have a vague memory that the original info system did read in the
> entire file.  This was the version before Robert created the current system.
>
> Ray
>
> _______________________________________________
> Maxima mailing list
> Maxima at math.utexas.edu
> http://www.math.utexas.edu/mailman/listinfo/maxima
>