more about build-index+cl-ppcre branch & encodings
Subject: more about build-index+cl-ppcre branch & encodings
From: Raymond Toy
Date: Wed, 02 Mar 2011 22:08:52 -0500
On 3/2/11 5:59 PM, Douglas Crosher wrote:
>
> Using a byte offset to position in a character file, exploiting broken
> implementations of 'file-offset, does not seem a good approach. At
> any time the broken implementations could correct 'file-offset and
> Maxima would then look up the wrong location.
>
> The SCL, and it would also seem CMUCL, do correctly position in a
> character file so currently return text from the wrong location.
Are you using the 8-bit version of cmucl or the unicode version? The
unicode version fails your test, which is what I was expecting. (Took
me a while to test this because I didn't build the es.utf8 files, and
then there were a couple of bugs in the build system itself that I had
to fix.)
I'm not sure what the 8-bit version will do with utf-8 encodings that
don't fit in an 8-bit character.
Ray