more about build-index+cl-ppcre branch & encodings
Subject: more about build-index+cl-ppcre branch & encodings
From: Robert Dodier
Date: Wed, 2 Mar 2011 14:29:12 -0700
OK, when I launch xterm with
LC_ALL=foo LANG=foo xterm
and then run Maxima 5.21.1 in that, describe text
(titles and content) is displayed correctly in both
ISO-8859 and UTF-8 locales.
What was Ray's original proposal? I don't remember.
At any rate, it occurs to me now that it seems possible to use
CL-PPCRE to construct the index, but use the existing
code to display stuff. The one wrinkle is that the existing
index has a byte offset + character length (i.e. not both byte counts
nor both character counts). That's to accomodate Lisp -- FILE-POSITION wants
a byte count, and READ wants a character count.
FWIW
Robert Dodier
On 3/2/11, Leo Butler <l.butler at ed.ac.uk> wrote:
>
>
> On Wed, 2 Mar 2011, Robert Dodier wrote:
>
> < I've updated my sandbox to revision 9c49048 and built Maxima.
> < I'm seeing the same behavior today as I did a day or two ago;
> < titles & content is displayed correctly in ISO-8859 locales,
> < in UTF-8 locales, titles are correct and content is messed up.
> <
> < I guess that the encoding for the content is set incorrectly.
> < I don't know how the encoding for the titles could be correct
> < and the content incorrect.
>
> Because they use differenct functions to write their output.
> The output to *standard-output* is being written with the
> wrong encoding for you (but not me). Could you try Ray's
> cmucl fix, please.
>
> <
> < As it happens, the code for the existing describe system
> < in src/cl-info.lisp doesn't bother with encodings at all;
> < it falls on the Lisp implementation to figure out the encoding.
> < That scheme displays titles & content correctly in ISO-8859
> < and UTF-8 locales so far as I know.
>
> It would be nice if you would test this supposition, so we
> can know for certain.
>
> < That suggests that the encoding stuff in src/build-index.lisp
> < could be simplified. Just a guess.
>
> And now we go full circle back to Ray's initial idea.
>
> Leo
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>