utf8 + maxima



   >From mailnull  Wed May 22 20:40:46 2013
   Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of math.utexas.edu designates 146.6.25.7 as permitted sender) client-ip=146.6.25.7; envelope-from=maxima-bounces at math.utexas.edu; helo=ironclad.mail.utexas.edu;
   Date: Wed, 22 May 2013 20:39:26 +0000
   From: Leo Butler <l_butler at users.sourceforge.net>
   CC: <maxima at math.utexas.edu>
   Content-Type: text/plain; charset="utf-8"

      >From mailnull  Wed May 22 18:34:43 2013
      Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of math.utexas.edu designates 146.6.25.7 as permitted sender) client-ip=146.6.25.7; envelope-from=maxima-bounces at math.utexas.edu; helo=ironclad.mail.utexas.edu;
      From: Robert Dodier <robert.dodier at gmail.com>
      Date: Wed, 22 May 2013 18:32:53 +0000
      Content-Type: text/plain; charset="utf-8"

      On 2013-05-22, Leo Butler <l_butler at users.sourceforge.net> wrote:

      > But both ecl and gcl choke,

      Well, ECL should be able to process UTF-8 characters. How did you launch
      it? I'm pretty sure I've tried it with ECL by launching a UTF-8 xterm
      and then executing Maxima + ECL in that and it works fine. Also
      something like 'LANG=foo.UTF-8 maxima -l ecl'.

   That does not work for me with ecl 11.1.1 from the debian testing
   repo.  The issue appears to be with this version of ecl, because, if
   the encoding is set on the command line, ecl barfs.

      > Maxima 5.30.0 http://maxima.sourceforge.net
      > using Lisp GNU Common Lisp (GCL) GCL 2.6.7 (a.k.a. GCL)
      > Distributed under the GNU Public License. See the file COPYING.
      > Dedicated to the memory of William Schelter.
      > The function bug_report() provides bug reporting information.
      > (%i1) ?:1;
      > incorrect syntax: \201 is not an infix operator
      > \317\201
      > ^

      Well, this is understandable -- GCL doesn't see the whole UTF-8
      character, instead a sequence of 2 characters \317 and \201. \317 is
      nonalphabetic according to ALPHA-CHAR-P, therefore it's treated as a
      separate token from the next one (\201), then the parser barfs on \201
      since it's not an operator.

   Ok, the error message explains as much. The point is that the GCL
   reader happily interns a symbol whose symbol-name consists of 2
   characters \317\201:

   >(coerce (symbol-name '?) 'list)

   (#\\317 #\\201)

   So I could hack a "utf8-enabled" Maxima parser by redefining alphabetp
   or *alphabet*, like so

   (%i1) :lisp (setf *alphabet* (append '(#\\317 #\\201) *alphabet*))

   (\317 \201 _ %)
   (%i1) ? : 1;
   (%o1)                                  1


I have written a hack to enable (selected) wide-characters in non-utf8
aware Lisps, as I suggested. I put it on github, I'm not sure if it
merits being put in Maxima's contrib directory.

git clone git://github.com/leo-butler/utf8-hack.git

There is a README with examples. It has worked fine for me with both
gcl and ecl. Unfortunately, the github webserver does not do justice
to the README file, it is best read off-line.

Leo