Next: , Previous: , Up: stringproc   [Contents][Index]

91.3 Characters

Characters are strings of length 1.

Function: adjust_external_format ()

Prints information about the current external format of the Lisp reader and in case the external format encoding differs from the encoding of the application which runs Maxima adjust_external_format tries to adjust the encoding or prints some help or instruction. adjust_external_format returns true when the external format has been changed and false otherwise.

Functions like cint, unicode, octets_to_string and string_to_octets need UTF-8 as the external format of the Lisp reader to work properly over the full range of Unicode characters.

Examples (Maxima on Windows, March 2016): Using adjust_external_format when the default external format is not equal to the encoding provided by the application.

1. Command line Maxima

In case a terminal session is preferred it is recommended to use Maxima compiled with SBCL. Here Unicode support is provided by default and calls to adjust_external_format are unnecessary.

If Maxima is compiled with CLISP or GCL it is recommended to change the terminal encoding from CP850 to CP1252. adjust_external_format prints some help.

CCL reads UTF-8 while the terminal input is CP850 by default. CP1252 is not supported by CCL. adjust_external_format prints instructions for changing the terminal encoding and external format both to iso-8859-1.

2. wxMaxima

In wxMaxima SBCL reads CP1252 by default but the input from the application is UTF-8 encoded. Adjustment is needed.

Calling adjust_external_format and restarting Maxima permanently changes the default external format to UTF-8.

(%i1)adjust_external_format();
The line
(setf sb-impl::*default-external-format* :utf-8)
has been appended to the init file
C:/Users/Username/.sbclrc
Please restart Maxima to set the external format to UTF-8.
(%i1) false

Restarting Maxima.

(%i1) adjust_external_format();
The external format is currently UTF-8
and has not been changed.
(%i1) false
Categories: Package stringproc ·
Function: alphacharp (char)

Returns true if char is an alphabetic character.

To identify a non-US-ASCII character as an alphabetic character the underlying Lisp must provide full Unicode support. E.g. a German umlaut is detected as an alphabetic character with SBCL in GNU/Linux but not with GCL. (In Windows Maxima, when compiled with SBCL, must be set to UTF-8. See adjust_external_format for more.)

Example: Examination of non-US-ASCII characters.

The underlying Lisp (SBCL, GNU/Linux) is able to convert the typed character into a Lisp character and to examine.

(%i1) alphacharp("ü");
(%o1)                          true

In GCL this is not possible. An error break occurs.

(%i1) alphacharp("u");
(%o1)                          true
(%i2) alphacharp("ü");

package stringproc: ü cannot be converted into a Lisp character.
 -- an error.
Function: alphanumericp (char)

Returns true if char is an alphabetic character or a digit (only corresponding US-ASCII characters are regarded as digits).

Note: See remarks on alphacharp.

Function: ascii (int)

Returns the US-ASCII character corresponding to the integer int which has to be less than 128.

See unicode for converting code points larger than 127.

Examples:

(%i1) for n from 0 thru 127 do ( 
        ch: ascii(n), 
        if alphacharp(ch) then sprint(ch),
        if n = 96 then newline() )$
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
a b c d e f g h i j k l m n o p q r s t u v w x y z
Categories: Package stringproc ·
Function: cequal (char_1, char_2)

Returns true if char_1 and char_2 are the same character.

Function: cequalignore (char_1, char_2)

Like cequal but ignores case which is only possible for non-US-ASCII characters when the underlying Lisp is able to recognize a character as an alphabetic character. See remarks on alphacharp.

Function: cgreaterp (char_1, char_2)

Returns true if the code point of char_1 is greater than the code point of char_2.

Function: cgreaterpignore (char_1, char_2)

Like cgreaterp but ignores case which is only possible for non-US-ASCII characters when the underlying Lisp is able to recognize a character as an alphabetic character. See remarks on alphacharp.

Function: charp (obj)

Returns true if obj is a Maxima-character. See introduction for example.

Function: cint (char)

Returns the Unicode code point of char which must be a Maxima character, i.e. a string of length 1.

Examples: The hexadecimal code point of some characters (Maxima with SBCL on GNU/Linux).

(%i1) obase: 16.$
(%i2) map(cint, ["$","£","€"]);
(%o2)                           [24, 0A3, 20AC]

Warning: It is not possible to enter characters corresponding to code points larger than 16 bit in wxMaxima with SBCL on Windows when the external format has not been set to UTF-8. See adjust_external_format.

CMUCL doesn’t process these characters as one character. cint then returns false. Converting a character to a code point via UTF-8-octets may serve as a workaround:

utf8_to_unicode(string_to_octets(character));

See utf8_to_unicode, string_to_octets.

Categories: Package stringproc ·
Function: clessp (char_1, char_2)

Returns true if the code point of char_1 is less than the code point of char_2.

Function: clesspignore (char_1, char_2)

Like clessp but ignores case which is only possible for non-US-ASCII characters when the underlying Lisp is able to recognize a character as an alphabetic character. See remarks on alphacharp.

Function: constituent (char)

Returns true if char is a graphic character but not a space character. A graphic character is a character one can see, plus the space character. (constituent is defined by Paul Graham. See Paul Graham, ANSI Common Lisp, 1996, page 67.)

(%i1) for n from 0 thru 255 do ( 
tmp: ascii(n), if constituent(tmp) then sprint(tmp) )$
! " #  %  ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B
C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c
d e f g h i j k l m n o p q r s t u v w x y z { | } ~
Function: digitcharp (char)

Returns true if char is a digit where only the corresponding US-ASCII-character is regarded as a digit.

Function: lowercasep (char)

Returns true if char is a lowercase character.

Note: See remarks on alphacharp.

Variable: newline

The newline character (ASCII-character 10).

Variable: space

The space character.

Variable: tab

The tab character.

Function: unicode (arg)

Returns the character defined by arg which might be a Unicode code point or a name string if the underlying Lisp provides full Unicode support.

Example: Characters defined by hexadecimal code points (Maxima with SBCL on GNU/Linux).

(%i1) ibase: 16.$
(%i2) map(unicode, [24, 0A3, 20AC]);
(%o2)                            [$, £, €]

Warning: In wxMaxima with SBCL on Windows it is not possible to convert code points larger than 16 bit to characters when the external format has not been set to UTF-8. See adjust_external_format for more information.

CMUCL doesn’t process code points larger than 16 bit. In these cases unicode returns false. Converting a code point to a character via UTF-8 octets may serve as a workaround:

octets_to_string(unicode_to_utf8(code_point));

See octets_to_string, unicode_to_utf8.

In case the underlying Lisp provides full Unicode support the character might be specified by its name. The following is possible in ECL, CLISP and SBCL, where in SBCL on Windows the external format has to be set to UTF-8. unicode(name) is supported by CMUCL too but again limited to 16 bit characters.

The string argument to unicode is basically the same string returned by printf using the "~@c" specifier. But as shown below the prefix "#\" must be omitted. Underlines might be replaced by spaces and uppercase letters by lowercase ones.

Example (continued): Characters defined by names (Maxima with SBCL on GNU/Linux).

(%i3) printf(false, "~@c", unicode(0DF));
(%o3)                    #\LATIN_SMALL_LETTER_SHARP_S
(%i4) unicode("LATIN_SMALL_LETTER_SHARP_S");
(%o4)                                  ß
(%i5) unicode("Latin small letter sharp s");
(%o5)                                  ß
Categories: Package stringproc ·
Function: unicode_to_utf8 (code_point)

Returns a list containing the UTF-8 code corresponding to the Unicode code_point.

Examples: Converting Unicode code points to UTF-8 and vice versa.

(%i1) ibase: obase: 16.$
(%i2) map(cint, ["$","£","€"]);
(%o2)                           [24, 0A3, 20AC]
(%i3) map(unicode_to_utf8, %);
(%o3)                 [[24], [0C2, 0A3], [0E2, 82, 0AC]]
(%i4) map(utf8_to_unicode, %);
(%o4)                           [24, 0A3, 20AC]
Categories: Package stringproc ·
Function: uppercasep (char)

Returns true if char is an uppercase character.

Note: See remarks on alphacharp.

Variable: us_ascii_only

This option variable affects Maxima when the character encoding provided by the application which runs Maxima is UTF-8 but the external format of the Lisp reader is not equal to UTF-8.

On GNU/Linux this is true when Maxima is built with GCL and on Windows in wxMaxima with GCL- and SBCL-builds. With SBCL it is recommended to change the external format to UTF-8. Setting us_ascii_only is unnecessary then. See adjust_external_format for details.

us_ascii_only is false by default. Maxima itself then (i.e. in the above described situation) parses the UTF-8 encoding.

When us_ascii_only is set to true it is assumed that all strings used as arguments to string processing functions do not contain Non-US-ASCII characters. Given that promise, Maxima avoids parsing UTF-8 and strings can be processed more efficiently.

Function: utf8_to_unicode (list)

Returns a Unicode code point corresponding to the list which must contain the UTF-8 encoding of a single character.

Examples: See unicode_to_utf8.

Categories: Package stringproc ·

Next: , Previous: , Up: stringproc   [Contents][Index]