Notes on implementation style



As Fateman points out, the attitude towards efficiency was very
different when the Mathlab/Macsyma project started than it is now.  I've
written up a few notes which may help explain some of the features of
the code.

Macsyma was originally implemented on the time-shared DEC PDP-6/10 at
the Artificial Intelligence Lab at M.I.T.  This machine has 36-bit words
and 18-bit word addresses and runs at roughly 500,000
instructions/second.  Starting in 1968 (?), it had 256kWords of memory
(approx 1MB) starting in 1968(?).  (I won't go into more detail on the
hardware and OS here.)  I don't remember how much disk space there was,
but not much; through the 1960's, people were still storing their
personal files on DECtape (300kB/tape).  Core memory was scarce enough
that files were usually edited sequentially in Teco, one page at a time.

The AI PDP-10 supported about 5-15 *active* users at a time (though
20-50 might be logged in).  A Lisp process with Macsyma loaded took a
minimum of about 70kW to run, and slowed down everyone else on the
system, so Macsyma was not run very often.  The core modules of Macsyma
(evaluator, general simplifier, rational expressions, indefinite and
definite integration, differentiation, parser, 2D display, user
programming language, matrices, solve, limit, etc.) were all originally
developed under this exiguous system.

In 1972 (?), the Macsyma group moved to its own PDP-10 (MIT-ML) with
256kW and then 512kW, and later (1975?) to MIT-MC, which was faster and
bigger, but still limited to 18-bit addressing.  The OS was enhanced to
support sharing of read-only pages (code and data), and it became
feasible to run 3-5 simultaneous Macsyma processes (on ML) then perhaps
8-10 (on MC).

The underlying MacLisp system was written in PDP-10 assembly code.
Standard Lisp utility functions (APPEND, REVERSE, etc.) were all
hand-coded, and carefully bummed.  We were pleased, for example, to get
the inner loop of NREVERSE down to 3 instructions from 6.

Data representations were very compact.  A Lisp pointer was an 18-bit
word address, and a CONS cell was a word of two pointers.  Integer
(fixnum) and floating-point (flonum, 27-bit mantissa) numbers took one
word (except for small fixnums...).  Early on, a symbol was equivalent
to a plist (including value, function, and print-name parts); later, it
became a more compact and efficient block of words.  The bignum and
string datatypes came a bit later (1974?), which explains why some code
uses lists of letters rather than strings.

Compiled code was also compact and fast.  A zero-argument function
call/return was two instructions (PUSHJ / POPJ).
CAR/CDR/RPLACA/RPLACD/NULL were one instruction each.  Though common
subexpressions were in general not optimized, they were for CAR/CDR
chains.  The compiler made careful use of registers.  When the number
compiler was introduced, fixnum and flonum code was comparable to PDP-10
Fortran, though array handling was never anywhere near Fortran levels.

Lisp programmers were careful about saving space, by using short symbol
names, sharing symbols for more than one purpose (both value and
function), making sure to share list structure as much as possible,
reusing temporary variable names, etc.  This often makes code hard to
understand.  Even so, because of address space limitations, it was never
possible to load the entire Macsyma system into memory at one time on
the PDP-10, and many calculations failed because they ran out of memory.
So the emphasis on small code was not just aesthetic.

Of course, the address restrictions disappeared on Multics Maclisp,
Franz Lisp, and the Lisp Machine Lisp, but our Maxima code base (DOE
Macsyma) was almost all developed under PDP-10 Maclisp, so with intense
awareness of address space limitations.

For your reading pleasure, I've included a simple Lisp function with its
compiled code in MacLisp (PDP-10 assembler), Emacs Lisp (bytecodes), and
GCL (C and x86 assembler). The GCL code seems awfully long....

       -s

;;; An iterative Copy function
;;;
;;; The recursive version is about as fast, but could run out of stack
space.

;;; This is a bummed version (for sport)

(defun copyi (l)
  (if l
      (let* ((result (list (car l))))
        (prog1 result
          (while (setq l (cdr l))
            (setq result (rplacd result (cons (car l) nil))))))))


;;; MacLisp on PDP-10 circa 1981
;;; (compiled on its.svensson.com, an emulated ITS system)
;;; In Lisp Assembly Program (LAP) format

(LAP COPYI SUBR)
(ARGS COPYI (()  . 1))
        (PUSH P 1)
        (JUMPE 1 G0005)
        (HLRZ 1 0 1)
        (JSP T %NCONS)
        (PUSH P 1)
        (PUSH P 1)
G0003   (HRRZ 1 @ -2 P)
        (MOVEM 1 -2 P)
        (JUMPE 1 G0009)
        (HLRZ 1 0 1)
        (JSP T %NCONS)
        (HRRM 1 @ -1 P)
        (MOVE 5 -1 P)
        (MOVEM 5 -1 P)
        (JRST 0 G0003)
G0009   (POP P 1)
        (SUB P (% 0 0 1 1))
G0005   (SUB P (% 0 0 1 1))
        (POPJ P)
        ()

;;; Same thing, in standard PDP-10 assembler format

COPYI:  PUSH P,1
        JUMPE 1,G0005
        HLRZ 1,(1)
        JSP T,%NCONS
        PUSH P,1
        PUSH P,1
G0003:  HRRZ 1,@-2(P)
        MOVEM 1,-2(P)
        JUMPE 1,G0009
        HLRZ 1,(1)
        JSP T,%NCONS
        HRRM 1,@-1(P)
        MOVE 5,-1(P)
        MOVEM 5,-1(P)
        JRST G0003
G0009:  POP P,1
        SUB P,[1,,1]
G0005:  SUB P,[1,,1]
        POPJ P,

;;; Emacs Lisp circa 2000
byte code for copyi:
  args: (l)
0       varref    l
1       goto-if-nil-else-pop 3
4       varref    l
5       car
6       list1
7       dup
8       varbind   result
9:1     varref    l
10      cdr
11      dup
12      varset    l
13      goto-if-nil 2
16      varref    result
17      varref    l
18      car
19      constant  nil
20      cons
21      setcdr
22      varset    result
23      goto      1
26:2    unbind    1
27:3    return

;;; GNU Common Lisp, circa 2002

;;; Header file

static L1();
#define VC1 object  V4;
#define VM1 4
static char * VVi[2]={
#define Cdata VV[1]
(char *)(L1)
};
#define VV ((object *)VVi)
static  LnkT0() ;
static  (*Lnk0)() = LnkT0;

;;; Code file

/*      function definition for COPYI   */

static L1()
{
  register object *base=vs_base;
  register object *sup=base+VM1;
  VC1 vs_check;
  {
    register object V1;
    V1=(base[0]);
    vs_top=sup;
  TTL:
    if(V1==Cnil) { goto T2; }
    {
      object V2;
      V2= make_cons(CMPcar(V1),Cnil);
      {
        object V3;
        V3= V2;
        V1= CMPcdr(V1);
        base[2]= V1;
        V4= make_cons(CMPcar(V1),Cnil);
        (V2)->c.c_cdr = /* INLINE-ARGS */ V4;
        base[3]= V2;
        vs_top=(vs_base=base+2)+2;
        (void) (*Lnk0)();
        vs_top=sup;
        base[2]= (V3);
        vs_top=(vs_base=base+2)+1;
        return;
      }
    }
  T2:
    base[1]= Cnil;
    vs_top=(vs_base=base+1)+1;
    return;
  }
}

static LnkT0()
{
  call_or_link(VV[0],&Lnk0);
} /* WHILE */

;;; Assembler -S -O4

.file   "gazonk0.c"

gcc2_compiled.:
___gnu_compiled_c:

.data
        .align 4

_VVi:   .long _L1
        .space 4
        .align 4
_Lnk0:  .long _LnkT0

.text
        .align 4
        .def    _L1;    .scl    3;      .type   32;     .endef

_L1:    pushl %ebp
        movl %esp,%ebp
        subl $28,%esp
        pushl %edi
        pushl %esi
        pushl %ebx
        movl _vs_base,%esi
        leal 16(%esi),%eax
        movl %eax,-4(%ebp)
        movl _vs_limit,%eax
        cmpl %eax,_vs_top
        jb L12
        call _vs_overflow
L12:    movl (%esi),%ebx
        movl -4(%ebp),%eax
        movl %eax,_vs_top
        cmpl $_Cnil_body,%ebx
        je L15
        addl $-8,%esp
        pushl $_Cnil_body
        movl 8(%ebx),%eax
        pushl %eax
        call _make_cons
        movl 4(%ebx),%ebx
        addl $-8,%esp
        movl %ebx,8(%esi)
        movl %eax,%edi
        pushl $_Cnil_body
        movl 8(%ebx),%eax
        pushl %eax
        leal 8(%esi),%ebx
        call _make_cons
        movl %eax,4(%edi)
        addl $32,%esp
        movl %edi,12(%esi)
        movl %ebx,_vs_base
        movl -4(%ebp),%eax
        movl %eax,_vs_top
        movl _Lnk0,%eax
        call *%eax
        movl -4(%ebp),%eax
        movl %eax,_vs_top
        movl %edi,8(%esi)
        addl $12,%esi
        movl %ebx,_vs_base
        jmp L17
        .align 4
L15:    movl %ebx,4(%esi)
        leal 4(%esi),%eax
        addl $8,%esi
        movl %eax,_vs_base
L17:    movl %esi,_vs_top
        leal -40(%ebp),%esp
        popl %ebx
        popl %esi
        popl %edi
        movl %ebp,%esp
        popl %ebp
        ret
        .align 4
        .def    _LnkT0; .scl    3;      .type   32;     .endef
_LnkT0: pushl %ebp
        movl %esp,%ebp
        subl $8,%esp
        addl $-8,%esp
        pushl $_Lnk0
        movl _VVi,%eax
        pushl %eax
        call _call_or_link
        movl %ebp,%esp
        popl %ebp
        ret

        .align 4
.globl _init_code
        .def    _init_code;     .scl    2;      .type   32;     .endef

_init_code:
        pushl %ebp
        movl %esp,%ebp
        subl $8,%esp
        addl $-12,%esp
        pushl $_VVi
        call _do_init
        movl %ebp,%esp
        popl %ebp
        ret

        .def    _call_or_link;  .scl    2;      .type   32;     .endef
        .def    _make_cons;     .scl    2;      .type   32;     .endef
        .def    _vs_overflow;   .scl    2;      .type   32;     .endef
        .def    _do_init;       .scl    2;      .type   32;     .endef