Subject: Initial- and boundary-value problems in Maxima.
From: Richard Fateman
Date: Thu, 15 Sep 2011 10:25:57 -0700
There are a few issues here.
1. Given that fortran does not do garbage collection at all in colnew,
one wonders why lisp needs to do this.
Probably because intermediate data/ floats are allocated and returned.
This GC itself is a waste of time but more importantly it suggests that
memory is unnecessarily being allocated and freed. Perhaps a good
compiler can change this so all computations are done in registers.
That is, the fact that there is gc suggests a lot of other time could be
saved.
2. for CMUCL and SBCL it may be very critical to have the lisp
declarations correct. In my experience (with Allegro CL) there are
sometimes very subtle improvements in declarations that make the program
run much faster. Looking at the diagnostics from the compiler to see
what types are unknown and what is open-coded can help. I even sometimes
look at the assembly code that is generated.
3. If 50% of the time is spent in colnew, then we can't speed up the
program by more than 2X, even if colnew is free. If the current program
is 100X slower, we would only get it to be 50X slower. We have to
address the maxima/lisp/setup code to get better than 50X.
4. I think that writing the linpack programs in idiomatic efficient
common lisp may require a higher-level look at the code. For example,
instead of doing fancy indexing, it may be faster to copy a submatrix to a
freshly-allocated data structure. For DAXPY, which does essentially
y[i]= a*x[i]+y[i],
maybe not. Guessing at what a lisp compiler might do, it is hard to say
what is optimal, but
(defun daxpy(n da dx incx dy incy)
(declare ....) ;; important
(if (= incx incy 1) (loop for k from 1 to n do (incf (aref dy k)(* da
(aref dx k))) ;; common case
;; do something slightly fancier here if increment steps are not both 1
))
This code above, or something like it, would work if x,y are simple
arrays. If they are cross-sections of a 2-d array, it would be
trickier. One issue is whether "incf" computes the location of y[i]
once or twice.
Anyway, a version of daxpy in lisp should be about 5 lines of code.
RJF