Initial- and boundary-value problems in Maxima.



There are a few issues here.

1. Given that fortran does not do garbage collection at all in colnew, 
one wonders why lisp needs to do this.
Probably because intermediate data/ floats are allocated and returned. 
This GC itself is a waste of time  but more importantly it suggests that 
memory is unnecessarily being allocated and freed. Perhaps a good 
compiler can change this so all computations are done in registers.
That is, the fact that there is gc suggests a lot of other time could be 
saved.

2. for CMUCL and SBCL it may be very critical to have the lisp 
declarations correct. In my experience (with Allegro CL) there are 
sometimes very subtle improvements in declarations that make the program 
run much faster. Looking at the diagnostics from the compiler to see 
what types are unknown and what is open-coded can help. I even sometimes 
look at the assembly code that is generated.

3. If 50% of the time is spent in colnew, then we can't speed up the 
program by more than 2X, even if colnew is free. If the current program 
is 100X slower, we would only get it to be 50X slower. We have to 
address the maxima/lisp/setup code to get better than 50X.

4. I think that writing the linpack programs in idiomatic efficient 
common lisp may require a higher-level look at the code.  For example, 
instead of doing fancy indexing, it may be faster to copy a submatrix to a
freshly-allocated data structure.  For DAXPY, which does essentially   
y[i]= a*x[i]+y[i],
maybe not.  Guessing at what a lisp compiler might do, it is hard to say 
what is optimal, but

(defun daxpy(n da dx incx dy incy)
(declare ....) ;;  important

  (if (= incx incy 1) (loop for k from 1 to n do (incf (aref dy k)(* da 
(aref dx k))) ;; common case

     ;; do something slightly fancier here if increment steps are not both 1
))


This code above, or something like it,  would work if x,y are simple 
arrays.  If they are cross-sections of a 2-d array, it would be 
trickier.  One issue is whether "incf" computes the location of y[i]  
once or twice.

Anyway, a version of daxpy in lisp should be about 5 lines of code.

RJF