Subject: Initial- and boundary-value problems in Maxima.
From: Raymond Toy
Date: Thu, 15 Sep 2011 11:24:34 -0700
On Thu, Sep 15, 2011 at 10:25 AM, Richard Fateman <fateman at eecs.berkeley.edu
> wrote:
> There are a few issues here.
>
> 1. Given that fortran does not do garbage collection at all in colnew, one
> wonders why lisp needs to do this.
> Probably because intermediate data/ floats are allocated and returned. This
> GC itself is a waste of time but more importantly it suggests that memory
> is unnecessarily being allocated and freed. Perhaps a good compiler can
> change this so all computations are done in registers.
> That is, the fact that there is gc suggests a lot of other time could be
> saved.
>
There should be enough declarations in the code so that boxing withing the
function is not done. However, boxing is needed if the function calls
another function. That's unavoidable.
>
>
> 4. I think that writing the linpack programs in idiomatic efficient common
> lisp may require a higher-level look at the code. For example, instead of
> doing fancy indexing, it may be faster to copy a submatrix to a
> freshly-allocated data structure.
>
That's a possibility. Of course, two copies need to be done. Hard to tell
what would be more beneficial unless we know the sizes of the arrays.
>
> (defun daxpy(n da dx incx dy incy)
> (declare ....) ;; important
>
> (if (= incx incy 1) (loop for k from 1 to n do (incf (aref dy k)(* da
> (aref dx k))) ;; common case
>
> ;; do something slightly fancier here if increment steps are not both 1
> ))
>
>
> This code above, or something like it, would work if x,y are simple
> arrays. If they are cross-sections of a 2-d array, it would be trickier.
> One issue is whether "incf" computes the location of y[i] once or twice.
>
I think given current architectures, the cost of computing the address of
y[i] is very small compared to actually getting the value. But that is
highly dependent on the cache.
>
> Anyway, a version of daxpy in lisp should be about 5 lines of code.
>
Yes, but it probably won't be fast in colnew. :-( From a quick grep of the
code, daxpy is called in 3 different places, and each place calls daxpy with
a slice of a larger array. F2cl tries to be smart about this and figures
out the underlying array. That could be expensive, depending on how many
intervening slices have been done. It might be cheaper to just use the
normal displaced-array access.
Ray