Subject: Initial- and boundary-value problems in Maxima.
From: talon at lpthe.jussieu.fr
Date: Fri, 16 Sep 2011 00:18:27 +0200
Raymond Toy wrote:
> On Thu, Sep 15, 2011 at 11:42 AM, Richard Fateman
>
> Anyway, here is a partial profile, including maxima and colnew functions
> from running prob4.mac using cmucl:
........
>
> Hardly seems worth optimizing colnew since it represents just 10% of the
> time.
This is not at all what i see in the profiles i have, where colnew related
functions use 50% of the time and maxima stuff 50%. I don't know why there
is this discrepancy. But i have 2 profiles, with cmucl and sbcl giving
essentially the same result. If this is correct, there is hardly any hope
to get good performance by tweaking the declarations in the maxima file
prob[34].mac. In the profile i have posted here, in a total of 61 seconds,
the linpack routine dgesl *alone* takes 10 s, and dgefa 4 seconds more (this
includes daxpy which was not profiled). This is totally incompatible with a
total colnew influence of 10%. Moreover these functions are pure lisp
without *any* maxima intervention anywhere, and in the profile of the
fortran version they take a small fraction of a second (the whole program
runs in less than a second). So we are always with this huge factor between
the lisp translation and the fortran version. I agree completely with
professor Fateman when he says that even if we took the option of using
the original fortran program colnew linked in some way into lisp, this would
not be a huge gain, it would only divide the time by 2. This is basically
what occurs in the scilab version of colnew where the same computation
takes a non negligible time, because, while colnew is the fortran version,
you have to make excursions to the scilab interpreter to evaluate the
differential equation, exactly like here where you have to enter maxima.
In the python version, they have introduced several threads which compute
the differential equations at the mesh points simultaneously, so that the
slow part benefits from a multiprocessor machine. Here colnew is the
fortran version, compiled and linked into python. They have modified
LSYSLV so that it is able to absorb values at several mesh points at the
same time instead of successively.
I recall the profile here:
seconds | gc | consed | calls | sec/call | name
------------------------------------------------------------
31.360 | 0.928 | 836,818,968 | 1,064 | 0.029473 |
COLNEW::LSYSLV
16.420 | 0.225 | 349,003,072 | 23,178 | 0.000708 |
COLNEW::VWBLOK
9.829 | 0.937 | 800,942,728 | 52,124 | 0.000189 |
COLNEW::DGESL
4.254 | 0.481 | 318,090,424 | 7,726 | 0.000551 |
COLNEW::DGEFA
------------------------------------------------------------
61.862 | 2.571 | 2,304,855,192 | 84,092 | | Total
Since fsub etc. are not profiled, they are hidden in LSYSLV, which is
consistent with the fact that they eat 30s, half of the time. Of course
this statement is based on the fact that the equations are computed in
LSYSLV (this requires evaluating the differential equations at mesh points)
and solved in VWBLOCK (this goes through some tricks and then using linpack
dgesl and dgefa).
Professor Fateman remarks there is garbage collection in these linpack
routines, which appears both true and puzzling. In the fortran program
there is static allocation for all the arrays, and the same space is always
reused in all the computations. It seems lisp does differently. This being
said the total cost of garbage collection, 2s is negligible with respect to
the 60s, but not to the total running time of the fortran version (< 1 s).
>
> Ray
--
Michel Talon