Subject: Initial- and boundary-value problems in Maxima.
From: Michel Talon
Date: Tue, 20 Sep 2011 11:50:53 +0200
Raymond Toy wrote:
>
> Corresponding results for cmucl:
>
> Consed | Calls | Secs | Sec/Call | Bytes/C. | Name:
> -----------------------------------------------------------------------
> 8,995,512 | 2,782 | 0.464 | 0.00017 | 3,233 | COLNEW::DGESL
> 15,297,232 | 4,632 | 0.391 | 0.00008 | 3,303 | FSUB
> 12,451,000 | 1,584 | 0.257 | 0.00016 | 7,860 | DFSUB
> 3,271,472 | 396 | 0.179 | 0.00045 | 8,261 | COLNEW::DGEFA
> 3,527,200 | 74 | 0.150 | 0.00203 | 47,665 | COLNEW::LSYSLV
> 2,074,056 | 5,872 | 0.128 | 0.00002 | 353 | COLNEW::APPROX
> 7,418,448 | 1 | 0.100 | 0.10000 | 7,418,448 | COLNEW::CONTRL
> 1,310,232 | 1,594 | 0.067 | 0.00004 | 822 | COLNEW::GBLOCK
> 1,076,160 | 1,584 | 0.047 | 0.00003 | 679 | COLNEW::VWBLOK
> ...
> -------------------------------------------------------------------
> 60,178,656 | 81,952 | 1.903 | | | Total
>
> I think this just basically confirms that cmucl compiles things
> differently
What is strange is that there is a large penalty on dgesl compared to
the compilation with sbcl. One may wonder what may cost so much time
in this basic routine - this may be a hint about the inefficiencies of
the lisp version. Anyways i have checked in the profilings i had done on
prob3, i see the same pattern:
sbcl:
seconds |gc | consed | calls | sec/call | name
------------------------------------------------------------
31.360 | 0.928 | 836,818,968 | 1,064 | 0.029473 | COLNEW::LSYSLV
16.420 | 0.225 | 349,003,072 | 23,178 | 0.000708 | COLNEW::VWBLOK
9.829 | 0.937 | 800,942,728 | 52,124 | 0.000189 | COLNEW::DGESL
4.254 | 0.481 | 318,090,424 | 7,726 | 0.000551 | COLNEW::DGEFA
------------------------------------------------------------
61.862 | 2.571 | 2,304,855,192 | 84,092 | | Total
cmucl:
Consed | Calls | Secs | Sec/Call | Bytes/C. | Name:
-----------------------------------------------------------------------
436,497,560 | 1,064 | 31.526 | 0.02963 | 410,242 | COLNEW::LSYSLV
366,043,528 | 23,178 | 16.477 | 0.00071 | 15,793 | COLNEW::VWBLOK
198,947,816 | 52,124 | 12.692 | 0.00024 | 3,817 | COLNEW::DGESL
86,848,840 | 7,726 | 7.929 | 0.00103 | 11,241 | COLNEW::DGEFA
-------------------------------------------------------------------
1,088,337,744 | 84,092 | 68.624 | | | Total
Here fsub, dfsub are hidden in lsyslv and take roughly the same time in
sbcl and cmucl. But dgesl dgefa are much more expensive in cmucl than in
sbcl, and this accounts for the difference in total time.
> from sbcl. But it's clear the fsub and dfsub take a large amount of time.
> Roughly 34% of the time. dgesl and lsyslv take another 32%.
>
> Making lsyslv faster would be a nice improvement.
The problem i see is that lsyslv is a "do nothing" routine which contents
itself with assembling computations done by other routines, approx, vwblock
etc. and of course fsub dfsub. It is in these routines that real
computations are done, for example approx will yield the values at
collocation points to be used in the equations, then one gets a linear
system to be inverted (one step of the Newton method). First the system is
massaged to a convenient form, vwblock, etc. and then finally solved by
dgesl dgefa. Eventually another step of Newton method is applied, perhaps
the mesh is refined etc. In this game lsyslv is only a driver. I had
obtained a graphviz representation of the call graph in colnew which
shows the driver routines at top, and the working routines at bottom
(as always ...) it is in:
http://www.lpthe.jussieu.fr/~talon/colnew.pdf
It has been obtained by first getting a vcg graph with ftnchek -vcg,
and then converting it to graphviz format by a script.
>
> I also wanted to mention that colnew makes use of a lot of common blocks.
> In the current conversion, the common blocks are represented as one giant
> array instead of a structure of smaller arrays/scalars. To access a
> particular part of a common block, an array slice is taken. I don't know
> how much additional time this takes, but it could be reduced if the parts
> of
> the common blocks were left as arrays. The naming used in colnew prevents
> f2cl from doing this. (Because in some parts of colnew we have a common
> block with an array named kdum, but in other parts it's named k, and that
> difference confuses f2cl.)
I must say i understand next to nothing to these common blocks, but i see a
line where both K and KDUM appear, they may not be the same:
COMMON /COLORD/ K, NCDUM, MSTAR, KDUM, MMAX, M(20)
The way in which the variables are setup in colnew is particularly obscure,
and this doesn't help to understand the program!
>
> Ray
--
Michel Talon