Initial- and boundary-value problems in Maxima.



Raymond Toy wrote:


> 
> Corresponding results for cmucl:
> 
>      Consed |   Calls |  Secs | Sec/Call |  Bytes/C. | Name:
> -----------------------------------------------------------------------
>  8,995,512 |   2,782 | 0.464 |  0.00017 |     3,233 | COLNEW::DGESL
> 15,297,232 |   4,632 | 0.391 |  0.00008 |     3,303 | FSUB
> 12,451,000 |   1,584 | 0.257 |  0.00016 |     7,860 | DFSUB
>  3,271,472 |     396 | 0.179 |  0.00045 |     8,261 | COLNEW::DGEFA
>  3,527,200 |      74 | 0.150 |  0.00203 |    47,665 | COLNEW::LSYSLV
>  2,074,056 |   5,872 | 0.128 |  0.00002 |       353 | COLNEW::APPROX
>  7,418,448 |       1 | 0.100 |  0.10000 | 7,418,448 | COLNEW::CONTRL
>  1,310,232 |   1,594 | 0.067 |  0.00004 |       822 | COLNEW::GBLOCK
>  1,076,160 |   1,584 | 0.047 |  0.00003 |       679 | COLNEW::VWBLOK
> ...
> -------------------------------------------------------------------
> 60,178,656 |  81,952 | 1.903 |          |           | Total
> 
> I think this just basically confirms that cmucl compiles things
> differently


What is strange is that there is a large penalty on dgesl compared to
the compilation with sbcl. One may wonder what may cost so much time
in this basic routine - this may be a hint about the inefficiencies of
the lisp version. Anyways i have checked in the profilings i had done on 
prob3, i see the same pattern:

sbcl:

  seconds  |gc     |     consed    |  calls |  sec/call  |  name                              
------------------------------------------------------------                                       
    31.360 | 0.928 |   836,818,968 |  1,064 |   0.029473 | COLNEW::LSYSLV                     
    16.420 | 0.225 |   349,003,072 | 23,178 |   0.000708 | COLNEW::VWBLOK                     
     9.829 | 0.937 |   800,942,728 | 52,124 |   0.000189 | COLNEW::DGESL                      
     4.254 | 0.481 |   318,090,424 |  7,726 |   0.000551 | COLNEW::DGEFA                      
------------------------------------------------------------                                       
    61.862 |      2.571 | 2,304,855,192 | 84,092 |       | Total                              

cmucl:

       Consed |   Calls |   Secs | Sec/Call | Bytes/C. | Name:                                     
-----------------------------------------------------------------------                            
  436,497,560 |   1,064 | 31.526 |  0.02963 |  410,242 | COLNEW::LSYSLV                            
  366,043,528 |  23,178 | 16.477 |  0.00071 |   15,793 | COLNEW::VWBLOK                            
  198,947,816 |  52,124 | 12.692 |  0.00024 |    3,817 | COLNEW::DGESL                             
   86,848,840 |   7,726 |  7.929 |  0.00103 |   11,241 | COLNEW::DGEFA                             
-------------------------------------------------------------------                                
1,088,337,744 |  84,092 | 68.624 |          |          | Total                  

Here fsub, dfsub are hidden in lsyslv and take roughly the same time in
sbcl and cmucl. But dgesl dgefa are much more expensive in cmucl than in 
sbcl, and this accounts for the difference in total time.



> from sbcl.  But it's clear the fsub and dfsub take a large amount of time.
> Roughly 34% of the time.  dgesl and lsyslv take another 32%.
> 
> Making lsyslv faster would be a nice improvement.

The problem i see is that lsyslv is a "do nothing" routine which contents 
itself with assembling computations done by other routines, approx, vwblock
etc. and of course fsub dfsub. It is in these routines that real 
computations are done, for example approx will yield the values at 
collocation points to be used in the equations, then one gets a linear 
system to be inverted (one step of the Newton method). First the system is 
massaged to a convenient form, vwblock, etc. and then finally solved by 
dgesl dgefa. Eventually another step of Newton method is applied, perhaps 
the mesh is refined etc. In this game lsyslv is only a driver. I had 
obtained a graphviz representation of the call graph in colnew which 
shows the driver routines at top, and the working routines at bottom
(as always ...) it is in:
http://www.lpthe.jussieu.fr/~talon/colnew.pdf
It has been obtained by first getting a vcg graph with ftnchek -vcg,
and then converting it to graphviz format by a script.

> 
> I also wanted to mention  that colnew makes use of a lot of common blocks.
> In the current conversion, the common blocks are represented as one giant
> array instead of a structure of smaller arrays/scalars.   To access a
> particular part of a common block, an array slice is taken.  I don't know
> how much additional time this takes, but it could be reduced if the parts
> of
> the common blocks were left as arrays.  The naming used in colnew prevents
> f2cl from doing this.  (Because in some parts of colnew we have a common
> block with an array named kdum, but in other parts it's named k, and that
> difference confuses f2cl.)

I must say i understand next to nothing to these common blocks, but i see a 
line where both K and KDUM appear, they may not be the same:

      COMMON /COLORD/ K, NCDUM, MSTAR, KDUM, MMAX, M(20)

The way in which the variables are setup in colnew is particularly obscure, 
and this doesn't help to understand the program!

> 
> Ray

-- 

Michel Talon