Subject: Initial- and boundary-value problems in Maxima.
From: Michel Talon
Date: Thu, 15 Sep 2011 10:50:07 +0200
Raymond Toy wrote:
> On Sat, Sep 10, 2011 at 1:39 AM, <talon at lpthe.jussieu.fr> wrote:
>
>>
>> Performancewise, unfortunately, maxima colnew is *much* slower than
>> fortran colnew. One can expect a factor of 100 slower. Raymond and i have
>> measured the time spent in various parts of the program. The principal
>> part is spent in the evaluation of the differential equation at the mesh
>> points. Since it is given in maxima this requires an excursion in maxima
>> land thus is slow.
>>
>
> Perhaps compiling the maxima code will help somewhat. Unfortunately, the
> maxima compiler is a bit buggy.
>
When we looked at these numbers we tried to compile the formulas. It made no
difference. This is not suprising. When you write x+y in maxima, the + is
not an ordinary plus, it is an MPLUS, and this simple formula goes through
several steps before being recognized as the addition of two numbers.
Compiling the formula doesn't change that.
>
> Yes, this is a disappointment to me as well. I think part of the problem
> is that the Fortran code depends on being able to pass in parts of arrays
> other
> functions. In Fortran, there is virtually no cost to this; you just pass
> in
> the address of the part you want. In lisp, you can't do this. F2cl
> creates a displaced array for this (fast), but access to displaced arrays
> is slow. F2cl tries to reduce this cost by figuring out the parent array
> with the
> corresponding offset. The parent array should be a specialized array, so
> access is fast. Except that indexing of the array has an additional cost
> because we need the desired index to be added to the offset. This will
> slow down access by at least a factor of 2 or 3.
>
I have here the profiling we did using sbcl. One of the routines which is
most used and takes time is COLNEW::DAXPY
This routine is surprisingly simple, it just does linear combinations of 2
columns of a matrix. It is 48 lines of fortran with comments! One of the
comments is
c constant times a vector plus a vector.
c uses unrolled loops for increments equal to one.
Perhaps this loop unrolling is beneficial in fortran but bad in lisp?
Anyways this is a place where access to matrix elements plays a significant
role, as you say.
The statistical profiling shows that mosts samples live in maxima code,
things like SIMPLIFYA MEVAL1, etc. This should correspond to the above
discussion of evaluations of the differential equation at mesh points.
Other frequant calls are to sbcl functions i suppose. In fact here is the
beginning:
Self Total Cumul
Nr Count % Count % Count % Calls Function
------------------------------------------------------------------------
1 490 5.8 1138 13.5 490 5.8 - REMOVE-IF
2 467 5.6 467 5.6 957 11.4 - "foreign function
sigprocmask"
3 438 5.2 438 5.2 1395 16.6 - SB-EXT:WEAK-POINTER-
VALUE
4 260 3.1 260 3.1 1655 19.7 - SB-C::COMPACT-INFO-
LOOKUP
5 235 2.8 235 2.8 1890 22.5 - SB-VM::ALLOC-SIGNED-
BIGNUM-IN-EAX
6 211 2.5 1725 20.5 2101 25.0 - SIMPLIFYA
7 210 2.5 210 2.5 2311 27.5 - SB-IMPL::GET3
8 182 2.2 8345 99.3 2493 29.7 - MEVAL1
9 137 1.6 137 1.6 2630 31.3 - (LAMBDA (SB-
IMPL::VALUE))
10 134 1.6 139 1.7 2764 32.9 - (LABELS SB-
IMPL::EQUAL-AUX)
11 134 1.6 134 1.6 2898 34.5 - SB-KERNEL:%MEMBER-EQ
12 128 1.5 193 2.3 3026 36.0 - ALIKE1
13 124 1.5 131 1.6 3150 37.5 - LENGTH
14 116 1.4 247 2.9 3266 38.9 - SB-KERNEL:VALUES-
SPECIFIER-TYPE
15 100 1.2 615 7.3 3366 40.1 - (FLET SB-C::LOOKUP)
16 97 1.2 97 1.2 3463 41.2 - SB-C::VOLATILE-INFO-
LOOKUP
17 93 1.1 335 4.0 3556 42.3 - SB-KERNEL:SPECIFIER-
TYPE
18 91 1.1 1318 15.7 3647 43.4 - SUBST1
19 91 1.1 233 2.8 3738 44.5 - EQUAL
20 90 1.1 2498 29.7 3828 45.6 - MAKE-ARRAY
21 88 1.0 88 1.0 3916 46.6 - SB-IMPL::GET2
22 84 1.0 109 1.3 4000 47.6 - SB-KERNEL:CSUBTYPEP
23 81 1.0 163 1.9 4081 48.6 - GETL
24 80 1.0 165 2.0 4161 49.5 - ALIKE
25 68 0.8 100 1.2 4229 50.3 76618 COLNEW::APPROX
26 65 0.8 65 0.8 4294 51.1 - KEYWORDP
27 65 0.8 65 0.8 4359 51.9 - (LABELS SB-
IMPL::SXHASH-RECURSE)
28 63 0.7 390 4.6 4422 52.6 - PLS
29 63 0.7 254 3.0 4485 53.4 - TIMESIN
30 59 0.7 121 1.4 4544 54.1 - EQTEST
31 55 0.7 3194 38.0 4599 54.7 23178 COLNEW::VWBLOK
The first explicit colnew functions are approx and vwblock.
Approx evaluates numerically the differential equation, this is compute
intensive, and vwblock solves a big linear system, which is also compute
intensive.
By contrast DAXPY is very simple, but called very much:
63 28 0.3 49 0.6 5793 68.9 1164244 COLNEW::DAXPY
--
Michel Talon