Multi-threaded Maxima



From: Stavros Macrakis
Sent: Monday, December 27, 2010 7:07 PM
To: Richard Fateman
Cc: Richard Hennessy ; Maxima List
Subject: Re: [Maxima] Multi-threaded Maxima
Hennessy speculated:


        ...Maxima could be as much as 3 ? 5 times faster if it used all 
[cores]


        Fateman replied:




        Not if it is waiting for memory. ...a number of basic algorithms 
that could be made faster,... if memory bandwidth were not an issue.


        I suppose it all depends on the memory usage patterns (in 
particular, the cache footprint) of the particular application.

        I ran some simple tests on my 2-core (E8400) machine:


        randpol(deg,coef):=sum((random(coef*2)-coef)*x^i,i,0,deg)$
        test(d, c) := block([p1, p2], p1 : randpol(d, c), p2 : randpol(d, 
c), factor(rat(p1*p2)))$
        for i thru 10 do test(100,10^80)$

        in two Maxima processes simultaneously (started by hand) vs. in one 
Maxima process sequentially.

        With two processes, the elapsed times were 54.8+53.5 = 108.3 (with 
Windows showing 100 % CPU usage) and with one process,  93.4 (50% CPU 
usage).  I tried other parameters -- different sized polynomials presumably 
require different amounts of memory for the factor calculation --  
and saw similar results.

I did the same thing sequentially and I got 45.54+40.11=85.65 seconds.  CPU 
was at 13%
In two hand-started processes simultaneously I got max(44.22, 48.21) = 48.21 
seconds.  CPU was at 25%.
In three hand-started processes simultaneously I got max(54.70, 52.81,54.64) 
= 54.70 seconds.  CPU was at 35%.



    I also tried a simple rational matrix inversion:

    for i thru 30 do genmatrix(lambda([i,j],random(100)-50),50,50)^^-1$

    and got 20.8+21.3 = 42.1 vs. 39.4.

For i = 50 in two processes I got.
for i thru 50 do genmatrix(lambda([i,j],random(100)-50),50,50)^^-1$
I got max(31.76, 32.49) with CPU at 25%

For i = 100 in one process I got
for i thru 100 do genmatrix(lambda([i,j],random(100)-50),50,50)^^-1$
60.83 seconds.

I cannot interpret these results since I don't know what was going on in 
these tests. I guess I have a fast machine.




    So the advantage in these cases are minimal, as Fateman predicts.

Not for me.
Rich

    There may well be symbolic Maxima applications with a small-enough 
memory footprint that memory access isn't the bottleneck, but there's no 
guarantee of that.

                  -s