Subject: [Ecls-list] segmentation fault with Maxima/ECL
From: Juan Jose Garcia-Ripoll
Date: Thu, 25 Sep 2008 20:38:42 +0200
On Thu, Sep 25, 2008 at 4:57 AM, Robert Dodier <robert.dodier at gmail.com> wrote:
> On Wed, Sep 24, 2008 at 2:29 PM, Juan Jose Garcia-Ripoll
> <juanjose.garciaripoll at googlemail.com> wrote:
>
>> Robert, I do not get anything like your backtrace. I ran that proram
>> within GDB using the latest ECL and the C stack simply overflows. The
>> GDB backgrace is 6065 calls deep!
>
> The Maxima call stack itself is a small multiple of 248 calls deep;
> it's somewhat annoying that the depth of the C call stack, which has
> triggered this error, is an artifact of the Lisp implementation.
Please, I will work on this issue, but there is no need to repeat the
nagging about how bad ECL is an implementation routine. We all know
that the Maxima developers do not like our implementation. But before
moving on, let us be precise. If I look at the GDB backtrace I see the
following
$ wc -l gdb.log
6066 gdb.log
$ egrep "in (cl_|ecl_)" gdb.log|wc -l
1310
$ egrep "meval" gdb.log|wc -l
2944
ECL only accounts for 1/5th of the call frames, while 50% is due to
the Maxima evaluator functions calling each other as here
#5781 0x0026ff98 in ecl_apply_from_stack_frame (frame=0xbfff9254,
x=0x1481940) at eval.d:74
#5782 0x00270472 in cl_apply (narg=2, fun=0x1481940,
lastarg=0x1481940) at eval.d:233
#5783 0x016bba48 in L14meval1 (V1=<value temporarily unavailable, due
to optimizations>) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:1198
#5784 0x016bbcda in L11meval (V1=0x1485828) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:607
#5785 0x016c347e in L7mlambda (V1=0x15dcd31, V2=0x12cecd9,
V3=0x15dcd31, V4=0x2f6058, V5=0x12cecf1) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:487
#5786 0x016c3879 in L1mapply1 (V1=0x15dcd31, V2=0x12cecf9,
V3=0x15dcd31, V4=0x12cecf1) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:97
#5787 0x016c3cd0 in L109arrfuncall (V1=0x15dccc9, V2=0x12cecf9,
V3=0x12cecf1) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:7937
#5788 0x016b9ffe in L102harrfind (V1=0x12cecf1) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:6939
#5789 0x016bae68 in L14meval1 (V1=0x15d1251) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:1072
#5790 0x016bbcda in L11meval (V1=0x1485828) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:607
#5791 0x016bdd90 in L3mevalargs (V1=0x15d1201) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:227
#5792 0x016bb459 in L14meval1 (V1=0x15d1181) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:1276
#5793 0x016bbcda in L11meval (V1=0x1485828) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:607
#5794 0x016bdd90 in L3mevalargs (V1=0x15d1619) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:227
#5795 0x016bb459 in L14meval1 (V1=0x15d1169) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:1276
#5796 0x016bbcda in L11meval (V1=0x1485828) at
/Users/jjgarcia/src/ecl/build/bin/binary-ecl/mlisp.c:607
In a future release I will simplify the code for calling functions
whose signature, name or number of arguments is unknown. That will
reduce the overhead of a function call from 2 to 1 frame per call,
thus lowering tha tnumber to about 600, but the number of function
calls that the Maxima evaluator uses is out of my control and
definitely it has nothing to do with the way we implement things.
> By the way, since the depth of the C stack is greatly increased
> because of the way ECL is implemented, maybe ECL should
> arrange for the default stack size to be larger. It won't fix the
> underlying problem but it would make it less frequently encountered.
I have been tracking this issue further on different machines. On
those which have large enough C stacks, there is a binding stack
overflow due to the many special variables that are bound through
different function calls. I have already implemented but not uploaded
code to create restartable conditions that grow the stack on demand.
This works for all functions I have defined, but still breaks with
Maxima because Maxima ignores the condition and, furthermore, prevents
the debugger and the stack growth code from being executed. Hence, the
only sensible solution for you will be to use the new routines to
enlarge the binding and frame stacks _before_ executing Maxima.
> Oh, I don't know. Maybe manage the call stack within ECL.
This is not our role. On Linux and other Unices, stack size is
controlled by the command line and programs do not have privileges to
enlarge it, other than sometimes allocating a completely new stack and
setting up a thread on it, AFAIK. But the linux box I tried had a
large enough C stack, it was the binding stack that failed.
> Or devise methods to detect C stack overflows on a best-effort basis.
> (If you have some methods that work on the most popular platforms,
> that's better than nothing.)
As I said, the binding and frame stacks are now checked (not yet
uploaded) and can grow. The C stack... I do not have enough low level
knowledge to hack the system calls which might enlarge the stack.
C in general provides no useful ways to control the stack size, other
than inspecting the address of automatic variables and guessing the
consumed stack. However this does not _enlarge_ it, it just gives you
a clue that you will be approaching some limits. What limits? Well
setting up such a check is stupid if we use a limit which is much
larger than usual stack size limits. Perhaps this could be guessed at
configuration time and later on left as a configurable parameter.
On another level of discussion, checking the size of the stack on
every function call seems like an overkill and will definitely slow
down any program.
Juanjo
--
Instituto de F?sica Fundamental, CSIC
c/ Serrano, 113b, Madrid 28009 (Spain)
http://juanjose.garciaripoll.googlepages.com