[jared@cs.utexas.edu: Re: [fateman@cs.berkeley.edu: RE: gcl inlining policy]]



Thanks so much Matt and Jared for the very helpful feedback.  My
comments on GCL at the moment below:

Matt Kaufmann <kaufmann at cs.utexas.edu> writes:

> Hi, Camm --
> 
> I forwarded some of your inlining thread to a couple of people who I
> thought would have useful comments, and I got an interesting reply
> saying "Feel free to forward this email to anyone".  (Thanks, Jared.)
> So I'm forwarding it to you below, and letting you decide if you want
> to forward it to some or all in the thread.
> 
> -- Matt
> From: "Jared C. Davis" <jared at cs.utexas.edu>
> Subject: Re: [fateman at cs.berkeley.edu: RE: [Maxima] gcl inlining policy]
> To: "Matt Kaufmann" <kaufmann at cs.utexas.edu>
> Cc: robert.sumners at amd.com
> Date: Fri, 6 Jul 2007 21:37:19 -0500
> 
> Hi,
> 
> Feel free to forward this email to anyone, but it's probably not
> particularly useful.
> 
> As an ACL2 user, I feel it's particularly important to set up "proper"
> abstractions.  Since there is no "shell principle" and I have no way
> to define new objects or structures, I end up using something like a
> tuple/alist/custom-cons-structure for this purpose.  To develop good
> rewrite rules, it's crucial to define proper recognizers, accessors,
> and constructors.  This leads me to have several aliases for "first",
> "second", "third", etc.  These aliases are particularly useful when
> combined with forcing, and they're excellent candidates to be inlined.
>  This would probably be a useful technique for any ACL2 user.  (Also,
> the ACL2 system code often uses macros instead of inline functions,
> and this is probably unfortunate.)

All of these includeing car et. al. are now inlined from source, which
is adjusted for the ambient safety level thus:

safety 3: all checks, all variables of generic type, no src inlining,
          no branch elimination
safety 2: src inlining with checks, branch elimination, full variable type propagation
safety 1: only check on function entry, src inlines without checks,
          reverse type propagation
safety 0: check-type at body head turned into declare, no checking
          unless user makes same explicit.

> 
> One other really crucial use of inlining, for me, is that all the base
> functions in Milawa are total with :guard t.  For example, MILAWA::CAR
> is defined as (if (consp x) (COMMON-LISP::car x) nil), and my
> arithmetic operations will nfix their arguments and results if needed.
>  Inlining these primitives allows smart Lisp compilers to see and
> optimize-away a lot of common subexpressions; for example, I often
> write code like the following:
> 
>    (let ((foo (first x))
>          (bar (second x))
>          (baz (third x)))
>      (do-stuff foo bar baz))
> 

This is on now at safety 0,1 in GCL:

>(disassemble '(lambda (x) (let ((foo (first x)) (bar (second x)) (baz (third x))) (do-stuff foo bar baz))) nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function CMP-ANON	*/

static object LI1__CMP_ANON___gazonk_14987_0(V3,V4)

fixnum V3;object V4;
{	 VMB1 VMS1 VMV1
	goto TTL;
TTL:;
	{object V5;
	object V6;
	object V7;
	/*(FIRST X)*/
	{object V8;
	V8= (V4);
	/*(CAR X)*/
	{object V9;
	V9= (V8);
	V5= ((V9))->c.c_car;}
	/* END (CAR X)*/}
	/* END (FIRST X)*/
	/*(SECOND X)*/
	{object V10;
	V10= (V4);
	/*(CADR X)*/
	{object V11;
	V11= (V10);
	/*(CAR (CDR X))*/
	{object V12;
	{object V13;
	/*(CDR X)*/
	{object V14;
	V14= (V11);
	V13= ((V14))->c.c_cdr;}
	/* END (CDR X)*/
	V12=V13;}
	V6= ((V12))->c.c_car;}
	/* END (CAR (CDR X))*/}
	/* END (CADR X)*/}
	/* END (SECOND X)*/
	/*(THIRD X)*/
	{object V15;
	V15= (V4);
	/*(CADDR X)*/
	{object V16;
	V16= (V15);
	/*(CAR (CDDR X))*/
	{object V12;
	{object V17;
	/*(CDDR X)*/
	{object V18;
	V18= (V16);
	/*(CDR (CDR X))*/
	{object V19;
	{object V20;
	/*(CDR X)*/
	{object V14;
	V14= (V18);
	V20= ((V14))->c.c_cdr;}
	/* END (CDR X)*/
	V19=V20;}
	V17= ((V19))->c.c_cdr;}
	/* END (CDR (CDR X))*/}
	/* END (CDDR X)*/
	V12=V17;}
	V7= ((V12))->c.c_car;}
	/* END (CAR (CDDR X))*/}
	/* END (CADDR X)*/}
	/* END (THIRD X)*/
	{object V21;
	if (V3) {
	V21= (VFUN_NARGS=3,(/* DO-STUFF */(*LnkLI0)(V3,(V5),(V6),(V7))));
	} else {
	V21= (VFUN_NARGS=3,(/* DO-STUFF */(*LnkLI0)((fixnum)0,(V5),(V6),(V7))));
	vs_top=base;
	}
	return(V21);}}
	base[0]=base[0];
	return Cnil;
}

>(disassemble '(lambda (x) (declare (optimize (safety 2)))(let ((foo (first x)) (bar (second x)) (baz (third x))) (do-stuff foo bar baz))) nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function CMP-ANON	*/

static object LI1__CMP_ANON___gazonk_14987_0(V3,V4)

fixnum V3;object V4;
{	 VMB1 VMS1 VMV1
	goto TTL;
TTL:;
	{object V5;
	object V6;
	object V7;
	/*(FIRST X)*/
	{register object V8;
	V8= (V4);
	if(!(listp((V8)))){
	goto T6;}
	goto T4;
	goto T6;
T6:;
	V8= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI2)(((object)VV[0]),(V8),((object)VV[1]),Cnil)));
	goto T4;
T4:;
	{object V9;
	V9= (V8);
	/*(CAR X)*/
	{register object V10;
	V10= (V9);
	{object V11;
	V11= (V10);
	V5= ((V11))->c.c_car;}}
	/* END (CAR X)*/}}
	/* END (FIRST X)*/
	/*(SECOND X)*/
	{register object V12;
	V12= (V4);
	if(!(listp((V12)))){
	goto T17;}
	goto T15;
	goto T17;
T17:;
	V12= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI2)(((object)VV[0]),(V12),((object)VV[1]),Cnil)));
	goto T15;
T15:;
	{object V13;
	V13= (V12);
	/*(CADR X)*/
	{register object V14;
	V14= (V13);
	{object V15;
	V15= (V14);
	/*(CAR (CDR X))*/
	{register object V16;
	{object V17;
	/*(CDR X)*/
	{register object V18;
	V18= (V15);
	{object V19;
	V19= (V18);
	V17= ((V19))->c.c_cdr;}}
	/* END (CDR X)*/
	V16=V17;}
	if(!(listp((V16)))){
	goto T32;}
	goto T30;
	goto T32;
T32:;
	V16= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI2)(((object)VV[0]),(V16),((object)VV[1]),Cnil)));
	goto T30;
T30:;
	{object V20;
	V20= (V16);
	V6= ((V20))->c.c_car;}}
	/* END (CAR (CDR X))*/}}
	/* END (CADR X)*/}}
	/* END (SECOND X)*/
	/*(THIRD X)*/
	{register object V21;
	V21= (V4);
	if(!(listp((V21)))){
	goto T39;}
	goto T37;
	goto T39;
T39:;
	V21= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI2)(((object)VV[0]),(V21),((object)VV[1]),Cnil)));
	goto T37;
T37:;
	{object V22;
	V22= (V21);
	/*(CADDR X)*/
	{register object V23;
	V23= (V22);
	{object V24;
	V24= (V23);
	/*(CAR (CDDR X))*/
	{register object V25;
	{object V26;
	/*(CDDR X)*/
	{register object V27;
	V27= (V24);
	{object V28;
	V28= (V27);
	/*(CDR (CDR X))*/
	{register object V29;
	{object V30;
	/*(CDR X)*/
	{register object V18;
	V18= (V28);
	{object V19;
	V19= (V18);
	V30= ((V19))->c.c_cdr;}}
	/* END (CDR X)*/
	V29=V30;}
	if(!(listp((V29)))){
	goto T59;}
	goto T57;
	goto T59;
T59:;
	V29= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI2)(((object)VV[0]),(V29),((object)VV[1]),Cnil)));
	goto T57;
T57:;
	{object V31;
	V31= (V29);
	V26= ((V31))->c.c_cdr;}}
	/* END (CDR (CDR X))*/}}
	/* END (CDDR X)*/
	V25=V26;}
	if(!(listp((V25)))){
	goto T65;}
	goto T63;
	goto T65;
T65:;
	V25= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI2)(((object)VV[0]),(V25),((object)VV[1]),Cnil)));
	goto T63;
T63:;
	{object V32;
	V32= (V25);
	V7= ((V32))->c.c_car;}}
	/* END (CAR (CDDR X))*/}}
	/* END (CADDR X)*/}}
	/* END (THIRD X)*/
	{object V33;
	if (V3) {
	V33= (VFUN_NARGS=3,(/* DO-STUFF */(*LnkLI3)(V3,(V5),(V6),(V7))));
	} else {
	V33= (VFUN_NARGS=3,(/* DO-STUFF */(*LnkLI3)((fixnum)0,(V5),(V6),(V7))));
	vs_top=base;
	}
	return(V33);}}
	base[0]=base[0];
	return Cnil;
}

>

Given the tentative redefinition of the safety settings, checks on
each function call are only enabled at safety >=2.  GCL binds the
lambda list source via a 'let, so the check-type works on the original
source variable, not the surrounding variable.  (The check-type does an
implicit setq for the type, which does automatically propagate to
references further down in the source body.)  GCL has support for
reverse type propagation, in which the binding mentioned above
triggers an implicit setq for the surrounding variable, but it appears
hard to justify turning this on at safety 2.  Currently it is turned
on at safety 0,1.  The idea being that at safety 2 one gets the full
checking effect as if the inlining was not performed -- the code above
would trigger three errors if passed a non-list which the user then
reset in the interactive check-type code and each call was a genuine
function call.

I'm writing this too quickly, so the above might sound opaque.  I can
clarify if there is interest.  I just bring it up to poll users'
opinions of what behavior would be useful at various safety settings.

Here is the reverse type propagation in action:

(disassemble '(lambda (x) (first x) (listp x)) nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function CMP-ANON	*/

static object LI1__CMP_ANON___gazonk_14987_0(V2)

object V2;
{	 VMB1 VMS1 VMV1
	goto TTL;
TTL:;
	/*(FIRST X)*/
	{object V3;
	V3= (V2);
	/*(CAR X)*/
	{object V4;
	V4= (V3);
	(void)(((V4))->c.c_car);}
	/* END (CAR X)*/}
	/* END (FIRST X)*/
	{object V5 = Ct;VMR1
	(V5);}
	return Cnil;
}

>(disassemble '(lambda (x) (declare (optimize (safety 1)))(first x) (listp x)) nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function CMP-ANON	*/

static object LI1__CMP_ANON___gazonk_14987_0(V2)

object V2;
{	 VMB1 VMS1 VMV1
	goto TTL;
TTL:;
	/*(FIRST X)*/
	{object V3;
	V3= (V2);
	/*(CAR X)*/
	{object V4;
	V4= (V3);
	(void)(((V4))->c.c_car);}
	/* END (CAR X)*/}
	/* END (FIRST X)*/
	{object V5 = Ct;VMR1
	(V5);}
	return Cnil;
}

>(disassemble '(lambda (x) (declare (optimize (safety 2)))(first x) (listp x)) nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function CMP-ANON	*/

static object LI1__CMP_ANON___gazonk_14987_0(V2)

object V2;
{	 VMB1 VMS1 VMV1
	goto TTL;
TTL:;
	/*(FIRST X)*/
	{register object V3;
	V3= (V2);
	if(!(listp((V3)))){
	goto T6;}
	goto T4;
	goto T6;
T6:;
	V3= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI2)(((object)VV[0]),(V3),((object)VV[1]),Cnil)));
	goto T4;
T4:;
	{object V4;
	V4= (V3);
	/*(CAR X)*/
	{register object V5;
	V5= (V4);
	{object V6;
	V6= (V5);
	(void)(((V6))->c.c_car);}}
	/* END (CAR X)*/}}
	/* END (FIRST X)*/
	{object V7 = (listp((V2))?Ct:Cnil);VMR1
	(V7);}
	return Cnil;
}

>


> Where first, second, and third are macros expanding to the appropriate
> nest of MILAWA::car and MILAWA::cdr's.  Inlining my car and cdr here
> will expose the (consp x) test in each binding, and the compiler can
> then get rid of these redundnat checks.
> 

I hope the above indicates we are competitive here.

> I believe that on CMUCL (and I think also SBCL), these two kinds of
> inlining give me almost a 50% speedup when running in raw lisp to
> check proofs.  And, the last time I compared these Lisps against GCL
> for my program (which was a long time ago), CMUCL was only taking 4
> seconds, SBCL was taking about 6 seconds, and GCL 2.6.7 was taking 11
> seconds to check some proofs with my system.
> 

I'd love to see some simple code I could benchmark.

> I think OpenMCL also has some support for inlinining, but I found it
> buggy in the last pre-release and as a result it was unusable.  I
> reported this to Gary Byers, and I think he's fixed it in CVS, and
> perhaps it's available in some release I'm not using yet.
> 
> I don't have a lot of intuition about what ought to be inlined by
> default.  Certainly anything relatively "small" and nonrecursive would
> make a fine candidate.  I also had an idle thought awhile back that

What really interests me is that the inlining could provide for some
loop unrolling of tail recursive functions.  GCL carries list length
information around causing same to terminate in some cases.  So far we
punt if the recursion is > 2 deep.  This is to support the likes of
mapl:

>(si::function-src 'mapl)

(LAMBDA (SYSTEM::F SYSTEM::L1 &REST SYSTEM::L)
  (DECLARE (OPTIMIZE (SAFETY 2)) (:DYNAMIC-EXTENT SYSTEM::L))
  (CHECK-TYPE SYSTEM::L1 SYSTEM:PROPER-LIST)
  (BLOCK MAPL
    (BLOCK ()
      (LET ((SYSTEM::R SYSTEM::L1) (SYSTEM::L1 SYSTEM::L1)
            (SYSTEM::L SYSTEM::L))
        (TAGBODY
          #:G4392
          (IF (NOT (OR (ENDP SYSTEM::L1) (MEMBER-IF 'ENDP SYSTEM::L)))
              (PROGN
                (TAGBODY (APPLY SYSTEM::F SYSTEM::L1 SYSTEM::L))
                (LET* ((#:G4393 (CDR SYSTEM::L1))
                       (#:G4394 (MAPL (LAMBDA (SYSTEM::X)
                                        (LET*
                                         ((#:G4395 SYSTEM::X)
                                          (#:G4396 (CDAR SYSTEM::X)))
                                          (PROGN
                                            (RPLACA #:G4395 #:G4396)
                                            #:G4396)))
                                      SYSTEM::L)))
                  (SETQ SYSTEM::L1 #:G4393)
                  (SETQ SYSTEM::L #:G4394)
                  NIL)
                (GO #:G4392))
              (RETURN-FROM () (PROGN SYSTEM::R))))))))

>(disassemble '(lambda (x) (mapl 'cdr x)) nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function CMP-ANON	*/

static object LI1__CMP_ANON___gazonk_14987_0(V2)

object V2;
{	 VMB1 VMS1 VMV1
	goto TTL;
TTL:;
	/*(MAPL 'CDR X)*/
	{object V3;
	object V4;
	V3= (V2);
	V4= Cnil;
	{register object V5;
	register object V6;
	register object V7;
	V5= (V3);
	V6= (V3);
	V7= Cnil;
	goto T3;
T3:;
	/*(ENDP L1)*/
	{register object V8;
	V8= (V6);
	if(((V8))==Cnil){
	goto T6;}}
	/* END (ENDP L1)*/
	/*(MEMBER-IF 'ENDP L)*/
	/*(MEMBER ITEM LIST TEST 'FUNCALL KEY KEY)*/
	/*(ENDP LIST)*/
	/* END (ENDP LIST)*/
	goto T9;
	/* END (MEMBER ITEM LIST TEST 'FUNCALL KEY KEY)*/
	/* END (MEMBER-IF 'ENDP L)*/
	goto T9;
T9:;
	{register object V9;
	V9= (V6);
	/*(CDR G2507)*/
	{register object V10;
	V10= (V9);
	(void)(((V10))->c.c_cdr);}
	/* END (CDR G2507)*/}
	{register object V11;
	register object V12;
	{object V13;
	/*(CDR L1)*/
	{register object V10;
	V10= (V6);
	V13= ((V10))->c.c_cdr;}
	/* END (CDR L1)*/
	V11=V13;}
	{object V14;
	/*(MAPL (LAMBDA (X)
        (LET* ((G4395 X) (G4396 (CDAR X)))
          (PROGN (RPLACA G4395 G4396) G4396)))
      L)*/
	{register object V15;
	V15= Cnil;
	goto T26;
	goto T24;
	goto T26;
T26:;
	/*(ENDP L1)*/
	/* END (ENDP L1)*/
	goto T24;
T24:;
	V14= Cnil;
	goto T20;
	V14= Cnil;}
	/* END (MAPL (LAMBDA (X)
        (LET* ((G4395 X) (G4396 (CDAR X)))
          (PROGN (RPLACA G4395 G4396) G4396)))
      L)*/
	goto T20;
T20:;
	V12=V14;}
	V6= (V11);
	V7= Cnil;}
	goto T3;
	goto T6;
T6:;
	{object V16 = (V5);VMR1
	(V16);}
	{object V17 = Cnil;VMR1
	(V17);}}}
	/* END (MAPL 'CDR X)*/
	return Cnil;
}

>

> mutually recursive functions might be unwound to expose tail
> recursion, e.g., think of datatype such as
>   TERM ::= ATOM | (FN TERM1 ... TERMN)

Here is something I really like -- hope this could be useful:

>(defun foo (x &optional (y 0)) (if x (bar (cdr x) (1+ y)) y))

FOO

>(defun bar (x &optional (y 0)) (if x (foo (cdr x) (1+ y)) y))

BAR

>(foo '(1 2 3))

3

>(compile 'foo)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.
;; Loading /tmp/gazonk_14987_0.o
 ;; start address -T 0x107e7f8 ;; Finished loading /tmp/gazonk_14987_0.o
#<compiled-function FOO>
NIL
NIL

>(compile 'bar)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.
;; Loading /tmp/gazonk_14987_0.o
Pass1 signature discovery on 1 functions ...
Mutual recursion detected: (FOO BAR), recompiling BAR2512
Pass1 signature discovery on 1 functions ...
Pass1 signature discovery on 2 functions ...
Compiling and loading new source in #<output stream "/tmp/gazonk_14987_X0QEXb.lsp">
;; Compiling /tmp/gazonk_14987_X0QEXb.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_X0QEXb.o.
;; Loading /tmp/gazonk_14987_X0QEXb.o
 ;; start address -T 0x10ab828 ;; Finished loading /tmp/gazonk_14987_X0QEXb.o
 ;; start address -T 0x10db668 ;; Finished loading /tmp/gazonk_14987_0.o
#<compiled-function BAR>
NIL
NIL

>(disassemble 'BAR2512 nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function BAR2512	*/

static object LI1__BAR2512___gazonk_14987_0(object V2,object V1,object first,...)
{	
	va_list ap;
	int narg = VFUN_NARGS; VMB1 VMS1 VMV1
	{fixnum V3;
	register object V4;
	register object V5;
	va_start(ap,first);
	V3= fix(V2);
	V4= V1;
	narg -= 2;
	if (narg <= 0) goto T1;
	else {
	V5= first;}
	--narg; goto T2;
	goto T1;
T1:;
	V5= Cnil;
	goto T2;
T2:;
	goto TTL;
TTL:;switch(V3){
	case 0:
	goto T5;
T5:;
	{register object V6;
	register object V7;
	V6= (V4);
	V7= (V5);
	if(((V6))==Cnil){
	goto T8;}
	V3= (fixnum)1;
	/*(CDR X)*/
	{register object V8;
	V8= (V6);
	V4= ((V8))->c.c_cdr;}
	/* END (CDR X)*/
	V5= immnum_plus((V7),make_fixnum(1));
	goto TTL;
	goto T8;
T8:;
	{object V9 = (V7);VMR1
	(V9);}}
	default:
	goto T6;
T6:;
	{register object V10;
	register object V11;
	V10= (V4);
	V11= (V5);
	if(((V10))==Cnil){
	goto T16;}
	V3= (fixnum)0;
	/*(CDR X)*/
	{register object V8;
	V8= (V10);
	V4= ((V8))->c.c_cdr;}
	/* END (CDR X)*/
	V5= immnum_plus((V11),make_fixnum(1));
	goto TTL;
	goto T16;
T16:;
	{object V12 = (V11);VMR1
	(V12);}}
	{object V13 = Cnil;VMR1
	(V13);}}
	{object V14 = Cnil;VMR1
	(V14);}
	va_end(ap);
	return Cnil;}
	}
>(disassemble 'foo nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function FOO	*/

static object LI1__FOO___gazonk_14987_0(object V1,object first,...)
{	
	va_list ap;
	int narg = VFUN_NARGS; VMB1 VMS1 VMV1
	{object V2;
	object V3;
	va_start(ap,first);
	V2= V1;
	narg -= 1;
	if (narg <= 0) goto T1;
	else {
	V3= first;}
	--narg; goto T2;
	goto T1;
T1:;
	V3= make_fixnum(0);
	goto T2;
T2:;
	goto TTL;
TTL:;
	{object V4 = (VFUN_NARGS=3,(/* BAR2512 */(*LnkLI0)(make_fixnum(0),(V2),(V3))));VMR1
	(V4);}
	va_end(ap);
	return Cnil;}
	}
>(disassemble 'bar nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function BAR	*/

static object LI1__BAR___gazonk_14987_0(object V1,object first,...)
{	
	va_list ap;
	int narg = VFUN_NARGS; VMB1 VMS1 VMV1
	{object V2;
	object V3;
	va_start(ap,first);
	V2= V1;
	narg -= 1;
	if (narg <= 0) goto T1;
	else {
	V3= first;}
	--narg; goto T2;
	goto T1;
T1:;
	V3= make_fixnum(0);
	goto T2;
T2:;
	goto TTL;
TTL:;
	{object V4 = (VFUN_NARGS=3,(/* BAR2512 */(*LnkLI0)(make_fixnum(1),(V2),(V3))));VMR1
	(V4);}
	va_end(ap);
	return Cnil;}
	}


> If you were to define a recognizer for such a function, it would
> probably take the form of a mutual-recursion with a TERM recognizer
> and a TERM LIST recognizer.  If you just inline the term recognizer,
> you'll find the resulting term-list recognizer to be entirely tail
> recursive, which is potentially a big win.  But I'm using flag
> functions instead now, so I don't know if this actually works in these
> other Lisps.

Don't know if you are familiar with the takr benchmark, but this
obviously really helps here.  How useful it is in real code is still
unclear to me.  It is a little bit of scheme that GCL has brought into
CL. 

> 
> One other idea is, I think, called "fusion" in the literature.  It's
> not quite the same as inlining, but it's sort of similar in spirit.
> Imagine expresions such as:
>    power4 x = map square (map square x)
> one way to compile power4 would be to have it create an intermediate
> list of squares, then a result list of power4's.  A smarter way would
> be to basically simplify the definition to "map (square square) x", so
> that consing together the intermediate list can be avoided.  I really
> like to write ACL2 code that operates by filtering, mapping, and
> folding operations over lists, but sometimes I have to "manually"
> perform the folding and prove that my folded function is the same as
> some combination of these higher order functions for efficiency's
> sake.  Introducing accumulators as appropriate would seem to fall into
> this category.  If GCL could automatically handle this kind of thing,
> it would be great, but this is probably all much further down the road
> and much harder to do in an impure language like Common Lisp.  That
> is, how do we know "square" is side-effect free?
> 

I am taking a stab at propagating a side-effect-free property
automatically.  But I don't yet see how to implement something like
the above ....

> Anyway, I'm just excited for GCL to support function inlining via
> declaims.  I don't really care what it does automatically, because I
> don't use much Lisp beyond whatever is provided by ACL2.  Maybe
> functions like "length" could be implemented as inlined wrappers of
> the form:
>   (if (stringp x) (nasty_c_code::string-length x) (nasty_c_code::list-length x))
> but probably all of you understand these kinds of details much better than me.
> 

Yep:

>(disassemble 'length nil)

;; Compiling /tmp/gazonk_14987_0.lsp.
;; End of Pass 1.  
;; End of Pass 2.  
;; OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3, (Debug quality ignored)
;; Finished compiling /tmp/gazonk_14987_0.o.

#include "gazonk_14987_0.h"
void init_code(){do_init((void *)VV);}
/*	local entry for function LENGTH	*/

static fixnum LI1__LENGTH___gazonk_14987_0(V2)

register object V2;
{	 VMB1 VMS1 VMV1
	goto TTL;
TTL:;
	if(((/* PROPER-SEQUENCEP */(*LnkLI3)((V2))))==Cnil){
	goto T5;}
	goto T3;
	goto T5;
T5:;
	V2= (VFUN_NARGS=4,(/* CHECK-TYPE-SYMBOL */(*LnkLI4)(((object)VV[0]),(V2),((object)VV[1]),Cnil)));
	goto T3;
T3:;
	{register object V3;
	V3= (V2);
	if(!(listp((V3)))){
	goto T8;}
	/*(ENDP X)*/
	{register object V4;
	V4= (V3);
	{register object V5;
	V5= (V4);
	if(!(((V5))==Cnil)){
	goto T11;}}}
	/* END (ENDP X)*/
	{fixnum V6 = (fixnum)0;VMR1
	(V6);}
	goto T11;
T11:;
	/*(ENDP (SETQ X (CDR X)))*/
	{register object V7;
	/*(CDR X)*/
	{register object V8;
	V8= (V3);
	{register object V9;
	V9= (V8);
	V3= ((V9))->c.c_cdr;}}
	/* END (CDR X)*/
	V7= (V3);
	{object V10;
	V10= (V7);
	if(!(((V10))==Cnil)){
	goto T18;}}}
	/* END (ENDP (SETQ X (CDR X)))*/
	{fixnum V11 = (fixnum)1;VMR1
	(V11);}
	goto T18;
T18:;
	/*(ENDP (SETQ X (CDR X)))*/
	{register object V7;
	/*(CDR X)*/
	{register object V8;
	V8= (V3);
	{register object V9;
	V9= (V8);
	V3= ((V9))->c.c_cdr;}}
	/* END (CDR X)*/
	V7= (V3);
	{object V10;
	V10= (V7);
	if(!(((V10))==Cnil)){
	goto T31;}}}
	/* END (ENDP (SETQ X (CDR X)))*/
	{fixnum V12 = (fixnum)2;VMR1
	(V12);}
	goto T31;
T31:;
	/*(ENDP (SETQ X (CDR X)))*/
	{register object V7;
	/*(CDR X)*/
	{register object V8;
	V8= (V3);
	{register object V9;
	V9= (V8);
	V3= ((V9))->c.c_cdr;}}
	/* END (CDR X)*/
	V7= (V3);
	{object V10;
	V10= (V7);
	if(!(((V10))==Cnil)){
	goto T44;}}}
	/* END (ENDP (SETQ X (CDR X)))*/
	{fixnum V13 = (fixnum)3;VMR1
	(V13);}
	goto T44;
T44:;
	/*(ENDP (SETQ X (CDR X)))*/
	{register object V7;
	/*(CDR X)*/
	{register object V8;
	V8= (V3);
	{register object V9;
	V9= (V8);
	V3= ((V9))->c.c_cdr;}}
	/* END (CDR X)*/
	V7= (V3);
	{object V10;
	V10= (V7);
	if(!(((V10))==Cnil)){
	goto T57;}}}
	/* END (ENDP (SETQ X (CDR X)))*/
	{fixnum V14 = (fixnum)4;VMR1
	(V14);}
	goto T57;
T57:;
	{register fixnum V15;
	register object V16;
	V15= (fixnum)5;
	/*(CDR X)*/
	{register object V8;
	V8= (V3);
	{register object V9;
	V9= (V8);
	V16= ((V9))->c.c_cdr;}}
	/* END (CDR X)*/
	goto T76;
T76:;
	/*(ENDP X)*/
	{register object V4;
	V4= (V16);
	{register object V5;
	V5= (V4);
	if(((V5))==Cnil){
	goto T79;}}}
	/* END (ENDP X)*/
	{register fixnum V17;
	register object V18;
	V19 = V15;
	V17= (fixnum)(V19)+((fixnum)1);
	{object V20;
	/*(CDR X)*/
	{register object V8;
	V8= (V16);
	{register object V9;
	V9= (V8);
	V20= ((V9))->c.c_cdr;}}
	/* END (CDR X)*/
	V18=V20;}
	V15= V17;
	V16= (V18);}
	goto T76;
	goto T79;
T79:;
	{fixnum V21 = V15;VMR1
	(V21);}
	{fixnum V22 = fixint(Cnil);VMR1
	(V22);}}
	goto T8;
T8:;
	{fixnum V23 = (fixnum)((V3))->v.v_fillp;VMR1
	(V23);}}
	base[0]=base[0];
}


GCL unrolls the first few iterations to count short list lengths, here
<= 5. 

> I guess as a parting thought, it's not really the size of the code
> that you care about.  I mean, you really want to inline if:
>    (1) it allows you to avoid the overhead of many function calls, or

Absolutely critical in lisp it seems.

>    (2) it allows you to see the sort of "repeated subterm"
> optimizations I mentioned in the MILAWA::CAR example.
> 

So far we have no automatic repeated subexpression simplifier, but I
have discussed this with Boyer in the past.

> The only downside of inlining is the disk space needed for the
> executable, and really who cares?  I mean, gosh, I bought a 250
> GIGABYTE hard drive awhile ago for something like $70.  If disk space
> is like 30 cents per GB (and dropping), then I'm happy for GCL to use
> several GB of disk space if it wants to.
> 

Thanks for the data point ...

Quite surprising to me -- code is often smaller when inlined.

> Maybe as a first cut, you could try to heuristically guess at these
> things.  We could imagine labelling a function as BIG if it is
> recursive or calls any BIG function on a non-constant value.  Non-big
> functions could be called SMALL.  A simple heuristic would be to
> always inline SMALL functions.  Maybe this isn't good enough, maybe
> you want to also have REALLYBIG functions that recursively call a BIG
> function.  Then you could unwind BIG functions up to, say, 5 times,
> depending on the sheer code size, and never unwind REALLYBIG
> functions.

Here is the current heuristic based on *space*

(defun acceptable-inline (h form tpis)
  (let* ((c1 (car h))
	 (sz (cadr h))
	 (d (and c1
		 (inline-possible (car form))
		 (or (< sz (* 1000 (- 3 (max 0 *space*))))
		     (and (< *space* 3) (member-if (lambda (x) (and (atomic-tp (car x)) (functionp (cadar x)))) tpis))))))

(sz is rougly the number of conses in the pass1 output)
(if an explicit function is passed as an arg, inline regardless of *space*)


> 
> But by now I'm talking nonsense.  Maybe the only good advice is to
> keep the autoamatic-inlining policy somehow flexible, so it can be
> changed after testing shows how we failed to handle some case well.
> We could also ask the SBCL or CMUCL lists, or Gary Byers, about their
> strategy for inlining and what experience led them to that strategy.

I imagine one wants to turn it off without going to safety 3.  Right
now, *speed* 0 will do it too, but this is really awful....

Take care,

> 
> Cheers,
>     Jared
> 
> On 7/6/07, Matt Kaufmann <kaufmann at cs.utexas.edu> wrote:
> > Hi, Rob and Jared --
> >
> > Rob, didn't you once tell me that Allegro ignores function inlining
> > proclamations?  Jared, didn't you once tell me that CMUCL does really
> > well with function inlining?  Below are relevant emails (starting with
> > the one that started this thread) in case either of you is interested.
> >
> > -- Matt
> > ============================================================
> >
> > To: gcl-devel at gnu.org
> > cc: boyer at cs.utexas.edu, Matt Kaufmann <kaufmann at cs.utexas.edu>,
> >         daly at axiom-developer.org, axiom-developer at nongnu.org,
> >         "Robert Dodier" <robert.dodier at gmail.com>,
> >         Maxima mailing list <Maxima at math.utexas.edu>
> > Subject: gcl inlining policy
> > From: Camm Maguire <camm at enhanced.com>
> > Date: Fri, 06 Jul 2007 12:58:02 -0400
> > X-SpamAssassin-Status: No, hits=-2.5 required=5.0
> > X-UTCS-Spam-Status: No, hits=-280 required=200
> >
> > Greetings!  If you would be so kind, I'd appreciate some direction on
> > the desired inlining policy fro gcl 2.7.0.
> >
> > GCL has always inlined certain functions, e.g. mapcar, member, etc.
> > Without this, performance would be noticeably worse.
> >
> > Traditionally, this has been accomplished though ad-hoc 'c1 functions
> > in the compiler, or C-snippet inline attributes in cmpopt.lsp.  Thus
> > one had not only to write the function, but its inline support in the
> > compiler as well.  Support for these remain, but thanks to Boyer's
> > suggestion to carry the original source around in the image, automatic
> > inlining is now possible.  Support for this is in place in 2.7 with
> > notable improvements in performance.
> >
> > The question is, what functions should be inlined by default?
> > Currently, symbols external to the lisp package are automatically
> > considered. Explicitly declared or declaimed functions are also.
> > inlining may decline if the inline is too large, using a heuristic
> > function using the *space* setting as input, though in many cases,
> > inlined code is smaller.  The real cost of inlining is compiler speed,
> > and tracing support.
> >
> > Boyer recently posted a benchmark which would benefit greatly from
> > automatic inlining of user functions.
> >
> > Separately, should inlining be allowed at safety 3?
> >
> > Take care,
> > --
> > Camm Maguire                                            camm at enhanced.com
> > ==========================================================================
> > "The earth is but one country, and mankind its citizens."  --  Baha'u'llah
> >
> > ------- Start of forwarded message -------
> > X-YMail-OSG: dfiyZd8VM1nP5PL6C0Yh5UuU_.FW78sSgGm09IOVpI8fTvWYGSsFiTgB9EbYmCi0WHN3WTCiw6gYbouFtYpFf8mA4RHd7wQsTusEKt1fU6GgHDK96gV4wXdJLWta
> > Reply-To: <fateman at cs.berkeley.edu>
> > From: "Richard Fateman" <fateman at cs.berkeley.edu>
> > To: "'Raymond Toy'" <raymond.toy at ericsson.com>, <daly at axiom-developer.org>
> > Cc: <boyer at cs.utexas.edu>, "'Camm Maguire'" <camm at debian.org>,
> >         "'Robert Dodier'" <robert.dodier at gmail.com>,
> >         "'Matt Kaufmann'" <kaufmann at cs.utexas.edu>, <Maxima at math.utexas.edu>,
> >         <axiom-developer at nongnu.org>
> > Subject: RE: [Maxima] gcl inlining policy
> > Date: Fri, 6 Jul 2007 14:21:44 -0700
> > Organization: UC Berkeley
> > X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
> > Thread-index: AcfAEG3zIg6uXqNEThWEWqdh2j5r/gAAi7gg
> > In-reply-to: <468EAC40.2070903 at ericsson.com>
> > X-SpamAssassin-Status: No, hits=-2.0 required=5.0
> > X-UTCS-Spam-Status: No, hits=-198 required=200
> >
> > It seems to me that in-lining a function of any significant length is very
> > likely to be a mistake, though I suppose it depends on the memory cache
> > management.  If the in-lining is done to remove consing, then perhaps the
> > solution is to address the consing itself.  I don't know the exact
> > circumstances of the needed optimization, but if it is arithmetic-related,
> > perhaps this can be done by stack allocation of temporary numbers.  I
> > haven't tried timing of ACL2 in Allegro Common Lisp, but it might be
> > revealing to do so.  It is my understanding that Allegro mostly, if not
> > entirely, ignores inline declarations of user-defined functions.
> > I have been told that this was a deliberate decision after a careful
> > consideration of the semantic consequences of inlining to the CL standard
> > versus the expected gain in performance.
> >
> > RJF
> >
> >
> > > -----Original Message-----
> > > From: maxima-bounces at math.utexas.edu
> > > [mailto:maxima-bounces at math.utexas.edu] On Behalf Of Raymond Toy
> > > Sent: Friday, July 06, 2007 1:55 PM
> > > To: daly at axiom-developer.org
> > > Cc: boyer at cs.utexas.edu; Camm Maguire; Robert Dodier; Matt
> > > Kaufmann; Maxima at math.utexas.edu; axiom-developer at nongnu.org
> > > Subject: Re: [Maxima] gcl inlining policy
> > >
> > > daly at axiom-developer.org wrote:
> > > >> As for space optimization I'm not sure it matters. I can't
> > > think of a
> > > > case where inlining will happen so many times that space is
> > > an issue.
> > > > Who writes a function with 50 mapcars? For space
> > > optimization it might
> > > > be better to throw away the cached sources.
> > >
> > > Here is an example where there is LOTS of inlining.  CMUCL supports
> > > double-double floats and has support for unboxed
> > > double-double floats.
> > > However, to make things fast, the code that implements the basic
> > > arithmetic operations is inlined.  So the routine that
> > > calculates log(x)
> > > via rational approximation has inlined lots of routines,
> > > perhaps 50 or
> > > more arithmetic operations.  A double-double add takes about
> > > 200 bytes.
> > >   The space used is a lot.
> > >
> > > I also have a quad-double implementation.  That is even
> > > larger because a
> > > quad-double operation is some 1K bytes long.  But if you don't inline
> > > the basic arithmetic function and all the internal functions that are
> > > needed to implement it, the consing slows down the operation by a
> > > significant fraction (half or more?).  To make these inline, I had to
> > > increase CMUCL's inlining threshold to 1600 (the  upper limit on the
> > > number of inline function calls that will be expanded in any
> > > given code
> > > object).
> > >
> > > Anyway, this is really way off topic for maxima, so I will
> > > refrain from
> > > saying more on this list.
> > >
> > > Ray
> > > _______________________________________________
> > > Maxima mailing list
> > > Maxima at math.utexas.edu
> > > http://www.math.utexas.edu/mailman/listinfo/maxima
> > >
> > ------- End of forwarded message -------
> >
> 
> 
> - -- 
> Jared C. Davis <jared at cs.utexas.edu>
> 3600 Greystone Drive #604
> Austin, TX 78731
> http://www.cs.utexas.edu/users/jared/
> ----------
> 
> 
> 
> 

-- 
Camm Maguire			     			camm at enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah