property list hack, profiling, compiler optimization, looking at code
Subject: property list hack, profiling, compiler optimization, looking at code
From: Richard Fateman
Date: Sat, 11 Aug 2012 10:53:18 -0700
This is not part of the Common Lisp standard, but here is an example
(edited somewhat for brevity)
to show how profiling can work. The function f was previously defined.
f(n):=block([],mode_declare([i,n],integer),
for i thru n do (sin(i),cos(i),log(i),tan(i),atan2(1,i)))$
I compiled it in Allegro CL 8.2 (express edition [free])
(%i51) f(10^5);
Evaluation took 11.1720 seconds (11.3120 elapsed) including GC time
0.016 s, using 10500037 cons-cells and 120 other bytes.
(%o51) done
(%i52) to_lisp();
Type (to-maxima) to restart, ($quit) to quit Maxima.
maxima> (symbol-plist '%sin)
(tex-rbp 130 texsym ("\\sin ") texword "\\sin " tex tex-prefix
commutes-with-conjugate t
sp2
((%sum) ...... etc. ;; the operators property was way down
on the prop list
maxima> (load "c:/lisp/ops.lisp")
; Loading c:\lisp\ops.lisp ;; this puts the operators simplifier at
the top of the prop list.
t
maxima> (continue)
(%i53) f(10^5);
Evaluation took 11.0000 seconds (11.3130 elapsed) using 10500031
cons-cells and 152 other bytes.
(%o53) done
(%i54) 0.172/11.172;
Evaluation took 0.0000 seconds (0.0000 elapsed) using 9 cons-cells and
144 other bytes.
(%o54) .01539563193698532
;;; the speedup for Allegro CL is only about 1.5%. Still, essentially
no cost to do this.
(%i55) :lisp (symbol-plist '%sin)
(operators simp-%sin tex-rbp 130 texsym (\sin ) texword \sin tex tex-prefix
... just checking to see where the operators property is ..
(%i55) :lisp (prof:start-profiler) ;;; THIS IS THE ALLEGRO PROFILER.
SBCL HAS ONE, GCL maybe not..
sampling
(%i55) f(10^5);
Evaluation took 10.9840 seconds (10.9840 elapsed) including GC time
0.016 s, using 10500031 cons-cells and 216 other bytes.
(%o55) done
(%i56) :lisp (prof:show-flat-profile)
Sampling stopped after 3735 samples taken for the time profiler.
Sample represents 11.0 seconds of processor time (out of a total of 11.0)
Times below 1.0% will be suppressed.
% % self total self total Function
Time Cum. secs secs calls ms/call ms/call name
7.8 7.8 0.9 0.9 "qcons"
6.7 14.5 0.7 10.7 ... simplifya
5.7 20.2 0.6 0.6 excl::get_2op
4.3 24.4 0.5 0.5 "bind_one_spec"
3.6 28.0 0.4 0.4 "qcar"
3.2 31.2 0.4 0.5 ... zerop1
2.4 33.7 0.3 0.4 eqtest
2.2 35.9 0.2 0.4 ...
excl::gethash_2op_1ret
2.2 38.1 0.2 1.5 ... tms
2.1 40.2 0.2 0.5 risplit
2.0 42.2 0.2 0.2 mop
2.0 44.1 0.2 0.2 $bfloatp
1.7 45.9 0.2 2.1 ... simptimes
1.7 47.6 0.2 1.0 ... trisplit
1.7 49.3 0.2 1.3 ... flonum-eval
1.6 50.9 0.2 0.2 "qcdr"
1.5 52.4 0.2 0.5 ... complex-number-p
1.5 53.9 0.2 0.2 "restify2"
1.4 55.3 0.2 0.2 ... alike1
1.4 56.7 0.2 0.2 "unbind_count"
1.4 58.1 0.1 0.6 ... timesin
1.3 59.4 0.1 0.1 floatp
1.3 60.7 0.1 0.1 "unbind_one_spec"
1.2 61.9 0.1 1.1 ... big-float-eval
1.1 63.0 0.1 4.4 ... simpatan2
1.1 64.1 0.1 1.7 ... simp-%tan
1.0 65.1 0.1 2.7 ... simp-%atan
1.0 66.1 0.1 0.2 ... distribute-over
(%i56) (%o50) done
Looks like there are a bunch of places to make this program run faster.
zerop1 is near the top. risplit (real-imaginary-split ???) is there too.
I assume that programs like "qcons" in quotes are part of the lisp system.
In my experience this profiler works for foreign function library programs
too. That is, loading MPFR, written in C, and running the profiler reports
times for calls of C programs to other C programs within MPFR. (subject
to availability of external names, etc.)
If someone wants to speed up the test suite start with a profile.
for sbcl, see http://www.sbcl.org/manual/Statistical-Profiler.html
This may be a poor place to start (like counting cup-holders :) but
it could be done.
Some speedups are useful only relative to particular underlying lisps.
For example, I found major speedups available in simplification using
hashtables in Allegro CL, but for GCL, not much. Also there are
issues of compiler optimization vs. debugging. I assume there is a
modest factor improvement available in Allegro and SBCL by setting
the compiler optimization level. In selected circumstances with
declarations for numerics the speed-up can be quite substantial
(10X or even 100X), but I assume it would be much less if one re-compiles
"all of maxima" and runs mostly symbolic code.
It is possible to look at the "maxima" version of $f; it is here:
(symbol-plist '$f)
(excl::.args. nil function-mode $any translated t once-translated
"I was once translated" mprops
(nil mexpr
((lambda) ((mlist) $n)
((mprog) ((mlist)) (($modedeclare) ((mlist) $i $n) $integer)
((mdo) $i 1 nil nil $n nil
((mprogn) ((%sin) $i) ((%cos) $i) ((%log) $i) ((%tan) $i)
(($atan2) 1 $i)))))))
It is possible to look at the "lisp" version of $f by tracing
translate-function
(defmtrfun ($f $any mdefine nil nil) ($n) (declare (special $n)
(fixnum $n))
((lambda nil
nil
nil
(do (($i 1 (+ 1 $i)))
((> $i $n) '$done)
(declare (special $i))
(simplify (list '(%sin) $i))
(simplify (list '(%cos) $i))
(simplify (list '(%log) $i))
(simplify (list '(%tan) $i))
(simplify (list '($atan2) 1 $i)))))))
It is possible to look at the assembler code for $f ...
maxima> (disassemble '$f)
;; disassembly of #<Function $f>
;; formals: $n
;; constant vector:
0: $n
1: $i
2: $done
3: (%sin)
4: simplify
5: (%cos)
6: (%log)
7: (%tan)
8: ($atan2)
;; code start: #x21ac1af4:
0: 55 pushl ebp
1: 8b ec movl ebp,esp
3: 83 ec 78 subl esp,$120
6: 89 75 fc movl [ebp-4],esi
9: 89 5d e4 movl [ebp-28],ebx
12: 39 63 1a cmpl [ebx+26],esp
15: 76 03 jbe 20
17: ff 57 43 call *[edi+67] ; sys::trap-stack-ovfl
20: 8b 9b e6 00 movl ebx,[ebx+230]
00 00
.... etc etc.