More info loading fun

Subject: More info loading fun
From: Rupert Swarbrick
Date: Wed, 03 Apr 2013 13:54:45 +0100
Raymond Toy <toy.raymond at gmail.com> writes:
> I tried this out on an old 750 MHz sparc with cmucl, the oldest and
> slowest machine I have access to.  From the maxima prompt until "?
> run_testsuite" returns, it takes about 7 sec.  A tad slow, but not
> terribly slow.  About 3 sec for the next invocation.
>
> For completeness, here are the results from the commit message:
>
> ; Evaluation took:
> ;   5.56f0 seconds of real time
> ;   5.2f0 seconds of user run time
> ;   0.27f0 seconds of system run time
> ;   4,173,048,959 CPU cycles
> ;   0 page faults and
> ;   21,354,312 bytes consed.

Thank you very much for trying this out! To be honest, that's quite a
lot slower than I was expecting. I spent some time trying to see whether
we could be faster on CMUCL and it looks like there's something weird
buried in the regex engine. More below.

> Works ok with cmucl on a Mac.  (But the 2013-03a snapshot has a bug in
> file-position that causes the text to be wrong.)

Oh dear. Is that something we can / should work around? What propertion
of cmucl users will be running the 2013-03a snapshot?

About the regex stuff:

TLDR:
=====

  (1) I've just pushed a change that makes "??" significantly faster on
      CMUCL and should improve it on other lisps too. No change to "?",
      I'm afraid.

  (2) Ray, is there any way you can easily work out what the sparc is
      spending all it's time on? (with the new code) Running the first
      code snippet below with profile:profile-all would probably tell me
      a lot. If you can, could you use (topic-match "foo" t) and
      (topic-match "foo" nil) and let me know the difference?

Gory details:
=============

Firstly, I used the deterministic profilers on SBCL and CMUCL, running

  (progn
    (setf *documents* nil)
    (pushnew (info-pathname nil) *deferred-documents*)
    (topic-match "Number Theory" t))

ten times each. The top of the results:

SBCL:
  CL-INFO> (sb-profile:report)
  measuring PROFILE overhead..done
    seconds  |     gc     |   consed   |  calls  |  sec/call  |  name  
  ----------------------------------------------------------
       0.233 |      0.000 | 30,715,992 |       5 |   0.046609 | SCAN-INFO-FILE
       0.150 |      0.016 | 14,676,784 |       5 |   0.030064 | ALL-DOC-REGEX-MATCHES
       0.093 |      0.000 | 16,384,464 |       5 |   0.018546 | READ-INFO-INDEX
       0.053 |      0.012 |  9,333,656 |  13,840 |   0.000004 | STRIP-SECTION-TITLE
       0.019 |      0.000 |          0 |  12,130 |   0.000002 | INFO-INDEX-STRIPPED-NODE

CMUCL:
  CL-INFO> (profile:report-time)

      Consed |   Calls |  Secs | Sec/Call |  Bytes/C. | Name:
  -----------------------------------------------------------------------
  14,060,392 |      10 | 0.380 |  0.03800 | 1,406,039 | DOCUMENTATION-MATCHING-TOPICS
  11,936,072 |       5 | 0.210 |  0.04200 | 2,387,214 | SCAN-INFO-FILE
  10,054,392 |       5 | 0.180 |  0.03600 | 2,010,878 | READ-INFO-INDEX
   6,806,208 |  13,840 | 0.082 |  0.00001 |       492 | STRIP-SECTION-TITLE
   2,069,040 |       5 | 0.060 |  0.01200 |   413,808 | INFO-READ-TAG-TABLE

The CMUCL results are a bit surprising, since
DOCUMENTATION-MATCHING-TOPICS is just a search through stuff we've
already parsed (and should be quick!). Narrowing down further, if I do
the following:

  (defvar *regex*
    (coerce
     (maxima-nregex::regex-compile
      "Number Theory" :case-sensitive nil) 'function))

  (defvar *titles*
    (mapcar #'doc-topic-name (info-doc-index (cdar *documents*))))

  (time (dotimes (i 100) (mapc *regex* *titles*)))

then on SBCL, I get:
  Evaluation took:
    0.027 seconds of real time
    0.028000 seconds of total run time (0.028000 user, 0.000000 system)
    103.70% CPU
    63,855,669 processor cycles
    3,878,904 bytes consed

and on CMUCL:
  ; Evaluation took:
  ;   2.5f0 seconds of real time
  ;   2.444f0 seconds of user run time
  ;   0.052f0 seconds of system run time
  ;   5,734,503,475 CPU cycles
  ;   [Run times include 0.09f0 seconds GC run time]
  ;   0 page faults and
  ;   111,166,664 bytes consed.

!!! Looks like I've found something CMUCL is doing really badly

I'm not going to work on MAXIMA-NREGEX today. Indeed, maybe the right
thing to do is to replace it with CL-PPCRE or similar? Anyway, the good
news is that "??" doesn't actually need a regular expression search,
since it just works by case-folded substrings. So we can do that
instead. I've pushed a version based on SEARCH instead, where the
"case-folding" works by discarding everything except spaces and
alphanumerics then upcasing the lot. Maybe this isn't quite what we
want, but I like the fact that "?? eulers" returns %e ("Euler's number").

Timings afterwards are much faster for ?? than for ?. On CMUCL, the
difference is pronounced. The following test is basically loading up the
document 50 times and calling "? Number Theory" or "?? Number Theory"
each time.

  CL-INFO> (time
            (dotimes (n 50)
              (progn
                (setf *documents* nil)
                (deferred-register-document (info-pathname nil))
                (topic-match "Number Theory" t))))

  ; Evaluation took:
  ;   8.02f0 seconds of real time
  ;   7.6f0 seconds of user run time
  ;   0.408f0 seconds of system run time
  ;   18,400,213,418 CPU cycles
  ;   [Run times include 0.68f0 seconds GC run time]
  ;   0 page faults and
  ;   498,037,184 bytes consed.
  ; 
  NIL
  CL-INFO> (time
            (dotimes (n 50)
              (progn
                (setf *documents* nil)
                (deferred-register-document (info-pathname nil))
                (topic-match "Number Theory" nil))))

  ; Evaluation took:
  ;   5.11f0 seconds of real time
  ;   4.764f0 seconds of user run time
  ;   0.336f0 seconds of system run time
  ;   11,722,715,612 CPU cycles
  ;   [Run times include 0.52f0 seconds GC run time]
  ;   0 page faults and
  ;   374,522,656 bytes consed.
  ; 
  NIL

The corresponding times before this change are 8.05 seconds and 12.9
seconds, respectively. Woo! (In case anyone is wondering, on SBCL with
the new code the timings are 5.4 seconds, 3.9 seconds)

If anyone's still reading here, I'm astonished...

Rupert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 315 bytes
Desc: not available
URL: <http://www.math.utexas.edu/pipermail/maxima/attachments/20130403/d8da553e/attachment.pgp>;
Prev by Date: maxima 5.30.0 error when starting
Next by Date: maxima 5.30.0 error when starting
Previous by thread: More info loading fun
Next by thread: More info loading fun
Index(es):
- Date
- Thread