From: jaffer@zurich.ai.mit.edu (Aubrey Jaffer)
Newsgroups: comp.lang.scheme
Subject: Benchmarking Scheme
Date: 1 May 91 20:38:05 GMT
Distribution: comp.lang.scheme
Organization: M.I.T. Artificial Intelligence Lab.

At the end of this message is benchmark code in C and in Scheme.  My
original hope was that computing the ratios of execution time for the
2 programs would allow me to roughly measure speed of an
implementation independent of the hardware.  There are problems with
this approach.

The benchmark programs take an argument and compute that many digits
of pi.  The programs take time proportional to the square of the
number of digits.  This allows one to compensate for widely differing
times of computation.

When I measure the times on my 10Mhz 68000 I get:
(pi 100)       takes 33.9 secs (scm2d)
time pi 200    takes 7.1 secs
33.9/(7.1/4) ==> 19 times slower than C

But on kleph.ai.mit.edu (which I think has a 68020) I get:
(pi 100)       takes 6.6 secs (scm2d)
time pi 800    takes 7.6 secs
6.6/(7.6/64) ==> 56 times slower that C ??!

What is going on here?  J. Finger suggests that kleph has an
instruction cache.  That would mean that the C inner loop stays in the
cache while the scm interpreter doesn't.

So what are the prospects for machine independent performance
measures?  An algorithm with a more complicated inner loop would keep
the C program out of cache.  But such algorithms tend either to be
artifically lengthened or lack the nice scaling properties of the pi
programs.  Part of the appeal to me of the pi program is that it does
an actual difficult computation with the best code I can devise.
Previously I used an intentionally less that optimal fibonacci.

An argument can also be made that the cached figures are more
realistic for everyday programming problems.  This has an interesting
consequence in that the penalty for using interpreters on newer
machines is more severe than on old ones.

So if anyone can suggest solitions to these problems I would be very
interested.  By the way, on kleph MITScheme takes 22 secs for (pi 100)
interpreted (185 times slower) and about 8 secs for (pi 100) compiled
(about 67 times slower) so I guess my interpreter is not doing too
badly.

/*  'Spigot' algorithm origionally due to Stanly Rabinowitz */
/*
main(c,v)
int c;char **v;{
  int n=200,j,m,b=2,k,t,r=1e5;
  long q;
  short *a;
  if(c==2)n=atoi(v[1])/5+1;
  k=m=3.322*n*5;
  a=calloc(1+m,2);
  while(k)a[--k]=2;
  for(a[m]=4;j<n;b=q%r){
    q=0;
    for(k=m;k;){
      q+=a[k]*r;
      t=(2*k+1);
      a[k]=q%t;
      q=q/t;
      q*=k--;}
    printf("%05d%s",b+q/r,++j%10?"  ":"\n");}
  puts("");}
*/

                       main
 (c,v)int c;char**v;{int n=200,j,m,
b=2,k,t,r=1e5;long q;short*a;if(c==2
 )n=atoi(v[1])/5+1;k=m=3.322*n*5;a=
       calloc(       1+m,2)
       ;while        (k)a[
       --k]=2        ;for(a
       [m]=4;        j<n;b=
       q%r){q        =0;for
       (k=m;k        ;){q+=
       a[k]*r        ;t=(2*
       k+1);         a[k]=q
       %t;q=         q/t;q*
       =k--;}        printf(
      "%05d%s"       ,b+q/r,
      ++j%10?        " ":"\n"
      );}puts         ("");}
  
===========================================================

(define r 100000)
(define (pi n)
  (let* ((n (+ (quotient n 5) 1))
	 (m (quotient (* n 5 3322) 1000))
	 (a (make-vector (+ 1 m) 2)))
    (vector-set! a m 4)
    (do ((j 1 (+ 1 j))
	 (q 0 0)
	 (b 2 (remainder q r)))
	((> j n))
      (do ((k m (- k 1)))
	  ((zero? k))
	(set! q (+ q (* (vector-ref a k) r)))
	(let ((t (+ 1 (* 2 k))))
	  (vector-set! a k (remainder q t))
	  (set! q (* k (quotient q t)))))
      (let ((s (number->string (+ b (quotient q r)))))
	(do ((l (string-length s) (+ 1 l)))
	    ((>= l 5) (display s))
	  (display #\0)))
      (display (if (zero? (modulo j 10)) #\newline #\ )))
    (newline)))