[Prev][Next][Index][Thread]

Re: Fun-O Basic Edition Compiler



In article <3C28F100.9478041B@mediaone.net>, Carl Gay 
<carlgay@mediaone.net> wrote:

> Carl Gay wrote:
> > 
> > I get 2.8x speedup in production mode over development mode.
> > Unfortunately, the lispworks version with no type declarations
> > runs 1.6x faster than that.
> 
> Even more unfortunately, a java version runs 1.6x faster than
> the Lispworks version and 2.5x faster than FunDev.
> 
> Anyone want to do the comparison on Gwydion?

Sure.

Using this code:

----------------------------------------------------------
module: tak

define function tak( x :: <integer>, y :: <integer>, z :: <integer>)
 => (result :: <integer>)
  if (y >= x)
    z
  else
    tak(tak(x - 1, y, z), tak(y - 1, z, x), tak(z - 1, x, y));
  end if;
end function tak;

define method tak-times(n :: <integer>)
  let result = 0;
  for (index from 1 to n)
    result := tak(18, 12, 6);
  end for;
  result;
end method tak-times;

format-out("tak(18, 12, 6) = %d\n", tak-times(2000));
----------------------------------------------------------

I get 6.0 seconds on a G4/867 Mac with OS X, d2c 2.3.6.  With "define 
method" as you had, it took 6.6 seconds.  On an Athlon/700 with Linux I 
get 3.8 seconds and 4.3 seconds.

The C code generated by d2c is (edited to remove some blank and comment 
lines):

----------------------------------------------------------
long takZtakZtak_FUN(descriptor_t *orig_sp, long A_x /* x */, long A_y 
/* y */, long A_z /* z */)
{
    descriptor_t *cluster_0_top;
    long L_arg0; /* arg0 */
    long L_arg1; /* arg1 */
    long L_arg2; /* arg2 */
    long L_result; /* result */
    if ((A_y < A_x)) {
        L_arg0 = takZtakZtak_FUN(orig_sp, (A_x - 1), A_y, A_z);
        L_arg1 = takZtakZtak_FUN(orig_sp, (A_y - 1), A_z, A_x);
        L_arg2 = takZtakZtak_FUN(orig_sp, (A_z - 1), A_x, A_y);
        L_result = takZtakZtak_FUN(orig_sp, L_arg0, L_arg1, L_arg2);
    }
    else {
        L_result = A_z;
    }
    return L_result;
}
----------------------------------------------------------

This is for the "function" version.  As a "method" there is an 
additional pointer for the "next-method" list (which is always the empty 
list in this case).

Other than getting rid of "orig_sp", you couldn't do any better than 
this in C.  "orig_sp" is a pointer to a stack area used by d2c as a more 
efficient substitute to C's "varargs" to pass arguments to unknown 
functions and receive (possibly multiple) results back from them.


The only way you'll get better than this, in *ANY* language, is to use 
partial inlining.

Trying the following on CMUCL on the same Athlon gives 10.9 seconds.

----------------------------------------------------------
(defun tak (x y z)
  (if (>= y x) z
    (tak (tak (1- x) y z)
         (tak (1- y) z x)
         (tak (1- z) x y))))

(defun tak-times (n)
  (dotimes (i n) (tak 18 12 6)))

(compile 'tak)
(compile 'tak-times)

(time (tak-times 2000))
----------------------------------------------------------

Adding (declare (fixnum x y z)) drops this to 4.7 seconds.  Adding 
"unsafe" flags to this might slightly speed up CMUCL, but 1) the 
recursive calls within tak are already well optimized and don't check 
argument types, and 2) it's against the philosophy of Dylan to ever have 
an "unsafe" mode, just as it is with Java.

CMUCL also tail-call optimizes the final call within tak, which d2c 
doesn't manage to do.


On the same Athlon, the following Java took 25.6 seconds using jdk 
1.1.8.  C code took 3.1.

The Mac took 3.9 seconds (jdk 1.3.1).  C code took 4.2.

----------------------------------------------------------
class tak {
    static int tak(int x, int y, int z){
        if (y >= x){
            return z;
        } else {
            return tak(tak(x-1, y, z),
                       tak(y-1, z, x),
                       tak(z-1, x, y));
        }
    }
    public static void main(String argv[]){
        int result = 0;
        for(int i=0; i<2000; ++i){
            result = tak(18,12,6);
        }
        System.out.println("tak(18,12,6) = " + result + "\n");
    }
}
----------------------------------------------------------


Overall:


       G4/867   K7/700
d2c      6.0      3.8
Lisp              4.7
Java     3.9     25.6
C        4.2      3.1


It's all pretty consistent.  The Mac comes with hotspot, Linux doesn't.  

The Dylan time looks slow on the Mac but this benchmark is *really* 
sensitive to very small differences.  I tried modifying the C code to 
see what diference it makes in time.  An extra argument costs about 0.5 
seconds.  Reversing the order of the branches of the "if" makes up most 
of the rest of the difference (d2c just happens to always canonicalize 
comparisons to be "<"ande that's the wrong thing in this case), and the 
forced sequencing of using named intermediate results adds a little bit 
as well.

The PowerPC looks slow compared to the Athlon.  Interestingly, most 
other Dylan code I've tried -- such as d2c itself -- runs considerably 
faster on the PowerPC than on the Athlon.


To illustrate how these results can be improved, consider this:

----------------------------------------------------------
module: tak

define function tak(x :: <integer>, y :: <integer>, z :: <integer>)
 => (result :: <integer>)
  if (y >= x)
    z
  else
    tak2(tak2(x - 1, y, z), tak2(y - 1, z, x), tak2(z - 1, x, y));
  end if;
end function tak;

define inline function tak2(x :: <integer>, y :: <integer>, z :: 
<integer>)
 => (result :: <integer>)
  if (y >= x)
    z
  else
    tak(tak(x - 1, y, z), tak(y - 1, z, x), tak(z - 1, x, y));
  end if;
end function tak2;

define method tak-times(n :: <integer>)
  let result = 0;
  for (index from 1 to n)
    result := tak(18, 12, 6);
  end for;
  result;
end method tak-times;

format-out("tak(8, 12, 6) = %d\n", tak-times(2000));
----------------------------------------------------------

3.77 seconds on the Mac (beating Java), 3.82 seconds on Linux.

Adding a 3rd level drops the time just a tiny bit more.

-- Bruce



Follow-Ups: References: