[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Simpe Gwydion question



In article <bruce-103AB8.11061913072002@copper.ipg.tsnz.net>,
 Bruce Hoult <bruce@hoult.org> wrote:

> Unfortunately, all three of the operations here are going through full 
> generic function dispatch.  You've done everything right, but we could 
> do a bit more work on this part of d2c.  We only got support for limited 
> vectors into the compiler at all a couple of versions ago and at the 
> moment it works, but is only optimized for some data types.

I've got some good news for you.  I've just checked some modifications 
into cvs for the compiler which implement unboxed vectors of doubles.  
It took five lines of code :-)


With one simple change your program now runs in 0.02 seconds instead of 
1.23 seconds.  After increasing the number of iterations from 100 to 
10,000 it takes 0.76 seconds.  So Dylan just got 160 times faster for 
this type of program.


Here is the Dylan and generated x86 machine code for vector-foo():

define function vector-foo(dvec :: <my-double-vector>) => ();
   let n :: <integer> = dvec.size;
   for (i :: <integer> from 0 below n)
     dvec[i] := dvec[i] + 1.64;
   end for;
end vector-foo;

0x8049440 <vector_foo_FUN>:     mov    0x8(%esp,1),%edx
0x8049444 <vector_foo_FUN+4>:   xor    %eax,%eax
0x8049446 <vector_foo_FUN+6>:   fldl   0x812eaa8
0x804944c <vector_foo_FUN+12>:  mov    0x8(%edx),%ecx
0x804944f <vector_foo_FUN+15>:  nop    
0x8049450 <vector_foo_FUN+16>:  cmp    %ecx,%eax
0x8049452 <vector_foo_FUN+18>:  jge    0x8049461 <vector_foo_FUN+33>
0x8049454 <vector_foo_FUN+20>:  fldl   0x10(%edx,%eax,8)
0x8049458 <vector_foo_FUN+24>:  fadd   %st(1),%st
0x804945a <vector_foo_FUN+26>:  fstpl  0x10(%edx,%eax,8)
0x804945e <vector_foo_FUN+30>:  inc    %eax
0x804945f <vector_foo_FUN+31>:  jmp    0x8049450 <vector_foo_FUN+16>
0x8049461 <vector_foo_FUN+33>:  fstp   %st(0)
0x8049463 <vector_foo_FUN+35>:  ret    
0x8049464 <vector_foo_FUN+36>:  lea    0x0(%esi),%esi
0x804946a <vector_foo_FUN+42>:  lea    0x0(%edi),%edi


I suspect this now beats CMUCL :-)  You're welcome.


The change is this:

   define constant <my-double-vector>
      = limited(<simple-vector>, of: <double-float>);

If you declare it as a limited version of <vector> then the compiler 
doesn't know whether vector-foo() is going to receive a fixed size or 
stretchy vector and so you get non-optimal code.  So we make sure it 
knows that it's going to get a fixed-size vector...

-- Bruce