[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Simpe Gwydion question
In article <bruce-103AB8.11061913072002@copper.ipg.tsnz.net>,
Bruce Hoult <bruce@hoult.org> wrote:
> Unfortunately, all three of the operations here are going through full
> generic function dispatch. You've done everything right, but we could
> do a bit more work on this part of d2c. We only got support for limited
> vectors into the compiler at all a couple of versions ago and at the
> moment it works, but is only optimized for some data types.
I've got some good news for you. I've just checked some modifications
into cvs for the compiler which implement unboxed vectors of doubles.
It took five lines of code :-)
With one simple change your program now runs in 0.02 seconds instead of
1.23 seconds. After increasing the number of iterations from 100 to
10,000 it takes 0.76 seconds. So Dylan just got 160 times faster for
this type of program.
Here is the Dylan and generated x86 machine code for vector-foo():
define function vector-foo(dvec :: <my-double-vector>) => ();
let n :: <integer> = dvec.size;
for (i :: <integer> from 0 below n)
dvec[i] := dvec[i] + 1.64;
end for;
end vector-foo;
0x8049440 <vector_foo_FUN>: mov 0x8(%esp,1),%edx
0x8049444 <vector_foo_FUN+4>: xor %eax,%eax
0x8049446 <vector_foo_FUN+6>: fldl 0x812eaa8
0x804944c <vector_foo_FUN+12>: mov 0x8(%edx),%ecx
0x804944f <vector_foo_FUN+15>: nop
0x8049450 <vector_foo_FUN+16>: cmp %ecx,%eax
0x8049452 <vector_foo_FUN+18>: jge 0x8049461 <vector_foo_FUN+33>
0x8049454 <vector_foo_FUN+20>: fldl 0x10(%edx,%eax,8)
0x8049458 <vector_foo_FUN+24>: fadd %st(1),%st
0x804945a <vector_foo_FUN+26>: fstpl 0x10(%edx,%eax,8)
0x804945e <vector_foo_FUN+30>: inc %eax
0x804945f <vector_foo_FUN+31>: jmp 0x8049450 <vector_foo_FUN+16>
0x8049461 <vector_foo_FUN+33>: fstp %st(0)
0x8049463 <vector_foo_FUN+35>: ret
0x8049464 <vector_foo_FUN+36>: lea 0x0(%esi),%esi
0x804946a <vector_foo_FUN+42>: lea 0x0(%edi),%edi
I suspect this now beats CMUCL :-) You're welcome.
The change is this:
define constant <my-double-vector>
= limited(<simple-vector>, of: <double-float>);
If you declare it as a limited version of <vector> then the compiler
doesn't know whether vector-foo() is going to receive a fixed size or
stretchy vector and so you get non-optimal code. So we make sure it
knows that it's going to get a fixed-size vector...
-- Bruce