Here's a patch to get GDC SVN from 04.05.08 (roughly) to support 4.2.3, as well as automatic translation of loop statements into SSE optimized assembly.
Basic procedure goes as follows: download GDC from SVN, copy it into the GCC folder as per installation procedure, edit the setup-gcc.sh to replace the following line:
> elif grep -q '^4\.1\.' gcc/BASE-VER; then
with
> elif grep -q '^4\.[12]\.' gcc/BASE-VER; then
then try to run it.
It will, predictably, fail.
_Now_ apply the attached patch to the so-prepared GDC directory.
Configure and build as normal. (--disable-bootstrap if you don't like waiting for hours)
BEWARE! Not really being a GCC dev, I've had to make some _very_ weird _pure guesses_ during the change to 4.2.3, so the resulting code, while it appears to work correctly
(http://demented.no-ip.org/~root/results.html ; dstress backs me up .. the numbers are in line with official GDC results), might in fact, break C compatibility, break D compatibility,
or eat and/or abuse (and/or sexually), without discrimination, small objects, household pets and family members, INCLUDING YOU.
Don't say I didn't warn you.
That being said, have fun with it!
--downs
PS: here's a demo of the autovectorizer at work:
gentoo-pc ~ $ cat test.d && gdc test.d -o test -O2 -msse -ftree-vectorize -ftree-vectorizer-verbose=5 -g && ./test && objdump -d test |grep addps -C10
module test; import std.stdio;
void main() {
float[4] a = [1f, 2, 3, 4];
float[4] b = [4f, 3, 2, 1];
float[4] c;
for (int i = 0; i < 4; ++i) c[i] = a[i] + b[i];
writefln(c);
}
test.d:5: note: not vectorized: too many BBs in loop.
test.d:6: note: LOOP VECTORIZED.
test.d:2: note: vectorized 1 loops in function.
[5,5,5,5]
804a228: 89 74 24 14 mov %esi,0x14(%esp)
804a22c: 89 44 24 08 mov %eax,0x8(%esp)
804a230: 89 54 24 0c mov %edx,0xc(%esp)
804a234: 89 3c 24 mov %edi,(%esp)
804a237: c7 44 24 04 04 00 00 movl $0x4,0x4(%esp)
804a23e: 00
804a23f: e8 fc 26 00 00 call 804c940 <_d_arraycopy>
804a244: b8 00 00 c0 7f mov $0x7fc00000,%eax
804a249: 0f 28 45 c8 movaps -0x38(%ebp),%xmm0
804a24d: 89 45 bc mov %eax,-0x44(%ebp)
804a250: 0f 58 45 d8 addps -0x28(%ebp),%xmm0
804a254: 89 45 c0 mov %eax,-0x40(%ebp)
804a257: 89 45 c4 mov %eax,-0x3c(%ebp)
804a25a: 8d 45 b8 lea -0x48(%ebp),%eax
804a25d: 83 ec 04 sub $0x4,%esp
804a260: 0f 29 45 b8 movaps %xmm0,-0x48(%ebp)
804a264: 89 44 24 08 mov %eax,0x8(%esp)
804a268: c7 44 24 04 04 00 00 movl $0x4,0x4(%esp)
804a26f: 00
804a270: c7 04 24 34 a2 06 08 movl $0x806a234,(%esp)
804a277: e8 84 9b 00 00 call 8053e00 <_D3std5stdio8writeflnFYv>
|