Thread overview
GDC on 4.2.3 with autovectorization
Apr 06, 2008
downs
Patch: http://demented.no-ip.org/~root/dgcc_to_4_2_3_and_autovec.patch
Apr 06, 2008
downs
Apr 07, 2008
Jérôme M. Berger
April 06, 2008
Here's a patch to get GDC SVN from 04.05.08 (roughly) to support 4.2.3, as well as automatic translation of loop statements into SSE optimized assembly.

Basic procedure goes as follows: download GDC from SVN, copy it into the GCC folder as per installation procedure, edit the setup-gcc.sh to replace the following line:

> elif grep -q '^4\.1\.' gcc/BASE-VER; then

with

> elif grep -q '^4\.[12]\.' gcc/BASE-VER; then

then try to run it.

It will, predictably, fail.

_Now_ apply the attached patch to the so-prepared GDC directory.

Configure and build as normal. (--disable-bootstrap if you don't like waiting for hours)

BEWARE! Not really being a GCC dev, I've had to make some _very_ weird _pure guesses_ during the change to 4.2.3, so the resulting code, while it appears to work correctly
(http://demented.no-ip.org/~root/results.html ; dstress backs me up .. the numbers are in line with official GDC results), might in fact, break C compatibility, break D compatibility,
or eat and/or abuse (and/or sexually), without discrimination, small objects, household pets and family members, INCLUDING YOU.

Don't say I didn't warn you.

That being said, have fun with it!

 --downs

PS: here's a demo of the autovectorizer at work:

gentoo-pc ~ $ cat test.d && gdc test.d -o test -O2 -msse -ftree-vectorize -ftree-vectorizer-verbose=5 -g && ./test && objdump -d test |grep addps -C10
module test; import std.stdio;
void main() {
  float[4] a = [1f, 2, 3, 4];
  float[4] b = [4f, 3, 2, 1];
  float[4] c;
  for (int i = 0; i < 4; ++i) c[i] = a[i] + b[i];
  writefln(c);
}

test.d:5: note: not vectorized: too many BBs in loop.
test.d:6: note: LOOP VECTORIZED.
test.d:2: note: vectorized 1 loops in function.
[5,5,5,5]
 804a228:       89 74 24 14             mov    %esi,0x14(%esp)
 804a22c:       89 44 24 08             mov    %eax,0x8(%esp)
 804a230:       89 54 24 0c             mov    %edx,0xc(%esp)
 804a234:       89 3c 24                mov    %edi,(%esp)
 804a237:       c7 44 24 04 04 00 00    movl   $0x4,0x4(%esp)
 804a23e:       00
 804a23f:       e8 fc 26 00 00          call   804c940 <_d_arraycopy>
 804a244:       b8 00 00 c0 7f          mov    $0x7fc00000,%eax
 804a249:       0f 28 45 c8             movaps -0x38(%ebp),%xmm0
 804a24d:       89 45 bc                mov    %eax,-0x44(%ebp)
 804a250:       0f 58 45 d8             addps  -0x28(%ebp),%xmm0
 804a254:       89 45 c0                mov    %eax,-0x40(%ebp)
 804a257:       89 45 c4                mov    %eax,-0x3c(%ebp)
 804a25a:       8d 45 b8                lea    -0x48(%ebp),%eax
 804a25d:       83 ec 04                sub    $0x4,%esp
 804a260:       0f 29 45 b8             movaps %xmm0,-0x48(%ebp)
 804a264:       89 44 24 08             mov    %eax,0x8(%esp)
 804a268:       c7 44 24 04 04 00 00    movl   $0x4,0x4(%esp)
 804a26f:       00
 804a270:       c7 04 24 34 a2 06 08    movl   $0x806a234,(%esp)
 804a277:       e8 84 9b 00 00          call   8053e00 <_D3std5stdio8writeflnFYv>
April 06, 2008
Here it is. Sorry.
April 07, 2008
downs wrote:
> Here's a patch to get GDC SVN from 04.05.08 (roughly) to support 4.2.3, as well as automatic translation of loop statements into SSE optimized assembly.
> 
	Thank you. I don't know on what platform you developed it, but it
appears to work fine here on a 64 bits linux. I only encountered one
minor issue: once your patch is applied to a gcc source tree, it
becomes impossible to build compilers for languages other than C and
D from that tree (not important since I use the system compiler for
everything else, but I thought I'd point it out since it took me a
bit of time to figure what was wrong).

		Jerome
- --
+------------------------- Jerome M. BERGER ---------------------+
|    mailto:jeberger@free.fr      | ICQ:    238062172            |
|    http://jeberger.free.fr/     | Jabber: jeberger@jabber.fr   |
+---------------------------------+------------------------------+