Thread overview
How to set struct alignment on the stack?
Feb 08, 2005
Brian Chapman
Feb 09, 2005
Craig Black
Feb 09, 2005
Brian Chapman
February 08, 2005
Was going to optimize my vector functions for SSE capable CPUs but I ran into a problem. How does one set the alignment for a struct? Not the byte packing alignment for the member data, but how the struct gets aligned on the stack? This is very important for SIMD operations. In the code that follows, there are two main functions. The first one will crash. The second one works, but is not optimal and sucks. Am I doing something lame? Thanks for any time you take to reply! - Brian

version = ia32simd; // this is the version being tested.
/********************************************************/
align (16) struct vector
{
	float x,y,z,w;
	void set (float a, float b, float c) {x=a;y=b;z=c;w=1;}
	void print () {printf ("[ %g, %g, %g, %g ]\n",x,y,z,w);}
}

void add (inout vector result, inout vector a, inout vector b)
{
	version (ia32simd) asm
	{
		mov ESI,a;
		mov EDI,b;
		movaps XMM0,[ESI];
		addps XMM0,[EDI];
		mov ESI,result;
		movaps [ESI],XMM0;
	}
	else
	{
		c.x = a.x + b.x;
		c.y = a.y + b.y;
		c.z = a.z + b.z;
	}
}

/********************************************************/
/* This Main Doesn't Work */

static assert (vector.sizeof == 16);
//static assert (vector.alignof == 16); // FAILS! ???

void main1 ()
{
	vector a,b,c;
	//assert ((cast(int)(&a) & 0b1111) == 0); // FAILS!
	//assert ((cast(int)(&b) & 0b1111) == 0); // FAILS!
	//assert ((cast(int)(&c) & 0b1111) == 0); // FAILS!
	a.set (1,2,3);
	b.set (4,5,6);
	add (c,a,b); // Error: Win32 Exception !!!
	c.print();
}

/********************************************************/
/* This Main Works, but SUCKS! */

vector *alloc16aligned ()
{
	/* allocate a vector off the heap 16 bytes aligned */
	byte *p = new byte [vector.sizeof+0b1111];
	return cast(vector*)(((cast(int)(p))+0b1111)&~0b1111);
}

void main2 ()
{
	vector *a = alloc16aligned();
	vector *b = alloc16aligned();
	vector *c = alloc16aligned();
	a.set (1,2,3);
	b.set (4,5,6);
	add (*c,*a,*b);
	c.print();
	assert (c.x == 5);
	assert (c.y == 7);
	assert (c.z == 9);
	assert (c.w == 2);
}

February 08, 2005
Brian Chapman wrote:

> Was going to optimize my vector functions for SSE capable CPUs but I ran into a problem. How does one set the alignment for a struct? Not the byte packing alignment for the member data, but how the struct gets aligned on the stack? This is very important for SIMD operations.

It's equally important for AltiVec, as well as it is for SSE.

It would be nice to avoid having to use assembler*, but then
D would have to have the same vector extensions that C has...
http://developer.apple.com/hardware/ve/model.html

And since the PowerPC G4+ has 32 vector registers, in addition
to the 32 integer and the 32 floating-point registers, passing
vector data on the stack does suck in comparison with registers.

But a first small step is aligning the thing to 16-byte boundaries.
Otherwise one would have permute all loads, and that sucks worse.
http://developer.apple.com/hardware/ve/alignment.html

--anders

* not that GDC supports any inline assembler yet anyway, but...
February 09, 2005
SIMD extensions for D would be really cool.

"Anders F Björklund" <afb@algonet.se> wrote in message news:cubf02$s48$1@digitaldaemon.com...
> Brian Chapman wrote:
>
>> Was going to optimize my vector functions for SSE capable CPUs but I ran into a problem. How does one set the alignment for a struct? Not the byte packing alignment for the member data, but how the struct gets aligned on the stack? This is very important for SIMD operations.
>
> It's equally important for AltiVec, as well as it is for SSE.
>
> It would be nice to avoid having to use assembler*, but then D would have to have the same vector extensions that C has... http://developer.apple.com/hardware/ve/model.html
>
> And since the PowerPC G4+ has 32 vector registers, in addition to the 32 integer and the 32 floating-point registers, passing vector data on the stack does suck in comparison with registers.
>
> But a first small step is aligning the thing to 16-byte boundaries. Otherwise one would have permute all loads, and that sucks worse. http://developer.apple.com/hardware/ve/alignment.html
>
> --anders
>
> * not that GDC supports any inline assembler yet anyway, but...


February 09, 2005
On 2005-02-08 16:37:54 -0600, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb@algonet.se> said:

> It's equally important for AltiVec, as well as it is for SSE.

Yeah, I was wanting to do some altivec too, but that's going to require an external asm file since, as you mentioned, GDC doesn't support inline asm. Which means, that it's only worth while to do on longer operations with more data (like matrices). But since its external, I may as well do it in C and use the compiler intrinsics, as you also mentioned. But then suddenly I'm back to using C again which I was wanting to get away from. *sigh*


> It would be nice to avoid having to use assembler*, but then
> D would have to have the same vector extensions that C has...
> http://developer.apple.com/hardware/ve/model.html
> 
> And since the PowerPC G4+ has 32 vector registers, in addition
> to the 32 integer and the 32 floating-point registers, passing
> vector data on the stack does suck in comparison with registers.
> 
> But a first small step is aligning the thing to 16-byte boundaries.
> Otherwise one would have permute all loads, and that sucks worse.
> http://developer.apple.com/hardware/ve/alignment.html
> 
> --anders
> 
> * not that GDC supports any inline assembler yet anyway, but...

It would be nice if at the very least there was a way, perhaps via the command line, to globally set the data alignment to an arbitrary value (in this case 16 bytes).

February 09, 2005
Brian Chapman wrote:

> Yeah, I was wanting to do some altivec too, but that's going to require an external asm file since, as you mentioned, GDC doesn't support inline asm. Which means, that it's only worth while to do on longer operations with more data (like matrices). But since its external, I may as well do it in C and use the compiler intrinsics, as you also mentioned. But then suddenly I'm back to using C again which I was wanting to get away from. *sigh*

D doesn't let you get away from C. It lets you get away from *C++* :-)

AltiVec works fine if you compile it with /usr/bin/gcc, and then link
in the objects in the D source ? (it'll require a PPC G4/G5, of course)

It might be possible (with a few months or something of work) to get
the AltiVec patches and the D patches to co-exist in the GCC 3.3 base...

See this changelog for all the patches that are being applied to it:
http://www.opensource.apple.com/darwinsource/DevToolsAug2004/gcc-1762/CHANGES.Apple

(some examples)
> Owner     Status     Name of change
> -----     ------     --------------
> zlaski    local      -Wno-altivec-long-deprecated
> shebs     mixed      AltiVec
> shebs     unknown    Altivec related
> shebs     unknown    darwin native, AltiVec
> shebs     local      disable generic AltiVec patterns

And a ton of other patches, mostly related to 1) Objective-C
2) Objective-C++ 3) Macintosh legacy 4) Fat i386/ppc builds
(the sources are modified, so you need to use "diff" a lot)

To my local GCC/GDC copy, I have applied the Apple framework patches
(so that "#include <Carbon/Carbon.h>" and -framework Carbon works)
as well as the -mcpu patches so that G3, G4 and G5 are recognized.
http://dstress.kuehne.cn/raw_results/mac-OS-X-10.3.7_gdc-0.10-patch/

But perhaps a worthier effort would be to port GDC to GCC 4.0 ?

--anders