Thread overview
align doesn't work
Apr 07, 2003
Sean L. Palmer
Apr 07, 2003
Helmut Leitner
Apr 07, 2003
Sean L. Palmer
April 07, 2003
Locals should be able to be aligned to the specified requirements.  This is vital once we start dealing with types that have hard alignment requirements (such as structs that contain 128-bit xmmwords that must be 16-byte aligned so that inline asm that references them won't get alignment faults).

That cent/ucent type would sure be handy too.   ;)

Sean



align(16) struct foo { uint x,y; };

void main ()

{

    foo f;

    uint x;

    foo f2;

    printf("foo.y.offset = %d, foo.size = %d\n", foo.y.offset, foo.size); //
this is good

    printf("f = %p, f2 = %p\n", &f, &f2); // these should both be aligned to
16 bytes

    // align(16) foo f3;  // syntax error, I don't understand the reasoning
why.

}




April 07, 2003

"Sean L. Palmer" wrote:
> 
> Locals should be able to be aligned to the specified requirements.  This is vital once we start dealing with types that have hard alignment requirements (such as structs that contain 128-bit xmmwords that must be 16-byte aligned so that inline asm that references them won't get alignment faults).
> 
> That cent/ucent type would sure be handy too.   ;)
> 
> Sean
> 
> align(16) struct foo { uint x,y; };
> 
> void main ()
> 
> {
> 
>     foo f;
> 
>     uint x;
> 
>     foo f2;
> 
>     printf("foo.y.offset = %d, foo.size = %d\n", foo.y.offset, foo.size); //
> this is good
> 
>     printf("f = %p, f2 = %p\n", &f, &f2); // these should both be aligned to
> 16 bytes
> 
>     // align(16) foo f3;  // syntax error, I don't understand the reasoning
> why.
> 
> }

I'll add some weird facts to the topic alignment.

While doing precision benchmarks I found, that delegates and functions are extremely senible to alignment. The same functions

void TestLoop1000A ()
{
    for(int i=0; i<1000; i++) {
        // empty
    }
}

void TestLoop1000B ()
{
    for(int i=0; i<1000; i++) {
        // empty
    }
}

will perform quite differently (about 20% up) depending on their starting
offset within a 16-Byte frame (at least that is what the benchmrks seem to proof).
The measurement error is below 1% (reproducibility).

I don't understand it. I'm not a hardware man. It may be CPU-dependent (I used an Athlon 750 for this).

Exactly the same effect can be seen when benchmarking the same code by using closures.


-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com
April 07, 2003
On x86 architecture, branch targets do considerably better when aligned to at least 4 byte alignment (8 is better for modern CPU's I think)

It's even better to have your entire inner loop fit into as few cache lines as possible.

This is something the compiler should deal with internally when you specify -O;  the programmer should not have to concern themselves with such petty implementation details.  It's part of the standard size vs. speed tradeoff.

Or were you driving at the need for some directive to control code alignment manually?

Sean

"Helmut Leitner" <leitner@hls.via.at> wrote in message news:3E913C6A.CBF417D1@hls.via.at...

> I'll add some weird facts to the topic alignment.
>
> While doing precision benchmarks I found, that delegates and functions are extremely senible to alignment. The same functions
>
> void TestLoop1000A ()
> {
>     for(int i=0; i<1000; i++) {
>         // empty
>     }
> }
>
> void TestLoop1000B ()
> {
>     for(int i=0; i<1000; i++) {
>         // empty
>     }
> }
>
> will perform quite differently (about 20% up) depending on their starting
> offset within a 16-Byte frame (at least that is what the benchmrks seem to
proof).
> The measurement error is below 1% (reproducibility).
>
> I don't understand it. I'm not a hardware man.
> It may be CPU-dependent (I used an Athlon 750 for this).
>
> Exactly the same effect can be seen when benchmarking the same code by
using
> closures.
>
>
> --
> Helmut Leitner    leitner@hls.via.at
> Graz, Austria   www.hls-software.com