December 19, 2008
Sergey Gromov wrote:
> Thu, 18 Dec 2008 17:08:12 +0100, Don wrote:
> 
>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov 
>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>> Foo.sizeof is 16.
>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
> 
> I think this is not something you can assume or not assume.  The set of
> conditions you mention is required and sufficient to respect alignment
> requirements of array elements while obeying the rules of pointer math.
> You cannot change them until you drop either pointer math or alignment
> guarantees.
> 
>> But these values *don't* reflect the machine behaviour.
>>
>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
> 
> If you want size, you can use align():
> 
> struct ReallySmall
> {
>   int flags;
>   align(2) real value;
>   byte flags2;
>   align(2) real value2;
> }
> 
> Now ReallySmall.value.alignof == 2, and sorting will give you
> 
> struct ReallySmallSorted
> {
>   int flags;
>   align(2) real value;
>   align(2) real value2;
>   byte flags2;
> }
> 
> with ReallySmallSorted.sizeof == 28 and .alignof == 4 on Windows.
> 
>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
> 
> I don't understand this part.  Do you say that fields in different
> elements of the same array should be aligned differently?

No. It only applies in this situation where you have a struct which contains another struct which is align(1).

Given

align(1):
struct Foo {
  int a;
  double b;
}

(A)
struct Bar {
  int c;
  Foo d;
}

(B)
struct Bar {
  Foo d;
  int c;
}

(A) is better than (B) because it aligns Bar.d.b correctly. It's a trick, really -- it's using the other elements of Bar as padding for Foo.
I can't think of any other cases where this trick applies. As you say, an array of Foo[] cannot have all of its elements aligned.
So probably, the 'ideal solution' for the proposed template would be probe the insides of each struct to see if the trick can be used.

real80 is another case where the trick can be applied.

float[4] may also benefit from being aligned to a 16-byte boundary. With Intel's AVX instructions, something as large as float[16] might like to be on a 256-byte boundary, but all such cases are easy, since they involve x.sizeof == pow(2,k). align(1) structs and real80 are the only non-trivial ones.
January 08, 2009
Andrei Alexandrescu wrote:
> Don wrote:
>> Denis Koroskin wrote:
>>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>> Foo.sizeof is 16.
>>>>> It is 12 with align(1) (I see no problem with it).
>>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>>> No, there is. You should have structure as follows:
>>>
>>> align(1):
>>> struct Foo
>>> {
>>>     char c;
>>>     double d;
>>>     short s;
>>>     bool b;
>>> }
>>
>> [snip]
>>
>>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
>>
>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
>> But these values *don't* reflect the machine behaviour.
>>
>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
>>
>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
>> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
>> The simple algorithm may well be good enough.
>> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
>>
>> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
> 
> That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).
> 
> Andrei

Here's a very simple implementation that should solve the problem described in the original post. It doesn't deal with the obscure and difficult cases of align(1) structs, or 80-bit reals.
Implementation includes workarounds for FOUR compiler bugs.
Works without modification on both D1 and D2.

----
/// Order the provided members to minimize size while preserving alignment
/// BUG: bugzilla 2029 prevents the signature from being (string[] names...),
/// so we need to use an ugly array literal instead.
char [] alignForSize(E...)(string[E.length] names)
{
    // Sort all of the members by .alignof.
    // BUG: Alignment is not always optimal for align(1) structs
    // or 80-bit reals.
    // TRICK: Use the fact that .alignof is always a power of 2,
    // and maximum 16 on extant systems. Thus, we can perform
    // a very limited radix sort.
    int [][] alignlist; // workaround for bugzilla 2569
    // alignof = 1, 2, 4, 8, 16, 32, 64
    alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
    char[][] declaration;
    foreach(int i_bug,T; E) {
        int i = i_bug; // workaround for bugzilla 2564 (D2 only)
        declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
        int a = T.alignof;
        int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : a>=64? 6 : 0;
        alignlist[k]~=i;
    }
    char [] s;
    foreach_reverse(q; alignlist) {
      foreach(int i; q) {
        s ~=  declaration[i];
      }
    }
    return s;
}

// Example:

struct Foo {
    mixin(alignForSize!(int[], char[13], short, double[5])(["x", "y","z", "w"]));
}

is equivalent to:

struct Foo{
 double[5u] w; // aligned to 8 bytes
 int[] x;      // aligned to 4
 short z;      // aligned to 2
 char[13u] y;   // aligned to 1
}

which will have a size of 63, followed by 1 byte of padding.
January 08, 2009
Don wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Denis Koroskin wrote:
>>>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>>> Foo.sizeof is 16.
>>>>>> It is 12 with align(1) (I see no problem with it).
>>>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>>>> No, there is. You should have structure as follows:
>>>>
>>>> align(1):
>>>> struct Foo
>>>> {
>>>>     char c;
>>>>     double d;
>>>>     short s;
>>>>     bool b;
>>>> }
>>>
>>> [snip]
>>>
>>>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
>>>
>>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
>>> But these values *don't* reflect the machine behaviour.
>>>
>>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
>>>
>>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
>>> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
>>> The simple algorithm may well be good enough.
>>> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
>>>
>>> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
>>
>> That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).
>>
>> Andrei
> 
> Here's a very simple implementation that should solve the problem described in the original post. It doesn't deal with the obscure and difficult cases of align(1) structs, or 80-bit reals.
> Implementation includes workarounds for FOUR compiler bugs.
> Works without modification on both D1 and D2.
> 
> ----
> /// Order the provided members to minimize size while preserving alignment
> /// BUG: bugzilla 2029 prevents the signature from being (string[] names...),
> /// so we need to use an ugly array literal instead.
> char [] alignForSize(E...)(string[E.length] names)
> {
>     // Sort all of the members by .alignof.
>     // BUG: Alignment is not always optimal for align(1) structs
>     // or 80-bit reals.
>     // TRICK: Use the fact that .alignof is always a power of 2,
>     // and maximum 16 on extant systems. Thus, we can perform
>     // a very limited radix sort.
>     int [][] alignlist; // workaround for bugzilla 2569
>     // alignof = 1, 2, 4, 8, 16, 32, 64
>     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
>     char[][] declaration;
>     foreach(int i_bug,T; E) {
>         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
>         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
>         int a = T.alignof;
>         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : a>=64? 6 : 0;
>         alignlist[k]~=i;
>     }
>     char [] s;
>     foreach_reverse(q; alignlist) {
>       foreach(int i; q) {
>         s ~=  declaration[i];
>       }
>     }
>     return s;
> }
> 
> // Example:
> 
> struct Foo {
>     mixin(alignForSize!(int[], char[13], short, double[5])(["x", "y","z", "w"]));
> }
> 
> is equivalent to:
> 
> struct Foo{
>  double[5u] w; // aligned to 8 bytes
>  int[] x;      // aligned to 4
>  short z;      // aligned to 2
>  char[13u] y;   // aligned to 1
> }
> 
> which will have a size of 63, followed by 1 byte of padding

Wow; thanks; that's awesome.
January 08, 2009
Don wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Denis Koroskin wrote:
>>>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>>> Foo.sizeof is 16.
>>>>>> It is 12 with align(1) (I see no problem with it).
>>>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>>>> No, there is. You should have structure as follows:
>>>>
>>>> align(1):
>>>> struct Foo
>>>> {
>>>>     char c;
>>>>     double d;
>>>>     short s;
>>>>     bool b;
>>>> }
>>>
>>> [snip]
>>>
>>>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
>>>
>>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
>>> But these values *don't* reflect the machine behaviour.
>>>
>>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
>>>
>>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
>>> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
>>> The simple algorithm may well be good enough.
>>> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
>>>
>>> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
>>
>> That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).
>>
>> Andrei
> 
> Here's a very simple implementation that should solve the problem described in the original post. It doesn't deal with the obscure and difficult cases of align(1) structs, or 80-bit reals.
> Implementation includes workarounds for FOUR compiler bugs.
> Works without modification on both D1 and D2.
> 
> ----
> /// Order the provided members to minimize size while preserving alignment
> /// BUG: bugzilla 2029 prevents the signature from being (string[] names...),
> /// so we need to use an ugly array literal instead.
> char [] alignForSize(E...)(string[E.length] names)
> {
>     // Sort all of the members by .alignof.
>     // BUG: Alignment is not always optimal for align(1) structs
>     // or 80-bit reals.
>     // TRICK: Use the fact that .alignof is always a power of 2,
>     // and maximum 16 on extant systems. Thus, we can perform
>     // a very limited radix sort.
>     int [][] alignlist; // workaround for bugzilla 2569
>     // alignof = 1, 2, 4, 8, 16, 32, 64
>     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
>     char[][] declaration;
>     foreach(int i_bug,T; E) {
>         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
>         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
>         int a = T.alignof;
>         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : a>=64? 6 : 0;
>         alignlist[k]~=i;
>     }
>     char [] s;
>     foreach_reverse(q; alignlist) {
>       foreach(int i; q) {
>         s ~=  declaration[i];
>       }
>     }
>     return s;
> }
> 
> // Example:
> 
> struct Foo {
>     mixin(alignForSize!(int[], char[13], short, double[5])(["x", "y","z", "w"]));
> }
> 
> is equivalent to:
> 
> struct Foo{
>  double[5u] w; // aligned to 8 bytes
>  int[] x;      // aligned to 4
>  short z;      // aligned to 2
>  char[13u] y;   // aligned to 1
> }
> 
> which will have a size of 63, followed by 1 byte of padding.

andrei:~$ grep unittest donscode.d
andrei:~$ echo $?
1
andrei:~$ _


Other than that, I'd be glad if you included it in Phobos :o).


Andrei
January 09, 2009
Andrei Alexandrescu wrote:
> Don wrote:
>> Andrei Alexandrescu wrote:
>>> Don wrote:
>>>> Denis Koroskin wrote:
>>>>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>>>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>>>> Foo.sizeof is 16.
>>>>>>> It is 12 with align(1) (I see no problem with it).
>>>>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>>>>> No, there is. You should have structure as follows:
>>>>>
>>>>> align(1):
>>>>> struct Foo
>>>>> {
>>>>>     char c;
>>>>>     double d;
>>>>>     short s;
>>>>>     bool b;
>>>>> }
>>>>
>>>> [snip]
>>>>
>>>>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
>>>>
>>>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>>>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
>>>> But these values *don't* reflect the machine behaviour.
>>>>
>>>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
>>>>
>>>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
>>>> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
>>>> The simple algorithm may well be good enough.
>>>> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
>>>>
>>>> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
>>>
>>> That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).
>>>
>>> Andrei
>>
>> Here's a very simple implementation that should solve the problem described in the original post. It doesn't deal with the obscure and difficult cases of align(1) structs, or 80-bit reals.
>> Implementation includes workarounds for FOUR compiler bugs.
>> Works without modification on both D1 and D2.
>>
>> ----
>> /// Order the provided members to minimize size while preserving alignment
>> /// BUG: bugzilla 2029 prevents the signature from being (string[] names...),
>> /// so we need to use an ugly array literal instead.
>> char [] alignForSize(E...)(string[E.length] names)
>> {
>>     // Sort all of the members by .alignof.
>>     // BUG: Alignment is not always optimal for align(1) structs
>>     // or 80-bit reals.
>>     // TRICK: Use the fact that .alignof is always a power of 2,
>>     // and maximum 16 on extant systems. Thus, we can perform
>>     // a very limited radix sort.
>>     int [][] alignlist; // workaround for bugzilla 2569
>>     // alignof = 1, 2, 4, 8, 16, 32, 64
>>     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
>>     char[][] declaration;
>>     foreach(int i_bug,T; E) {
>>         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
>>         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
>>         int a = T.alignof;
>>         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : a>=64? 6 : 0;
>>         alignlist[k]~=i;
>>     }
>>     char [] s;
>>     foreach_reverse(q; alignlist) {
>>       foreach(int i; q) {
>>         s ~=  declaration[i];
>>       }
>>     }
>>     return s;
>> }
>>
>> // Example:
>>
>> struct Foo {
>>     mixin(alignForSize!(int[], char[13], short, double[5])(["x", "y","z", "w"]));
>> }
>>
>> is equivalent to:
>>
>> struct Foo{
>>  double[5u] w; // aligned to 8 bytes
>>  int[] x;      // aligned to 4
>>  short z;      // aligned to 2
>>  char[13u] y;   // aligned to 1
>> }
>>
>> which will have a size of 63, followed by 1 byte of padding.
> 
> andrei:~$ grep unittest donscode.d
> andrei:~$ echo $?
> 1
> andrei:~$ _
> 
> 
> Other than that, I'd be glad if you included it in Phobos :o).
> 
> 
> Andrei

I'd really like to see bug 2029 fixed before standardizing it; I hate those ugly [] in the parameter list which will appear all over user code. It's a trivial change to the library code though.
Where should it go? In typecons.d ?
January 09, 2009
Andrei Alexandrescu wrote:
> Don wrote:
>> Andrei Alexandrescu wrote:
>>> Don wrote:
>>>> Denis Koroskin wrote:
>>>>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>>>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>>>> Foo.sizeof is 16.
>>>>>>> It is 12 with align(1) (I see no problem with it).
>>>>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>>>>> No, there is. You should have structure as follows:
>>>>>
>>>>> align(1):
>>>>> struct Foo
>>>>> {
>>>>>     char c;
>>>>>     double d;
>>>>>     short s;
>>>>>     bool b;
>>>>> }
>>>>
>>>> [snip]
>>>>
>>>>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
>>>>
>>>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>>>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
>>>> But these values *don't* reflect the machine behaviour.
>>>>
>>>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
>>>>
>>>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
>>>> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
>>>> The simple algorithm may well be good enough.
>>>> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
>>>>
>>>> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
>>>
>>> That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).
>>>
>>> Andrei
>>
>> Here's a very simple implementation that should solve the problem described in the original post. It doesn't deal with the obscure and difficult cases of align(1) structs, or 80-bit reals.
>> Implementation includes workarounds for FOUR compiler bugs.
>> Works without modification on both D1 and D2.
>>
>> ----
>> /// Order the provided members to minimize size while preserving alignment
>> /// BUG: bugzilla 2029 prevents the signature from being (string[] names...),
>> /// so we need to use an ugly array literal instead.
>> char [] alignForSize(E...)(string[E.length] names)
>> {
>>     // Sort all of the members by .alignof.
>>     // BUG: Alignment is not always optimal for align(1) structs
>>     // or 80-bit reals.
>>     // TRICK: Use the fact that .alignof is always a power of 2,
>>     // and maximum 16 on extant systems. Thus, we can perform
>>     // a very limited radix sort.
>>     int [][] alignlist; // workaround for bugzilla 2569
>>     // alignof = 1, 2, 4, 8, 16, 32, 64
>>     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
>>     char[][] declaration;
>>     foreach(int i_bug,T; E) {
>>         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
>>         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
>>         int a = T.alignof;
>>         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : a>=64? 6 : 0;
>>         alignlist[k]~=i;
>>     }
>>     char [] s;
>>     foreach_reverse(q; alignlist) {
>>       foreach(int i; q) {
>>         s ~=  declaration[i];
>>       }
>>     }
>>     return s;
>> }
>>
>> // Example:
>>
>> struct Foo {
>>     mixin(alignForSize!(int[], char[13], short, double[5])(["x", "y","z", "w"]));
>> }
>>
>> is equivalent to:
>>
>> struct Foo{
>>  double[5u] w; // aligned to 8 bytes
>>  int[] x;      // aligned to 4
>>  short z;      // aligned to 2
>>  char[13u] y;   // aligned to 1
>> }
>>
>> which will have a size of 63, followed by 1 byte of padding.
> 
> andrei:~$ grep unittest donscode.d
> andrei:~$ echo $?
> 1
> andrei:~$ _
> 
> 
> Other than that, I'd be glad if you included it in Phobos :o).

But this looks like something that really should be done by the compiler. This is in the compiler's area of responsibility, and using Don's template isn't very elegant either. Just look how the types and field names are separate, unlike in normal struct declarations. And the names are string literals.

Instead, this should be implemented directly in the compiler. Some syntax is needed to mark structs, that should use an optimal layout. What about:

align("optimal")
struct Foo {
	int[] x; char[13] y; short z; double[5] w;
}

For classes, the compiler can optimize the field layout by default.
January 09, 2009
grauzone wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Andrei Alexandrescu wrote:
>>>> Don wrote:
>>>>> Denis Koroskin wrote:
>>>>>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>>>>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>>>>> Foo.sizeof is 16.
>>>>>>>> It is 12 with align(1) (I see no problem with it).
>>>>>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>>>>>> No, there is. You should have structure as follows:
>>>>>>
>>>>>> align(1):
>>>>>> struct Foo
>>>>>> {
>>>>>>     char c;
>>>>>>     double d;
>>>>>>     short s;
>>>>>>     bool b;
>>>>>> }
>>>>>
>>>>> [snip]
>>>>>
>>>>>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
>>>>>
>>>>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>>>>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
>>>>> But these values *don't* reflect the machine behaviour.
>>>>>
>>>>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
>>>>>
>>>>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
>>>>> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
>>>>> The simple algorithm may well be good enough.
>>>>> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
>>>>>
>>>>> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
>>>>
>>>> That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).
>>>>
>>>> Andrei
>>>
>>> Here's a very simple implementation that should solve the problem described in the original post. It doesn't deal with the obscure and difficult cases of align(1) structs, or 80-bit reals.
>>> Implementation includes workarounds for FOUR compiler bugs.
>>> Works without modification on both D1 and D2.
>>>
>>> ----
>>> /// Order the provided members to minimize size while preserving alignment
>>> /// BUG: bugzilla 2029 prevents the signature from being (string[] names...),
>>> /// so we need to use an ugly array literal instead.
>>> char [] alignForSize(E...)(string[E.length] names)
>>> {
>>>     // Sort all of the members by .alignof.
>>>     // BUG: Alignment is not always optimal for align(1) structs
>>>     // or 80-bit reals.
>>>     // TRICK: Use the fact that .alignof is always a power of 2,
>>>     // and maximum 16 on extant systems. Thus, we can perform
>>>     // a very limited radix sort.
>>>     int [][] alignlist; // workaround for bugzilla 2569
>>>     // alignof = 1, 2, 4, 8, 16, 32, 64
>>>     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
>>>     char[][] declaration;
>>>     foreach(int i_bug,T; E) {
>>>         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
>>>         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
>>>         int a = T.alignof;
>>>         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : a>=64? 6 : 0;
>>>         alignlist[k]~=i;
>>>     }
>>>     char [] s;
>>>     foreach_reverse(q; alignlist) {
>>>       foreach(int i; q) {
>>>         s ~=  declaration[i];
>>>       }
>>>     }
>>>     return s;
>>> }
>>>
>>> // Example:
>>>
>>> struct Foo {
>>>     mixin(alignForSize!(int[], char[13], short, double[5])(["x", "y","z", "w"]));
>>> }
>>>
>>> is equivalent to:
>>>
>>> struct Foo{
>>>  double[5u] w; // aligned to 8 bytes
>>>  int[] x;      // aligned to 4
>>>  short z;      // aligned to 2
>>>  char[13u] y;   // aligned to 1
>>> }
>>>
>>> which will have a size of 63, followed by 1 byte of padding.
>>
>> andrei:~$ grep unittest donscode.d
>> andrei:~$ echo $?
>> 1
>> andrei:~$ _
>>
>>
>> Other than that, I'd be glad if you included it in Phobos :o).
> 
> But this looks like something that really should be done by the compiler. This is in the compiler's area of responsibility, and using Don's template isn't very elegant either. Just look how the types and field names are separate, unlike in normal struct declarations. And the names are string literals.
> 
> Instead, this should be implemented directly in the compiler. Some syntax is needed to mark structs, that should use an optimal layout. What about:
> 
> align("optimal")
> struct Foo {
>     int[] x; char[13] y; short z; double[5] w;
> }
> 
> For classes, the compiler can optimize the field layout by default.

Well, it's very obscure, really. It only makes a difference when you want to reduce the size of the struct, since DMD already assigns padding to give optimal alignment. And you'd only use it in cases when you care about the size AND you don't think it's important enough to order the fields yourself (or where you can't, because you don't know the types) AND where you don't care that the compiler will add a few bytes of padding at the end.

So I'm pretty much certain the compiler shouldn't include support for limited-interest stuff like this. I'm not even sure that it's useful enough to be in a standard library. It was a fun exercise though.

January 09, 2009
== Quote from Don (nospam@nospam.com)'s article
> > For classes, the compiler can optimize the field layout by default.
> Well, it's very obscure, really. It only makes a difference when you
> want to reduce the size of the struct, since DMD already assigns padding
> to give optimal alignment. And you'd only use it in cases when you care
> about the size AND you don't think it's important enough to order the
> fields yourself (or where you can't, because you don't know the types)
> AND where you don't care that the compiler will add a few bytes of
> padding at the end.
> So I'm pretty much certain the compiler shouldn't include support for
> limited-interest stuff like this. I'm not even sure that it's useful
> enough to be in a standard library. It was a fun exercise though.

The thing is, those few bytes of padding really matter when you're working with large arrays of templated structs, and they can turn into tens of megabytes, or possibly even if they can just turn into kilobytes and kill cache performance.  A perfect use for it (and the one I had in mind) was to build more space-efficient generic hash tables.
January 09, 2009
Don wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Andrei Alexandrescu wrote:
>>>> Don wrote:
>>>>> Denis Koroskin wrote:
>>>>>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>>>>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>>>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>>>>>> Foo.sizeof is 16.
>>>>>>>> It is 12 with align(1) (I see no problem with it).
>>>>>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>>>>>> No, there is. You should have structure as follows:
>>>>>>
>>>>>> align(1):
>>>>>> struct Foo
>>>>>> {
>>>>>>     char c;
>>>>>>     double d;
>>>>>>     short s;
>>>>>>     bool b;
>>>>>> }
>>>>>
>>>>> [snip]
>>>>>
>>>>>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
>>>>>
>>>>> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
>>>>> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
>>>>> But these values *don't* reflect the machine behaviour.
>>>>>
>>>>> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
>>>>>
>>>>> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
>>>>> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
>>>>> The simple algorithm may well be good enough.
>>>>> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
>>>>>
>>>>> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
>>>>
>>>> That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).
>>>>
>>>> Andrei
>>>
>>> Here's a very simple implementation that should solve the problem described in the original post. It doesn't deal with the obscure and difficult cases of align(1) structs, or 80-bit reals.
>>> Implementation includes workarounds for FOUR compiler bugs.
>>> Works without modification on both D1 and D2.
>>>
>>> ----
>>> /// Order the provided members to minimize size while preserving alignment
>>> /// BUG: bugzilla 2029 prevents the signature from being (string[] names...),
>>> /// so we need to use an ugly array literal instead.
>>> char [] alignForSize(E...)(string[E.length] names)
>>> {
>>>     // Sort all of the members by .alignof.
>>>     // BUG: Alignment is not always optimal for align(1) structs
>>>     // or 80-bit reals.
>>>     // TRICK: Use the fact that .alignof is always a power of 2,
>>>     // and maximum 16 on extant systems. Thus, we can perform
>>>     // a very limited radix sort.
>>>     int [][] alignlist; // workaround for bugzilla 2569
>>>     // alignof = 1, 2, 4, 8, 16, 32, 64
>>>     alignlist = [ [],[],[],[],[],[],[]]; // workaround for bugzilla 2562
>>>     char[][] declaration;
>>>     foreach(int i_bug,T; E) {
>>>         int i = i_bug; // workaround for bugzilla 2564 (D2 only)
>>>         declaration ~= T.stringof ~ " " ~ names[i].dup ~ ";\n";
>>>         int a = T.alignof;
>>>         int k = a==2? 1 : a==4? 2 : a==8? 3 : a==16? 4 : a==32? 5 : a>=64? 6 : 0;
>>>         alignlist[k]~=i;
>>>     }
>>>     char [] s;
>>>     foreach_reverse(q; alignlist) {
>>>       foreach(int i; q) {
>>>         s ~=  declaration[i];
>>>       }
>>>     }
>>>     return s;
>>> }
>>>
>>> // Example:
>>>
>>> struct Foo {
>>>     mixin(alignForSize!(int[], char[13], short, double[5])(["x", "y","z", "w"]));
>>> }
>>>
>>> is equivalent to:
>>>
>>> struct Foo{
>>>  double[5u] w; // aligned to 8 bytes
>>>  int[] x;      // aligned to 4
>>>  short z;      // aligned to 2
>>>  char[13u] y;   // aligned to 1
>>> }
>>>
>>> which will have a size of 63, followed by 1 byte of padding.
>>
>> andrei:~$ grep unittest donscode.d
>> andrei:~$ echo $?
>> 1
>> andrei:~$ _
>>
>>
>> Other than that, I'd be glad if you included it in Phobos :o).
>>
>>
>> Andrei
> 
> I'd really like to see bug 2029 fixed before standardizing it; I hate those ugly [] in the parameter list which will appear all over user code. It's a trivial change to the library code though.
> Where should it go? In typecons.d ?

Yah, that's the place. That module quickly becomes one of my faves.

Andrei
1 2 3
Next ›   Last »