December 16, 2008
Andrei Alexandrescu wrote:
> Sergey Gromov wrote:
>> Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:
>>
>>> BCS wrote:
>>>> Reply to dsimcha,
>>>>
>>>>> According to the spec, data stored in structs is guaranteed to be laid
>>>>> out in the order specified in the source code.  While this is
>>>>> necessary in some low-level code, I have a use case where I need to
>>>>> pack structs as efficiently as possible.  These structs are to be
>>>>> stored in an array, and for efficiency reasons I want to store them
>>>>> directly rather than storing pointers, so using classes is out of the
>>>>> question.  Does anyone know how to write a template that, given a
>>>>> tuple of types, will reorder the tuple so that it will produce the
>>>>> optimal struct layout?  I don't care, at least for now, if it assumes
>>>>> x86-32/DMD instead of handling the more general case.
>>>>>
>>>> would "align 1" work?
>>> That could make things excruciatingly slow. What's really needed is (in first approximation) to sort the members in decreasing order of size. Odd-sized arrays of char foil this plan to some extent, which is where the fun begins.
>>
>> I would say in decreasing order of field/array element alignment
>> requirement.
> 
> Sounds indeed right. Even simpler than I thought!

Close, but doesn't work in all cases. You sometimes need to insert elements of smaller size.

Example 1:
Given real, double, double, short, int:
real (10 bytes, wants 16 byte alignment) --(only on Windows, 12 byte size on Linux32, 16-byte size on Linux64)
double (8 bytes, wants 8 byte alignment)
short (2 bytes, wants 2 byte alignment)
int (4 bytes, wants 4 byte alignment)

There are only two solutions:
{ real, short, int, double, double }
{ double, double, real, short, int }

Example 2:
struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

given Foo, Foo, int, int:
solution is
{ Foo, int, Foo, int }







> 
> Andrei
December 16, 2008
Don wrote:
> Andrei Alexandrescu wrote:
>> Sergey Gromov wrote:
>>> Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:
>>>
>>>> BCS wrote:
>>>>> Reply to dsimcha,
>>>>>
>>>>>> According to the spec, data stored in structs is guaranteed to be laid
>>>>>> out in the order specified in the source code.  While this is
>>>>>> necessary in some low-level code, I have a use case where I need to
>>>>>> pack structs as efficiently as possible.  These structs are to be
>>>>>> stored in an array, and for efficiency reasons I want to store them
>>>>>> directly rather than storing pointers, so using classes is out of the
>>>>>> question.  Does anyone know how to write a template that, given a
>>>>>> tuple of types, will reorder the tuple so that it will produce the
>>>>>> optimal struct layout?  I don't care, at least for now, if it assumes
>>>>>> x86-32/DMD instead of handling the more general case.
>>>>>>
>>>>> would "align 1" work?
>>>> That could make things excruciatingly slow. What's really needed is (in first approximation) to sort the members in decreasing order of size. Odd-sized arrays of char foil this plan to some extent, which is where the fun begins.
>>>
>>> I would say in decreasing order of field/array element alignment
>>> requirement.
>>
>> Sounds indeed right. Even simpler than I thought!
> 
> Close, but doesn't work in all cases. You sometimes need to insert elements of smaller size.
> 
> Example 1:
> Given real, double, double, short, int:
> real (10 bytes, wants 16 byte alignment) --(only on Windows, 12 byte size on Linux32, 16-byte size on Linux64)
> double (8 bytes, wants 8 byte alignment)
> short (2 bytes, wants 2 byte alignment)
> int (4 bytes, wants 4 byte alignment)
> 
> There are only two solutions:
> { real, short, int, double, double }
> { double, double, real, short, int }
> 
> Example 2:
> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
> 
> given Foo, Foo, int, int:
> solution is
> { Foo, int, Foo, int }
> 

Greedy algorithm?
December 17, 2008
Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:

> Example 2:
> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

Foo.sizeof is 16.

The thing is, array of Foo's must be properly aligned, and &foo[1] must be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be itself 8-aligned.  The solution is { Foo, Foo, int, int }.

The same with your other example.  Sizeof cannot be less than alignment. Therefore you can never pack real and short in 16 bytes.
December 17, 2008
On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov <snake.scaly@gmail.com> wrote:

> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>
>> Example 2:
>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>
> Foo.sizeof is 16.
>

It is 12 with align(1) (I see no problem with it).

> The thing is, array of Foo's must be properly aligned, and &foo[1] must
> be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
> itself 8-aligned.  The solution is { Foo, Foo, int, int }.
>

Stop right there. You can't access &foo[1] no matter what align or structure layout is used. Example:

align(default):
struct Data
{
    int i1;
    short s2;
    char c1;
    int 12;
}

What is &i1[1]?
December 17, 2008
Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:

> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov <snake.scaly@gmail.com> wrote:
> 
>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>
>>> Example 2:
>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>
>> Foo.sizeof is 16.
>>
> 
> It is 12 with align(1) (I see no problem with it).

With align(1) the Foo.alignof is 1 so there is no problem to talk about. The problem is when you have elements with non-trivial alignment requirements and want to pack them as tightly as possible.

>> The thing is, array of Foo's must be properly aligned, and &foo[1] must be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be itself 8-aligned.  The solution is { Foo, Foo, int, int }.
> 
> Stop right there. You can't access &foo[1] no matter what align or structure layout is used. Example:
> 
> align(default):
> struct Data
> {
>      int i1;
>      short s2;
>      char c1;
>      int 12;
> }
> 
> What is &i1[1]?

i1 is not an array.

Sorry I wasn't clear enough.  I was talking about an imaginary array of Foo's:

  Foo[] foo;

You must admit that foo[1] and &foo[1] makes perfect sense for such construct.  And that

  cast(byte*) &foo[1] - cast(byte*) &foo[0] == Foo.sizeof

must hold.
December 18, 2008
On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov <snake.scaly@gmail.com> wrote:

> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>
>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov <snake.scaly@gmail.com>
>> wrote:
>>
>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>
>>>> Example 2:
>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>
>>> Foo.sizeof is 16.
>>>
>>
>> It is 12 with align(1) (I see no problem with it).
>
> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
> The problem is when you have elements with non-trivial alignment
> requirements and want to pack them as tightly as possible.
>

No, there is. You should have structure as follows:

align(1):
struct Foo
{
    char c;
    double d;
    short s;
    bool b;
}

it should be reordered to (one of the solutions):

align(1):
struct Foo
{
    char c;
    bool b;
    short s;
    double d;
}

so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.

>>> The thing is, array of Foo's must be properly aligned, and &foo[1] must
>>> be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
>>> itself 8-aligned.  The solution is { Foo, Foo, int, int }.
>>
>> Stop right there. You can't access &foo[1] no matter what align or
>> structure layout is used. Example:
>>
>> align(default):
>> struct Data
>> {
>>      int i1;
>>      short s2;
>>      char c1;
>>      int 12;
>> }
>>
>> What is &i1[1]?
>
> i1 is not an array.
>
> Sorry I wasn't clear enough.  I was talking about an imaginary array of
> Foo's:
>
>   Foo[] foo;
>
> You must admit that foo[1] and &foo[1] makes perfect sense for such
> construct.  And that
>
>   cast(byte*) &foo[1] - cast(byte*) &foo[0] == Foo.sizeof
>
> must hold.

This will *always* be true no matter what align/data layout is used.
December 18, 2008
Denis Koroskin wrote:
> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov 
>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov 
>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>> Foo.sizeof is 16.
>>> It is 12 with align(1) (I see no problem with it).
>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
> No, there is. You should have structure as follows:
> 
> align(1):
> struct Foo
> {
>     char c;
>     double d;
>     short s;
>     bool b;
> }

[snip]

> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.

Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
But these values *don't* reflect the machine behaviour.

real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.

These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
But you could reasonably argue that if you're specifying align(1) then alignment is not important.
The simple algorithm may well be good enough.
(And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).

Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.
December 18, 2008
Don wrote:
> Denis Koroskin wrote:
>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>> Foo.sizeof is 16.
>>>> It is 12 with align(1) (I see no problem with it).
>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>> No, there is. You should have structure as follows:
>>
>> align(1):
>> struct Foo
>> {
>>     char c;
>>     double d;
>>     short s;
>>     bool b;
>> }
> 
> [snip]
> 
>> so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof == 12.
> 
> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof.
> When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.
> But these values *don't* reflect the machine behaviour.
> 
> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.
> 
> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.
> But you could reasonably argue that if you're specifying align(1) then alignment is not important.
> The simple algorithm may well be good enough.
> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, or 16 -- so it's in O(n) not O(n lg n)).
> 
> Interestingly, this 'internal alignment' applies to function alignment, as well. D allows you to specify that a function be aligned to (for example) a 16-byte boundary. This is hardly ever useful; usually what you want is for the innermost loop to be aligned. And instead of a pile of nops before the loop, you'd like the function start address to be adjusted.

That's cool! I think the way to the templates AlignForSpeed!(Foo) and AlignForSize!(Foo) is paved. Don, now all you have to do is a simple matter of coding :o).

Andrei
December 18, 2008
Thu, 18 Dec 2008 17:08:12 +0100, Don wrote:

>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>> Foo.sizeof is 16.
> 
> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for any x, and also x.sizeof is always an integer multiple of x.alignof. When you assume that DMD is correct, then the simple ordering by .alignof indeed gives an optimal solution.

I think this is not something you can assume or not assume.  The set of conditions you mention is required and sufficient to respect alignment requirements of array elements while obeying the rules of pointer math. You cannot change them until you drop either pointer math or alignment guarantees.

> But these values *don't* reflect the machine behaviour.
> 
> real.alignof is 2. But it's significantly faster (factor of 2) on most x86 machines to have real aligned to 16.

If you want size, you can use align():

struct ReallySmall
{
  int flags;
  align(2) real value;
  byte flags2;
  align(2) real value2;
}

Now ReallySmall.value.alignof == 2, and sorting will give you

struct ReallySmallSorted
{
  int flags;
  align(2) real value;
  align(2) real value2;
  byte flags2;
}

with ReallySmallSorted.sizeof == 28 and .alignof == 4 on Windows.

> These effects are not captured by the .alignof property for structs. To capture it in the Foo example, you'd need (for example) a Foo.memberalignof property, and a Foo.memberalignoffsetof property.

I don't understand this part.  Do you say that fields in different elements of the same array should be aligned differently?
December 19, 2008
Sergey Gromov wrote:
> I think this is not something you can assume or not assume.  The set of
> conditions you mention is required and sufficient to respect alignment
> requirements of array elements while obeying the rules of pointer math.
> You cannot change them until you drop either pointer math or alignment
> guarantees.

You could split the sizeof property into two separate:
  real.alignof = 16
  real.unpaddedsizeof = 12
  real.paddedsizeof = 16

  T.paddedsizeof = (T.unpaddedsizeof + T.alignof - 1) & ~T.alignof;

Arrays would always use paddedsizeof to calculate addresses.  Structs could use unpaddedsizeof and squeeze one or more additional variables into the padding space.  The same applies to structs:

  struct C { int a; byte b; }
  C.alignof = 4
  C.unpaddedsizeof = 5
  C.paddedsizeof = 8

There is one problem with this technique.  Because the padding space of a variable may contain another variable, when the variable is copied, only unpaddedsizeof bytes may be copied or the other variable will be clobbered.  I expect that this partially negates the advantage of properly aligning the variable.


-- 
Rainer Deyke