View mode: basic / threaded / horizontal-split · Log in · Help
December 16, 2008
Re: Optimal struct layout template?
Andrei Alexandrescu wrote:
> Sergey Gromov wrote:
>> Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:
>>
>>> BCS wrote:
>>>> Reply to dsimcha,
>>>>
>>>>> According to the spec, data stored in structs is guaranteed to be laid
>>>>> out in the order specified in the source code.  While this is
>>>>> necessary in some low-level code, I have a use case where I need to
>>>>> pack structs as efficiently as possible.  These structs are to be
>>>>> stored in an array, and for efficiency reasons I want to store them
>>>>> directly rather than storing pointers, so using classes is out of the
>>>>> question.  Does anyone know how to write a template that, given a
>>>>> tuple of types, will reorder the tuple so that it will produce the
>>>>> optimal struct layout?  I don't care, at least for now, if it assumes
>>>>> x86-32/DMD instead of handling the more general case.
>>>>>
>>>> would "align 1" work?
>>> That could make things excruciatingly slow. What's really needed is 
>>> (in first approximation) to sort the members in decreasing order of 
>>> size. Odd-sized arrays of char foil this plan to some extent, which 
>>> is where the fun begins.
>>
>> I would say in decreasing order of field/array element alignment
>> requirement.
> 
> Sounds indeed right. Even simpler than I thought!

Close, but doesn't work in all cases. You sometimes need to insert 
elements of smaller size.

Example 1:
Given real, double, double, short, int:
real (10 bytes, wants 16 byte alignment) --(only on Windows, 12 byte 
size on Linux32, 16-byte size on Linux64)
double (8 bytes, wants 8 byte alignment)
short (2 bytes, wants 2 byte alignment)
int (4 bytes, wants 4 byte alignment)

There are only two solutions:
{ real, short, int, double, double }
{ double, double, real, short, int }

Example 2:
struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

given Foo, Foo, int, int:
solution is
{ Foo, int, Foo, int }







> 
> Andrei
December 16, 2008
Re: Optimal struct layout template?
Don wrote:
> Andrei Alexandrescu wrote:
>> Sergey Gromov wrote:
>>> Sun, 14 Dec 2008 13:17:45 -0800, Andrei Alexandrescu wrote:
>>>
>>>> BCS wrote:
>>>>> Reply to dsimcha,
>>>>>
>>>>>> According to the spec, data stored in structs is guaranteed to be 
>>>>>> laid
>>>>>> out in the order specified in the source code.  While this is
>>>>>> necessary in some low-level code, I have a use case where I need to
>>>>>> pack structs as efficiently as possible.  These structs are to be
>>>>>> stored in an array, and for efficiency reasons I want to store them
>>>>>> directly rather than storing pointers, so using classes is out of the
>>>>>> question.  Does anyone know how to write a template that, given a
>>>>>> tuple of types, will reorder the tuple so that it will produce the
>>>>>> optimal struct layout?  I don't care, at least for now, if it assumes
>>>>>> x86-32/DMD instead of handling the more general case.
>>>>>>
>>>>> would "align 1" work?
>>>> That could make things excruciatingly slow. What's really needed is 
>>>> (in first approximation) to sort the members in decreasing order of 
>>>> size. Odd-sized arrays of char foil this plan to some extent, which 
>>>> is where the fun begins.
>>>
>>> I would say in decreasing order of field/array element alignment
>>> requirement.
>>
>> Sounds indeed right. Even simpler than I thought!
> 
> Close, but doesn't work in all cases. You sometimes need to insert 
> elements of smaller size.
> 
> Example 1:
> Given real, double, double, short, int:
> real (10 bytes, wants 16 byte alignment) --(only on Windows, 12 byte 
> size on Linux32, 16-byte size on Linux64)
> double (8 bytes, wants 8 byte alignment)
> short (2 bytes, wants 2 byte alignment)
> int (4 bytes, wants 4 byte alignment)
> 
> There are only two solutions:
> { real, short, int, double, double }
> { double, double, real, short, int }
> 
> Example 2:
> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
> 
> given Foo, Foo, int, int:
> solution is
> { Foo, int, Foo, int }
> 

Greedy algorithm?
December 17, 2008
Re: Optimal struct layout template?
Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:

> Example 2:
> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.

Foo.sizeof is 16.

The thing is, array of Foo's must be properly aligned, and &foo[1] must
be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
itself 8-aligned.  The solution is { Foo, Foo, int, int }.

The same with your other example.  Sizeof cannot be less than alignment.
Therefore you can never pack real and short in 16 bytes.
December 17, 2008
Re: Optimal struct layout template?
On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov <snake.scaly@gmail.com>  
wrote:

> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>
>> Example 2:
>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>
> Foo.sizeof is 16.
>

It is 12 with align(1) (I see no problem with it).

> The thing is, array of Foo's must be properly aligned, and &foo[1] must
> be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
> itself 8-aligned.  The solution is { Foo, Foo, int, int }.
>

Stop right there. You can't access &foo[1] no matter what align or  
structure layout is used. Example:

align(default):
struct Data
{
    int i1;
    short s2;
    char c1;
    int 12;
}

What is &i1[1]?
December 17, 2008
Re: Optimal struct layout template?
Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:

> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov <snake.scaly@gmail.com>  
> wrote:
> 
>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>
>>> Example 2:
>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>
>> Foo.sizeof is 16.
>>
> 
> It is 12 with align(1) (I see no problem with it).

With align(1) the Foo.alignof is 1 so there is no problem to talk about.
The problem is when you have elements with non-trivial alignment
requirements and want to pack them as tightly as possible.

>> The thing is, array of Foo's must be properly aligned, and &foo[1] must
>> be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must be
>> itself 8-aligned.  The solution is { Foo, Foo, int, int }.
> 
> Stop right there. You can't access &foo[1] no matter what align or  
> structure layout is used. Example:
> 
> align(default):
> struct Data
> {
>      int i1;
>      short s2;
>      char c1;
>      int 12;
> }
> 
> What is &i1[1]?

i1 is not an array.

Sorry I wasn't clear enough.  I was talking about an imaginary array of
Foo's:

 Foo[] foo;

You must admit that foo[1] and &foo[1] makes perfect sense for such
construct.  And that

 cast(byte*) &foo[1] - cast(byte*) &foo[0] == Foo.sizeof

must hold.
December 18, 2008
Re: Optimal struct layout template?
On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov <snake.scaly@gmail.com>  
wrote:

> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>
>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov  
>> <snake.scaly@gmail.com>
>> wrote:
>>
>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>
>>>> Example 2:
>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>
>>> Foo.sizeof is 16.
>>>
>>
>> It is 12 with align(1) (I see no problem with it).
>
> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
> The problem is when you have elements with non-trivial alignment
> requirements and want to pack them as tightly as possible.
>

No, there is. You should have structure as follows:

align(1):
struct Foo
{
    char c;
    double d;
    short s;
    bool b;
}

it should be reordered to (one of the solutions):

align(1):
struct Foo
{
    char c;
    bool b;
    short s;
    double d;
}

so that Foo.d.alignof == 8. Align(1) is still necessary so that Foo.sizeof  
== 12.

>>> The thing is, array of Foo's must be properly aligned, and &foo[1] must
>>> be cast(byte*) &foo[0] + Foo.sizeof.  This means that Foo.sizeof must  
>>> be
>>> itself 8-aligned.  The solution is { Foo, Foo, int, int }.
>>
>> Stop right there. You can't access &foo[1] no matter what align or
>> structure layout is used. Example:
>>
>> align(default):
>> struct Data
>> {
>>      int i1;
>>      short s2;
>>      char c1;
>>      int 12;
>> }
>>
>> What is &i1[1]?
>
> i1 is not an array.
>
> Sorry I wasn't clear enough.  I was talking about an imaginary array of
> Foo's:
>
>   Foo[] foo;
>
> You must admit that foo[1] and &foo[1] makes perfect sense for such
> construct.  And that
>
>   cast(byte*) &foo[1] - cast(byte*) &foo[0] == Foo.sizeof
>
> must hold.

This will *always* be true no matter what align/data layout is used.
December 18, 2008
Re: Optimal struct layout template?
Denis Koroskin wrote:
> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov 
>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov 
>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>> Foo.sizeof is 16.
>>> It is 12 with align(1) (I see no problem with it).
>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
> No, there is. You should have structure as follows:
> 
> align(1):
> struct Foo
> {
>     char c;
>     double d;
>     short s;
>     bool b;
> }

[snip]

> so that Foo.d.alignof == 8. Align(1) is still necessary so that 
> Foo.sizeof == 12.

Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
any x, and also x.sizeof is always an integer multiple of x.alignof.
When you assume that DMD is correct, then the simple ordering by 
.alignof indeed gives an optimal solution.
But these values *don't* reflect the machine behaviour.

real.alignof is 2. But it's significantly faster (factor of 2) on most 
x86 machines to have real aligned to 16.

These effects are not captured by the .alignof property for structs. To 
capture it in the Foo example, you'd need (for example) a 
Foo.memberalignof property, and a Foo.memberalignoffsetof property.
But you could reasonably argue that if you're specifying align(1) then 
alignment is not important.
The simple algorithm may well be good enough.
(And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, 
or 16 -- so it's in O(n) not O(n lg n)).

Interestingly, this 'internal alignment' applies to function alignment, 
as well. D allows you to specify that a function be aligned to (for 
example) a 16-byte boundary. This is hardly ever useful; usually what 
you want is for the innermost loop to be aligned. And instead of a pile 
of nops before the loop, you'd like the function start address to be 
adjusted.
December 18, 2008
Re: Optimal struct layout template?
Don wrote:
> Denis Koroskin wrote:
>> On Thu, 18 Dec 2008 01:38:32 +0300, Sergey Gromov
>>> Thu, 18 Dec 2008 00:21:58 +0300, Denis Koroskin wrote:
>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov
>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>> Foo.sizeof is 16.
>>>> It is 12 with align(1) (I see no problem with it).
>>> With align(1) the Foo.alignof is 1 so there is no problem to talk about.
>> No, there is. You should have structure as follows:
>>
>> align(1):
>> struct Foo
>> {
>>     char c;
>>     double d;
>>     short s;
>>     bool b;
>> }
> 
> [snip]
> 
>> so that Foo.d.alignof == 8. Align(1) is still necessary so that 
>> Foo.sizeof == 12.
> 
> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
> any x, and also x.sizeof is always an integer multiple of x.alignof.
> When you assume that DMD is correct, then the simple ordering by 
> .alignof indeed gives an optimal solution.
> But these values *don't* reflect the machine behaviour.
> 
> real.alignof is 2. But it's significantly faster (factor of 2) on most 
> x86 machines to have real aligned to 16.
> 
> These effects are not captured by the .alignof property for structs. To 
> capture it in the Foo example, you'd need (for example) a 
> Foo.memberalignof property, and a Foo.memberalignoffsetof property.
> But you could reasonably argue that if you're specifying align(1) then 
> alignment is not important.
> The simple algorithm may well be good enough.
> (And note that you can use radix sort -- alignof can only be 1, 2, 4, 8, 
> or 16 -- so it's in O(n) not O(n lg n)).
> 
> Interestingly, this 'internal alignment' applies to function alignment, 
> as well. D allows you to specify that a function be aligned to (for 
> example) a 16-byte boundary. This is hardly ever useful; usually what 
> you want is for the innermost loop to be aligned. And instead of a pile 
> of nops before the loop, you'd like the function start address to be 
> adjusted.

That's cool! I think the way to the templates AlignForSpeed!(Foo) and 
AlignForSize!(Foo) is paved. Don, now all you have to do is a simple 
matter of coding :o).

Andrei
December 18, 2008
Re: Optimal struct layout template?
Thu, 18 Dec 2008 17:08:12 +0100, Don wrote:

>>>> On Thu, 18 Dec 2008 00:12:18 +0300, Sergey Gromov 
>>>>> Tue, 16 Dec 2008 10:09:41 +0100, Don wrote:
>>>>>> struct Foo { double, int } // 12 byte size, wants 8-byte alignment.
>>>>> Foo.sizeof is 16.
> 
> Sergey is correct to the extent that in DMD, x.alignof <= x.sizeof for 
> any x, and also x.sizeof is always an integer multiple of x.alignof.
> When you assume that DMD is correct, then the simple ordering by 
> .alignof indeed gives an optimal solution.

I think this is not something you can assume or not assume.  The set of
conditions you mention is required and sufficient to respect alignment
requirements of array elements while obeying the rules of pointer math.
You cannot change them until you drop either pointer math or alignment
guarantees.

> But these values *don't* reflect the machine behaviour.
> 
> real.alignof is 2. But it's significantly faster (factor of 2) on most 
> x86 machines to have real aligned to 16.

If you want size, you can use align():

struct ReallySmall
{
 int flags;
 align(2) real value;
 byte flags2;
 align(2) real value2;
}

Now ReallySmall.value.alignof == 2, and sorting will give you

struct ReallySmallSorted
{
 int flags;
 align(2) real value;
 align(2) real value2;
 byte flags2;
}

with ReallySmallSorted.sizeof == 28 and .alignof == 4 on Windows.

> These effects are not captured by the .alignof property for structs. To 
> capture it in the Foo example, you'd need (for example) a 
> Foo.memberalignof property, and a Foo.memberalignoffsetof property.

I don't understand this part.  Do you say that fields in different
elements of the same array should be aligned differently?
December 19, 2008
Re: Optimal struct layout template?
Sergey Gromov wrote:
> I think this is not something you can assume or not assume.  The set of
> conditions you mention is required and sufficient to respect alignment
> requirements of array elements while obeying the rules of pointer math.
> You cannot change them until you drop either pointer math or alignment
> guarantees.

You could split the sizeof property into two separate:
  real.alignof = 16
  real.unpaddedsizeof = 12
  real.paddedsizeof = 16

  T.paddedsizeof = (T.unpaddedsizeof + T.alignof - 1) & ~T.alignof;

Arrays would always use paddedsizeof to calculate addresses.  Structs 
could use unpaddedsizeof and squeeze one or more additional variables 
into the padding space.  The same applies to structs:

  struct C { int a; byte b; }
  C.alignof = 4
  C.unpaddedsizeof = 5
  C.paddedsizeof = 8

There is one problem with this technique.  Because the padding space of 
a variable may contain another variable, when the variable is copied, 
only unpaddedsizeof bytes may be copied or the other variable will be 
clobbered.  I expect that this partially negates the advantage of 
properly aligning the variable.


-- 
Rainer Deyke
1 2 3
Top | Discussion index | About this forum | D home