Jump to page: 1 2
Thread overview
[phobos] byte alignment for arrays
Jun 28, 2010
Jason Spencer
Jun 28, 2010
Jason Spencer
Jun 28, 2010
Jason Spencer
Jun 28, 2010
Jason Spencer
Jun 29, 2010
Sean Kelly
Jun 29, 2010
Sean Kelly
Jun 28, 2010
Sean Kelly
June 28, 2010
Recently, this bug has surfaced: http://d.puremagic.com/issues/show_bug.cgi?id=4400

In a nutshell, sometimes the byte alignment of arrays is 8 bytes instead 16 bytes.

This was caused by my array append patch, because in large arrays, I store the length at the front of the array.  With some queries before I created my patch, I was told that 8 byte alignment was fine.  However, the alignment is easy to change since it's a couple specific functions that determine the padding and alignment.  So changing to 16 bytes is not an issue technically, and functionally, this is only on PAGE sized arrays and larger, so 16 bytes vs. 8 bytes isn't likely to cause problems.

Bearophile's main argument stems from this.  I am not a processor or assembly expert, so I have no idea about this at all:

-----------------
The 16 bytes alignment was introduced because instructions like the SSE2 movapd
need 16 byte alignment:
http://en.wikipedia.org/wiki/MOVAPD

I have recently used it here:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=112670

And some other SSE* instructions work with 8 byte alignment too, but they are
slower (future CPUs can remove such alignment requirements, some of it has
being removed already, so in that future the GC can go back giving 8 bytes
aligned memory).
-----------------

So should I change it?

-Steve




June 28, 2010
I would recommend changing it to 16-byte aligned.  Lots of SSE instructions won't work or won't work efficiently at 8-byte aligned addresses.  Even without SSE, this makes array access more cache-friendly, and is likely to help.  If we're only talking about arrays larger than page-size, then it's not too much memory overhead.

For less-than-page-sized arrays (or performance-tight code if you DON'T make the change), you'd have to use something like std.c.stdlib or std.<system> _aligned_alloc() to get around this.  Might be worth verifying this is actually available and works (maybe a unit test?. )

Jason



----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> To: Phobos <phobos at puremagic.com>
> Sent: Mon, June 28, 2010 11:36:25 AM
> Subject: [phobos] byte alignment for arrays
> 
> Recently, this bug has surfaced: http://d.puremagic.com/issues/show_bug.cgi?id=4400

In a nutshell,
> sometimes the byte alignment of arrays is 8 bytes instead 16 bytes.

This
> was caused by my array append patch, because in large arrays, I store the length at the front of the array.  With some queries before I created my patch, I was told that 8 byte alignment was fine.  However, the alignment is easy to change since it's a couple specific functions that determine the padding and alignment.  So changing to 16 bytes is not an issue technically, and functionally, this is only on PAGE sized arrays and larger, so 16 bytes vs. 8 bytes isn't likely to cause problems.

Bearophile's main argument stems
> from this.  I am not a processor or assembly expert, so I have no idea about this at all:

-----------------
The 16 bytes alignment was
> introduced because instructions like the SSE2 movapd
need 16 byte
> alignment:
http://en.wikipedia.org/wiki/MOVAPD

I have recently used
> it here:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=112670

> 
And some other SSE* instructions work with 8 byte alignment too, but they
> are
slower (future CPUs can remove such alignment requirements, some of it
> has
being removed already, so in that future the GC can go back giving 8
> bytes
aligned memory).
-----------------

So should I change
> it?

-Steve




> 
_______________________________________________
phobos mailing list

> ymailto="mailto:phobos at puremagic.com" href="mailto:phobos at puremagic.com">phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
June 28, 2010
Yes. I think OSX may actually require 16 byte alignment in some cases.  SSE seems like a major issue as well.

Sent from my iPhone

On Jun 28, 2010, at 11:36 AM, Steve Schveighoffer <schveiguy at yahoo.com> wrote:

> Recently, this bug has surfaced: http://d.puremagic.com/issues/show_bug.cgi?id=4400
> 
> In a nutshell, sometimes the byte alignment of arrays is 8 bytes instead 16 bytes.
> 
> This was caused by my array append patch, because in large arrays, I store the length at the front of the array.  With some queries before I created my patch, I was told that 8 byte alignment was fine.  However, the alignment is easy to change since it's a couple specific functions that determine the padding and alignment.  So changing to 16 bytes is not an issue technically, and functionally, this is only on PAGE sized arrays and larger, so 16 bytes vs. 8 bytes isn't likely to cause problems.
> 
> Bearophile's main argument stems from this.  I am not a processor or assembly expert, so I have no idea about this at all:
> 
> -----------------
> The 16 bytes alignment was introduced because instructions like the SSE2 movapd
> need 16 byte alignment:
> http://en.wikipedia.org/wiki/MOVAPD
> 
> I have recently used it here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=112670
> 
> And some other SSE* instructions work with 8 byte alignment too, but they are slower (future CPUs can remove such alignment requirements, some of it has being removed already, so in that future the GC can go back giving 8 bytes aligned memory).
> -----------------
> 
> So should I change it?
> 
> -Steve
> 
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
June 28, 2010
This only affects arrays that are PAGE size or larger.

A question then -- let's say you have an array of doubles, which are 8 bytes wide, and you want to use these SSE instructions.  Even if the first one is aligned on a 16-byte boundary, wouldn't every other double be misaligned?

-Steve



----- Original Message ----
> From: Jason Spencer <spencer8 at sbcglobal.net>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Mon, June 28, 2010 3:00:10 PM
> Subject: Re: [phobos] byte alignment for arrays
> 
> I would recommend changing it to 16-byte aligned.  Lots of SSE instructions won't work or won't work efficiently at 8-byte aligned addresses.  Even without SSE, this makes array access more cache-friendly, and is likely to help.  If we're only talking about arrays larger than page-size, then it's not too much memory overhead.

For less-than-page-sized arrays (or
> performance-tight code if you DON'T make the change), you'd have to use something like std.c.stdlib or std.<system> _aligned_alloc() to get around this.  Might be worth verifying this is actually available and works (maybe a unit test?. )

Jason



----- Original Message ----
> 
> From: Steve Schveighoffer <
> href="mailto:schveiguy at yahoo.com">schveiguy at yahoo.com>
> To: Phobos
> <
> href="mailto:phobos at puremagic.com">phobos at puremagic.com>
> Sent:
> Mon, June 28, 2010 11:36:25 AM
> Subject: [phobos] byte alignment for
> arrays
> 
> Recently, this bug has surfaced:
> 
> http://d.puremagic.com/issues/show_bug.cgi?id=4400

In a nutshell,
> 
> sometimes the byte alignment of arrays is 8 bytes instead 16 bytes.

This
> was caused by my array append patch, because in large arrays, I store the length at the front of the array.  With some queries before I created my patch, I was told that 8 byte alignment was fine.  However, the alignment is easy to change since it's a couple specific functions that determine the padding and
> 
> alignment.  So changing to 16 bytes is not an issue technically, and
> 
> functionally, this is only on PAGE sized arrays and larger, so 16 bytes vs. 8 bytes isn't likely to cause problems.

Bearophile's main
> argument stems from this.  I am not a processor or assembly expert, so I have no idea about this at all:

-----------------
The 16 bytes alignment was
> introduced because instructions like the SSE2 movapd
need 16 byte
> 
> alignment:
http://en.wikipedia.org/wiki/MOVAPD

I have recently used
> 
> it
> 
> here:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=112670

> 
> 
And some other SSE* instructions work with 8 byte alignment too, but they
> 
> are
slower (future CPUs can remove such alignment requirements, some
> of it has
being removed already, so in that future the GC can go
> back giving 8 bytes
aligned memory).
-----------------

So
> should I change it?

-Steve




> 
> 
_______________________________________________
phobos mailing
> list

> ymailto="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com"
> 
> href="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
_______________________________________________
phobos
> mailing list

> href="mailto:phobos at puremagic.com">phobos at puremagic.com

> href="http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank
> >http://lists.puremagic.com/mailman/listinfo/phobos



June 28, 2010
Yes.  What you really want is to know that the address of element 0 is 16-byte aligned.  So, what I assumed you were proposing is to always allocate the array storage as 16-byte aligned, then use 16 bytes for the size (and any other housekeeping you need).  Then element 0 will be 16-bytes behind a 16-byte aligned address, so you're still good.  That's the memory cost of this change--you'll burn (16 - (size of your size storage)) bytes at the beginning.

It's not an easy choice.  But if this is limited to only those arrays over page-size, then I'm assuming there won't be 1000's of them, and the cost is down around a few hundred bytes at worst for the program.

Jason


----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Mon, June 28, 2010 12:29:17 PM
> Subject: Re: [phobos] byte alignment for arrays
> 
> This only affects arrays that are PAGE size or larger.

A question then --
> let's say you have an array of doubles, which are 8 bytes wide, and you want to use these SSE instructions.  Even if the first one is aligned on a 16-byte boundary, wouldn't every other double be misaligned?

-Steve



----- Original Message ----
> 
> From: Jason Spencer <
> href="mailto:spencer8 at sbcglobal.net">spencer8 at sbcglobal.net>
> To:
> Discuss the phobos library for D <
> href="mailto:phobos at puremagic.com">phobos at puremagic.com>
> Sent:
> Mon, June 28, 2010 3:00:10 PM
> Subject: Re: [phobos] byte alignment for
> arrays
> 
> I would recommend changing it to 16-byte aligned. Lots of SSE instructions won't work or won't work efficiently at 8-byte aligned addresses.  Even without SSE, this makes array access more cache-friendly, and is likely to help.  If we're only talking about arrays larger than page-size, then it's not too much memory overhead.

For less-than-page-sized arrays (or
> performance-tight code if you DON'T make the change), you'd have to use something like std.c.stdlib or std.<system> _aligned_alloc() to get around
> 
> this.  Might be worth verifying this is actually available and works (maybe
> 
> a unit test?. )

Jason



----- Original Message 
> ----
> 
> From: Steve Schveighoffer <
> href="mailto:
> ymailto="mailto:schveiguy at yahoo.com"
> href="mailto:schveiguy at yahoo.com">schveiguy at yahoo.com">
> ymailto="mailto:schveiguy at yahoo.com"
> href="mailto:schveiguy at yahoo.com">schveiguy at yahoo.com>
> To: Phobos
> 
> <
> href="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com>
> Sent:
> 
> Mon, June 28, 2010 11:36:25 AM
> Subject: [phobos] byte alignment
> for
> arrays
> 
> Recently, this bug has surfaced:
> 
> 
> http://d.puremagic.com/issues/show_bug.cgi?id=4400

In a
> nutshell,
> 
> sometimes the byte alignment of arrays is 8 bytes instead 16 bytes.

This
> was caused by my array append patch, because in large arrays, I store the length at the front of the array.  With some queries before I created my patch, I was told that 8 byte alignment was fine.  However, the alignment is easy to change since it's a couple specific functions that determine the padding and
> 
> alignment.  So changing to 16 bytes is not an issue technically, and
> 
> 
> functionally, this is only on PAGE sized arrays and larger, so 16 bytes
> 
> vs. 8 bytes isn't likely to cause problems.

Bearophile's main
> 
> argument stems from this.  I am not a processor or assembly expert, so I have no idea about this at
> 
> all:

-----------------
The 16 bytes alignment was
> introduced
> 
> because instructions like the SSE2 movapd
need 16 byte
> 
> 
> alignment:
http://en.wikipedia.org/wiki/MOVAPD

I have
> recently used
> 
> it
> 
> 
> here:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=112670

> 
> 
> 
And some other SSE* instructions work with 8 byte alignment too,
> but they
> 
> are
slower (future CPUs can remove such alignment
> requirements, some of it has
being removed already, so in
> that future the GC can go back giving 8 bytes
aligned
> memory).
-----------------

So
> should I change
> 
> it?

-Steve




> 
> 
> 
_______________________________________________
phobos mailing
> 
> list

> ymailto="mailto:
> href="mailto:
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com"
> 
> 
> href="mailto:
> href="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> 
> ymailto="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com"
> 
> href="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
_______________________________________________
phobos
> 
> mailing list

> href="mailto:
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com

> href="
> href="http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank
> >http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank
> 
> >
> >http://lists.puremagic.com/mailman/listinfo/phobos



> 
_______________________________________________
phobos mailing
> list

> href="mailto:phobos at puremagic.com">phobos at puremagic.com

> href="http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank
> >http://lists.puremagic.com/mailman/listinfo/phobos
June 28, 2010
Sorry, I forgot to address the every-other-one concern.

The MMX registers are 64-bits, so you can only do 1 double at a time.  Those instructions only require 8-byte aligned memory.  The SSE instructions use 128-bit registers, so they take 2 doubles at a time.  As long as the first one is 16-byte aligned, you can iterate through on 16-byte (128 bits) chunks, and you'll be good.  That's why element 0 should be 128-aligned.

If it's not, the processor will either have an alignment fault (in the instruction requires alignment) or will do a bunch of split-loads across cache lines, which kill performance.

One other thought:  If you wanted to be tricky, you could do a general, 4-byte allocation and based on the address you get, assign your storage pointer to the next 128-aligned address.  But you're offloading to run-time lot's of housekeeping.  Again, maybe tolerable for just these large arrays.  But it starts to add a lot of corner cases.  Walter might have some good suggestions here.

Jason




----- Original Message ----
> From: Steve Schveighoffer <

> A question then --  let's say you have an array of doubles, which are 8 bytes wide, and you want to use these SSE instructions.  Even if the first one is aligned on a 16-byte boundary, wouldn't every other double be misaligned?
June 28, 2010
Thanks, this information helps a lot!

I will make the change to 16-byte aligned.  I'm already using 8 bytes for a 4 byte length.  Using 16 bytes isn't much different, especially when the block size is 4096+ bytes.

One final question -- I currently use sizeof(size_t) * 2, which could now be sizeof(size_t) * 4, but of course, this changes to 32 bytes on 64-bit dmd.  Would it make sense to just use 16 instead of some multiple of size_t?

-Steve


----- Original Message ----
> From: Jason Spencer <spencer8 at sbcglobal.net>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Mon, June 28, 2010 4:09:01 PM
> Subject: Re: [phobos] byte alignment for arrays
> 
> Sorry, I forgot to address the every-other-one concern.

The MMX registers
> are 64-bits, so you can only do 1 double at a time.  Those instructions only require 8-byte aligned memory.  The SSE instructions use 128-bit registers, so they take 2 doubles at a time.  As long as the first one is 16-byte aligned, you can iterate through on 16-byte (128 bits) chunks, and you'll be good.  That's why element 0 should be 128-aligned.

If it's
> not, the processor will either have an alignment fault (in the instruction requires alignment) or will do a bunch of split-loads across cache lines, which kill performance.

One other thought:  If you wanted to be
> tricky, you could do a general, 4-byte allocation and based on the address you get, assign your storage pointer to the next 128-aligned address.  But you're offloading to run-time lot's of housekeeping.  Again, maybe tolerable for just these large arrays.  But it starts to add a lot of corner cases.  Walter might have some good suggestions here.

Jason




----- Original Message ----
> From: Steve Schveighoffer <

> A question then --  let's say you have an array of doubles, which are 8 bytes wide, and you want to
> 
> use these SSE instructions.  Even if the first one is aligned on a 16-byte
> 
> boundary, wouldn't every other double be
> 
> misaligned?
_______________________________________________
phobos mailing
> list

> href="mailto:phobos at puremagic.com">phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos



June 28, 2010
Hmmm.  The natural thing would be to have some type to describe these 128-bit values (akin to __m128 in gcc, Intel and MS compilers) and use sizeof on that.  I don't see that D has any MMX/SSE intrinsics, so I don't know if there is a standard type.  If you don't have such a thing defined by the compiler, I'd be tempted to define it, based on which version of the compiler will compile this code (i.e. 32- or 64-bit dmd).  Then you can use that in your sizeof.  Maybe you'll get lucky, and that will become standard :)

Jason



----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Mon, June 28, 2010 1:35:59 PM
> Subject: Re: [phobos] byte alignment for arrays
> 
> Thanks, this information helps a lot!

I will make the change to 16-byte
> aligned.  I'm already using 8 bytes for a 4 byte length.  Using 16 bytes isn't much different, especially when the block size is 4096+ bytes.

One final question -- I currently use sizeof(size_t) * 2, which
> could now be sizeof(size_t) * 4, but of course, this changes to 32 bytes on 64-bit dmd.  Would it make sense to just use 16 instead of some multiple of size_t?

-Steve


----- Original Message ----
> From: Jason
> Spencer <
> href="mailto:spencer8 at sbcglobal.net">spencer8 at sbcglobal.net>
> To:
> Discuss the phobos library for D <
> href="mailto:phobos at puremagic.com">phobos at puremagic.com>
> Sent:
> Mon, June 28, 2010 4:09:01 PM
> Subject: Re: [phobos] byte alignment for
> arrays
> 
> Sorry, I forgot to address the every-other-one concern.

The MMX registers
> are 64-bits, so you can only do 1 double at a time.  Those instructions only require 8-byte aligned memory.  The SSE instructions use 128-bit registers, so they take 2 doubles at a time.  As long as the first one is 16-byte aligned, you can iterate through on 16-byte (128 bits) chunks, and you'll be good.  That's why element 0 should be 128-aligned.

If it's
> 
> not, the processor will either have an alignment fault (in the instruction
> 
> requires alignment) or will do a bunch of split-loads across cache lines, which kill performance.

One other thought:
> If you wanted to be tricky, you could do a general, 4-byte allocation and based on the address you get, assign your storage pointer to the next 128-aligned address.  But you're offloading to run-time lot's of housekeeping.  Again, maybe tolerable for just these large arrays.  But it starts to add a lot of corner cases.  Walter might have some good suggestions
> 
> here.

Jason




----- Original Message ----
> From:
> 
> Steve Schveighoffer <

> A question then --  let's say you have an array of doubles, which are 8 bytes wide, and you want to
> 
> use these SSE instructions.  Even if the first one is aligned on a 16-byte
> 
> boundary, wouldn't every other double be
> 
> 
> misaligned?
_______________________________________________
phobos mailing
> 
> list

> href="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com"
> href="mailto:phobos at puremagic.com">phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos



> 
_______________________________________________
phobos
> mailing list

> href="mailto:phobos at puremagic.com">phobos at puremagic.com

> href="http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank
> >http://lists.puremagic.com/mailman/listinfo/phobos
June 28, 2010
All,

Should there be something in the runtime that defines the minimum align size for things like memory blocks?  That might make this easier to deal with from a design perspective...

-Steve



----- Original Message ----
> From: Jason Spencer <spencer8 at sbcglobal.net>
> 
> Hmmm.  The natural thing would be to have some type to describe these 128-bit values (akin to __m128 in gcc, Intel and MS compilers) and use sizeof on that.  I don't see that D has any MMX/SSE intrinsics, so I don't know if there is a standard type.  If you don't have such a thing defined by the compiler, I'd be tempted to define it, based on which version of the compiler will compile this code (i.e. 32- or 64-bit dmd).  Then you can use that in your sizeof.  Maybe you'll get lucky, and that will become standard :)




June 29, 2010
I think it should be enough just to make it a documented requirement for allocators.  It's not like it will ever change, right?

On Jun 28, 2010, at 2:07 PM, Steve Schveighoffer wrote:

> All,
> 
> Should there be something in the runtime that defines the minimum align size for things like memory blocks?  That might make this easier to deal with from a design perspective...
> 
> ----- Original Message ----
>> From: Jason Spencer <spencer8 at sbcglobal.net>
>> 
>> Hmmm.  The natural thing would be to have some type to describe these 128-bit values (akin to __m128 in gcc, Intel and MS compilers) and use sizeof on that.  I don't see that D has any MMX/SSE intrinsics, so I don't know if there is a standard type.  If you don't have such a thing defined by the compiler, I'd be tempted to define it, based on which version of the compiler will compile this code (i.e. 32- or 64-bit dmd).  Then you can use that in your sizeof.  Maybe you'll get lucky, and that will become standard :)
> 
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos

« First   ‹ Prev
1 2