DMD 1.034 and 2.018 releases

Re: DMD 1.034 and 2.018 releases

Aug 11, 2008

Pete

Aug 11, 2008

Aug 13, 2008

Aug 13, 2008

Aug 14, 2008

Aug 14, 2008

Walter Bright Wrote: > This one has (finally) got array operations implemented. For those who want to show off their leet assembler skills, the initial assembler implementation code is in phobos/internal/array*.d. Burton Radons wrote the assembler. Can you make it faster? > > http://www.digitalmars.com/d/1.0/changelog.html http://ftp.digitalmars.com/dmd.1.034.zip > > http://www.digitalmars.com/d/2.0/changelog.html http://ftp.digitalmars.com/dmd.2.018.zip Not sure if someone else has already mentioned this but would it be possible for the compiler to align these arrays on 16 byte boundaries in order to maximise any possible vector efficiency. AFAIK you can't actually specify align anything higher than align 8 at the moment which is a bit of a problem. Regards,

Pete wrote: > Not sure if someone else has already mentioned this but would it be > possible for the compiler to align these arrays on 16 byte boundaries > in order to maximise any possible vector efficiency. AFAIK you can't > actually specify align anything higher than align 8 at the moment > which is a bit of a problem. Anything allocated with new will be aligned on 16 byte boundaries.

On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote: > Walter Bright Wrote: >> This one has (finally) got array operations implemented. For those who want to show off their leet assembler skills, the initial assembler implementation code is in phobos/internal/array*.d. Burton Radons wrote the assembler. Can you make it faster? > > Not sure if someone else has already mentioned this but would it be possible for the compiler to align these arrays on 16 byte boundaries in order to maximise any possible vector efficiency. AFAIK you can't actually specify align anything higher than align 8 at the moment which is a bit of a problem. From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.: a = 0xf00d0013 (3 mod 16) b = 0xdeaffff3 (3 mod 16) In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest. This would also work for slices, at least when both slices have the same alignment remainder. I'm just not sure what overhead such a solution would impose for small arrays. Georg -- || http://op-co.de ++ GCS/CM d? s: a-- C+++ UL+++ !P L+++ E--- W++ ++ || gpg: 0x962FD2DE || N++ o? K- w---() O M V? PS+ PE-- Y+ PGP++ t* || || Ge0rG: euIRCnet || 5 X+ R tv b+(+++) DI+(+++) D+ G e* h! r* !y+ || ++ IRCnet OFTC OPN ||________________________________________________||

Georg Lukas wrote: > On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote: >> Walter Bright Wrote: >>> This one has (finally) got array operations implemented. For those who >>> want to show off their leet assembler skills, the initial assembler >>> implementation code is in phobos/internal/array*.d. Burton Radons wrote >>> the assembler. Can you make it faster? >> Not sure if someone else has already mentioned this but would it be >> possible for the compiler to align these arrays on 16 byte boundaries in >> order to maximise any possible vector efficiency. AFAIK you can't >> actually specify align anything higher than align 8 at the moment which >> is a bit of a problem. > > From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.: > > a = 0xf00d0013 (3 mod 16) > b = 0xdeaffff3 (3 mod 16) > > In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest. > > This would also work for slices, at least when both slices have the same alignment remainder. I'm just not sure what overhead such a solution would impose for small arrays. Just begin with a check for minimal size. If less than that size, don't use SSE at all. > > Georg

August 14, 2008

Re: DMD 1.034 and 2.018 releases

Posted by Dave
in reply to Don

Permalink

Dave

Posted in reply to Don

Permalink

"Don" <nospam@nospam.com.au> wrote in message news:g7u36h$20j0$1@digitalmars.com...
> Georg Lukas wrote:
>> On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote:
>>> Walter Bright Wrote:
>>>> This one has (finally) got array operations implemented. For those who
>>>> want to show off their leet assembler skills, the initial assembler
>>>> implementation code is in phobos/internal/array*.d. Burton Radons wrote
>>>> the assembler. Can you make it faster?
>>> Not sure if someone else has already mentioned this but would it be
>>> possible for the compiler to align these arrays on 16 byte boundaries in
>>> order to maximise any possible vector efficiency. AFAIK you can't
>>> actually specify align anything higher than align 8 at the moment which
>>> is a bit of a problem.
>>
>> From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.:
>>
>> a = 0xf00d0013 (3 mod 16)
>> b = 0xdeaffff3 (3 mod 16)
>>
>> In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest.

Good idea. Right now in that code there is (usually) a case for both un/aligned.

It typically goes like this:

if(cpu_has_sse2 && a.length > min_size)
{
   if(((cast(size_t) aptr | cast(size_t)bptr | cast(size_t)cptr) & 15) != 0)
   {    // Unaligned case
   asm
   {
   ...
   movdqu  XMM0, [EAX]
   ...
   }
   }
   else
   {    // Aligned case
   asm
   {
   ...
   movdqa  XMM0, [EAX]
   ...
   }
   }
}

The two blocks of asm code is basically identical except for the un/aligned SSE opcodes.

With your idea, one could get rid of the test for alignment, probably some bloat and a whole lot of duplication. I guess the question would be if the overhead of your idea would be less than the current design.

- Dave

>>
>> This would also work for slices, at least when both slices have the same alignment remainder. I'm just not sure what overhead such a solution would impose for small arrays.
>
> Just begin with a check for minimal size. If less than that size, don't use SSE at all.
>
>>
>> Georg

Georg Lukas wrote: > On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote: >> Walter Bright Wrote: >>> This one has (finally) got array operations implemented. For those who >>> want to show off their leet assembler skills, the initial assembler >>> implementation code is in phobos/internal/array*.d. Burton Radons wrote >>> the assembler. Can you make it faster? >> Not sure if someone else has already mentioned this but would it be >> possible for the compiler to align these arrays on 16 byte boundaries in >> order to maximise any possible vector efficiency. AFAIK you can't >> actually specify align anything higher than align 8 at the moment which >> is a bit of a problem. > > From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.: > > a = 0xf00d0013 (3 mod 16) > b = 0xdeaffff3 (3 mod 16) > > In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest. > > This would also work for slices, at least when both slices have the same alignment remainder. I'm just not sure what overhead such a solution would impose for small arrays. There would be some overhead for small arrays however as I said in my previous email, if your using a small array then its likely that your not doing much. If it is a performance issue you should switch to a larger array (by grouping all your smaller ones together). Of course there's the edge case where some actually needs to do a g-billion operations on exactly the same small array. > > Georg -Joel

Forums