April 23, 2007 Re: summing large arrays - or - performance of tight loops | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Dave | Dave wrote: > Craig Black wrote: >>> I think the "trust the compiler" philosophy is for stability (20 years old), not optimizations. >> >> Nope. Read the documentation on foreach. It says specifically that foreach is preferred over other looping mechanisms because the compiler will optimize it. >> >> -Craig >> > > Sorry - I misunderstood you, and hadn't seen that documentation. > I still haven't seen that documentation 'cause I can't find it in the foreach section. Could you point it out? > Thanks, > > - Dave | |||
April 23, 2007 Re: summing large arrays - or - performance of tight loops | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Dan Attachments: | Dan schrieb am 2007-04-23:
>> > Using a 1024 x 1024 x 64 array, I got:
>> >
>> > P4: 97% (linux32 FC5)
>> > AMD64: 92% (WinXP32)
>> >
>> > So, the array size seems to make some difference, at least on AMD machines.
>>
>> The results strongly depend on the memory architecture and to a lesser extend on the element values. I've put an updated version online that contains results for byte, short, int, long, float and double.
>
> Actually, the size of the data type doesn't matter at all for a properly implemented algorithm - as a general rule, you implement a duff's device to align and then use the largest sized instruction you can fit. Right now the SSE2 instruction "movaps" is quite effective for copying memory.
That's what I thought too, but while my SSE version for float and double didn't have the worst performance they were by no means the fastest.
Thomas
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply