std.math performance (SSE vs. real) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.math performance (SSE vs. real)

Thread overview

std.math performance (SSE vs. real)
Jun 27, 2014 David Nadlinger
Jun 27, 2014 Tofu Ninja
Jun 27, 2014 H. S. Teoh
Jun 27, 2014 Jerry
Jun 27, 2014 Russel Winder
Jun 27, 2014 Iain Buclaw
Jun 27, 2014 Iain Buclaw
Jun 27, 2014 hane
Jun 27, 2014 David Nadlinger
Jun 27, 2014 Iain Buclaw
Jun 27, 2014 David Nadlinger
Jun 28, 2014 Iain Buclaw
Jun 27, 2014 Manu
Jun 27, 2014 John Colvin
Jun 27, 2014 Remo
Jun 27, 2014 Russel Winder
Jun 27, 2014 dennis luehring
Jun 27, 2014 John Colvin
Jun 28, 2014 Russel Winder
Jun 27, 2014 Element 126
Jun 27, 2014 Iain Buclaw
Jun 27, 2014 Kai Nacke
Jun 27, 2014 Kagamin
Jun 27, 2014 Element 126
Jun 28, 2014 Russel Winder
Jun 28, 2014 Walter Bright
Jun 28, 2014 Walter Bright
Jun 28, 2014 John Colvin
Jun 28, 2014 francesco cattoglio
Jun 28, 2014 Walter Bright
Jun 28, 2014 Russel Winder
Jun 29, 2014 Walter Bright
Jun 29, 2014 H. S. Teoh
Jun 29, 2014 deadalnix
Jun 29, 2014 Paolo Invernizzi
Jun 29, 2014 Walter Bright
Jun 29, 2014 Andrei Alexandrescu
Jun 29, 2014 Iain Buclaw
Jun 29, 2014 David Nadlinger
Jun 29, 2014 Andrei Alexandrescu
Jun 29, 2014 Walter Bright
Jun 29, 2014 Russel Winder
Jun 29, 2014 Walter Bright
Jun 29, 2014 David Nadlinger
Jun 29, 2014 Russel Winder
Jun 29, 2014 Andrei Alexandrescu
Jun 30, 2014 Don
Jun 30, 2014 Ola Fosheim Grøstad
Jun 30, 2014 Walter Bright
Jun 30, 2014 Sean Kelly
Jun 30, 2014 Walter Bright
Jun 30, 2014 Sean Kelly
Jun 30, 2014 JR
Jun 29, 2014 Sean Kelly
Jun 28, 2014 francesco cattoglio
Jun 29, 2014 Walter Bright
Jun 29, 2014 francesco cattoglio
Jun 29, 2014 Walter Bright
Jun 28, 2014 Andrei Alexandrescu
Jun 28, 2014 John Colvin
Jun 29, 2014 Walter Bright
Jun 29, 2014 Andrei Alexandrescu
Jun 28, 2014 Alex_Dovhal
Jun 28, 2014 H. S. Teoh
Jun 29, 2014 Walter Bright
Jun 28, 2014 Russel Winder
Jun 28, 2014 John Colvin
Jun 28, 2014 Element 126
Jun 29, 2014 Walter Bright
Jun 29, 2014 Russel Winder
Jun 29, 2014 David Nadlinger
Jun 29, 2014 Walter Bright
Jun 29, 2014 Russel Winder
Jun 29, 2014 Walter Bright
Jun 29, 2014 Walter Bright
Jun 29, 2014 Timon Gehr
Jun 29, 2014 Walter Bright
Jun 29, 2014 Russel Winder
Jun 29, 2014 Walter Bright
Jun 29, 2014 John Colvin
Jun 29, 2014 Iain Buclaw
Jun 29, 2014 Walter Bright
Jun 30, 2014 Element 126
Jun 30, 2014 John Colvin
Jun 29, 2014 Russel Winder
Jun 29, 2014 Walter Bright
Jun 30, 2014 Ola Fosheim Grøstad
Jun 30, 2014 Walter Bright
Jun 30, 2014 Ola Fosheim Grøstad
Jun 30, 2014 Walter Bright
Jul 01, 2014 Ola Fosheim Grøstad
Jul 01, 2014 Walter Bright
Jul 01, 2014 Ola Fosheim Grøstad
Jul 02, 2014 Iain Buclaw
Jul 02, 2014 Ola Fosheim Grøstad
Jul 02, 2014 Iain Buclaw
Jul 02, 2014 Ola Fosheim Grøstad
Jul 02, 2014 Iain Buclaw
Jul 03, 2014 Walter Bright
Jul 03, 2014 Iain Buclaw
Jul 03, 2014 Walter Bright
Jul 03, 2014 Joseph Rushton Wakeling
Jul 02, 2014 Wanderer
Jul 02, 2014 Ola Fosheim Grøstad
Jul 02, 2014 Walter Bright
Jul 03, 2014 Ola Fosheim Grøstad
Jul 03, 2014 Iain Buclaw
Jul 03, 2014 H. S. Teoh
Jul 04, 2014 Walter Bright
Jul 04, 2014 Paolo Invernizzi
Jul 04, 2014 Max Samukha
Jul 04, 2014 Walter Bright
Jul 04, 2014 Ola Fosheim Grøstad
Jul 04, 2014 bearophile
Jul 04, 2014 Ola Fosheim Grøstad
Jul 04, 2014 Walter Bright
Jul 04, 2014 Ola Fosheim Grøstad
Jul 04, 2014 Walter Bright
Jul 04, 2014 Ola Fosheim Grøstad
Jul 04, 2014 Walter Bright
Jul 04, 2014 Ola Fosheim Grøstad
Jul 05, 2014 Paolo Invernizzi
Jul 05, 2014 Ola Fosheim Grøstad
Jul 05, 2014 Paolo Invernizzi
Jul 05, 2014 Ola Fosheim Grøstad
Jul 05, 2014 Iain Buclaw
Jul 05, 2014 Ola Fosheim Grøstad
Jul 05, 2014 bearophile
Jul 05, 2014 Ola Fosheim Grøstad
Jul 06, 2014 bearophile
Jul 06, 2014 Ola Fosheim Grøstad
Jul 05, 2014 Iain Buclaw
Jul 05, 2014 Ola Fosheim Grøstad
Jul 04, 2014 Walter Bright
Jul 05, 2014 Russel Winder
Jul 05, 2014 Ola Fosheim Grøstad
Jul 03, 2014 Paolo Invernizzi
Jun 29, 2014 Iain Buclaw
Jun 29, 2014 H. S. Teoh
Jun 29, 2014 deadalnix
Jun 30, 2014 Manu
Jun 28, 2014 Andrei Alexandrescu
Jun 29, 2014 Walter Bright
Jun 29, 2014 Tofu Ninja
Jun 29, 2014 Andrei Alexandrescu
Jun 29, 2014 H. S. Teoh
Jun 29, 2014 Iain Buclaw
Jun 29, 2014 H. S. Teoh
Jun 29, 2014 Iain Buclaw
Jun 30, 2014 H. S. Teoh
Jun 30, 2014 Tofu Ninja
Jun 30, 2014 H. S. Teoh
Jun 30, 2014 Tofu Ninja
Jun 30, 2014 Walter Bright
Jun 30, 2014 Temtaime
Jun 30, 2014 Chris Cain
Jun 30, 2014 Walter Bright
Jun 29, 2014 Timon Gehr
Jun 29, 2014 Andrei Alexandrescu
Jun 30, 2014 Manu
Jun 30, 2014 Walter Bright
Jun 30, 2014 Manu
Jun 30, 2014 Walter Bright
Jun 30, 2014 ed
Jun 30, 2014 Walter Bright
Jun 30, 2014 dennis luehring
Jun 30, 2014 dennis luehring
Jun 30, 2014 Don
Jun 30, 2014 ed
Jun 30, 2014 Walter Bright
Jul 01, 2014 Don
Jul 01, 2014 Walter Bright
Jul 02, 2014 Don
Jul 02, 2014 Iain Buclaw
Jul 02, 2014 Walter Bright
Jul 02, 2014 Sean Kelly
Jul 02, 2014 Walter Bright
Jul 02, 2014 Sean Kelly
Jul 03, 2014 Walter Bright
Jul 03, 2014 H. S. Teoh
Jul 03, 2014 Sean Kelly
Jul 03, 2014 Iain Buclaw
Jul 03, 2014 Walter Bright
Jul 03, 2014 Jonathan M Davis
Jul 03, 2014 Walter Bright
Jul 03, 2014 Jonathan M Davis
Jul 03, 2014 Sean Kelly
Jul 04, 2014 Jonathan M Davis
Jul 03, 2014 Iain Buclaw
Jul 03, 2014 Joseph Rushton Wakeling
Jul 03, 2014 Iain Buclaw
Jul 04, 2014 Joseph Rushton Wakeling
Jul 04, 2014 Iain Buclaw
Jul 03, 2014 Walter Bright
Jul 03, 2014 Sean Kelly
Jul 03, 2014 Walter Bright
Jul 03, 2014 Tofu Ninja
Jul 04, 2014 Walter Bright
Jul 04, 2014 Don
Jul 04, 2014 Walter Bright
Jul 04, 2014 John Colvin
Jul 04, 2014 Walter Bright
Jul 08, 2014 Don
Jul 04, 2014 Daniel Murphy
Jul 04, 2014 Iain Buclaw
Jul 04, 2014 Daniel Murphy
Jul 04, 2014 Iain Buclaw
Jul 04, 2014 Daniel Murphy
Jul 04, 2014 Iain Buclaw
Jul 04, 2014 Walter Bright
Jul 04, 2014 Walter Bright
Jul 02, 2014 Iain Buclaw
Jun 30, 2014 Andrei Alexandrescu
Jun 30, 2014 H. S. Teoh
Jun 30, 2014 David Nadlinger
Jun 30, 2014 Walter Bright
Jul 01, 2014 dennis luehring
Jul 01, 2014 Iain Buclaw
Jul 01, 2014 Daniel Murphy
Jun 28, 2014 Walter Bright
Jun 29, 2014 Kapps
Jun 30, 2014 Manu
Jun 27, 2014 Iain Buclaw
Jun 27, 2014 Kagamin
Jun 28, 2014 Walter Bright

June 27, 2014

std.math performance (SSE vs. real)

Posted by David Nadlinger

David Nadlinger

Hi all,

right now, the use of std.math over core.stdc.math can cause a huge performance problem in typical floating point graphics code. An instance of this has recently been discussed here in the "Perlin noise benchmark speed" thread [1], where even LDC, which already beat DMD by a factor of two, generated code more than twice as slow as that by Clang and GCC. Here, the use of floor() causes trouble. [2]

Besides the somewhat slow pure D implementations in std.math, the biggest problem is the fact that std.math almost exclusively uses reals in its API. When working with single- or double-precision floating point numbers, this is not only more data to shuffle around than necessary, but on x86_64 requires the caller to transfer the arguments from the SSE registers onto the x87 stack and then convert the result back again. Needless to say, this is a serious performance hazard. In fact, this accounts for an 1.9x slowdown in the above benchmark with LDC.

Because of this, I propose to add float and double overloads (at the very least the double ones) for all of the commonly used functions in std.math. This is unlikely to break much code, but:
 a) Somebody could rely on the fact that the calls effectively widen the calculation to 80 bits on x86 when using type deduction.
 b) Additional overloads make e.g. "&floor" ambiguous without context, of course.

What do you think?

Cheers,
David


[1] http://forum.dlang.org/thread/lo19l7$n2a$1@digitalmars.com
[2] Fun fact: As the program happens only deal with positive numbers, the author could have just inserted an int-to-float cast, sidestepping the issue altogether. All the other language implementations have the floor() call too, though, so it doesn't matter for this discussion.

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Tofu Ninja
in reply to David Nadlinger

Tofu Ninja

Posted in reply to David Nadlinger

On Friday, 27 June 2014 at 01:31:17 UTC, David Nadlinger wrote:
> Hi all,
>
> right now, the use of std.math over core.stdc.math can cause a huge performance problem in typical floating point graphics code. An instance of this has recently been discussed here in the "Perlin noise benchmark speed" thread [1], where even LDC, which already beat DMD by a factor of two, generated code more than twice as slow as that by Clang and GCC. Here, the use of floor() causes trouble. [2]
>
> Besides the somewhat slow pure D implementations in std.math, the biggest problem is the fact that std.math almost exclusively uses reals in its API. When working with single- or double-precision floating point numbers, this is not only more data to shuffle around than necessary, but on x86_64 requires the caller to transfer the arguments from the SSE registers onto the x87 stack and then convert the result back again. Needless to say, this is a serious performance hazard. In fact, this accounts for an 1.9x slowdown in the above benchmark with LDC.
>
> Because of this, I propose to add float and double overloads (at the very least the double ones) for all of the commonly used functions in std.math. This is unlikely to break much code, but:
>  a) Somebody could rely on the fact that the calls effectively widen the calculation to 80 bits on x86 when using type deduction.
>  b) Additional overloads make e.g. "&floor" ambiguous without context, of course.
>
> What do you think?
>
> Cheers,
> David
>
>
> [1] http://forum.dlang.org/thread/lo19l7$n2a$1@digitalmars.com
> [2] Fun fact: As the program happens only deal with positive numbers, the author could have just inserted an int-to-float cast, sidestepping the issue altogether. All the other language implementations have the floor() call too, though, so it doesn't matter for this discussion.

I honestly alway thought that it was a little odd that it forced conversion to real. Personally I support this. It would also make generic code that calls math functions more simple as it wouldn't require casts back.

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by H. S. Teoh
in reply to Tofu Ninja

H. S. Teoh

Posted in reply to Tofu Ninja

On Fri, Jun 27, 2014 at 02:09:59AM +0000, Tofu Ninja via Digitalmars-d wrote:
> On Friday, 27 June 2014 at 01:31:17 UTC, David Nadlinger wrote:
[...]
> >Because of this, I propose to add float and double overloads (at the very least the double ones) for all of the commonly used functions in std.math.  This is unlikely to break much code, but:
> > a) Somebody could rely on the fact that the calls effectively widen
> > the calculation to 80 bits on x86 when using type deduction.
> > b) Additional overloads make e.g. "&floor" ambiguous without
> > context, of course.
> >
> >What do you think?
[...]
> I honestly alway thought that it was a little odd that it forced conversion to real. Personally I support this. It would also make generic code that calls math functions more simple as it wouldn't require casts back.

I support this too.


T

-- 
It is impossible to make anything foolproof because fools are so ingenious. -- Sammy

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Jerry
in reply to H. S. Teoh

Jerry

Posted in reply to H. S. Teoh

"H. S. Teoh via Digitalmars-d" <digitalmars-d@puremagic.com> writes:

> On Fri, Jun 27, 2014 at 02:09:59AM +0000, Tofu Ninja via Digitalmars-d wrote:
>> On Friday, 27 June 2014 at 01:31:17 UTC, David Nadlinger wrote:
> [...]
>> >Because of this, I propose to add float and double overloads (at the very least the double ones) for all of the commonly used functions in std.math.  This is unlikely to break much code, but:
>> > a) Somebody could rely on the fact that the calls effectively widen
>> > the calculation to 80 bits on x86 when using type deduction.
>> > b) Additional overloads make e.g. "&floor" ambiguous without
>> > context, of course.
>> >
>> >What do you think?
> [...]
>> I honestly alway thought that it was a little odd that it forced conversion to real. Personally I support this. It would also make generic code that calls math functions more simple as it wouldn't require casts back.
>
> I support this too.

Me three.  This seems like an unnecessary pessimisation and it would be irritating for D to become associated with slow fp math.

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Russel Winder
in reply to Jerry

Russel Winder

Posted in reply to Jerry

Attachments:

signature.asc (This is a digitally signed message part)

On Thu, 2014-06-26 at 23:28 -0400, Jerry via Digitalmars-d wrote:
> "H. S. Teoh via Digitalmars-d" <digitalmars-d@puremagic.com> writes:
> 
> > On Fri, Jun 27, 2014 at 02:09:59AM +0000, Tofu Ninja via Digitalmars-d wrote:
> >> On Friday, 27 June 2014 at 01:31:17 UTC, David Nadlinger wrote:
> > [...]
> >> >Because of this, I propose to add float and double overloads (at the very least the double ones) for all of the commonly used functions in std.math.  This is unlikely to break much code, but:
> >> > a) Somebody could rely on the fact that the calls effectively widen
> >> > the calculation to 80 bits on x86 when using type deduction.
> >> > b) Additional overloads make e.g. "&floor" ambiguous without
> >> > context, of course.
> >> >
> >> >What do you think?
> > [...]
> >> I honestly alway thought that it was a little odd that it forced conversion to real. Personally I support this. It would also make generic code that calls math functions more simple as it wouldn't require casts back.
> >
> > I support this too.
> 
> Me three.  This seems like an unnecessary pessimisation and it would be irritating for D to become associated with slow fp math.

So has anyone got a pull request ready?

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Iain Buclaw
in reply to David Nadlinger

Iain Buclaw

Posted in reply to David Nadlinger

On 27 June 2014 02:31, David Nadlinger via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> Hi all,
>
> right now, the use of std.math over core.stdc.math can cause a huge performance problem in typical floating point graphics code. An instance of this has recently been discussed here in the "Perlin noise benchmark speed" thread [1], where even LDC, which already beat DMD by a factor of two, generated code more than twice as slow as that by Clang and GCC. Here, the use of floor() causes trouble. [2]
>
> Besides the somewhat slow pure D implementations in std.math, the biggest problem is the fact that std.math almost exclusively uses reals in its API. When working with single- or double-precision floating point numbers, this is not only more data to shuffle around than necessary, but on x86_64 requires the caller to transfer the arguments from the SSE registers onto the x87 stack and then convert the result back again. Needless to say, this is a serious performance hazard. In fact, this accounts for an 1.9x slowdown in the above benchmark with LDC.
>
> Because of this, I propose to add float and double overloads (at the very
> least the double ones) for all of the commonly used functions in std.math.
> This is unlikely to break much code, but:
>  a) Somebody could rely on the fact that the calls effectively widen the
> calculation to 80 bits on x86 when using type deduction.
>  b) Additional overloads make e.g. "&floor" ambiguous without context, of
> course.
>
> What do you think?
>
> Cheers,
> David
>

This is the reason why floor is slow, it has an array copy operation.

---
  auto vu = *cast(ushort[real.sizeof/2]*)(&x);
---

I didn't like it at the time I wrote, but at least it prevented the compiler (gdc) from removing all bit operations that followed.

If there is an alternative to the above, then I'd imagine that would speed up floor by tenfold.

Regards
Iain

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Iain Buclaw

Iain Buclaw

On 27 June 2014 07:14, Iain Buclaw <ibuclaw@gdcproject.org> wrote:
> On 27 June 2014 02:31, David Nadlinger via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> Hi all,
>>
>> right now, the use of std.math over core.stdc.math can cause a huge performance problem in typical floating point graphics code. An instance of this has recently been discussed here in the "Perlin noise benchmark speed" thread [1], where even LDC, which already beat DMD by a factor of two, generated code more than twice as slow as that by Clang and GCC. Here, the use of floor() causes trouble. [2]
>>
>> Besides the somewhat slow pure D implementations in std.math, the biggest problem is the fact that std.math almost exclusively uses reals in its API. When working with single- or double-precision floating point numbers, this is not only more data to shuffle around than necessary, but on x86_64 requires the caller to transfer the arguments from the SSE registers onto the x87 stack and then convert the result back again. Needless to say, this is a serious performance hazard. In fact, this accounts for an 1.9x slowdown in the above benchmark with LDC.
>>
>> Because of this, I propose to add float and double overloads (at the very
>> least the double ones) for all of the commonly used functions in std.math.
>> This is unlikely to break much code, but:
>>  a) Somebody could rely on the fact that the calls effectively widen the
>> calculation to 80 bits on x86 when using type deduction.
>>  b) Additional overloads make e.g. "&floor" ambiguous without context, of
>> course.
>>
>> What do you think?
>>
>> Cheers,
>> David
>>
>
> This is the reason why floor is slow, it has an array copy operation.
>
> ---
>   auto vu = *cast(ushort[real.sizeof/2]*)(&x);
> ---
>
> I didn't like it at the time I wrote, but at least it prevented the compiler (gdc) from removing all bit operations that followed.
>
> If there is an alternative to the above, then I'd imagine that would speed up floor by tenfold.
>

Can you test with this?

https://github.com/D-Programming-Language/phobos/pull/2274

Float and Double implementations of floor/ceil are trivial and I can add later.

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by hane
in reply to Iain Buclaw

hane

Posted in reply to Iain Buclaw

On Friday, 27 June 2014 at 06:48:44 UTC, Iain Buclaw via Digitalmars-d wrote:
> On 27 June 2014 07:14, Iain Buclaw <ibuclaw@gdcproject.org> wrote:
>> On 27 June 2014 02:31, David Nadlinger via Digitalmars-d
>> <digitalmars-d@puremagic.com> wrote:
>>> Hi all,
>>>
>>> right now, the use of std.math over core.stdc.math can cause a huge
>>> performance problem in typical floating point graphics code. An instance of
>>> this has recently been discussed here in the "Perlin noise benchmark speed"
>>> thread [1], where even LDC, which already beat DMD by a factor of two,
>>> generated code more than twice as slow as that by Clang and GCC. Here, the
>>> use of floor() causes trouble. [2]
>>>
>>> Besides the somewhat slow pure D implementations in std.math, the biggest
>>> problem is the fact that std.math almost exclusively uses reals in its API.
>>> When working with single- or double-precision floating point numbers, this
>>> is not only more data to shuffle around than necessary, but on x86_64
>>> requires the caller to transfer the arguments from the SSE registers onto
>>> the x87 stack and then convert the result back again. Needless to say, this
>>> is a serious performance hazard. In fact, this accounts for an 1.9x slowdown
>>> in the above benchmark with LDC.
>>>
>>> Because of this, I propose to add float and double overloads (at the very
>>> least the double ones) for all of the commonly used functions in std.math.
>>> This is unlikely to break much code, but:
>>>  a) Somebody could rely on the fact that the calls effectively widen the
>>> calculation to 80 bits on x86 when using type deduction.
>>>  b) Additional overloads make e.g. "&floor" ambiguous without context, of
>>> course.
>>>
>>> What do you think?
>>>
>>> Cheers,
>>> David
>>>
>>
>> This is the reason why floor is slow, it has an array copy operation.
>>
>> ---
>>   auto vu = *cast(ushort[real.sizeof/2]*)(&x);
>> ---
>>
>> I didn't like it at the time I wrote, but at least it prevented the
>> compiler (gdc) from removing all bit operations that followed.
>>
>> If there is an alternative to the above, then I'd imagine that would
>> speed up floor by tenfold.
>>
>
> Can you test with this?
>
> https://github.com/D-Programming-Language/phobos/pull/2274
>
> Float and Double implementations of floor/ceil are trivial and I can add later.

Nice! I tested with the Perlin noise benchmark, and it got faster(in my environment, 1.030s -> 0.848s).
But floor still consumes almost half of the execution time.

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by David Nadlinger
in reply to hane

David Nadlinger

Posted in reply to hane

On Friday, 27 June 2014 at 09:37:54 UTC, hane wrote:
> On Friday, 27 June 2014 at 06:48:44 UTC, Iain Buclaw via Digitalmars-d wrote:
>> Can you test with this?
>>
>> https://github.com/D-Programming-Language/phobos/pull/2274
>>
>> Float and Double implementations of floor/ceil are trivial and I can add later.
>
> Nice! I tested with the Perlin noise benchmark, and it got faster(in my environment, 1.030s -> 0.848s).
> But floor still consumes almost half of the execution time.

Wait, so DMD and GDC did actually emit a memcpy/… here? LDC doesn't, and the change didn't have much of an impact on performance.

What _does_ have a significant impact, however, is that the whole of floor() for doubles can be optimized down to
    roundsd <…>,<…>,0x1
when targeting SSE 4.1, or
    vroundsd <…>,<…>,<…>,0x1
when targeting AVX.

This is why std.math will need to build on top of compiler-recognizable primitives. Iain, Don, how do you think we should handle this? One option would be to build std.math based on an extended core.math with functions that are recognized as intrinsics or suitably implemented in the compiler-specific runtimes. The other option would be for me to submit LDC-specific implementations to Phobos.

Cheers,
David

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by David Nadlinger
in reply to hane

David Nadlinger

Posted in reply to hane

On Friday, 27 June 2014 at 09:37:54 UTC, hane wrote:
> Nice! I tested with the Perlin noise benchmark, and it got faster(in my environment, 1.030s -> 0.848s).
> But floor still consumes almost half of the execution time.

Oh, and by the way, my optimized version (simply replace floor() in perlin_noise.d with a call to llvm_floor() from ldc.intrinsics) is 2.8x faster than the original one on my machine (both with -mcpu=native).

David

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation