std.math performance (SSE vs. real) (page 14) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.math performance (SSE vs. real) (page 14)

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by David Nadlinger
in reply to H. S. Teoh

David Nadlinger

Posted in reply to H. S. Teoh

On Monday, 30 June 2014 at 22:44:44 UTC, H. S. Teoh via Digitalmars-d wrote:
> Iain's PR to provide overloads of std.math functions for float/double
> has already been merged, so all that remains […]

Plus all the other functions in std.math, plus a way to provide efficient implementations depending on the target instruction set (compiler intrinsics).

David

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to H. S. Teoh

Walter Bright

Posted in reply to H. S. Teoh

On 6/30/2014 3:43 PM, H. S. Teoh via Digitalmars-d wrote:
> Iain's PR to provide overloads of std.math functions for float/double
> has already been merged, so all that remains is for float literals to
> default to double unless suffixed with L,

They already do.

> or contain too many digits to accurately represent in double.

This won't work predictably. Heck, 0.3 is not accurately representable as a double.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Ola Fosheim Grøstad

Walter Bright

Posted in reply to Ola Fosheim Grøstad

On 6/30/2014 3:14 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Monday, 30 June 2014 at 16:54:55 UTC, Walter Bright wrote:
>> On 6/30/2014 4:25 AM, "Ola Fosheim Grøstad"
>> <ola.fosheim.grostad+dlang@gmail.com>" wrote:
>>>
>>> AFAIK they break compliance all the time.
>>
>> Examples, please.
>
> Cell:
>
> http://publib.boulder.ibm.com/infocenter/cellcomp/v101v121/index.jsp?topic=/com.ibm.xlcpp101.cell.doc/proguide/spu_sp_diffs.html

Wow. Fortunately, there's a switch http://publib.boulder.ibm.com/infocenter/cellcomp/v101v121/index.jsp?topic=/com.ibm.xlcpp101.cell.doc/proguide/spu_sp_diffs.html so it'll work correctly.

> Intel:
>
> http://www.velocityreviews.com/threads/intel-fp-h-w-non-compliance-to-ieee754.746517/

That one looks like a bug.

July 01, 2014

Re: std.math performance (SSE vs. real)

Posted by dennis luehring
in reply to Andrei Alexandrescu

dennis luehring

Posted in reply to Andrei Alexandrescu

Am 01.07.2014 00:18, schrieb Andrei Alexandrescu:
> On 6/30/14, 2:20 AM, Don wrote:
>> For me, a stronger argument is that you can get *higher* precision using
>> doubles, in many cases. The reason is that FMA gives you an intermediate
>> value with 128 bits of precision; it's available in SIMD but not on x87.
>>
>> So, if we want to use the highest precision supported by the hardware,
>> that does *not* mean we should always use 80 bits.
>>
>> I've experienced this in CTFE, where the calculations are currently done
>> in 80 bits, I've seen cases where the 64-bit runtime results were more
>> accurate, because of those 128 bit FMA temporaries. 80 bits are not
>> enough!!
>
> Interesting. Maybe we should follow a simple principle - define
> overloads and intrinsic operations such that real is only used if (a)
> requested explicitly (b) it brings about an actual advantage.

gcc seems to use GMP for (all) its compiletime calculations - is this for cross-compile unification of calculation results or just for better result at all - or both?

July 01, 2014

Re: std.math performance (SSE vs. real)

Posted by Ola Fosheim Grøstad
in reply to Walter Bright

Ola Fosheim Grøstad

Posted in reply to Walter Bright

On Monday, 30 June 2014 at 22:58:48 UTC, Walter Bright wrote:
> Wow. Fortunately, there's a switch http://publib.boulder.ibm.com/infocenter/cellcomp/v101v121/index.jsp?topic=/com.ibm.xlcpp101.cell.doc/proguide/spu_sp_diffs.html so it'll work correctly.

That's the same link I provided, but I presume the compiler switch kills performance? You have the same with ARM processors. NEON (SIMD) instructions are not IEEE754 compliant. VPF is almost compliant, but does not support subnormal numbers and flush them to zero. Which can be a disaster…

So basically, floating point is not portable unless you give up performance or check all expressions with worst case analysis based on deficiencies on all current platforms.

July 01, 2014

Re: std.math performance (SSE vs. real)

Posted by Iain Buclaw
in reply to dennis luehring

Iain Buclaw

Posted in reply to dennis luehring

On 1 July 2014 06:34, dennis luehring via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> Am 01.07.2014 00:18, schrieb Andrei Alexandrescu:
>
>> On 6/30/14, 2:20 AM, Don wrote:
>>>
>>> For me, a stronger argument is that you can get *higher* precision using doubles, in many cases. The reason is that FMA gives you an intermediate value with 128 bits of precision; it's available in SIMD but not on x87.
>>>
>>> So, if we want to use the highest precision supported by the hardware, that does *not* mean we should always use 80 bits.
>>>
>>> I've experienced this in CTFE, where the calculations are currently done in 80 bits, I've seen cases where the 64-bit runtime results were more accurate, because of those 128 bit FMA temporaries. 80 bits are not enough!!
>>
>>
>> Interesting. Maybe we should follow a simple principle - define
>> overloads and intrinsic operations such that real is only used if (a)
>> requested explicitly (b) it brings about an actual advantage.
>
>
> gcc seems to use GMP for (all) its compiletime calculations - is this for cross-compile unification of calculation results or just for better result at all - or both?
>

More cross-compilation where the host has less precision than the target.  But at the same time they wouldn't use GMP if it didn't produce accurate results. :)

July 01, 2014

Re: std.math performance (SSE vs. real)

Posted by Daniel Murphy
in reply to dennis luehring

Daniel Murphy

Posted in reply to dennis luehring

"dennis luehring"  wrote in message news:loth9o$2arl$1@digitalmars.com... 

> gcc seems to use GMP for (all) its compiletime calculations - is this for cross-compile unification of calculation results or just for better result at all - or both?

To make the gcc build process more complicated.

July 01, 2014

Re: std.math performance (SSE vs. real)

Posted by Don
in reply to Walter Bright

Don

Posted in reply to Walter Bright

On Monday, 30 June 2014 at 16:54:17 UTC, Walter Bright wrote:
> On 6/30/2014 12:20 AM, Don wrote:
>> What I think is highly likely is that it will only have legacy support, with
>> such awful performance that it never makes sense to use them. For example, the
>> speed of 80-bit and 64-bit calculations in x87 used to be identical. But on
>> recent Intel CPUs, the 80-bit operations run at half the speed of the 64 bit
>> operations. They are already partially microcoded.
>>
>> For me, a stronger argument is that you can get *higher* precision using
>> doubles, in many cases. The reason is that FMA gives you an intermediate value
>> with 128 bits of precision; it's available in SIMD but not on x87.
>>
>> So, if we want to use the highest precision supported by the hardware, that does
>> *not* mean we should always use 80 bits.
>>
>> I've experienced this in CTFE, where the calculations are currently done in 80
>> bits, I've seen cases where the 64-bit runtime results were more accurate,
>> because of those 128 bit FMA temporaries. 80 bits are not enough!!
>
> I did not know this. It certainly adds another layer of nuance - as the higher level of precision will only apply as long as one can keep the value in a register.

Yes, it's complicated. The interesting thing is that there are no 128 bit registers. The temporaries exist only while the FMA operation is in progress. You cannot even preserve them between consecutive FMA operations.

An important consequence is that allowing intermediate calculations to be performed at higher precision than the operands, is crucial, and applies outside of x86. This is something we've got right.

But it's not possible to say that "the intermediate calculations are done at the precision of 'real'". This is the semantics which I think we currently have wrong. Our model is too simplistic.

On modern x86, calculations on float operands may have intermediate calculations done at only 32 bits (if using straight SSE), 80 bits (if using x87), or 64 bits (if using float FMA). And for double operands, they may be 64 bits, 80 bits, or 128 bits.
Yet, in the FMA case, non-FMA operations will be performed at lower precision.
It's entirely possible for all three intermediate precisions to be active at the same time!

I'm not sure that we need to change anything WRT code generation. But I think our style recommendations aren't quite right. And we have at least one missing primitive operation (discard all excess precision).

July 01, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Ola Fosheim Grøstad

Walter Bright

Posted in reply to Ola Fosheim Grøstad

On 6/30/2014 11:58 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Monday, 30 June 2014 at 22:58:48 UTC, Walter Bright wrote:
>> Wow. Fortunately, there's a switch
>> http://publib.boulder.ibm.com/infocenter/cellcomp/v101v121/index.jsp?topic=/com.ibm.xlcpp101.cell.doc/proguide/spu_sp_diffs.html
>> so it'll work correctly.
>
> That's the same link I provided, but I presume the compiler switch kills
> performance?

Click on "compiling for strict IEEE conformance"

> You have the same with ARM processors. NEON (SIMD) instructions are
> not IEEE754 compliant. VPF is almost compliant, but does not support subnormal
> numbers and flush them to zero. Which can be a disaster…

It wouldn't be any different if the D spec says "floating point is, ya know, whatevah". You can't fix stuff like this in the spec.

> So basically, floating point is not portable unless you give up performance or
> check all expressions with worst case analysis based on deficiencies on all
> current platforms.

As I've posted before, nobody's FP code is going to work on such platforms out of the box even if the spec is accommodating for it. The whole point of IEEE 754 is to make portable FP code possible.

Besides, Java and Javascript, for example, both require IEEE conformance.

July 01, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Don

Walter Bright

Posted in reply to Don

On 7/1/2014 3:26 AM, Don wrote:
> Yes, it's complicated. The interesting thing is that there are no 128 bit
> registers. The temporaries exist only while the FMA operation is in progress.
> You cannot even preserve them between consecutive FMA operations.
>
> An important consequence is that allowing intermediate calculations to be
> performed at higher precision than the operands, is crucial, and applies outside
> of x86. This is something we've got right.
>
> But it's not possible to say that "the intermediate calculations are done at the
> precision of 'real'". This is the semantics which I think we currently have
> wrong. Our model is too simplistic.
>
> On modern x86, calculations on float operands may have intermediate calculations
> done at only 32 bits (if using straight SSE), 80 bits (if using x87), or 64 bits
> (if using float FMA). And for double operands, they may be 64 bits, 80 bits, or
> 128 bits.
> Yet, in the FMA case, non-FMA operations will be performed at lower precision.
> It's entirely possible for all three intermediate precisions to be active at the
> same time!
>
> I'm not sure that we need to change anything WRT code generation. But I think
> our style recommendations aren't quite right. And we have at least one missing
> primitive operation (discard all excess precision).

What do you recommend?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation