std.math performance (SSE vs. real) (page 11) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.math performance (SSE vs. real) (page 11)

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Manu

Walter Bright

Posted in reply to Manu

On 6/29/2014 9:38 PM, Manu via Digitalmars-d wrote:
>> It's news to me that x86, x86-64, etc., are deprecated, despite being used
>> to run pretty much all desktops and laptops and even servers. The 80 bit
>> reals are also part of the C ABI for Linux, OSX, and FreeBSD, 32 and 64 bit.
>
> x86_64 and x86 are different architectures, and they have very different ABI's.
> Nobody is manufacturing x86 (exclusive) cpu's.
> Current x86_64 cpu's maintain a backwards compatibility mode, but
> that's not a part of the x86-64 spec, and may go away when x86_64 is
> deemed sufficiently pervasive and x86 sufficiently redundant.

It's still part of the C ABI for both 32 and 64 bit code.

> Correct, so they are deliberately treated separately.
> I argued for strong separation between simd and float, and you agreed.

floats & doubles are implemented using SIMD instructions.

>> Since they are part of the 64 bit C ABI, that would seem to be in the
>> category of "nevah hoppen".
> Not in windows.

Correct.

> You say they are in linux? I don't know.

Since you don't believe me, I recommend looking it up. The document is entitled "System V Application Binary Interface AMD64 Architecture Processor Supplement"

Furthermore, just try using "long double" on Linux, and disassemble the resulting code.

> How do you distinguish x87 double and xmm double in C?

You don't. It's a compiler implementation issue.

> The only way I know to access x87 is with inline asm.

I suggest using "long double" on Linux and look at the compiler output. You don't have to believe me - use gcc or clang.

> Huh? I think you misunderstood my point. I'm saying that fpu/simd
> units are distinct, and they are distanced by the type system in order
> to respect that separation.

I think that's not correct. There's nothing in the C Standard that says you have to implement float semantics with any particular FPU, nor does the C type system distinguish FPUs.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Don
in reply to Walter Bright

Don

Posted in reply to Walter Bright

On Monday, 30 June 2014 at 04:15:46 UTC, Walter Bright wrote:
> On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote:
>> Well, here's the thing then. Consider that 'real' is only actually
>> supported on only a single (long deprecated!) architecture.

>> In x64's case, it is deprecated for over a decade now, and may be
>> removed from the hardware at some unknown time. The moment that x64
>> processors decide to stop supporting 32bit code, the x87 will go away,
>> and those opcodes will likely be emulated or microcoded.
>> Interacting real<->float/double means register swapping through
>> memory. It should be treated the same as float<->simd; they are
>> distinct (on most arch's).
>
> Since they are part of the 64 bit C ABI, that would seem to be in the category of "nevah hoppen".

What I think is highly likely is that it will only have legacy support, with such awful performance that it never makes sense to use them. For example, the speed of 80-bit and 64-bit calculations in x87 used to be identical. But on recent Intel CPUs, the 80-bit operations run at half the speed of the 64 bit operations. They are already partially microcoded.

For me, a stronger argument is that you can get *higher* precision using doubles, in many cases. The reason is that FMA gives you an intermediate value with 128 bits of precision; it's available in SIMD but not on x87.

So, if we want to use the highest precision supported by the hardware, that does *not* mean we should always use 80 bits.

I've experienced this in CTFE, where the calculations are currently done in 80 bits, I've seen cases where the 64-bit runtime results were more accurate, because of those 128 bit FMA temporaries. 80 bits are not enough!!

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by ed
in reply to Walter Bright

ed

Posted in reply to Walter Bright

On Monday, 30 June 2014 at 06:21:49 UTC, Walter Bright wrote:

When precision is an issue we always choose a software solution. This has been my experience in both geophysics and medical device development. It is cheaper, faster (dev. time), and better tested than anything we would develop within a release time frame.

But D "real" is a winner IMO. At my last workplace we ported some geophysics C++ apps to D for fun. The apps required more precision than double could offer and relied on GMP/MPFR. It was a nice surprise when we found the extra bits in D's real were enough for some of these apps to be correct without GMP/MPFR and gave a real performance boost (pun intended!).

We targeted x86/x86_64 desktops and clusters running linux (windows and MAC on desktops as well).

We did not consider the lack of IBM 360 support to be an issue when porting to D :-P


Cheers,
ed

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by ed
in reply to Don

ed

Posted in reply to Don

On Monday, 30 June 2014 at 07:21:00 UTC, Don wrote:
> On Monday, 30 June 2014 at 04:15:46 UTC, Walter Bright wrote:
>> On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote:
>>> Well, here's the thing then. Consider that 'real' is only actually
>>> supported on only a single (long deprecated!) architecture.
>
>>> In x64's case, it is deprecated for over a decade now, and may be
>>> removed from the hardware at some unknown time. The moment that x64
>>> processors decide to stop supporting 32bit code, the x87 will go away,
>>> and those opcodes will likely be emulated or microcoded.
>>> Interacting real<->float/double means register swapping through
>>> memory. It should be treated the same as float<->simd; they are
>>> distinct (on most arch's).
>>
>> Since they are part of the 64 bit C ABI, that would seem to be in the category of "nevah hoppen".
>
> What I think is highly likely is that it will only have legacy support, with such awful performance that it never makes sense to use them. For example, the speed of 80-bit and 64-bit calculations in x87 used to be identical. But on recent Intel CPUs, the 80-bit operations run at half the speed of the 64 bit operations. They are already partially microcoded.
>
> For me, a stronger argument is that you can get *higher* precision using doubles, in many cases. The reason is that FMA gives you an intermediate value with 128 bits of precision; it's available in SIMD but not on x87.
>
> So, if we want to use the highest precision supported by the hardware, that does *not* mean we should always use 80 bits.
>
> I've experienced this in CTFE, where the calculations are currently done in 80 bits, I've seen cases where the 64-bit runtime results were more accurate, because of those 128 bit FMA temporaries. 80 bits are not enough!!

This is correct and we use this now for some time critical code that requires high precision.

But anything non-time critical (~80%-85% of our code) we simply use a software solution when precision becomes an issue. It is here that I think the extra bits in D real can be enough to get a performance gain.

But I won't argue with you think I'm wrong. I'm only basing this on anecdotal evidence of what I saw from 5-6 apps ported from C++ to D :-)

Cheers,
ed

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Ola Fosheim Grøstad
in reply to Walter Bright

Ola Fosheim Grøstad

Posted in reply to Walter Bright

On Sunday, 29 June 2014 at 22:49:44 UTC, Walter Bright wrote:
>> I guess we just hope that all future hardware is IEEE754 compliant.
>
> I'm not concerned about it. No CPU maker in their right head would do something different.

AFAIK they break compliance all the time.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Don
in reply to Russel Winder

Don

Posted in reply to Russel Winder

On Sunday, 29 June 2014 at 18:13:59 UTC, Russel Winder via Digitalmars-d wrote:
> On Sun, 2014-06-29 at 07:59 -0700, Andrei Alexandrescu via Digitalmars-d
> wrote:
> […]
>
>> A friend who works at a hedge fund (after making the rounds to the NYC large financial companies) told me that's a myth. Any nontrivial calculation involving money (interest, fixed income, derivatives, ...) needs floating point. He never needed more than double.
>
> Very definitely so. Fixed point or integer arithmetic for simple
> "household" finance fair enough, but for "finance house" calculations
> you generally need 22+ significant denary digits to meet with compliance
> requirements.

Many people seem to have the bizarre idea that floating point is less accurate than integer arithmetic. As if storing a value into a double makes it instantly "fuzzy", or something.
In fact, providing that the the precision is large enough, every operation that is exact in integers, is exact in floating point as well.
And if you perform a division using integers, you've silently lost precision.
So I'm not sure what benefit you'd gain by eschewing floating point.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Ola Fosheim Grøstad
in reply to Don

Ola Fosheim Grøstad

Posted in reply to Don

On Monday, 30 June 2014 at 11:57:04 UTC, Don wrote:
> And if you perform a division using integers, you've silently lost precision.

2 ints of arbitrary length = rational numbers => no loss of precision for divs.

CPU vendors make arbitrary decisions about FP and break compliance with no remorse for single precision float vector operations in order to reduce die size / increase throughput. That includes not having NaN/Inf, reducing mantissa precision for some operations, etc.

FP is not portable out-of-the-box.

I think the main advantage of integers is that you control the precision and can repeat the calculation with the same result. It is easier to avoid stability issues with integers too. And it is portable.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Element 126
in reply to John Colvin

Element 126

Posted in reply to John Colvin

On 06/29/2014 11:04 PM, John Colvin wrote:
> [...]
>
> mixin(`alias real` ~ (real.sizeof*8).stringof ~ ` = real;`);
>
> is more useful to me.

Be careful : this code is tricky ! real.sizeof is the storage size, ie 16 bytes on x86_64.

The following happily compiles ;-)

import std.conv: to;

mixin(`alias real` ~ to!string(real.sizeof*8) ~ ` = real;`);

static assert(real128.mant_dig == 64);

void main() {

	real128 x = 1.0;
}

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by H. S. Teoh

H. S. Teoh

On Sun, Jun 29, 2014 at 11:33:20PM +0100, Iain Buclaw via Digitalmars-d wrote:
> On 29 June 2014 23:20, H. S. Teoh via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> > On Sun, Jun 29, 2014 at 08:54:49AM +0100, Iain Buclaw via Digitalmars-d wrote:
> >> On 29 Jun 2014 05:48, "H. S. Teoh via Digitalmars-d" < digitalmars-d@puremagic.com> wrote:
> >> >
> >> > On Sat, Jun 28, 2014 at 08:41:24PM -0700, Andrei Alexandrescu via
> >> Digitalmars-d wrote:
> >> > > On 6/28/14, 6:02 PM, Tofu Ninja wrote:
> >> > [...]
> >> > > >I think this thread needs to refocus on the main point, getting math overloads for float and double and how to mitigate any problems that might arise from that.
> >> > >
> >> > > Yes please. -- Andrei
> >> >
> >> > Let's see the PR!
> >> >
> >>
> >> I've already raised one (already linked in this thread).
> >
> > Are you talking about #2274? Interesting that your implementation is basically identical to my own idea for fixing std.math -- using unions instead of pointer casting.
> >
> 
> Not really.  The biggest speed up was from adding float+double overloads for floor, ceil, isNaN and isInfinity.  Firstly, the use of a union itself didn't make much of a dent in the speed up.  Removing the slow array copy operation did though.  Secondly, unions are required for this particular function (floor) because we need to set bits through type-punning, it just wouldn't work casting to a pointer.
[...]

I wasn't referring to the speedup (though that is certainly nice), I was talking about making things CTFE-able.

What's the status of repainting unions in CTFE? Is there a PR for that yet, or do we need to implement one?


T

-- 
People tell me that I'm skeptical, but I don't believe it.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by dennis luehring
in reply to Walter Bright

dennis luehring

Posted in reply to Walter Bright

Am 30.06.2014 08:21, schrieb Walter Bright:
>> The only way I know to access x87 is with inline asm.
>
> I suggest using "long double" on Linux and look at the compiler output. You
> don't have to believe me - use gcc or clang.

gcc.godbolt.org clang 3.4.1 -O3

int main(int argc, char** argv)
{
  return ((long double)argc/12345.6789);
}

asm:

.LCPI0_0:
	.quad	4668012723080132769     # double 12345.678900000001
main:                                   # @main
	movl	%edi, -8(%rsp)
	fildl	-8(%rsp)
	fdivl	.LCPI0_0(%rip)
	fnstcw	-10(%rsp)
	movw	-10(%rsp), %ax
	movw	$3199, -10(%rsp)        # imm = 0xC7F
	fldcw	-10(%rsp)
	movw	%ax, -10(%rsp)
	fistpl	-4(%rsp)
	fldcw	-10(%rsp)
	movl	-4(%rsp), %eax
	ret

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation