std.math performance (SSE vs. real) (page 10) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.math performance (SSE vs. real) (page 10)

June 29, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Russel Winder

Walter Bright

Posted in reply to Russel Winder

On 6/29/2014 2:30 PM, Russel Winder via Digitalmars-d wrote:
> If D is a language that uses the underlying hardware representation then
> it cannot define the use of specific formats for hardware numbers. Thus,
> on hardware that provides IEEE754 format hardware float and double can
> map to the 32-bit and 64-bit IEEE754 numbers offered. However if the
> hardware does not provide IEEE754 hardware then either D must interpret
> floating point expressions (as per Java) or it cannot be ported to that
> architecture. cf. IBM 360.

That's correct. The D spec says IEEE 754.


> PS Walter just wrote that the type real is not defined as float and
> double are, so it does have a Humpty Dumpty factor even if float and
> double do not.

It's still IEEE, just the longer lengths if they exist on the hardware.

D is not unique in requiring IEEE 754 floats - Java does, too. So does Javascript.

June 29, 2014

Re: std.math performance (SSE vs. real)

Posted by Iain Buclaw

Iain Buclaw

On 29 June 2014 23:20, H. S. Teoh via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Sun, Jun 29, 2014 at 08:54:49AM +0100, Iain Buclaw via Digitalmars-d wrote:
>> On 29 Jun 2014 05:48, "H. S. Teoh via Digitalmars-d" < digitalmars-d@puremagic.com> wrote:
>> >
>> > On Sat, Jun 28, 2014 at 08:41:24PM -0700, Andrei Alexandrescu via
>> Digitalmars-d wrote:
>> > > On 6/28/14, 6:02 PM, Tofu Ninja wrote:
>> > [...]
>> > > >I think this thread needs to refocus on the main point, getting math overloads for float and double and how to mitigate any problems that might arise from that.
>> > >
>> > > Yes please. -- Andrei
>> >
>> > Let's see the PR!
>> >
>>
>> I've already raised one (already linked in this thread).
>
> Are you talking about #2274? Interesting that your implementation is basically identical to my own idea for fixing std.math -- using unions instead of pointer casting.
>

Not really.  The biggest speed up was from adding float+double overloads for floor, ceil, isNaN and isInfinity.  Firstly, the use of a union itself didn't make much of a dent in the speed up.  Removing the slow array copy operation did though.  Secondly, unions are required for this particular function (floor) because we need to set bits through type-punning, it just wouldn't work casting to a pointer.

Regards
Iain

June 29, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to John Colvin

Walter Bright

Posted in reply to John Colvin

On 6/29/2014 2:04 PM, John Colvin wrote:
> Assuming there isn't one, then what is the point of having a type with hardware
> dependant precision?

The point is D is a systems programming language, and the D programmer should not be locked out of the hardware capabilities of the system he is running on.

D should not be constrained to be the least common denominator of all and future processors.

June 29, 2014

Re: std.math performance (SSE vs. real)

Posted by Andrei Alexandrescu
in reply to Russel Winder

Andrei Alexandrescu

Posted in reply to Russel Winder

On 6/29/14, 11:13 AM, Russel Winder via Digitalmars-d wrote:
> On Sun, 2014-06-29 at 07:59 -0700, Andrei Alexandrescu via Digitalmars-d
> wrote:
> […]
>
>> A friend who works at a hedge fund (after making the rounds to the NYC
>> large financial companies) told me that's a myth. Any nontrivial
>> calculation involving money (interest, fixed income, derivatives, ...)
>> needs floating point. He never needed more than double.
>
> Very definitely so. Fixed point or integer arithmetic for simple
> "household" finance fair enough, but for "finance house" calculations
> you generally need 22+ significant denary digits to meet with compliance
> requirements.

I don't know of US regulations that ask for such.

What I do know is I gave my hedge fund friend a call (today is his name day so it was as good a pretext as any) and mentioned that some people believe fixed point is used in finance. His answer was:

BWAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAAAAAAAAAAAAAAAAAAAAAAA!

I asked about how they solve accumulating numeric errors and he said it's on a case basis. Most of the time it's pennies for billions of dollars, so nobody cares. Sometimes there are reconciliations needed - so called REC's - that compare and adjust outputs of different algorithms.

One nice war story he recalled: someone was storing the number of seconds as a double, and truncate it to int where needed. An error of at most one second wasn't important in the context. However, sometimes the second was around midnight so an error of one second was an error of one day, which was significant. The solution was to use rounding instead of truncation.

Andrei

June 29, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Russel Winder

Walter Bright

Posted in reply to Russel Winder

On 6/29/2014 2:45 PM, Russel Winder via Digitalmars-d wrote:
> So D float and double will not work on IBM 360 unless interpreted,

That's right.

On the other hand, someone could create a "D360" fork of the language that was specifically targetted to the 360. Nothing wrong with that. Why burden the other 99.999999% of D programmers with 360 nutburger problems?

> I guess we just hope that all future hardware is IEEE754 compliant.

I'm not concerned about it. No CPU maker in their right head would do something different.

I've witnessed decades of "portable" C code where the programmer tried to be "portable" in his use of int's and char's, but never tested it on a machine where those sizes are different, and when finally it was tested it turned out to be broken.

Meaning that whether the D spec defines 360 portability or not, there's just no way that FP code is going to be portable to the 360 unless someone actually tests it.

1's complement, 10 bit bytes, 18 bit words, non-IEEE fp, are all DEAD. I can pretty much guarantee you that about zero of C/C++ programs will actually work without modification on those systems, despite the claims of the C/C++ Standard.

I'd also bet you that most C/C++ code will break if ints are 64 bits, and about 99% will break if you try to compile them with a 16 bit C/C++ compiler. 90% will break if you feed it EBCDIC.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Manu
in reply to Walter Bright

Manu

Posted in reply to Walter Bright

On 28 June 2014 15:16, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 6/27/2014 3:50 AM, Manu via Digitalmars-d wrote:
>>
>> Totally agree.
>> Maintaining commitment to deprecated hardware which could be removed
>> from the silicone at any time is a bit of a problem looking forwards.
>> Regardless of the decision about whether overloads are created, at
>> very least, I'd suggest x64 should define real as double, since the
>> x87 is deprecated, and x64 ABI uses the SSE unit. It makes no sense at
>> all to use real under any general circumstances in x64 builds.
>>
>> And aside from that, if you *think* you need real for precision, the
>> truth is, you probably have bigger problems.
>> Double already has massive precision. I find it's extremely rare to
>> have precision problems even with float under most normal usage
>> circumstances, assuming you are conscious of the relative magnitudes
>> of your terms.
>
>
> That's a common perception of people who do not use the floating point unit for numerical work, and whose main concern is speed instead of accuracy.
>
> I've done numerical floating point work. Two common cases where such precision matters:
>
> 1. numerical integration
> 2. inverting matrices
>
> It's amazing how quickly precision gets overwhelmed and you get garbage answers. For example, when inverting a matrix with doubles, the results are garbage for larger than 14*14 matrices or so. There are techniques for dealing with this, but they are complex and difficult to implement.

This is what I was alluding to wrt being aware of the relative
magnitudes of terms in operations.
You're right it can be a little complex, but it's usually just a case
of rearranging the operations a bit, or worst case, a temporary
renormalisation from time to time.

> Increasing the precision is the most straightforward way to deal with it.

Is a 14*14 matrix really any more common than a 16*16 matrix though?
It just moves the goal post a bit. Numerical integration will always
manage to find it's way into crazy big or crazy small numbers. It's
all about relative magnitude with floats.
'real' is only good for about 4 more significant digits... I've often
thought they went a bit overboard on exponent and skimped on mantissa.
Surely most users would reach for a lib in these cases anyway, and
they would be written by an expert.

Either way, I don't think it's sensible to have a std api defy the arch ABI.

> Note that the 80 bit precision comes from W.F. Kahan, and he's no fool when dealing with these issues.

I never argued this. I'm just saying I can't see how defying the ABI in a std api could be seen as a good idea applied generally to all software.

> Another boring Boeing anecdote: calculators have around 10 digits of precision. A colleague of mine was doing a multi-step calculation, and rounded each step to 2 decimal points. I told him he needed to keep the full 10 digits. He ridiculed me - but his final answer was off by a factor of 2. He could not understand why, and I'd explain, but he could never get how his 2 places past the decimal point did not work.

Rounding down to 2 decimal points is rather different than rounding from 19 to 15 decimal points.

> Do you think engineers like that will ever understand the problems with double precision, or have the remotest idea how to deal with them beyond increasing the precision? I don't.

I think they would use a library.
Either way, those jobs are so rare, I don't see that it's worth
defying the arch ABI across the board for it.

I think there should be a 'double' overload. The existing real overload would be chosen when people use the real type explicitly. Another advantage of this, is that when people are using the double type, the API will produce the same results on all architectures, including the ones that don't have 'real'.

>> I find it's extremely rare to have precision problems even with float
>> under most normal usage
>> circumstances,
>
> Then you aren't doing numerical work, because it happens right away.

My key skillset includes physics, lighting, rendering, animation.
These are all highly numerical workloads.
While I am comfortable with some acceptable level of precision loss
for performance, I possibly have to worry about maintaining numerical
precision even more since I use low-precision types exclusively. I
understand the problem very well, probably better than most. More
often than not, the problems are easily mitigated by rearranging
operations such that operations are performed against terms with
relative magnitudes, or in some instances, temporarily renormalising
terms.
I agree these aren't skills that most people have, but most people use
libraries for complex numerical work... or would, if such a robust
library existed.

Thing is, *everybody* will use std.math.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Manu
in reply to Walter Bright

Manu

Posted in reply to Walter Bright

On 28 June 2014 16:16, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 6/27/2014 10:18 PM, Walter Bright wrote:
>>
>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>>
>>> *The number of algorithms that are both numerically stable/correct and
>>> benefit
>>> significantly from > 64bit doubles is very small.
>>
>>
>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing professional numerical work.
>>
>
> Sorry for being so abrupt. FP is important to me - it's not just about performance, it's also about accuracy.

Well, here's the thing then. Consider that 'real' is only actually supported on only a single (long deprecated!) architecture.

I think it's reasonable to see that 'real' is not actually an fp type.
It's more like an auxiliary type, which just happens to be supported
via a completely different (legacy) set of registers on x64 (most
arch's don't support it at all).
In x64's case, it is deprecated for over a decade now, and may be
removed from the hardware at some unknown time. The moment that x64
processors decide to stop supporting 32bit code, the x87 will go away,
and those opcodes will likely be emulated or microcoded.
Interacting real<->float/double means register swapping through
memory. It should be treated the same as float<->simd; they are
distinct (on most arch's).

For my money, x87 can only be considered, at best, a coprocessor (a slow one!), which may or may not exist. Software written today (10+ years after the hardware was deprecated) should probably even consider introducing runtime checks to see if the hardware is even present before making use of it.

It's fine to offer a great precise extended precision library, but I don't think it can be _the_ standard math library which is used by everyone in virtually all applications. It's not a defined part of the architecture, it's slow, and it will probably go away in the future.

It's the same situation with SIMD; on x64, the SIMD unit and the FPU are the same unit, but I don't think it's reasonable to design all the API's around that assumption. Most processors separate the SIMD unit from the FPU, and the language decisions reflect that. We can't make the language treat SIMD just like an FPU extensions on account of just one single architecture... although in that case, the argument would be even more compelling since x64 is actually current and active.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Manu

Walter Bright

Posted in reply to Manu

On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote:
> Well, here's the thing then. Consider that 'real' is only actually
> supported on only a single (long deprecated!) architecture.

It's news to me that x86, x86-64, etc., are deprecated, despite being used to run pretty much all desktops and laptops and even servers. The 80 bit reals are also part of the C ABI for Linux, OSX, and FreeBSD, 32 and 64 bit.


> I think it's reasonable to see that 'real' is not actually an fp type.

I find that a bizarre statement.


> It's more like an auxiliary type, which just happens to be supported
> via a completely different (legacy) set of registers on x64 (most
> arch's don't support it at all).

The SIMD registers are also a "completely different set of registers".


> In x64's case, it is deprecated for over a decade now, and may be
> removed from the hardware at some unknown time. The moment that x64
> processors decide to stop supporting 32bit code, the x87 will go away,
> and those opcodes will likely be emulated or microcoded.
> Interacting real<->float/double means register swapping through
> memory. It should be treated the same as float<->simd; they are
> distinct (on most arch's).

Since they are part of the 64 bit C ABI, that would seem to be in the category of "nevah hoppen".


> It's the same situation with SIMD; on x64, the SIMD unit and the FPU
> are the same unit, but I don't think it's reasonable to design all the
> API's around that assumption. Most processors separate the SIMD unit
> from the FPU, and the language decisions reflect that. We can't make
> the language treat SIMD just like an FPU extensions on account of just
> one single architecture... although in that case, the argument would
> be even more compelling since x64 is actually current and active.

Intel has yet to remove any SIMD instructions.

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Manu
in reply to deadalnix

Manu

Posted in reply to deadalnix

On 29 June 2014 10:11, deadalnix via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Saturday, 28 June 2014 at 09:07:17 UTC, John Colvin wrote:
>>
>> On Saturday, 28 June 2014 at 06:16:51 UTC, Walter Bright wrote:
>>>
>>> On 6/27/2014 10:18 PM, Walter Bright wrote:
>>>>
>>>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>>>>
>>>>> *The number of algorithms that are both numerically stable/correct and
>>>>> benefit
>>>>> significantly from > 64bit doubles is very small.
>>>>
>>>>
>>>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing professional numerical work.
>>>>
>>>
>>> Sorry for being so abrupt. FP is important to me - it's not just about performance, it's also about accuracy.
>>
>>
>> I still maintain that the need for the precision of 80bit reals is a niche demand. Its a very important niche, but it doesn't justify having its relatively extreme requirements be the default. Someone writing a matrix inversion has only themselves to blame if they don't know plenty of numerical analysis and look very carefully at the specifications of all operations they are using.
>>
>> Paying the cost of moving to/from the fpu, missing out on increasingly large SIMD units, these make everyone pay the price.
>>
>> inclusion of the 'real' type in D was a great idea, but std.math should be overloaded for float/double/real so people have the choice where they stand on the performance/precision front.
>
>
> Would thar make sense to have std.mast and std.fastmath, or something along these lines ?

I've thought this too.
std.math and std.numeric maybe?

To me, 'fastmath' suggests comfort with approximations/estimates or
other techniques in favour of speed, and I don't think the non-'real'
version should presume that.
It's not that we have a 'normal' one and a 'fast' one. What we have is
a 'slow' one, and the other is merely normal; ie, "std.math".

June 30, 2014

Re: std.math performance (SSE vs. real)

Posted by Manu
in reply to Walter Bright

Manu

Posted in reply to Walter Bright

On 30 June 2014 14:15, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote:
>>
>> Well, here's the thing then. Consider that 'real' is only actually supported on only a single (long deprecated!) architecture.
>
>
> It's news to me that x86, x86-64, etc., are deprecated, despite being used to run pretty much all desktops and laptops and even servers. The 80 bit reals are also part of the C ABI for Linux, OSX, and FreeBSD, 32 and 64 bit.

x86_64 and x86 are different architectures, and they have very different ABI's.
Nobody is manufacturing x86 (exclusive) cpu's.
Current x86_64 cpu's maintain a backwards compatibility mode, but
that's not a part of the x86-64 spec, and may go away when x86_64 is
deemed sufficiently pervasive and x86 sufficiently redundant.

>> I think it's reasonable to see that 'real' is not actually an fp type.
>
>
> I find that a bizarre statement.

Well, it's not an fp type as implemented by the standard fp architecture of any cpu except x86, which is becoming less relevant with each passing day.

>> It's more like an auxiliary type, which just happens to be supported
>> via a completely different (legacy) set of registers on x64 (most
>> arch's don't support it at all).
>
>
> The SIMD registers are also a "completely different set of registers".

Correct, so they are deliberately treated separately.
I argued for strong separation between simd and float, and you agreed.

>> In x64's case, it is deprecated for over a decade now, and may be
>> removed from the hardware at some unknown time. The moment that x64
>> processors decide to stop supporting 32bit code, the x87 will go away,
>> and those opcodes will likely be emulated or microcoded.
>> Interacting real<->float/double means register swapping through
>> memory. It should be treated the same as float<->simd; they are
>> distinct (on most arch's).
>
>
> Since they are part of the 64 bit C ABI, that would seem to be in the category of "nevah hoppen".

Not in windows. You say they are in linux? I don't know.

"Intel started discouraging the use of x87 with the introduction of
the P4 in late 2000. AMD deprecated x87 since the K8 in 2003, as
x86-64 is defined with SSE2 support; VIA’s C7 has supported SSE2 since
2005. In 64-bit versions of Windows, x87 is deprecated for user-mode,
and prohibited entirely in kernel-mode."

How do you distinguish x87 double and xmm double in C? The only way I know to access x87 is with inline asm.

>> It's the same situation with SIMD; on x64, the SIMD unit and the FPU are the same unit, but I don't think it's reasonable to design all the API's around that assumption. Most processors separate the SIMD unit from the FPU, and the language decisions reflect that. We can't make the language treat SIMD just like an FPU extensions on account of just one single architecture... although in that case, the argument would be even more compelling since x64 is actually current and active.
>
>
> Intel has yet to remove any SIMD instructions.

Huh? I think you misunderstood my point. I'm saying that fpu/simd units are distinct, and they are distanced by the type system in order to respect that separation.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation