std.math performance (SSE vs. real) (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.math performance (SSE vs. real) (page 3)

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Kai Nacke
in reply to Iain Buclaw

Kai Nacke

Posted in reply to Iain Buclaw

On Friday, 27 June 2014 at 13:50:29 UTC, Iain Buclaw via Digitalmars-d wrote:
> On 27 June 2014 14:24, Element 126 via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> On 06/27/2014 03:04 PM, dennis luehring wrote:
>>>
>>> Am 27.06.2014 14:20, schrieb Russel Winder via Digitalmars-d:
>>>>
>>>> On Fri, 2014-06-27 at 11:10 +0000, John Colvin via Digitalmars-d wrote:
>>>> [â€Š]
>>>>>
>>>>> I understand why the current situation exists. In 2000 x87 was
>>>>> the standard and the 80bit precision came for free.
>>>>
>>>>
>>>> Real programmers have been using 128-bit floating point for decades. All
>>>> this namby-pamby 80-bit stuff is just an aberration and should never
>>>> have happened.
>>>
>>>
>>> what consumer hardware and compiler supports 128-bit floating points?
>>>
>>
>> I noticed that std.math mentions partial support for big endian non-IEEE
>> doubledouble. I first thought that it was a software implemetation like the
>> QD library [1][2][3], but I could not find how to use it on x86_64.
>> It looks like it is only available for the PowerPC architecture.
>> Does anyone know about it ?
>>
>
> We only support native types in std.math.  And partial support is
> saying more than what there actually is. :-)

The doubledouble type is available for PowerPC. In fact, I try to use this for my PowerPC64 port of LDC. The partial support here is a bit annoying but I did not find the time to implement the missing functions myself.

It is "native" in the sense that it is a supported type by gcc and xlc.

Regards,
Kai

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Kagamin
in reply to Kai Nacke

Kagamin

Posted in reply to Kai Nacke

On Friday, 27 June 2014 at 14:50:14 UTC, Kai Nacke wrote:
> The doubledouble type is available for PowerPC. In fact, I try to use this for my PowerPC64 port of LDC. The partial support here is a bit annoying but I did not find the time to implement the missing functions myself.
>
> It is "native" in the sense that it is a supported type by gcc and xlc.

Doesn't SSE2 effectively operate on double doubles too with instructions like addpd (and others *pd)?

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Kagamin
in reply to David Nadlinger

Kagamin

Posted in reply to David Nadlinger

I think, make real==double on x86-64, like on other architectures, because double is the way to go.

June 27, 2014

Re: std.math performance (SSE vs. real)

Posted by Element 126
in reply to Kagamin

Element 126

Posted in reply to Kagamin

On 06/27/2014 08:19 PM, Kagamin wrote:
> On Friday, 27 June 2014 at 14:50:14 UTC, Kai Nacke wrote:
>> The doubledouble type is available for PowerPC. In fact, I try to use
>> this for my PowerPC64 port of LDC. The partial support here is a bit
>> annoying but I did not find the time to implement the missing
>> functions myself.
>>
>> It is "native" in the sense that it is a supported type by gcc and xlc.
>
> Doesn't SSE2 effectively operate on double doubles too with instructions
> like addpd (and others *pd)?

I'm everything but an assembly guru (so please correct me if I'm wrong), but if my understanding is right, SSE2 only operates element-wise (at least for the operations you are mentionning).
For instance, if you operate on two "double2" vectors (in pseudo-code) :
  c[] = a[] # b[]
where # is a supported binary operation, then the value of the first element of c only depends on the first elements of a and b.

The idea of double-double is that you operate on two doubles in such a way that if you "concatenate" the mantissas of both, then you effectively obtain the correct mathematical semantics of a quadruple precision floating point number, with a higher number of significant digits (~31 vs ~16 for double, in base 10).

I am not 100% sure yet, but I think that the idea is to simulate a floating point number with a 106 bit mantissa and a 12 bit exponent as
  x = s * ( m1 + m2 * 2^(-53) ) * 2^(e-b)
    = s * m1 * 2^(e-b) + s * m2 * 2^(e-b-53)
where s is the sign bit (the same for both doubles), m1 and m2 the mantissas (including the implied 1 for normalized numbers), e the base-2 exponent, b the common bias and 53 an extra bias for the low-order bits (I'm ignoring the denormalized numbers and the special values). The mantissa m1 of the first double gives the first 53 significant bits, and this of the second (m2) the extra 53 bits.

The addition is quite straightforward, but it gets tricky when implementing the other operations. The articles I mentionned in my previous post describe these operations for "quadruple-doubles", achieving a ~62 digit precision (implemented in the QD library, but there is also a CUDA implemetation). It is completely overkill for most applications, but it can be useful for studying the convergence of numerical algorithms, and double-doubles can provide the extra precision needed in some simulations (or to compare the results with double precision).

It is also a comparatively faster alternative to arbitrary-precision floating-point libraries like GMP/MPFR, since it does not need to emulate every single digit, but instead takes advantage of the native double precision instructions. The downside is that you cannot get more significant bits than n*53, which is not suitable for computing the decimals of pi for instance.

To give you more details, I will need to study these papers more thoroughly. I am actually considering bringing double-double and quad-double software support to D, either by making a binding to QD, porting it or starting from scratch based on the papers. I don't know if it will succeed but it will be an interesting exercise anyway. I don't have a lot of time right now but I will try to start working on it in a few weeks. I'd really like to be able to use it with D. Having to rewrite an algorithm in C++ where I could only change one template argument in the main() can be quite painful :-)

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Manu

Walter Bright

Posted in reply to Manu

On 6/27/2014 3:50 AM, Manu via Digitalmars-d wrote:
> Totally agree.
> Maintaining commitment to deprecated hardware which could be removed
> from the silicone at any time is a bit of a problem looking forwards.
> Regardless of the decision about whether overloads are created, at
> very least, I'd suggest x64 should define real as double, since the
> x87 is deprecated, and x64 ABI uses the SSE unit. It makes no sense at
> all to use real under any general circumstances in x64 builds.
>
> And aside from that, if you *think* you need real for precision, the
> truth is, you probably have bigger problems.
> Double already has massive precision. I find it's extremely rare to
> have precision problems even with float under most normal usage
> circumstances, assuming you are conscious of the relative magnitudes
> of your terms.

That's a common perception of people who do not use the floating point unit for numerical work, and whose main concern is speed instead of accuracy.

I've done numerical floating point work. Two common cases where such precision matters:

1. numerical integration
2. inverting matrices

It's amazing how quickly precision gets overwhelmed and you get garbage answers. For example, when inverting a matrix with doubles, the results are garbage for larger than 14*14 matrices or so. There are techniques for dealing with this, but they are complex and difficult to implement.

Increasing the precision is the most straightforward way to deal with it.

Note that the 80 bit precision comes from W.F. Kahan, and he's no fool when dealing with these issues.

Another boring Boeing anecdote: calculators have around 10 digits of precision. A colleague of mine was doing a multi-step calculation, and rounded each step to 2 decimal points. I told him he needed to keep the full 10 digits. He ridiculed me - but his final answer was off by a factor of 2. He could not understand why, and I'd explain, but he could never get how his 2 places past the decimal point did not work.

Do you think engineers like that will ever understand the problems with double precision, or have the remotest idea how to deal with them beyond increasing the precision? I don't.

> I find it's extremely rare to have precision problems even with float under most normal usage
> circumstances,

Then you aren't doing numerical work, because it happens right away.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to John Colvin

Walter Bright

Posted in reply to John Colvin

On 6/27/2014 4:10 AM, John Colvin wrote:
> *The number of algorithms that are both numerically stable/correct and benefit
> significantly from > 64bit doubles is very small.

To be blunt, baloney. I ran into these problems ALL THE TIME when doing professional numerical work.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Kagamin

Walter Bright

Posted in reply to Kagamin

On 6/27/2014 11:47 AM, Kagamin wrote:
> I think, make real==double on x86-64, like on other architectures, because
> double is the way to go.

No.

Consider also that on non-Windows platforms, such a decision would shut D out from accessing C code written using long doubles.

BTW, there's a reason Fortran is still king for numerical work - that's because C compiler devs typically do not understand floating point math and provide crappy imprecise math functions. I had an argument with a physics computation prof a year back who was gobsmacked when I told him the FreeBSD 80 bit math functions were only accurate to 64 bits. He told me he didn't believe me, that C wouldn't make such mistakes. I suggested he test it and see for himself :-)

They can and do. The history of C, including the C Standard, shows a lack of knowledge of how to do numerical math. For example, it was years and years before the Standard mentioned what the math functions should do with infinity arguments.

Things have gotten better in recent years, but I'd always intended that D out of the gate have proper support for fp, including fully accurate math functions. The reason D re-implements the math functions in Phobos rather than deferring to the C ones is the unreliability of the C ones.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to Walter Bright

Walter Bright

Posted in reply to Walter Bright

On 6/27/2014 10:18 PM, Walter Bright wrote:
> On 6/27/2014 4:10 AM, John Colvin wrote:
>> *The number of algorithms that are both numerically stable/correct and benefit
>> significantly from > 64bit doubles is very small.
>
> To be blunt, baloney. I ran into these problems ALL THE TIME when doing
> professional numerical work.
>

Sorry for being so abrupt. FP is important to me - it's not just about performance, it's also about accuracy.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by John Colvin
in reply to Walter Bright

John Colvin

Posted in reply to Walter Bright

On Saturday, 28 June 2014 at 06:16:51 UTC, Walter Bright wrote:
> On 6/27/2014 10:18 PM, Walter Bright wrote:
>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>> *The number of algorithms that are both numerically stable/correct and benefit
>>> significantly from > 64bit doubles is very small.
>>
>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing
>> professional numerical work.
>>
>
> Sorry for being so abrupt. FP is important to me - it's not just about performance, it's also about accuracy.

I still maintain that the need for the precision of 80bit reals is a niche demand. Its a very important niche, but it doesn't justify having its relatively extreme requirements be the default. Someone writing a matrix inversion has only themselves to blame if they don't know plenty of numerical analysis and look very carefully at the specifications of all operations they are using.

Paying the cost of moving to/from the fpu, missing out on increasingly large SIMD units, these make everyone pay the price.

inclusion of the 'real' type in D was a great idea, but std.math should be overloaded for float/double/real so people have the choice where they stand on the performance/precision front.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by francesco cattoglio
in reply to John Colvin

francesco cattoglio

Posted in reply to John Colvin

On Saturday, 28 June 2014 at 09:07:17 UTC, John Colvin wrote:
> On Saturday, 28 June 2014 at 06:16:51 UTC, Walter Bright wrote:
>> On 6/27/2014 10:18 PM, Walter Bright wrote:
>>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>>> *The number of algorithms that are both numerically stable/correct and benefit
>>>> significantly from > 64bit doubles is very small.
>>>
>>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing
>>> professional numerical work.
>>>
>> Sorry for being so abrupt. FP is important to me - it's not just about performance, it's also about accuracy.

When you need accuracy, 999 times out of 1000 you change the numerical technique, you don't just blindly upgrade the precision.
The only real reason one would use 80 bits is when there is an actual need of adding values which differ for more than 16 orders of magnitude. And I've never seen this happen in any numerical paper I've read.

> I still maintain that the need for the precision of 80bit reals is a niche demand. Its a very important niche, but it doesn't justify having its relatively extreme requirements be the default. Someone writing a matrix inversion has only themselves to blame if they don't know plenty of numerical analysis and look very carefully at the specifications of all operations they are using.

Couldn't agree more. 80 bit IS a niche, which is really nice to have, but shouldn't be the standard if we lose on performance.

> Paying the cost of moving to/from the fpu, missing out on increasingly large SIMD units, these make everyone pay the price.

Especially the numerical analysts themselves will pay that price. 64 bit HAS to be as fast as possible, if you want to be competitive when it comes to any kind of numerical work.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation