std.math performance (SSE vs. real) (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.math performance (SSE vs. real) (page 4)

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Russel Winder
in reply to John Colvin

Russel Winder

Posted in reply to John Colvin

Attachments:

signature.asc (This is a digitally signed message part)

On Sat, 2014-06-28 at 09:07 +0000, John Colvin via Digitalmars-d wrote: […]
> I still maintain that the need for the precision of 80bit reals is a niche demand. Its a very important niche, but it doesn't justify having its relatively extreme requirements be the default. Someone writing a matrix inversion has only themselves to blame if they don't know plenty of numerical analysis and look very carefully at the specifications of all operations they are using.

I fear the whole argument is getting misguided. We should reset.

If you are doing numerical calculations then accuracy is critical. Arbitrary precision floats are the only real (!) way of doing any numeric non-integer calculation, and arbitrary precision integers are the only way of doing integer calculations.

However speed is also an issue, so to obtain speed we have hardware integer and floating point ALUs.

The cost for the integer ALU is bounded integers. Python appreciates this and uses hardware integers when it can and software integers otherwise. Thus Python is very good for doing integer work. C, C++, Go, D, Fortran, etc. are fundamentally crap for integer calculation because integers are bounded. Of course if calculations are prvably within the hardware integer bounds this is not a constraint and we are happy with hardware integers. Just don't try calculating factorial, Fibonacci numbers and other numbers used in some bioinformatics and quant models. There is a reason why SciPy has a massive following in bioinformatics and quant comuting.

The cost for floating point ALU is accuracy. Hardware floating point numbers are dreadful in that sense, but again the issue is speed and for GPU they went 32-bit for speed. Now they are going 64-bit as they can just about get the same speed and the accuracy is so much greater. For hardware floating point the more bits you have the better. Hence IBM in the 360 and later having 128-bit floating point for accuracy at the expense of some speed. Sun had 128-bit in the SPARC processors for accuracy at the expense of a little speed.

As Walter has or will tell us, C (and thus C++) got things woefully wrong in support of numerical work because the inventors were focused on writing operating systems, supporting only PDP hardware. They and the folks that then wrote various algorithms didn't really get numerical analysis. If C had targeted IBM 360 from the outset things might have been better.

We have to be clear on this: Fortran is the only language that supports hardware floating types even at all well.

Intel's 80-bit floating point were an aberration, they should just have done 128-bit in the first place. OK so they got the 80-bit stuff as a sort of free side-effect of creating 64-bit, but they ran with.  They shouldn't have done. I cannot see it ever happening again. cf. ARM.

By being focused on Intel chips, D has failed to get floating point correct in avery analogous way to C failing to get floating point types right by focusing on PDP. Yes using 80-bit on Intel is good, but no-one else has this. Floating point sizes should be 32-, 64-, 128-, 256-bit, etc. D needs to be able to handle this. So does C, C++, Java, etc. Go will be able to handle it when it is ported to appropriate hardware as they use float32, float64, etc. as their types. None of this float, double, long double, double double rubbish.

So D should perhaps make a breaking change and have types int32, int64, float32, float64, float80, and get away from the vagaries of bizarre type relationships with hardware?

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Walter Bright
in reply to francesco cattoglio

Walter Bright

Posted in reply to francesco cattoglio

On 6/28/2014 2:47 AM, francesco cattoglio wrote:
> When you need accuracy, 999 times out of 1000 you change the numerical
> technique, you don't just blindly upgrade the precision.

I have experience doing numerical work? Upgrading the precision is the first thing people try.

> The only real reason one would use 80 bits is when there is an actual need of
> adding values which differ for more than 16 orders of magnitude. And I've never
> seen this happen in any numerical paper I've read.

It happens with both numerical integration and inverting matrices. Inverting matrices is commonplace for solving N equations with N unknowns.

Errors accumulate very rapidly and easily overwhelm the significance of the answer.

> Especially the numerical analysts themselves will pay that price. 64 bit HAS to
> be as fast as possible, if you want to be competitive when it comes to any kind
> of numerical work.

Getting the wrong answer quickly is not useful when you're calculating the stress levels in a part.

Again, I've done numerical programming in airframe design. The correct answer is what matters. You can accept wrong answers in graphics display algorithms, but not when designing critical parts.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Russel Winder
in reply to Walter Bright

Russel Winder

Posted in reply to Walter Bright

Attachments:

signature.asc (This is a digitally signed message part)

On Sat, 2014-06-28 at 03:42 -0700, Walter Bright via Digitalmars-d wrote:
> On 6/28/2014 2:47 AM, francesco cattoglio wrote:
> > When you need accuracy, 999 times out of 1000 you change the numerical technique, you don't just blindly upgrade the precision.
> 
> I have experience doing numerical work? Upgrading the precision is the first thing people try.

Nonetheless, algorithm and expression of algorithm are often more important. As proven by my Pi_Quadrature examples you can appear to have better results with greater precision, but actually the way the code operates is actually the core problem: the code I have written does not do things in the best way to achieve the best result as a given accuracy level..

[…]
> 
> Errors accumulate very rapidly and easily overwhelm the significance of the answer.

I wonder if programmers should only be allowed to use floating point number sin their code if they have studied numerical analysis?
> 
> > Especially the numerical analysts themselves will pay that price. 64 bit HAS to be as fast as possible, if you want to be competitive when it comes to any kind of numerical work.
> 
> Getting the wrong answer quickly is not useful when you're calculating the stress levels in a part.

[…]
> Again, I've done numerical programming in airframe design. The correct answer is what matters. You can accept wrong answers in graphics display algorithms, but not when designing critical parts.

Or indeed when calculating anything to do with money.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Russel Winder
in reply to dennis luehring

Russel Winder

Posted in reply to dennis luehring

Attachments:

signature.asc (This is a digitally signed message part)

On Fri, 2014-06-27 at 15:04 +0200, dennis luehring via Digitalmars-d wrote:
> Am 27.06.2014 14:20, schrieb Russel Winder via Digitalmars-d:
> > On Fri, 2014-06-27 at 11:10 +0000, John Colvin via Digitalmars-d wrote: [âŠ]
> >> I understand why the current situation exists. In 2000 x87 was the standard and the 80bit precision came for free.
> >
> > Real programmers have been using 128-bit floating point for decades. All this namby-pamby 80-bit stuff is just an aberration and should never have happened.
> 
> what consumer hardware and compiler supports 128-bit floating points?

None but what has that do do with the core problem being debated?

The core problem here is that no programming language has a proper type system able to deal with hardware. C has a hack, Fortran as a less problematic hack. Go insists on float32, float64, etc. which is better but still not great.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Russel Winder
in reply to John Colvin

Russel Winder

Posted in reply to John Colvin

Attachments:

signature.asc (This is a digitally signed message part)

On Fri, 2014-06-27 at 13:11 +0000, John Colvin via Digitalmars-d wrote:
> On Friday, 27 June 2014 at 13:04:31 UTC, dennis luehring wrote:
> > Am 27.06.2014 14:20, schrieb Russel Winder via Digitalmars-d:
> >> On Fri, 2014-06-27 at 11:10 +0000, John Colvin via
> >> Digitalmars-d wrote:
> >> [âŠ]
> >>> I understand why the current situation exists. In 2000 x87 was the standard and the 80bit precision came for free.
> >>
> >> Real programmers have been using 128-bit floating point for
> >> decades. All
> >> this namby-pamby 80-bit stuff is just an aberration and should
> >> never
> >> have happened.
> >
> > what consumer hardware and compiler supports 128-bit floating points?
> 
> I think he was joking :)

Actually no, but…

> No consumer hardware supports IEEE binary128 as far as I know. Wikipedia suggests that Sparc used to have some support.

For once Wikipedia is not wrong. IBM 128-bit is not IEEE compliant (but pre-dates IEEE standards). SPARC is IEEE compliant. No other hardware manufacturer appears to care about accuracy of floating point expression evaluation. GPU manufacturers have an excuse of sorts in that speed is more important than accuracy for graphics model evaluation. GPGPU suffers because of this.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by francesco cattoglio
in reply to Walter Bright

francesco cattoglio

Posted in reply to Walter Bright

On Saturday, 28 June 2014 at 10:42:19 UTC, Walter Bright wrote:
> On 6/28/2014 2:47 AM, francesco cattoglio wrote:
> I have experience doing numerical work? Upgrading the precision is the first thing people try.
>

Brute force is always the first thing people try :o)

> It happens with both numerical integration and inverting matrices. Inverting matrices is commonplace for solving N equations with N unknowns.
> Errors accumulate very rapidly and easily overwhelm the significance of the answer.

And that's exactly the reason you change approach instead of getting greater precision: the "adding precision" approach scales horribly, at least in my field of study, which is solving numerical PDEs.
(BTW: no sane person inverts matrices)

> Getting the wrong answer quickly is not useful when you're calculating the stress levels in a part.

We are talking about paying a price when you don't need it. With the correct approach, solving numerical problems with double precision floats yelds perfectly fine results. And it is, in fact, commonplace.

Again, I've not read yet a research paper in which it was clearly stated that 64bit floats were not good enough for solving a whole class of PDE problem. I'm not saying that real is useless, quite the opposite: I love the idea of having an extra tool when the need arises. I think the focus should be about not paying a price for what you don't use

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Iain Buclaw
in reply to hane

Iain Buclaw

Posted in reply to hane

On 27 June 2014 10:37, hane via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Friday, 27 June 2014 at 06:48:44 UTC, Iain Buclaw via Digitalmars-d wrote:
>>
>> On 27 June 2014 07:14, Iain Buclaw <ibuclaw@gdcproject.org> wrote:
>>>
>>> On 27 June 2014 02:31, David Nadlinger via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> right now, the use of std.math over core.stdc.math can cause a huge
>>>> performance problem in typical floating point graphics code. An instance
>>>> of
>>>> this has recently been discussed here in the "Perlin noise benchmark
>>>> speed"
>>>> thread [1], where even LDC, which already beat DMD by a factor of two,
>>>> generated code more than twice as slow as that by Clang and GCC. Here,
>>>> the
>>>> use of floor() causes trouble. [2]
>>>>
>>>> Besides the somewhat slow pure D implementations in std.math, the
>>>> biggest
>>>> problem is the fact that std.math almost exclusively uses reals in its
>>>> API.
>>>> When working with single- or double-precision floating point numbers,
>>>> this
>>>> is not only more data to shuffle around than necessary, but on x86_64
>>>> requires the caller to transfer the arguments from the SSE registers
>>>> onto
>>>> the x87 stack and then convert the result back again. Needless to say,
>>>> this
>>>> is a serious performance hazard. In fact, this accounts for an 1.9x
>>>> slowdown
>>>> in the above benchmark with LDC.
>>>>
>>>> Because of this, I propose to add float and double overloads (at the
>>>> very
>>>> least the double ones) for all of the commonly used functions in
>>>> std.math.
>>>> This is unlikely to break much code, but:
>>>>  a) Somebody could rely on the fact that the calls effectively widen the
>>>> calculation to 80 bits on x86 when using type deduction.
>>>>  b) Additional overloads make e.g. "&floor" ambiguous without context,
>>>> of
>>>> course.
>>>>
>>>> What do you think?
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>
>>> This is the reason why floor is slow, it has an array copy operation.
>>>
>>> ---
>>>   auto vu = *cast(ushort[real.sizeof/2]*)(&x);
>>> ---
>>>
>>> I didn't like it at the time I wrote, but at least it prevented the compiler (gdc) from removing all bit operations that followed.
>>>
>>> If there is an alternative to the above, then I'd imagine that would speed up floor by tenfold.
>>>
>>
>> Can you test with this?
>>
>> https://github.com/D-Programming-Language/phobos/pull/2274
>>
>> Float and Double implementations of floor/ceil are trivial and I can add later.
>
>
> Nice! I tested with the Perlin noise benchmark, and it got faster(in my
> environment, 1.030s -> 0.848s).
> But floor still consumes almost half of the execution time.


I've done some further improvements in that PR.  I'd imagine you'd see a little more juice squeezed out.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by John Colvin
in reply to Russel Winder

John Colvin

Posted in reply to Russel Winder

On Saturday, 28 June 2014 at 10:34:00 UTC, Russel Winder via Digitalmars-d wrote:

> So D should perhaps make a breaking change and have types int32, int64,
> float32, float64, float80, and get away from the vagaries of bizarre
> type relationships with hardware?

`real`* is the only builtin numerical type in D that doesn't have a defined width. http://dlang.org/type.html

*well I guess there's size_t and ptrdiff_t, but they aren't distinct types in their own right.

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Element 126
in reply to Russel Winder

Element 126

Posted in reply to Russel Winder

On 06/28/2014 12:33 PM, Russel Winder via Digitalmars-d wrote:
> On Sat, 2014-06-28 at 09:07 +0000, John Colvin via Digitalmars-d wrote:
> […]
>> I still maintain that the need for the precision of 80bit reals
>> is a niche demand. Its a very important niche, but it doesn't
>> justify having its relatively extreme requirements be the
>> default. Someone writing a matrix inversion has only themselves
>> to blame if they don't know plenty of numerical analysis and look
>> very carefully at the specifications of all operations they are
>> using.
>
> I fear the whole argument is getting misguided. We should reset.
>
> If you are doing numerical calculations then accuracy is critical.
> Arbitrary precision floats are the only real (!) way of doing any
> numeric non-integer calculation, and arbitrary precision integers are
> the only way of doing integer calculations.
>
> However speed is also an issue, so to obtain speed we have hardware
> integer and floating point ALUs.
>
> The cost for the integer ALU is bounded integers. Python appreciates
> this and uses hardware integers when it can and software integers
> otherwise. Thus Python is very good for doing integer work. C, C++, Go,
> D, Fortran, etc. are fundamentally crap for integer calculation because
> integers are bounded. Of course if calculations are prvably within the
> hardware integer bounds this is not a constraint and we are happy with
> hardware integers. Just don't try calculating factorial, Fibonacci
> numbers and other numbers used in some bioinformatics and quant models.
> There is a reason why SciPy has a massive following in bioinformatics
> and quant comuting.
>
> The cost for floating point ALU is accuracy. Hardware floating point
> numbers are dreadful in that sense, but again the issue is speed and for
> GPU they went 32-bit for speed. Now they are going 64-bit as they can
> just about get the same speed and the accuracy is so much greater. For
> hardware floating point the more bits you have the better. Hence IBM in
> the 360 and later having 128-bit floating point for accuracy at the
> expense of some speed. Sun had 128-bit in the SPARC processors for
> accuracy at the expense of a little speed.
>
> As Walter has or will tell us, C (and thus C++) got things woefully
> wrong in support of numerical work because the inventors were focused on
> writing operating systems, supporting only PDP hardware. They and the
> folks that then wrote various algorithms didn't really get numerical
> analysis. If C had targeted IBM 360 from the outset things might have
> been better.
>
> We have to be clear on this: Fortran is the only language that supports
> hardware floating types even at all well.
>
> Intel's 80-bit floating point were an aberration, they should just have
> done 128-bit in the first place. OK so they got the 80-bit stuff as a
> sort of free side-effect of creating 64-bit, but they ran with.  They
> shouldn't have done. I cannot see it ever happening again. cf. ARM.
>
> By being focused on Intel chips, D has failed to get floating point
> correct in avery analogous way to C failing to get floating point types
> right by focusing on PDP. Yes using 80-bit on Intel is good, but no-one
> else has this. Floating point sizes should be 32-, 64-, 128-, 256-bit,
> etc. D needs to be able to handle this. So does C, C++, Java, etc. Go
> will be able to handle it when it is ported to appropriate hardware as
> they use float32, float64, etc. as their types. None of this float,
> double, long double, double double rubbish.
>
> So D should perhaps make a breaking change and have types int32, int64,
> float32, float64, float80, and get away from the vagaries of bizarre
> type relationships with hardware?
>

+1 for float32 & cie. These names are much more explicit than the current ones. But I see two problems with it :

 - These names are already used in core.simd to denote vectors, and AVX 3 (which should appear in mainstream CPUs next year) will require to use float16, so the next revision might cause a collision. This could be avoided by using real32, real64... instead, but I prefer floatxx since it reminds us that we are not dealing with an exact real number.

 - These types are redundant, and people coming from C/C++ will likely use float and double instead. It's much too late to think of deprecating them since it would break backward compatibility (although it would be trivial to update the code with "DFix"... if someone is still maintaining the code).

A workaround would be to use a template which maps to the correct native type, iff it has the exact number of bits specified, or issues an error. Here is a quick mockup (does not support all types). I used "fp" instead of "float" or "real" to avoid name collisions with the current types.

template fp(uint n) {

        static if (n == 32) {
                alias fp = float;
        } else static if (n == 64) {
                alias fp = double;
        } else static if (n == 80) {
                static if (real.mant_dig == 64) {
                        alias fp = real;
                } else {
                        static assert(false, "No 80 bit floating point type supported on this architecture");
                }
        } else static if (n == 128) {
                alias fp = quadruple; // Or doubledouble on PPC. Add other static ifs if necessary.
        } else {
                import std.conv: to;
                static assert(false, "No "~to!string(n)~" bit floating point type.");
        }
}

void main() {

        fp!32 x = 3.1415926;
        assert(is(typeof(x) == float));

        fp!64 y = 3.141592653589793;
        assert(is(typeof(y) == double));

        fp!80 z = 3.14159265358979323846;
        assert(is(typeof(z) == real));

        /* Fails on x86_64, as it should, but the error message could be made more explicit.
         * Currently : "undefined identifier quadruple"
         * Should ideally be : "No native 128 bit floating-point type supported on x86_64 architecture."
         */
        /*
        fp!128 w = 3.14159265358979323846264338327950288;
        assert(is(typeof(w) == quadruple));
        */
}

June 28, 2014

Re: std.math performance (SSE vs. real)

Posted by Andrei Alexandrescu
in reply to Walter Bright

Andrei Alexandrescu

Posted in reply to Walter Bright

On 6/27/14, 11:16 PM, Walter Bright wrote:
> On 6/27/2014 10:18 PM, Walter Bright wrote:
>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>> *The number of algorithms that are both numerically stable/correct
>>> and benefit
>>> significantly from > 64bit doubles is very small.
>>
>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing
>> professional numerical work.
>>
>
> Sorry for being so abrupt. FP is important to me - it's not just about
> performance, it's also about accuracy.

The only problem is/would be when the language forces one choice over the other. Both options of maximum performance and maximum precision should be handily accessible to D users.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation