May 16, 2016
On 5/16/2016 3:30 AM, Andrei Alexandrescu wrote:
> On 5/16/16 3:31 AM, Walter Bright wrote:
>>
>> Ironically, this Microsoft article argues for greater precision for
>> intermediate calculations, although Microsoft ditched 80 bits:
>>
>> https://msdn.microsoft.com/en-us/library/aa289157(VS.71).aspx
>
> Do you have an explanation on why Microsoft ditched 80-bit floats? -- Andrei

At the time they were porting NT to some other architecture which didn't have 80 bits. I suspect they dumped it in order to enhance code compatibility between the x86 compiler and the other compiler. I think Microsoft tools had an internal mandate to make the differences between the machines as small as possible.

The lack of 80 bit SIMD support likely gave it a good shove off the curb as well. Even dmd no longer generates x87 code for float/double for 64 bit targets, meaning no more 80 bit temporaries.

It's kinda sad, really. The x87 was a triumph of engineering when it came out. The comments that "64 bits is all anyone will ever need" are not made by people who do numerical work, where one constantly battles catastrophic loss of precision.

I've often wondered how NASA calculates trajectories out to Jupiter, because it has to be done iteratively, and that means cumulative loss of precision. I wrote some orbit software for fun in college, and catastrophic loss of precision would make the orbits go completely bonkers after just one orbit. I invented several schemes to fix it, but nothing worked.

When I was doing numerical analysis for my you-know-what job, I had to invert matrices all the time. I only knew the math algorithm, and that would produce nonsense on anything larger than 14*14 or so using doubles. Better algorithms exist that compensate for the errors, but this was pre-internet and I didn't know where to get such a solution.

Those two experiences shaped my attitudes about the value of precision, as well as the TTL boards I designed where the design had to still work if faster parts were swapped in.
May 16, 2016
On 5/16/2016 4:18 AM, Joseph Rushton Wakeling wrote:
> you keep saying "correctness" or "accuracy", when people are
> consistently talking to you about "consistency" ... :-)

I see consistently wrong answers as being as virtuous as getting tires for my car that are not round :-)


> I can always request more precision if I need or want it.  Getting different
> results for a superficially identical float * double calculation, because one
> was performed at compile time and another at runtime, is an inconsistency that
> it might be nicer to avoid.

As the links I posted explain in great detail, such a design produces other problems. Secondly, as I think this thread amply demonstrates, few programmers are particularly aware of the issues of cumulative roundoff error. It's very easy to miss it and just assume the result of the calculation is precise to 7 digits, when it might be off by factors of 2.


> The latter result, at least (AIUI) is consistent depending on whether the
> calculation is done at compile time or runtime.

True, but I was talking about intuitiveness. Integral promotion rules are not intuitive, they are a surprise to every C/C++/D newbie. Except for me, since I came from a PDP-11 assembler background, and that's how the PDP-11 CPU worked, and C's semantics are based on the 11.
May 16, 2016
On 5/16/2016 3:33 AM, Andrei Alexandrescu wrote:
> On 5/16/16 4:10 AM, Walter Bright wrote:
>> FP behavior has complex trade-offs with speed, accuracy, compatibility,
>> and size. There are no easy, obvious answers.
>
> That's a fair statement. My understanding is also that 80-bit math is on the
> wrong side of the tradeoff simply because it's disproportionately slow (again I
> cite http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/). All modern
> ALUs I looked at have 32- and 64-bit FP units only. I'm trying to figure why. --


I think it is slow because no effort has been put into speeding it up. All the effort went into SIMD. The x87 FPU is a library module that is just plopped onto the chip for compatibility.

The x87 register stack also sux because it's hard to generate good code for. That didn't help.
May 16, 2016
On 5/16/2016 3:26 AM, Joseph Rushton Wakeling wrote:
> If I've understood people's arguments right, the point of concern is that there
> are use cases where the programmer wants to be able to guarantee _a specific
> precision of their choice_.
>
> That strikes me as a legitimate use-case that it would be worth trying to support.

Yes, and I've proposed roundToFloat() and roundToDouble() intrinsics in this thread and the last two or three times this came up.

I also strongly feel that use of those intrinsics is a red flag that there's a problem with the algorithm. At least the use of them will make it obvious in the code that there's a bug :-) in the algorithm.

Floats are chosen for speed and storage. Insisting that lower accuracy is desired as a default consequence just baffles me. It's like saying people choose sugary snacks because they want dental cavities.
May 16, 2016
On 5/16/2016 3:29 AM, Andrei Alexandrescu wrote:
> Aren't good algorithms helping dramatically with that?

Yup. But they are not textbook math algorithms, tend to be complex and strange, and are not very discoverable by regular programmers (I tried and failed).

Extended precision is a simple, straightforward fix to precision problems in straightforward code.

May 16, 2016
On 5/16/2016 4:35 AM, Andrei Alexandrescu wrote:
> This may be the best angle in this discussion. For all I can tell 80 bit is slow
> as molasses and on the road to getting slower. Isn't that enough of an argument
> to move away from it?

We are talking CTFE here, not runtime.

May 16, 2016
On Monday, 16 May 2016 at 10:25:33 UTC, Andrei Alexandrescu wrote:
> I'm not sure about this. My understanding is that all SSE has hardware for 32 and 64 bit floats, and the the 80-bit hardware is pretty much cut-and-pasted from the x87 days without anyone really looking in improving it. And that's been the case for more than a decade. Is that correct?

Pretty much. On the OS side, Windows has officially deprecated x87 for the 64-bit version in desktop mode, and it's flat out forbidden in kernel mode. All development focus from Intel has been on improving the SSE/AVX instruction set and pipeline.

And on a gamedev side, we generally go for fast over precise. Or, more to the point, an acceptable loss in precision. The C++ codegen spits out SSE/AVX code by default in our builds, and I hand optimise with appropriate intrinsics certain functions that get inlined. SIMD is even more an appropriate point to bring up here - gaming is trending towards more parallel operations, operating on a single float at a time is not the correct way to get the best performance out of your system.

This is one of those things where I can see the point for the D compiler to do things its own way - but only when it expects to operate in a pure D environment. We have heavy interop between C++ and D. If simple functions can give different results at compile time without a way for me to configure the compiler on both sides, what actual benefits does that give me?
May 16, 2016
On 5/16/2016 3:27 AM, Andrei Alexandrescu wrote:
> I'm not sure about this. My understanding is that all SSE has hardware for 32
> and 64 bit floats, and the the 80-bit hardware is pretty much cut-and-pasted
> from the x87 days without anyone really looking in improving it. And that's been
> the case for more than a decade. Is that correct?

I believe so.

> I'm looking for example at
> http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/ and see that on all
> Intel and compatible hardware, the speed of 80-bit floating point operations
> ranges between much slower and disastrously slower.

It's not a totally fair comparison. A matrix inversion algorithm that compensates for cumulative precision loss involves executing a lot more FP instructions (don't know the ratio).


> I think it's time to revisit our attitudes to floating point, which was formed
> last century in the heydays of x87. My perception is the world has moved to SSE
> and 32- and 64-bit float; the "real" type is a distraction for D; the whole
> let's do things in 128-bit during compilation is a time waster; and many of the
> original things we want to do with floating point are different without a
> distinction, and a further waste of our resources.

Some counter points:

1. Go uses 256 bit soft float for constant folding.

2. Speed is hardly the only criterion. Quickly getting the wrong answer (and not just a few bits off, but total loss of precision) is of no value.

3. Supporting 80 bit reals does not take away from the speed of floats/doubles at runtime.

4. Removing 80 bit reals will consume resources (adapting the test suite, rewriting the math library, ...).

5. Other languages not supporting it means D has a capability they don't have. My experience with selling products is that if you have an exclusive feature that a particular customer needs, it's a slam dunk sale.

6. My other experience with feature sets is if you drop things that make your product different, and concentrate on matching feature checklists with Major Brand X, customers go with Major Brand X.

7. 80 bit reals are there and they work. The support is mature, and is rarely worked on, i.e. it does not consume resources.

8. Removing it would break an unknown amount of code, and there's no reasonable workaround for those that rely on it.
May 16, 2016
On 5/16/16 7:53 AM, Walter Bright wrote:
> On 5/16/2016 3:33 AM, Andrei Alexandrescu wrote:
>> On 5/16/16 4:10 AM, Walter Bright wrote:
>>> FP behavior has complex trade-offs with speed, accuracy, compatibility,
>>> and size. There are no easy, obvious answers.
>>
>> That's a fair statement. My understanding is also that 80-bit math is
>> on the
>> wrong side of the tradeoff simply because it's disproportionately slow
>> (again I
>> cite http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/). All
>> modern
>> ALUs I looked at have 32- and 64-bit FP units only. I'm trying to
>> figure why. --
>
>
> I think it is slow because no effort has been put into speeding it up.
> All the effort went into SIMD. The x87 FPU is a library module that is
> just plopped onto the chip for compatibility.

That makes sense.

> The x87 register stack also sux because it's hard to generate good code
> for. That didn't help.

It may indeed be a bummer but it's the way it is. So we should act accordingly. -- Andrei

May 16, 2016
On 5/16/16 8:19 AM, Walter Bright wrote:
> On 5/16/2016 4:35 AM, Andrei Alexandrescu wrote:
>> This may be the best angle in this discussion. For all I can tell 80
>> bit is slow
>> as molasses and on the road to getting slower. Isn't that enough of an
>> argument
>> to move away from it?
>
> We are talking CTFE here, not runtime.

I have big plans with floating-point CTFE and all are elastic: the faster CTFE FP is, the more and better things we can do. Things that other languages can't dream to do, like interpolation tables for transcendental functions. So a slowdown of FP CTFE would be essentially a strategic loss. -- Andrei