May 21, 2016
On Friday, 20 May 2016 at 22:22:57 UTC, Walter Bright wrote:
> On 5/20/2016 5:36 AM, Tobias M wrote:
>> Still an authority, though.
>
> If we're going to use the fallacy of appeal to authority, may I present Kahan who concurrently designed the IEEE 754 spec and the x87.

Since I'm just in the mood of appealing to authorities, let me cite Wikipedia:

The argument from authority (Latin: argumentum ad verecundiam) also appeal to authority, is a common argument form which can be fallacious, *such as when an authority is cited on a topic outside their area of expertise, when the authority cited is not a true expert or if the cited authority is stating a contentious or controversial position.*

(Emphasis is mine)
May 21, 2016
On Tuesday, 17 May 2016 at 21:07:21 UTC, Walter Bright wrote:
> [...] why an unusual case of producing a slightly worse answer trumps the usual case of producing better answers.

'Sometimes worse' is not 'better', but that's not the point, here.

Even if you managed to consistently produce not less accurate results: if the results computed at compile time differ from those computed at run time you cannot safely substitute run time calls with compile time calls without altering the result. Even slightest differences can lead to completely different outcomes.

For example, if we are interested in the number of real roots of a quadratic polynomial we compute its discriminant D. There are no solutions if D is negative, there is one solution if D is zero and there are two solutions if D is positive. In a floating point environment the crucial test is often implemented with a certain tolerance and good implementations compute the discriminant with increased precision. In any case, the critical point is that we have to make a decision based on a single computed value. A difference of one unit in the last place can change the result.

Note that this in not about correctness of the result but about reproducibility and substitutability across run time and compile time boundaries.

Please make a decision, document it, and move on.

Fool
May 21, 2016
On 21.05.2016 00:22, Walter Bright wrote:
> On 5/20/2016 5:36 AM, Tobias M wrote:
>> Still an authority, though.
>
> If we're going to use the fallacy of appeal to authority,

Authorities are not always wrong, the fallacy is to argue that they are right *because* they are authorities. However, in this case, I don't think you have provided a quote from Kahan that is applicable to the case we have here. The Gustafson quote is formulated generally enough to apply to the current case directly.

> may I present
> Kahan who concurrently designed the IEEE 754 spec and the x87.
>

The x87 is by far not the slam-dunk design you seem to make it out to be. (And you don't need to take my word for it. Otherwise, why would hardware designers prefer to phase it out in favour of systems that don't do the implicit extra scratch space thing?)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

Bug report for this problem in gcc. The discussion raises an issue I wasn't aware of so far (or have forgotten about): Apparently the x87 does not even round to double correctly, it only rounds the mantissa but does not deal with out-of-range-exponents correctly. What a mess.


http://blog.jwhitham.org/2015/04/gcc-bug-323-journey-to-heart-of.html

x87 experience report.


http://www.vedetta.com/x87-fpu-php-bug-causes-infinite-loop-affected-websites-vulnerable-to-dos-via-php-get-function-2.2250738585072011e-308

DOS attack possible due to usage of x87 instructions.


http://stackoverflow.com/questions/3206101/extended-80-bit-double-floating-point-in-x87-not-sse2-we-dont-miss-it

Many complaints about x87, and some concerns about 80 bit floating point support in the future.


http://www.vinc17.org/research/extended.en.html

Good summary of some x87 problems. (Slightly broken English though.)


http://www.realworldtech.com/physx87/

x87 instructions used to make the GPU look faster compared to the CPU. (Not /directly/ a x87 design problem.)


http://www.cims.nyu.edu/~dbindel/class/cs279/87stack.pdf
http://www.cims.nyu.edu/~dbindel/class/cs279/stack87.pdf

Kahan criticizing the design of the register stack on the x87.


I haven't found any statement from Kahan on gcc bug 323, but this would be really interesting, if anyone can find anything.

I think there is ample evidence that x87 is ultimately a design failure, even though the intentions were good and based on real expertise.
May 21, 2016
On 20.05.2016 13:32, Joakim wrote:
> On Friday, 20 May 2016 at 11:02:45 UTC, Timon Gehr wrote:
>> On 20.05.2016 11:14, Joakim wrote:
>>> On Thursday, 19 May 2016 at 18:22:48 UTC, Timon Gehr wrote:
>>>> On 19.05.2016 08:04, Joakim wrote:
>>>>> On Wednesday, 18 May 2016 at 17:10:25 UTC, Timon Gehr wrote:
>>>>>> It's not just slightly worse, it can cut the number of useful bits in
>>>>>> half or more! It is not unusual, I have actually run into those
>>>>>> problems in the past, and it can break an algorithm that is in Phobos
>>>>>> today!
>>>>>
>>>>> I wouldn't call that broken.  Looking at the hex output by
>>>>> replacing %f
>>>>> with %A in writefln, it appears the only differences in all those
>>>>> results is the last byte in the significand.
>>>>
>>>> Argh...
>>>>
>>>> // ...
>>>>
>>>> void main(){
>>>>     //double[] data=[1e16,1,-9e15];
>>>>     import std.range;
>>>>     double[] data=1e16~repeat(1.0,100000000).array~(-9e15);
>>>>     import std.stdio;
>>>>     writefln("%f",sum(data)); // baseline
>>>>     writefln("%f",kahan(data)); // kahan
>>>>     writefln("%f",kahanBroken(data)); // broken kahan
>>>> }
>>>>
>>>>
>>>> dmd -run kahanDemo.d
>>>> 1000000000000000.000000
>>>> 1000000100000000.000000
>>>> 1000000000000000.000000
>>>>
>>>> dmd -m32 -O -run kahanDemo.d
>>>> 1000000000000000.000000
>>>> 1000000000000000.000000
>>>> 1000000000000000.000000
>>>>
>>>>
>>>> Better?
>>>>
>>>> Obviously there is more structure in the data that I invent manually
>>>> than in a real test case where it would go wrong. The problems carry
>>>> over though.
>>>
>>> I looked over your code a bit.  If I define sum and c as reals in
>>> "kahanBroken" at runtime, this problem goes away.
>>
>> Yes. That's absolutely obvious, and I have noted it before, but
>> thanks. Maybe try to understand why this problem occurs in the first
>> place.
>
> Yet you're the one arguing against increasing precision everywhere in CTFE.
> ...

High precision is usually good (use high precision, up to arbitrary precision or even symbolic arithmetic whenever it improves your results and you can afford it). *Implicitly* increasing precision during CTFE is bad. *Explicitly* using higher precision during CTFE than at running time /may or may not/ be good. In case it is good, there is often no reason to stop at 80 bits.

>>> Since that's what the
>>> CTFE rule is actually doing, ie extending all floating-point to reals at
>>> compile-time, I don't see what you're complaining about.  Try it, run
>>> even your original naive summation algorithm through CTFE and it will
>>> produce the result you want:
>>>
>>> enum double[] ctData=[1e16,1,-9e15];
>>> enum ctSum = sum(ctData);
>>> writefln("%f", ctSum);
>>> ...
>>
>> This example wasn't specifically about CTFE, but just imagine that
>> only part of the computation is done at CTFE, all local variables are
>> transferred to runtime and the computation is completed there.
>
> Why would I imagine that?

Because that's the most direct way to go from that example to one where implicit precision enhancement during CTFE only is bad.

> And this whole discussion is about what
> happens if you change the precision of all variables to real when doing
> CTFE, so what's the point of giving an example that isn't "specifically"
> about that?
> ...

Walter's request wasn't specifically about that, and it's the first thing I came up with:

> On 17.05.2016 23:07, Walter Bright wrote:
>> I'd like to see an example of double rounding "completely defeating" an
>> algorithm,

I don't have infinite amounts of time. The (significant) time spent writing up all of this competes with time spent doing things that are guaranteed to yield (often immediate) tangible benefits.


> And if any part of it is done at runtime using the algorithms you gave,
> which you yourself admit works fine if you use the right
> higher-precision types,

What's "right" about them? That the compiler will not implicitly transform some of them to even higher precision in order to break the algorithm again? (I don't think that is even guaranteed.)

> you don't seem to have a point at all.
> ...

Be assured that I have a point. If you spend some time reading, or ask some constructive questions, I might be able to get it across to you. Otherwise, we might as well stop arguing.

>>>>> As Don's talk pointed out,
>>>>> all floating-point calculations will see loss of precision starting
>>>>> there.
>>>>> ...
>>>>
>>>>
>>>> This is implicitly assuming a development model where the programmer
>>>> first writes down the computation as it would be correct in the real
>>>> number system and then naively replaces every operation by the
>>>> rounding equivalent and hopes for the best.
>>>
>>> No, it is intrinsic to any floating-point calculation.
>>> ...
>>
>> How do you even define accuracy if you don't specify an infinitely
>> precise reference result?
>
> There is no such thing as an infinitely precise result.  All one can do
> is compute using even higher precision and compare it to lower precision.
> ...

If I may ask, how much mathematics have you been exposed to?


>>>> It is a useful rule if that is what you're doing. One might be doing
>>>> something else. Consider the following paper for an example where the
>>>> last bit in the significant actually carries useful information for
>>>> many of the values used in the program.
>>>>
>>>> http://www.jaist.ac.jp/~s1410018/papers/qd.pdf
>>>
>>> Did you link to the wrong paper? ;)
>>
>> No. That paper uses multiple doubles per approximated real value to
>> implement arithmetic that is more precise than using just plain
>> doubles. If any bit in the first double is off, this is no better than
>> using a single double.
>>
>>> I skimmed it and that paper
>>> explicitly talks about error bounds all over the place.
>>
>> It is enough to read the abstract to figure out what the problem is.
>> This demonstrates a non-contrived case where CTFE using enhanced
>> precision throughout can break your program. Compute something as a
>> double-double at compile-time, and when it is transferred to runtime
>> you lose all the nice extra precision, because bits in the middle of
>> the (conceptual) mantissa are lost.
>
> That is a very specific case where they're implementing higher-precision
> algorithms using lower-precision registers.

If I argue in the abstract, people request examples. If I provide examples, people complain that they are too specific.

> If you're going to all that
> trouble, you should know not to blindly run the same code at compile-time.
> ...

The point of CTFE is to be able to run any code at compile-time that adheres to a well-defined set of restrictions. Not using floating point is not one of them.

>>> The only mention of "the last bit" is
>>
>> This part is actually funny. Thanks for the laugh. :-)
>> I was going to say that your text search was too naive, but then I
>> double-checked your claim and there are actually two mentions of "the
>> last bit", and close by to the other mention, the paper says that "the
>> first double a_0 is a double-precision approximation to the number a,
>> accurate to almost half an ulp."
>
> Is there a point to this paragraph?
>

I don't think further explanations are required here. Maybe be more careful next time.

>>> when they say they calculated their
>>> constants in arbitrary precision before rounding them for runtime use,
>>> which is ironically similar to what Walter suggested doing for D's CTFE
>>> also.
>>> ...
>>
>> Nothing "ironic" about that. It is sometimes a good idea and I can do
>> this explicitly and make sure the rounding is done correctly, just
>> like they did. Also, it is a lot more flexible if I can specify the
>> exact way the computation is done and the result is rounded. 80 bits
>> might not be enough anyway. There is no reason for the language to
>> apply potentially incorrect "hand holding" here.
>>
>> Again, please understand that my point is not that lower precision is
>> better. My point is that doing the same thing in every context and
>> allowing the programmer to specify what happens is better.
>
> I understand your point that sometimes the programmer wants more
> control.

Thanks!

> But as long as the way CTFE extending precision is
> consistently done and clearly communicated,

It never will be clearly communicated to everyone and it will also hit people by accident who would have been aware of it.

What is so incredibly awesome about /implicit/ 80 bit precision as to justify the loss of control? If I want to use high precision for certain constants precomputed at compile time, I can do it just as well, possibly even at more than 80 bits such as to actually obtain accuracy up to the last bit.

Also, maybe I will need to move the computation to startup at runtime some time in the future because of some CTFE limitation, and then the additional implicit gain from 80 bit precision will be lost and cause a regression. The compiler just has no way to guess what precision is actually needed for each operation.

> those people can always opt out and do it some other way.
> ...

Sure, but now they need to _carefully_ maintain different implementations for CTFE and runtime, for an ugly misfeature. It's a silly magic trick that is not actually very useful and prone to errors.


May 21, 2016
On 20.05.2016 08:25, Walter Bright wrote:
> On 5/19/2016 12:49 AM, Max Samukha wrote:
>> People are trying to get across that, if they wanted to maximize
>> accuracy, they
>> would request the most precise type explicitly. D has 'real' for that.
>> This
>> thread has shown unequivocally that the semantics you are insisting on
>> is bound
>> to cause endless confusion and complaints.
>
> C++ programs do the same thing, and have for decades, and such behavior
> is allowed by the C++ Standard.
> ...

And reported as a gcc bug regularly, and there seems to be a flag to turn x87-induced strangeness off completely. Also, the C99 standard rules out compiler-specific results. (And allows the program to query whether it is being compiled with the set of flags it requires: https://en.wikipedia.org/wiki/C99#IEEE.C2.A0754_floating_point_support .)

> People have been confused by and complained about floating point
> behavior at least since I started programming 40 years ago.
> ...

That's mostly an incidental problem. (Sharing language syntax with non-rounding operations, needing to map to mis-designed hardware, etc.)

> I have not invented anything new here.
>

You may not have invented anything new, but the old invention is broken and in the (slow and painful) process of getting fixed. There is no reason to build new things that use the same principle, as there is already a better way available.
May 21, 2016
On 20.05.2016 14:34, Johan Engelen wrote:
> On Thursday, 19 May 2016 at 18:22:48 UTC, Timon Gehr wrote:
>>
>> dmd -run kahanDemo.d
>> 1000000000000000.000000
>> 1000000100000000.000000
>> 1000000000000000.000000
>>
>> dmd -m32 -O -run kahanDemo.d
>> 1000000000000000.000000
>> 1000000000000000.000000
>> 1000000000000000.000000
>>
>> Better?
>
> Ignore if you think it's not relevant.
> I am a bit lost in this thread. I thought this was about floating point
> CTFE,

Yes, this is how it started out. :)
At some point, I made the comment that the compiler should never be allowed to use a higher precision than the one specified, which broadened the scope of the discussion somewhat. But the kinds of problems caused by using higher precision than requested can be similar for CTFE and runtime.

> but I think what you are seeing here (in the 2nd case) is the DMD
> optimizer allowing reassociation of fp expressions?

I don't believe it does (Walter knows /very/ well that a+(b+c) is not the same as (a+b)+c), but I have not looked at the assembly output. The result becomes wrong when 80 bit precision is used behind the scenes for some (but not all) of the computation, so this could be the explanation.

> LDC does not allow
> that (per default) and produces the 1000000100000000.000000 result also
> with optimization on.
>
> -Johan

Thanks!
May 21, 2016
On 17.05.2016 20:09, Max Samukha wrote:
> On Monday, 16 May 2016 at 19:01:19 UTC, Timon Gehr wrote:
>
>>> You are not even guaranteed to get the same result on two different x86
>>> implementations.
>>
>> Without reading the x86 specification, I think it is safe to claim
>> that you actually are guaranteed to get the same result.
>
> Correct. Sorry for the noise.

Interestingly, I was actually wrong:
https://hal.archives-ouvertes.fr/hal-00128124v5/document

I think that document is worth a read in any case.

Page 18: "Note, also, that different processors within the same architecture can implement the same transcendental functions with different accuracies. We already noted the difference between the AMD-K5 and the K6 and following processors with respect to angular reduction. Intel also notes that the algorithms’ precision was improved between the 80387 / i486DX processors and the Pentium processors. [Int, 1997,§7.5.10]"

The language should ideally not leak such differences to the user.
May 21, 2016
On 21.05.2016 15:45, Timon Gehr wrote:
> On 21.05.2016 00:22, Walter Bright wrote:
>> ...
>> may I present
>> Kahan who concurrently designed the IEEE 754 spec and the x87.
>>
>
> The x87 is by far not the slam-dunk design you seem to make it out to
> be. ...

https://hal.archives-ouvertes.fr/hal-00128124v5/document
Check out section 5 for some convincing examples showing why the x87 is horrible.
May 21, 2016
On 5/21/2016 2:26 AM, Tobias Müller wrote:
> On Friday, 20 May 2016 at 22:22:57 UTC, Walter Bright wrote:
>> On 5/20/2016 5:36 AM, Tobias M wrote:
>>> Still an authority, though.
>>
>> If we're going to use the fallacy of appeal to authority, may I present Kahan
>> who concurrently designed the IEEE 754 spec and the x87.
>
> Actually cited this *because* of you mentioning Kahan several times. And because
> you said "The people who design these things are not fools, and there are good
> reasons for the way things are."

I meant two things by this:

1. Do the homework before disagreeing with someone who literally wrote the book and designed the hardware for it.

2. Complaining that the x87 is not IEEE compliant, when the guy that designed the x87 wrote the spec at the same time, suggests a misunderstanding the spec. I.e. again, gotta do the homework first.

Dismissing several decades of FP designs, and every programming language, as being "obviously wrong" and "insane" is an extraordinary claim, and so requires extraordinary evidence.

After all, what would your first thought be when a sophomore physics student tells you that Feynman got it all wrong? It's good to revisit existing dogma now and then, and challenge the underlying assumptions of it, but again, you gotta understand the existing dogma all the way down first.

If you don't, you're very likely to miss something fundamental and produce a design that is less usable.
May 21, 2016
On 5/21/2016 10:03 AM, Timon Gehr wrote:
> Check out section 5 for some convincing examples showing why the x87 is horrible.

The polio vaccine winds up giving a handful of people polio, too.

It's good to list traps for the unwary in FP usage. It's disingenuous to list only problems with one design and pretend there are no traps in another design.