May 18, 2016
On Wednesday, 18 May 2016 at 15:42:56 UTC, Joakim wrote:
> I see, so the fact that both the C++ and D specs say the same thing doesn't matter, and the fact that D also has the const float in your example as single-precision at runtime, contrary to your claims, none of that matters.

D doesn't even have a spec, so how can they possibly say the same thing?

However, quoting the Wikipedia page on IEEE floats:

«The IEEE 754-1985 allowed many variations in implementations (such as the encoding of some values and the detection of certain exceptions). IEEE 754-2008 has strengthened up many of these, but a few variations still remain (especially for binary formats). The reproducibility clause recommends that language standards should provide a means to write reproducible programs (i.e., programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.»

That's the map people who care about floating point follow.


> No sane DSP programmer would write that like you did.

What kind of insult is that?  I've read lots of DSP code written by others.  I know what kind of programming it entails.

In fact, what I have described here are techniques picked up from state-of-the-art DSP code written by top-the-of-line DSP programmers.


> Since the vast majority of tests will never use such compile-test constants, your opinion is not only wrong but irrelevant.

Oh... Not only am I wrong, but my opinion is irrelevant. Well, with this attitude D will remain irrelevant as well.

For good reasons.


> Then don't use differently defined constants in different places

I don't, and I didn't. DMD did it.

May 18, 2016
On 17.05.2016 23:07, Walter Bright wrote:
> On 5/17/2016 11:08 AM, Timon Gehr wrote:
>> Right. Hence, the 80-bit CTFE results have to be converted to the final
>> precision at some point in order to commence the runtime computation.
>> This means
>> that additional rounding happens, which was not present in the
>> original program.
>> The additional total roundoff error this introduces can exceed the
>> roundoff
>> error you would have suffered by using the lower precision in the
>> first place,
>> sometimes completely defeating precision-enhancing improvements to an
>> algorithm.
>
> I'd like to see an example of double rounding "completely defeating" an
> algorithm,

I have given this example, and I have explained it.

However, let me provide one of the examples I have given before, in a more concrete fashion. Unfortunately, there is no way to use "standard" D to illustrate the problem, as there is no way to write an implementation that is guaranteed not to be broken, so let us assume hypothetically for now that we are using D', where all computations are performed at the specified precision.

I'm copying the code from:
https://en.wikipedia.org/wiki/Kahan_summation_algorithm


$ cat kahanDemo.d

module kahanDemo;

double sum(double[] arr){
    double s=0.0;
    foreach(x;arr) s+=x;
    return s;
}

double kahan(double[] arr){
    double sum = 0.0;
    double c = 0.0;
    foreach(x;arr){
        double y=x-c;
        double t=sum+y;
        c = (t-sum)-y;
        sum=t;
    }
    return sum;
}

double kahanBroken(double[] arr){
    double sum = 0;
    double c= 0.0;
    foreach(x;arr){
        real y=x-c;
        real t=sum+y;
        c = (t-sum)-y;
        sum=t;
    }
    return sum;
}

void main(){
    double[] data=[1e16,1,-9e15];
    import std.stdio;
    writefln("%f",sum(data)); // baseline
    writefln("%f",kahan(data)); // kahan
    writefln("%f",kahanBroken(data)); // broken kahan
}

In D, the compiler is in principle allowed to transform the non-broken version to the broken version. (And maybe, it will soon be allowed to transform the baseline version to the Kahan version. Who knows.)

Now let's see what DMD does:

$ dmd --version
DMD64 D Compiler v2.071.0
Copyright (c) 1999-2015 by Digital Mars written by Walter Bright

$ dmd -m64 -run kahanDemo.d
1000000000000000.000000
1000000000000001.000000
1000000000000000.000000

Nice, this is what I expect.

$ dmd -m64 -O -run kahanDemo.d
1000000000000000.000000
1000000000000001.000000
1000000000000000.000000

Still great.

$ dmd -m32 -run kahanDemo.d
1000000000000000.000000
1000000000000001.000000
1000000000000000.000000

Liking this.

$ dmd -m32 -O -run kahanDemo.d
1000000000000000.000000
1000000000000000.000000
1000000000000000.000000

Screw you, DMD!

And suddenly, I need to compile and test my code with all combinations of compiler flags, and even then I am not sure the compiler is not intentionally screwing me over. How is this remotely acceptable?

> and why an unusual case of producing a slightly worse

It's not just slightly worse, it can cut the number of useful bits in half or more! It is not unusual, I have actually run into those problems in the past, and it can break an algorithm that is in Phobos today!

> answer trumps the usual case of producing better answers.
> ...

The 'usual case of producing better answers' /is not actually desirable/, because the compiler does not guarantee that it happens all the time! I don't want my code to rely on something to happen that might not always happen. I want to be sure that my code is correct. I cannot conveniently do so if you don't tell me in advance what it does, and/or if the behaviour has a lot of abstraction-breaking special cases.

>
>> There are other reasons why I think that this kind of
>> implementation-defined
>> behaviour is a terribly bad idea, eg.:
>>
>> - it breaks common assumptions about code, especially how it behaves
>> under
>> seemingly innocuous refactorings, or with a different set of compiler
>> flags.
>
> As pointed out, this already happens with just about every language. It
> happens with all C/C++ compilers I'm aware of.

I'm not claiming those languages don't have broken floating point semantics. I have sometimes been using inline assembler in C++ to get the results I want. It's painful and unnecessary.

> It happens as the default behavior of the x86.

I know. I don't care. It is a stupid idea. See above.

> And as pointed out, refactoring (x+y)+z to x+(y+z)
> often produces different results, and surprises a lot of people.
> ...

As far as I can tell, you are saying: "You think A is bad, but A is similar to B, and B is bad, but B is hard to fix, hence A is actually good." I disagree.


For floating point, '+' means: "add precisely, then round". It is indeed potentially surprising and not very helpful for generic code that floating point types and integral types use the same syntax for conceptually different things (i.e., the language commits operator overloading abuse), but we are stuck with that now. Implementation-defined behaviour can actually be eliminated in a backward-compatible way.

Refactorings as simple as moving some expression into its own function should be possible without surprisingly wreaking havoc. Implementations can (and do) choose strange and useless behaviours. The fact that types do not actually specify what kind of value will be used at runtime is a pointless waste of type safety.

>
>> - it breaks reproducibility, which is sometimes more important that
>> being close
>> to the infinite precision result (which you cannot guarantee with any
>> finite
>> floating point type anyway).
>>   (E.g. in a game, it is enough if the result seems plausible, but it
>> should be
>> the same for everyone. For some scientific experiments, the ideal case
>> is to
>> have 100% reproducibility of the computation, even if it is horribly
>> wrong, such
>> that other scientists can easily uncover and diagnose the problem, for
>> example.)
>
> Nobody is proposing a D feature that does not produce reproducible
> results

Do you disagree with the notion that implementation-defined behaviour is almost by definition detrimental to reproducibility? (I'm not trying to say that it makes reproducing results impossible. It is a continuum.)

> with the same program on the same inputs.

I'm talking about computations, not just programs, and I ideally want consistent behaviour across compilers/compiler versions/compiler flags (at least those flags not specifically designed to change language semantics in precisely that way). If you can additionally give me refactorability (i.e. a lack of unprincipled interplay between language features that should be orthogonal [1]), that would be really awesome.

The result should ideally not depend on e.g. whether a computation is run at compile-time or run-time, or whatever other irrelevant implicit detail that is changed during refactoring or when switching machines. (e.g. someone might be running a 32-bit system, and another person (or the same person) might be running a 64-bit system, and they get different floating-point results. It just adds a lot of friction and pointless work.)

> This complaint is a strawman, as I've pointed out multiple times.
> ...

A strawman is an argument one incorrectly claims that the other party has made.

Here, I think the underlying problem is that there is a misunderstanding. (I.e. you think I'm saying something I didn't intend to say, or vice-versa. [2])

This is often the case, hence I try to avoid calling out strawmen by name and instead try to put my argument differently. (E.g., when you seemingly started to claim that my argument was something like "using lower precision for the entire computation throughout should be expected to yield more accurate results" and then ridiculed it.)

> In fact, the results would be MORE portable than with C/C++,

I know that it is better than completely broken. I'm sorry, but I have higher standards. It should be not broken.

> because the
> FP behavior is completely implementation defined, and compilers take
> advantage of that.
>

I guess the reason why it is completely implementation defined is the desire that there should be completely standard-compliant C/C++ compilers for each platform, and the fact that implementers didn't want to support IEEE 754 everywhere. One way to implement a specification is to weaken the specification.



[1] This is the single most important thing programming languages developed in academia often get right.

[2] To clarify: I know that if I compile a program on systems with identical states, on an identical compiler with identical flags, I will (or should) get identical binaries that then produce identical results on conforming hardware. (See how I didn't need to say "identical hardware"? I want to be there!) It is not always desirable or practical to keep around all that additional state though.

May 18, 2016
I had written and sent this message three days ago, but it seemingly never showed up on the newsgroup. I'm sorry if it seemed that I didn't explain myself, I was operating under the assumption that this message had been made available to you.


On 14.05.2016 03:26, Walter Bright wrote:
> On 5/13/2016 5:49 PM, Timon Gehr wrote:
>> Nonsense. That might be true for your use cases. Others might actually
>> depend on
>> IEE 754 semantics in non-trivial ways. Higher precision for
>> temporaries does not
>> imply higher accuracy for the overall computation.
>
> Of course it implies it.
> ...

No, see below.


> An anecdote: a colleague of mine was once doing a chained calculation.
> At every step, he rounded to 2 digits of precision after the decimal
> point, because 2 digits of precision was enough for anybody. I carried
> out the same calculation to the max precision of the calculator (10
> digits). He simply could not understand why his result was off by a
> factor of 2, which was a couple hundred times his individual roundoff
> error.
> ...

Now assume that colleague of your was doing that chained calculation, and his calculator magically added the additional digits behind his back (it can do this by caching the last full-precision value for each number prefix). He wouldn't even notice that his rounding strategy does not work. Sometime later he might then use a calculator that does not do the magical enhancing.

>
>> E.g., correctness of double-double arithmetic is crucially dependent
>> on correct
>> rounding semantics for double:
>> https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic
>>
>
> Double-double has its own peculiar issues, and is not relevant to this
> discussion.
> ...

It is relevant to this discussion insofar that it can occur in algorithms that use double-precision floating-point arithmetic. It illustrates a potential issue with implicit enhancement of precision. For double-double, there are two values of type double that together represent a higher-precision value. (One of them has a shifted exponent, such that their mantissa bits do not overlap.)

You have mantissas like:

|------double1------| |------double2--------|


Now assume that the compiler instead uses extended precision, what you get is something we might call extended-extended of the form:

|---------extended1---------| |---------extended2-----------|

Now those values are written back into 64-bit double storage, now observe which part of the double-double mantissa is lost:


|---------extended1---xxxxxx| |---------extended2-----xxxxxx|

|
v

|------double1------| |------double2--------|


The middle part of the mantissa is thrown away, and we are left with single double-precision plus some noise. Implicitly using extended precision for some parts of the computation approximately cuts the number of accurate mantissa bits in half. I don't want to have to deal with this. Just give me what I ask for.


>
>> Also, it seems to me that for e.g.
>> https://en.wikipedia.org/wiki/Kahan_summation_algorithm,
>> the result can actually be made less precise by adding casts to higher
>> precision
>> and truncations back to lower precision at appropriate places in the
>> code.
>
> I don't see any support for your claim there.
> ....

It's using the same trick that double-double does. The above reasoning should apply.

>
>> And even if higher precision helps, what good is a "precision-boost"
>> that e.g.
>> disappears on 64-bit builds and then creates inconsistent results?
>
> That's why I was thinking of putting in 128 bit floats for the compiler
> internals.
> ...

Runtime should do the same as CTFE. Are you suggesting we use 128-bit soft-floats at run time for all float types?


>
>> Sometimes reproducibility/predictability is more important than maybe
>> making
>> fewer rounding errors sometimes. This includes reproducibility between
>> CTFE and
>> runtime.
>
> A more accurate answer should never cause your algorithm to fail.

It's not more accurate, just more precise, and it is only for some temporary computations, and you don't necessarily know which. The way the new roundoff errors propagate is chaotic, and might not be what the code anticipated.

> It's like putting better parts in your car causing the car to fail.
> ...

It's like (possibly repeatedly) interchanging "better" parts and "worse" parts while the engine is still running.

Anyway, it should be obvious that this kind of reasoning by analogy does not lead anywhere.


>
>> Just actually comply to the IEEE floating point standard when using their
>> terminology. There are algorithms that are designed for it and that
>> might stop
>> working if the language does not comply.
>
> Conjecture.

I have given a concrete example.

> I've written FP algorithms (from Cody+Waite, for example),
> and none of them degraded when using more precision.
> ...

For the entire computation or some random temporaries?

>
> Consider that the 8087 has been operating at 80 bits precision by
> default for 30 years. I've NEVER heard of anyone getting actual bad
> results from this.

Fine, so you haven't.

> They have complained about their test suites that
> tested for less accurate results broke.

What happened is that the test suites broke.


> They have complained about the
> speed of x87. And Intel has been trying to get rid of the x87 forever.

It's nice to have 80-bit precision. I just want to explicitly ask for it.


> Sometimes I wonder if there's a disinformation campaign about more
> accuracy being bad, because it smacks of nonsense.
>
> BTW, I once asked Prof Kahan about this. He flat out told me that the
> only reason to downgrade precision was if storage was tight or you
> needed it to run faster. I am not making this up.

Obviously, but I think his comment was about enhancing precision for the entire computation front-to-back, not just some parts of it. I can do that on my own. I don't need the compiler to second-guess me.



May 18, 2016
On 17.05.2016 21:31, deadalnix wrote:
> On Tuesday, 17 May 2016 at 18:08:47 UTC, Timon Gehr wrote:
>> Right. Hence, the 80-bit CTFE results have to be converted to the
>> final precision at some point in order to commence the runtime
>> computation. This means that additional rounding happens, which was
>> not present in the original program. The additional total roundoff
>> error this introduces can exceed the roundoff error you would have
>> suffered by using the lower precision in the first place, sometimes
>> completely defeating precision-enhancing improvements to an algorithm.
>>
>
> WAT ? Is that really possible ?
>

Yes, I'm sorry, but this can and does happen.
Consider http://forum.dlang.org/post/nhi7m4$css$1@digitalmars.com

You can build similar examples involving only CTFE. Refer to http://forum.dlang.org/post/nhi9gh$fa4$1@digitalmars.com for an explanation of one case where this can happen. (I had actually written that post three days ago, and assumed that it had been posted to the newsgroup, but something went wrong, apparently.)
May 18, 2016
On Wednesday, 18 May 2016 at 12:39:21 UTC, Johannes Pfau wrote:
>
> Do you have a link explaining GCC actually uses such a soft float?

I'm confused as to why the compiler would be using soft floats instead of hard floats.
May 18, 2016
On Wednesday, 18 May 2016 at 19:20:20 UTC, jmh530 wrote:
> On Wednesday, 18 May 2016 at 12:39:21 UTC, Johannes Pfau wrote:
>>
>> Do you have a link explaining GCC actually uses such a soft float?
>
> I'm confused as to why the compiler would be using soft floats instead of hard floats.

Cross compilation.

May 18, 2016
On Wednesday, 18 May 2016 at 11:46:37 UTC, Era Scarecrow wrote:
> On Wednesday, 18 May 2016 at 10:25:10 UTC, tsbockman wrote:
>> https://code.dlang.org/packages/checkedint
>> https://dlang.org/phobos/core_checkedint.html
>
>  Glancing at the checkedInt I really don't see it as being the same as what I'm talking about. Overflow/carry for add perhaps, but unless it breaks down to a single instruction for the compiler to determine if it needs to do something, I see it as a failure (at best, a workaround).
>
> That's just my thoughts. CheckedInt simply _doesn't_ cover what I was talking about.

The functions in druntime's `core.checkedint` are intrinsics that map directly to the hardware overflow/carry instructions.

The DUB package I linked provides various wrapper functions and data structures to make it easier to use the `core.checkedint` intrinsics (among other things). The performance cost of the wrappers is low with proper inlining, which GDC and LDC are able to provide. (DMD is another story...)

> Obtaining the modulus for 0 cost/instructions after doing a division which is in the hardware's opcode side effects (unless the compiler recognizes the pattern and offers it as an optimization), or having the full result of a multiply on hand (that exceeds it's built-in size, long.max*long.max = 128bit result, which the hardware hands to you if you check the register it stores the other half of the result in).

I agree that intrinsics for this would be nice. I doubt that any current D platform is actually computing the full 128 bit result for every 64 bit multiply though - that would waste both power and performance, for most programs.

May 18, 2016
On Wednesday, 18 May 2016 at 19:36:59 UTC, tsbockman wrote:
> I agree that intrinsics for this would be nice. I doubt that any current D platform is actually computing the full 128 bit result for every 64 bit multiply though - that would waste both power and performance, for most programs.

 Except the 128 result is _already_ there for 0 cost (at least for x86 instructions that I'm aware). There's bound to be enough cases (say pseudo random number generation, encryption, or numerical processing above 64bits) I'd like access to it supported by the language and not having to inject instructions using the asm command.

 This is also the same with the division where the number could be a 128bit number divided by a 64bit divisor. We could get the main benefit of the 128bit cent with very few instructions if we simply guarantee one of the two arguments smaller than 65 bits (although the dividend and remainder both need to be 64 bits or smaller)
May 18, 2016
On 5/18/2016 4:48 AM, deadalnix wrote:
> Typo: arbitrary precision FP. Meaning some soft float that grows as big as
> necessary to not lose precision à la BitInt but for floats.

0.10 is not representable in a binary format regardless of precision.

May 18, 2016
On Wednesday, 18 May 2016 at 20:14:22 UTC, Walter Bright wrote:
> On 5/18/2016 4:48 AM, deadalnix wrote:
>> Typo: arbitrary precision FP. Meaning some soft float that grows as big as
>> necessary to not lose precision à la BitInt but for floats.
>
> 0.10 is not representable in a binary format regardless of precision.

You should ask the gcc guys how they do it, but you can surely represent this as a fraction, so I see no major blocker.