August 25, 2015
On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins wrote:
> I'm posting this here for visibility. This was silently corrupting our data, and might be doing the same for others as well.
>
> import std.stdio;
> void main() {
>   double x = 1.2;
>   writeln(cast(ulong)(x * 10.0));
>   double y = 1.2 * 10.0;
>   writeln(cast(ulong)y);
> }
>
> Output:
> 11
> 12
>

Internally the first case calculates x * 10.0 in real precision and casts it to ulong in truncating mode directly. As 1.2 is not representable, x is really 1.199999999999999956 and the result is trunc(11.99999999999999956) = 11.

In the second case x * 10.0 is calculated in real precision, but first converted to double in round-to-nearest mode and then the result is truncated.


August 25, 2015
On 8/25/15 9:51 AM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" <marcioapm@gmail.com>\"" wrote:
> On Tuesday, 25 August 2015 at 11:14:35 UTC, Steven Schveighoffer wrote:
>> On 8/24/15 5:34 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\"
>> <marcioapm@gmail.com>\"" wrote:
>>> On Monday, 24 August 2015 at 21:03:50 UTC, Steven Schveighoffer wrote:
>>
>>>> I understand the inconsistency, and I agree it is an issue that should
>>>> be examined. But the issue is entirely avoidable by not using
>>>> incorrect methods to convert from floating point to integer after
>>>> floating point operations introduce some small level of error.
>>>>
>>>> Perhaps there is some way to make it properly round in this case, but
>>>> I guarantee it will not fix all floating point errors.
>>>>
>>>
>>> What is the correct way to truncate, not round, a floating-point value
>>> to an integer?
>>
>> auto result = cast(ulong)(x * 10.0 + x.epsilon);
>>
>
> import std.stdio;
> void main() {
>      double x = 1.2;
>      writeln(cast(ulong)(x * 10.0 + x.epsilon));

Sorry, I misunderstood what epsilon was (I think it's the smallest incremental value for a given floating point type with an exponent of 1). Because your number is further away than this value, it doesn't help.

You need to add something to correct for the error that might exist. The best thing to do is to add a very small number, as that will only adjust truly close numbers.

In this case, the number you could add is 0.1, since it's not going to affect anything other than a slightly-off value. It depends on where you expect the error to be.

As bachmeier says, it's not something that's easy to get right.

>
>      double y = x * 10.0;
>      writeln(cast(ulong)(y + x.epsilon));
>
>      double z = x * 10.0 + x.epsilon;
>      writeln(cast(ulong)(z));

these work because you have converted to double, which appears to round up.

-Steve
August 25, 2015
On Tuesday, 25 August 2015 at 14:54:41 UTC, Steven Schveighoffer wrote:
> On 8/25/15 9:51 AM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" <marcioapm@gmail.com>\"" wrote:
>>      [...]
>
> Sorry, I misunderstood what epsilon was (I think it's the smallest incremental value for a given floating point type with an exponent of 1). Because your number is further away than this value, it doesn't help.
>
> You need to add something to correct for the error that might exist. The best thing to do is to add a very small number, as that will only adjust truly close numbers.
>
> In this case, the number you could add is 0.1, since it's not going to affect anything other than a slightly-off value. It depends on where you expect the error to be.
>
> As bachmeier says, it's not something that's easy to get right.
>
>>      [...]
>
> these work because you have converted to double, which appears to round up.
>
> -Steve

I didn't convert to double. My computations are all in double to start with, as you can see for my explicit types everywhere.

If you compile it with *GDC* it works fine. If you compile a port with clang, gcc or msvc, it works right as well. I suspect it will also work fine with LDC.
August 25, 2015
On Tuesday, 25 August 2015 at 14:54:41 UTC, Steven Schveighoffer wrote:
> As bachmeier says, it's not something that's easy to get right.

Are you sure you follow IEEE 754 recommendations? Floating point arithmetics should be reproducible according to the chosen rounding mode.

https://en.wikipedia.org/wiki/IEEE_floating_point#Reproducibility

«The reproducibility clause recommends that language standards should provide a means to write reproducible programs (i.e., programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.»

August 25, 2015
On 8/25/15 11:56 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Tuesday, 25 August 2015 at 14:54:41 UTC, Steven Schveighoffer wrote:
>> As bachmeier says, it's not something that's easy to get right.
>
> Are you sure you follow IEEE 754 recommendations? Floating point
> arithmetics should be reproducible according to the chosen rounding mode.
>
> https://en.wikipedia.org/wiki/IEEE_floating_point#Reproducibility
>
> «The reproducibility clause recommends that language standards should
> provide a means to write reproducible programs (i.e., programs that will
> produce the same result in all implementations of a language), and
> describes what needs to be done to achieve reproducible results.»
>

I'm not an expert on floating point, but I have written code that uses it, and I have gotten it very wrong because I didn't take into account the floating point error (worst was causing an infinite loop in a corner case).

I'll note that D does exactly what C does in the case where you are using 80-bit floating point numbers.

There is definitely an issue with the fact that storing it as a double causes a change in the behavior, and that D doesn't treat expressions that are typed as doubles, as doubles.

I see an issue with this:

double x = 1.2;

auto y = x * 10.0; // typed as double
writefln("%s %s", cast(ulong)y, cast(ulong)(x * 10.0)); // 12 11

IMO, these two operations should be the same. If the result of an expression is detected to be double, then it should behave like one. You can't have the calculation done in 80-bit mode, and then magically throw away the rounding to get to 64-bit mode.

I think Marcio has a point that this is both surprising and troublesome. But I think this is an anecdotal instance of a toy example. I'd expect real code to use adjustments when truncating to avoid the FP error (this obviously isn't his real code).

-Steve
August 25, 2015
On Tuesday, 25 August 2015 at 17:40:06 UTC, Steven Schveighoffer wrote:
> I'll note that D does exactly what C does in the case where you are using 80-bit floating point numbers.

I don't think C specifies how it should be done, but some compilers have a "precise" compilation flag that is supposed to retain order and accurate intermediate rounding.

> IMO, these two operations should be the same. If the result of an expression is detected to be double, then it should behave like one. You can't have the calculation done in 80-bit mode, and then magically throw away the rounding to get to 64-bit mode.

Yes, that is rather obvious. IEEE754-2008 go much further than that, though. It requires that all arithmetic have correct rounding. Yes, I am aware that the D specification allows higher precision, but it seems to me that this neither gets you predictable results or maximum performance. And what is the point of being able to set the rounding mode if you don't know the bit width used?

It is a practical issue in all simulations where you want reproducible results. If D is meant for scientific computing it should support correct rounding and reproducible results. If D is meant for gaming it should provide ways of expressing minimum precision or other ways of loosening the accuracy where needed.  I'm not really sure which group the current semantics appeals to. I personally either want reproducible or very fast...

August 25, 2015
On Tuesday, 25 August 2015 at 18:15:03 UTC, Ola Fosheim Grøstad wrote:

> It is a practical issue in all simulations where you want reproducible results. If D is meant for scientific computing it should support correct rounding and reproducible results. If D is meant for gaming it should provide ways of expressing minimum precision or other ways of loosening the accuracy where needed.  I'm not really sure which group the current semantics appeals to. I personally either want reproducible or very fast...

As long as it doesn't change from one release of the compiler to the next, we have reproducibility. In many cases though, reproducibility doesn't mean exact reproducibility, at least in the old days it didn't, due to floating point issues. You generally want to allow for replication of the results using other languages, so you have to allow for some differences.

I'm pretty sure Walter has stated the reason that you cannot count on exact precision, but I don't remember what it is.
August 25, 2015
On Tuesday, 25 August 2015 at 15:19:41 UTC, Márcio Martins wrote:
>
> If you compile it with *GDC* it works fine. If you compile a port with clang, gcc or msvc, it works right as well. I suspect it will also work fine with LDC.

The same program "fails" in gcc too, if you use x87 math. Usually C compilers allow excess precision for intermediate results, because the extra precision seldom hurts and changing precision on x87 is very expensive (depends on the CPU, but it is more expensive than the trigonometric functions on some models).

August 25, 2015
On Tuesday, 25 August 2015 at 20:00:11 UTC, bachmeier wrote:
> On Tuesday, 25 August 2015 at 18:15:03 UTC, Ola Fosheim Grøstad wrote:
>
> I'm pretty sure Walter has stated the reason that you cannot count on exact precision, but I don't remember what it is.

Probably because DMD is spewing out x87 code. The x87 FPU converts everything to its internal working bit depth before it does the math op. You can set it to work at different bit depths but IIRC it's a fairly expensive operation to change the FPU flags. You really dont want to be doing it every time some mixes a double and a float.

The compilers that dont exhibit this problem might set the x87 to work at 64 bit at startup or more likely they are using scalar SSE. You cant mix different depth operands in SSE. You cant multiply a float by double for example, you have to convert one of them so they have the same type. So in SSE the bit depth of every op is always explicit.
August 25, 2015
On Tuesday, 25 August 2015 at 21:30:03 UTC, Warwick wrote:

> The compilers that dont exhibit this problem might set the x87 to work at 64 bit at startup or more likely they are using scalar SSE. You cant mix different depth operands in SSE. You cant multiply a float by double for example, you have to convert one of them so they have the same type. So in SSE the bit depth of every op is always explicit.

True word:

This is msvc compiler generated code (default configuration, debug):

    double x = 1.2;
012F174E  movsd       xmm0,mmword ptr ds:[12F6B30h]
012F1756  movsd       mmword ptr [x],xmm0
    unsigned long long u = (unsigned long long)(x * 10);
012F175B  movsd       xmm0,mmword ptr [x]
012F1760  mulsd       xmm0,mmword ptr ds:[12F6B40h]
012F1768  call        __dtoul3 (012F102Dh)
012F176D  mov         dword ptr [u],eax
012F1770  mov         dword ptr [ebp-18h],edx