August 05, 2016
On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote:
> On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote:
>> 1. Could you please provide an assembler example with clang or recent gcc?
>
> I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings.
>
> But try it yourself:
>
> float foo(float a, float b) {
>   return 3.0 * a / b;
> }

Your example is just a speculation. 3.0 force compiler to convert a and  b to double. This is obvious.
August 05, 2016
On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote:
> You are wrong that there are far fewer of those cases. This is naive point of
> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex
> requires it. Python C backend and Mir library require exact IEEE arithmetic.
> Atmosphere package requires it, Atmosphere is used as reference code for my
> publication in JMS, Springer. And the most important case: no one top scientific
> laboratory will use a language without exact IEEE arithmetic by default.

A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit.

And besides, all these libraries presumably work, or used to work, on the x87, which does not provide exact IEEE arithmetic for intermediate results without a special setting, and that setting substantially slows it down.

By netlib do you mean the Cephes functions? I've used them, and am not aware of any of them that require reduced precision.
August 05, 2016
On 8/5/2016 1:17 AM, Ilya Yaroshenko wrote:
> 2. C compilers not promote double to 80-bit reals anyway.

Java originally came out with an edict that floats will all be done in float precision, and double in double.

Sun had evidently never used an x87 before, because it soon became obvious that it was unworkable on the x87, and the spec was changed to allow intermediate values out to 80 bits.
August 05, 2016
On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote:
> On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote:
>> 1. Could you please provide an assembler example with clang or recent gcc?
>
> I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings.
>
> But try it yourself:
>
> float foo(float a, float b) {
>   return 3.0 * a / b;
> }
>
> GCC 5.3 gives me
>
> foo(float, float):
>         cvtss2sd        xmm0, xmm0
>         cvtss2sd        xmm1, xmm1
>         mulsd   xmm0, QWORD PTR .LC0[rip]
>         divsd   xmm0, xmm1
>         cvtsd2ss        xmm0, xmm0
>         ret
> .LC0:
>         .long   0
>         .long   1074266112
>

Gotta be careful with those examples. See this: https://godbolt.org/g/0yNUSG

float foo1(float a, float b) {
  return 3.42 * (a / b);
}

float foo2(float a, float b) {
  return 3.0 * (a / b);
}

float foo3(float a, float b) {
  return 3.0 * a / b;
}

float foo4(float a, float b) {
  return 3.0f * a / b;
}

foo1(float, float):
        divss   xmm0, xmm1
        cvtss2sd        xmm0, xmm0
        mulsd   xmm0, QWORD PTR .LC0[rip]
        cvtsd2ss        xmm0, xmm0
        ret
foo2(float, float):
        divss   xmm0, xmm1
        mulss   xmm0, DWORD PTR .LC2[rip]
        ret
foo3(float, float):
        cvtss2sd        xmm0, xmm0
        cvtss2sd        xmm1, xmm1
        mulsd   xmm0, QWORD PTR .LC4[rip]
        divsd   xmm0, xmm1
        cvtsd2ss        xmm0, xmm0
        ret
foo4(float, float):
        mulss   xmm0, DWORD PTR .LC2[rip]
        divss   xmm0, xmm1
        ret
.LC0:
        .long   4123168604
        .long   1074486312
.LC2:
        .long   1077936128
.LC4:
        .long   0
        .long   1074266112


It depends on the literal value, not just the type.
August 05, 2016
On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote:
> You are wrong that there are far fewer of those cases. This is naive point of
> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex
> requires it. Python C backend and Mir library require exact IEEE arithmetic.
> Atmosphere package requires it, Atmosphere is used as reference code for my
> publication in JMS, Springer. And the most important case: no one top scientific
> laboratory will use a language without exact IEEE arithmetic by default.

I'd appreciate it if you could provide links to where these requirements are. I can't find anything on Tinflex, for example.
August 05, 2016
On Friday, 5 August 2016 at 09:24:49 UTC, Walter Bright wrote:
> On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote:
>> You are wrong that there are far fewer of those cases. This is naive point of
>> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex
>> requires it. Python C backend and Mir library require exact IEEE arithmetic.
>> Atmosphere package requires it, Atmosphere is used as reference code for my
>> publication in JMS, Springer. And the most important case: no one top scientific
>> laboratory will use a language without exact IEEE arithmetic by default.
>
> A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit.

No. For example std.math.log requires it! But you don't care about other compilers which not use yl2x and about making it template (real version slows down code for double and float).

> And besides, all these libraries presumably work, or used to work, on the x87, which does not provide exact IEEE arithmetic for intermediate results without a special setting, and that setting substantially slows it down.

x87 FPU is deprecated. We have more significant performance issues with std.math. For example, it is x87 oriented, is slows down the code for double and float. Many functions are not inlined. This 2 are real performance problems.

> By netlib do you mean the Cephes functions? I've used them, and am not aware of any of them that require reduced precision.

Yes and many of its functions requires IEEE. For example log2.c for doubles:
	z = x - 0.5;
	z -= 0.5;
	y = 0.5 * x  + 0.5;
This code requires IEEE. The same code appears in our std.math :P

August 05, 2016
On Friday, 5 August 2016 at 09:40:23 UTC, Walter Bright wrote:
> On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote:
>> You are wrong that there are far fewer of those cases. This is naive point of
>> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex
>> requires it. Python C backend and Mir library require exact IEEE arithmetic.
>> Atmosphere package requires it, Atmosphere is used as reference code for my
>> publication in JMS, Springer. And the most important case: no one top scientific
>> laboratory will use a language without exact IEEE arithmetic by default.
>
> I'd appreciate it if you could provide links to where these requirements are. I can't find anything on Tinflex, for example.

1. https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz
mathmodule.c, math_fsum has comment:
   Depends on IEEE 754 arithmetic guarantees and half-even rounding.
The same algorithm also available in Mir. And it does not work with 32 bit DMD.

2. sum_kbn in https://github.com/JuliaLang/julia/blob/master/base/reduce.jl
requires ieee arithmetic. The same algorithm also available in Mir.

3. http://www.netlib.org/cephes/
See log2.c for example:
	z = x - 0.5;
	z -= 0.5;
	y = 0.5 * x  + 0.5;
This code requires IEEE. And you can found it in Phobos std.math

4. Mir has 5 types of summation, and 3 of them requires IEEE.

5. Tinflex requires IEEE arithmetic because extended precision may force algorithm to choose wrong computation branch. The most significant part of original code was written in R, and the scientists who create this algorithm did not care about non IEEE compilers at all.

6. Atmosphere requires IEEE for may functions such as https://github.com/9il/atmosphere/blob/master/source/atmosphere/math.d#L616
Without proper IEEE rounding the are not guarantee that this functions will stop.

7. The same true for real world implementations of algorithms presented in Numeric Recipes, which uses various series expansion such as for Modified Bessel Function.


August 05, 2016
Here is a relevant example:

https://hal.inria.fr/inria-00171497v1/document

It is used in at least one real world geometric modeling system.
August 05, 2016
On Friday, 5 August 2016 at 09:40:59 UTC, Ilya Yaroshenko wrote:
> On Friday, 5 August 2016 at 09:24:49 UTC, Walter Bright wrote:
>> On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote:
>>> You are wrong that there are far fewer of those cases. This is naive point of
>>> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex
>>> requires it. Python C backend and Mir library require exact IEEE arithmetic.
>>> Atmosphere package requires it, Atmosphere is used as reference code for my
>>> publication in JMS, Springer. And the most important case: no one top scientific
>>> laboratory will use a language without exact IEEE arithmetic by default.
>>
>> A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit.
>
> No. For example std.math.log requires it! But you don't care about other compilers which not use yl2x and about making it template (real version slows down code for double and float).

Yep.

1) There are some function (exp, pow, log, round, sqrt) for which using llvm_intrinsincs significantly increases your performance.

It's a simple benchmark and might be flawed, but I hope it shows the point.

Code is here: https://gist.github.com/wilzbach/2b64e10dec66a3153c51fbd1e6848f72

> ldmd -inline -release -O3 -boundscheck=off test.d

fun: pow
std.math.pow   = 15 secs, 914 ms, 102 μs, and 8 hnsecs
core.stdc.pow  = 11 secs, 590 ms, 702 μs, and 5 hnsecs
llvm_pow       = 13 secs, 570 ms, 439 μs, and 7 hnsecs
fun: exp
std.math.exp   = 6 secs, 85 ms, 741 μs, and 7 hnsecs
core.stdc.exp  = 16 secs, 267 ms, 997 μs, and 4 hnsecs
llvm_exp       = 2 secs, 22 ms, and 876 μs
fun: exp2
std.math.exp2  = 3 secs, 117 ms, 624 μs, and 2 hnsecs
core.stdc.exp2 = 2 secs, 973 ms, and 243 μs
llvm_exp2      = 2 secs, 451 ms, 628 μs, and 9 hnsecs
fun: sin
std.math.sin   = 1 sec, 805 ms, 626 μs, and 7 hnsecs
core.stdc.sin  = 17 secs, 743 ms, 33 μs, and 5 hnsecs
llvm_sin       = 2 secs, 95 ms, and 178 μs
fun: cos
std.math.cos   = 2 secs, 820 ms, 684 μs, and 5 hnsecs
core.stdc.cos  = 17 secs, 626 ms, 78 μs, and 1 hnsec
llvm_cos       = 2 secs, 814 ms, 60 μs, and 5 hnsecs
fun: log
std.math.log   = 5 secs, 584 ms, 344 μs, and 5 hnsecs
core.stdc.log  = 16 secs, 443 ms, 893 μs, and 3 hnsecs
llvm_log       = 2 secs, 13 ms, 291 μs, and 1 hnsec
fun: log2
std.math.log2  = 5 secs, 583 ms, 777 μs, and 7 hnsecs
core.stdc.log2 = 2 secs, 800 ms, 848 μs, and 5 hnsecs
llvm_log2      = 2 secs, 165 ms, 849 μs, and 6 hnsecs
fun: sqrt
std.math.sqrt  = 799 ms and 917 μs
core.stdc.sqrt = 864 ms, 834 μs, and 7 hnsecs
llvm_sqrt      = 439 ms, 469 μs, and 2 hnsecs
fun: ceil
std.math.ceil  = 540 ms and 167 μs
core.stdc.ceil = 971 ms, 533 μs, and 6 hnsecs
llvm_ceil      = 562 ms, 490 μs, and 2 hnsecs
fun: round
std.math.round = 3 secs, 52 ms, 567 μs, and 3 hnsecs
core.stdc.round = 958 ms and 217 μs
llvm_round     = 590 ms, 742 μs, and 7 hnsecs


2) As mentioned before they can yield _different_ results

https://dpaste.dzfl.pl/c0ab5131b49d
August 05, 2016
On Friday, 5 August 2016 at 09:21:53 UTC, Ilya Yaroshenko wrote:
> On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote:
>> On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote:
>>> 1. Could you please provide an assembler example with clang or recent gcc?
>>
>> I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings.
>>
>> But try it yourself:
>>
>> float foo(float a, float b) {
>>   return 3.0 * a / b;
>> }
>
> Your example is just a speculation. 3.0 force compiler to convert a and  b to double. This is obvious.

Ha you are right.

Testing more it seems that gcc and clang are not promoting on 64 bit code, but still are on 32 bits.