August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote:
> On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote:
>> 1. Could you please provide an assembler example with clang or recent gcc?
>
> I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings.
>
> But try it yourself:
>
> float foo(float a, float b) {
> return 3.0 * a / b;
> }
Your example is just a speculation. 3.0 force compiler to convert a and b to double. This is obvious.
|
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote:
> You are wrong that there are far fewer of those cases. This is naive point of
> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex
> requires it. Python C backend and Mir library require exact IEEE arithmetic.
> Atmosphere package requires it, Atmosphere is used as reference code for my
> publication in JMS, Springer. And the most important case: no one top scientific
> laboratory will use a language without exact IEEE arithmetic by default.
A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit.
And besides, all these libraries presumably work, or used to work, on the x87, which does not provide exact IEEE arithmetic for intermediate results without a special setting, and that setting substantially slows it down.
By netlib do you mean the Cephes functions? I've used them, and am not aware of any of them that require reduced precision.
|
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On 8/5/2016 1:17 AM, Ilya Yaroshenko wrote:
> 2. C compilers not promote double to 80-bit reals anyway.
Java originally came out with an edict that floats will all be done in float precision, and double in double.
Sun had evidently never used an x87 before, because it soon became obvious that it was unworkable on the x87, and the spec was changed to allow intermediate values out to 80 bits.
|
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote: > On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote: >> 1. Could you please provide an assembler example with clang or recent gcc? > > I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings. > > But try it yourself: > > float foo(float a, float b) { > return 3.0 * a / b; > } > > GCC 5.3 gives me > > foo(float, float): > cvtss2sd xmm0, xmm0 > cvtss2sd xmm1, xmm1 > mulsd xmm0, QWORD PTR .LC0[rip] > divsd xmm0, xmm1 > cvtsd2ss xmm0, xmm0 > ret > .LC0: > .long 0 > .long 1074266112 > Gotta be careful with those examples. See this: https://godbolt.org/g/0yNUSG float foo1(float a, float b) { return 3.42 * (a / b); } float foo2(float a, float b) { return 3.0 * (a / b); } float foo3(float a, float b) { return 3.0 * a / b; } float foo4(float a, float b) { return 3.0f * a / b; } foo1(float, float): divss xmm0, xmm1 cvtss2sd xmm0, xmm0 mulsd xmm0, QWORD PTR .LC0[rip] cvtsd2ss xmm0, xmm0 ret foo2(float, float): divss xmm0, xmm1 mulss xmm0, DWORD PTR .LC2[rip] ret foo3(float, float): cvtss2sd xmm0, xmm0 cvtss2sd xmm1, xmm1 mulsd xmm0, QWORD PTR .LC4[rip] divsd xmm0, xmm1 cvtsd2ss xmm0, xmm0 ret foo4(float, float): mulss xmm0, DWORD PTR .LC2[rip] divss xmm0, xmm1 ret .LC0: .long 4123168604 .long 1074486312 .LC2: .long 1077936128 .LC4: .long 0 .long 1074266112 It depends on the literal value, not just the type. |
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote:
> You are wrong that there are far fewer of those cases. This is naive point of
> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex
> requires it. Python C backend and Mir library require exact IEEE arithmetic.
> Atmosphere package requires it, Atmosphere is used as reference code for my
> publication in JMS, Springer. And the most important case: no one top scientific
> laboratory will use a language without exact IEEE arithmetic by default.
I'd appreciate it if you could provide links to where these requirements are. I can't find anything on Tinflex, for example.
|
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 5 August 2016 at 09:24:49 UTC, Walter Bright wrote: > On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: >> You are wrong that there are far fewer of those cases. This is naive point of >> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex >> requires it. Python C backend and Mir library require exact IEEE arithmetic. >> Atmosphere package requires it, Atmosphere is used as reference code for my >> publication in JMS, Springer. And the most important case: no one top scientific >> laboratory will use a language without exact IEEE arithmetic by default. > > A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit. No. For example std.math.log requires it! But you don't care about other compilers which not use yl2x and about making it template (real version slows down code for double and float). > And besides, all these libraries presumably work, or used to work, on the x87, which does not provide exact IEEE arithmetic for intermediate results without a special setting, and that setting substantially slows it down. x87 FPU is deprecated. We have more significant performance issues with std.math. For example, it is x87 oriented, is slows down the code for double and float. Many functions are not inlined. This 2 are real performance problems. > By netlib do you mean the Cephes functions? I've used them, and am not aware of any of them that require reduced precision. Yes and many of its functions requires IEEE. For example log2.c for doubles: z = x - 0.5; z -= 0.5; y = 0.5 * x + 0.5; This code requires IEEE. The same code appears in our std.math :P |
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 5 August 2016 at 09:40:23 UTC, Walter Bright wrote: > On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: >> You are wrong that there are far fewer of those cases. This is naive point of >> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex >> requires it. Python C backend and Mir library require exact IEEE arithmetic. >> Atmosphere package requires it, Atmosphere is used as reference code for my >> publication in JMS, Springer. And the most important case: no one top scientific >> laboratory will use a language without exact IEEE arithmetic by default. > > I'd appreciate it if you could provide links to where these requirements are. I can't find anything on Tinflex, for example. 1. https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz mathmodule.c, math_fsum has comment: Depends on IEEE 754 arithmetic guarantees and half-even rounding. The same algorithm also available in Mir. And it does not work with 32 bit DMD. 2. sum_kbn in https://github.com/JuliaLang/julia/blob/master/base/reduce.jl requires ieee arithmetic. The same algorithm also available in Mir. 3. http://www.netlib.org/cephes/ See log2.c for example: z = x - 0.5; z -= 0.5; y = 0.5 * x + 0.5; This code requires IEEE. And you can found it in Phobos std.math 4. Mir has 5 types of summation, and 3 of them requires IEEE. 5. Tinflex requires IEEE arithmetic because extended precision may force algorithm to choose wrong computation branch. The most significant part of original code was written in R, and the scientists who create this algorithm did not care about non IEEE compilers at all. 6. Atmosphere requires IEEE for may functions such as https://github.com/9il/atmosphere/blob/master/source/atmosphere/math.d#L616 Without proper IEEE rounding the are not guarantee that this functions will stop. 7. The same true for real world implementations of algorithms presented in Numeric Recipes, which uses various series expansion such as for Modified Bessel Function. |
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Here is a relevant example: https://hal.inria.fr/inria-00171497v1/document It is used in at least one real world geometric modeling system. |
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On Friday, 5 August 2016 at 09:40:59 UTC, Ilya Yaroshenko wrote: > On Friday, 5 August 2016 at 09:24:49 UTC, Walter Bright wrote: >> On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: >>> You are wrong that there are far fewer of those cases. This is naive point of >>> view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex >>> requires it. Python C backend and Mir library require exact IEEE arithmetic. >>> Atmosphere package requires it, Atmosphere is used as reference code for my >>> publication in JMS, Springer. And the most important case: no one top scientific >>> laboratory will use a language without exact IEEE arithmetic by default. >> >> A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit. > > No. For example std.math.log requires it! But you don't care about other compilers which not use yl2x and about making it template (real version slows down code for double and float). Yep. 1) There are some function (exp, pow, log, round, sqrt) for which using llvm_intrinsincs significantly increases your performance. It's a simple benchmark and might be flawed, but I hope it shows the point. Code is here: https://gist.github.com/wilzbach/2b64e10dec66a3153c51fbd1e6848f72 > ldmd -inline -release -O3 -boundscheck=off test.d fun: pow std.math.pow = 15 secs, 914 ms, 102 μs, and 8 hnsecs core.stdc.pow = 11 secs, 590 ms, 702 μs, and 5 hnsecs llvm_pow = 13 secs, 570 ms, 439 μs, and 7 hnsecs fun: exp std.math.exp = 6 secs, 85 ms, 741 μs, and 7 hnsecs core.stdc.exp = 16 secs, 267 ms, 997 μs, and 4 hnsecs llvm_exp = 2 secs, 22 ms, and 876 μs fun: exp2 std.math.exp2 = 3 secs, 117 ms, 624 μs, and 2 hnsecs core.stdc.exp2 = 2 secs, 973 ms, and 243 μs llvm_exp2 = 2 secs, 451 ms, 628 μs, and 9 hnsecs fun: sin std.math.sin = 1 sec, 805 ms, 626 μs, and 7 hnsecs core.stdc.sin = 17 secs, 743 ms, 33 μs, and 5 hnsecs llvm_sin = 2 secs, 95 ms, and 178 μs fun: cos std.math.cos = 2 secs, 820 ms, 684 μs, and 5 hnsecs core.stdc.cos = 17 secs, 626 ms, 78 μs, and 1 hnsec llvm_cos = 2 secs, 814 ms, 60 μs, and 5 hnsecs fun: log std.math.log = 5 secs, 584 ms, 344 μs, and 5 hnsecs core.stdc.log = 16 secs, 443 ms, 893 μs, and 3 hnsecs llvm_log = 2 secs, 13 ms, 291 μs, and 1 hnsec fun: log2 std.math.log2 = 5 secs, 583 ms, 777 μs, and 7 hnsecs core.stdc.log2 = 2 secs, 800 ms, 848 μs, and 5 hnsecs llvm_log2 = 2 secs, 165 ms, 849 μs, and 6 hnsecs fun: sqrt std.math.sqrt = 799 ms and 917 μs core.stdc.sqrt = 864 ms, 834 μs, and 7 hnsecs llvm_sqrt = 439 ms, 469 μs, and 2 hnsecs fun: ceil std.math.ceil = 540 ms and 167 μs core.stdc.ceil = 971 ms, 533 μs, and 6 hnsecs llvm_ceil = 562 ms, 490 μs, and 2 hnsecs fun: round std.math.round = 3 secs, 52 ms, 567 μs, and 3 hnsecs core.stdc.round = 958 ms and 217 μs llvm_round = 590 ms, 742 μs, and 7 hnsecs 2) As mentioned before they can yield _different_ results https://dpaste.dzfl.pl/c0ab5131b49d |
August 05, 2016 Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On Friday, 5 August 2016 at 09:21:53 UTC, Ilya Yaroshenko wrote:
> On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote:
>> On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote:
>>> 1. Could you please provide an assembler example with clang or recent gcc?
>>
>> I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings.
>>
>> But try it yourself:
>>
>> float foo(float a, float b) {
>> return 3.0 * a / b;
>> }
>
> Your example is just a speculation. 3.0 force compiler to convert a and b to double. This is obvious.
Ha you are right.
Testing more it seems that gcc and clang are not promoting on 64 bit code, but still are on 32 bits.
|
Copyright © 1999-2021 by the D Language Foundation