Thread overview | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 18, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 greenify <greeenify@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |greeenify@gmail.com --- Comment #1 from greenify <greeenify@gmail.com> --- See also: https://github.com/dlang/dmd/pull/7640 -- |
March 18, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 --- Comment #2 from ponce <aliloko@gmail.com> --- I've posted there, thanks. -- |
March 20, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 Iain Buclaw <ibuclaw@gdcproject.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ibuclaw@gdcproject.org --- Comment #3 from Iain Buclaw <ibuclaw@gdcproject.org> --- FYI, GDC is missing, but I'll post it anyway, along with DMD as a comparative benchmark, because each machine is different and DMD may optimize weirdly for one CPU but is perfectly fine for another (see for instance issue 5100) DMD64 D Compiler v2.076.1 --- $ dmd complex.d -O -inline -release With cfloat: 75 ms, 688 μs, and 2 hnsecs With cdouble: 61 ms, 546 μs, and 7 hnsecs With Complex!float: 161 ms, 816 μs, and 8 hnsecs With Complex!double: 109 ms, 66 μs, and 1 hnsec --- There seems to be room for improvement in dmd or the general phobos implementation. gdc (GCC) 8.0.1 20180226 (2.076.1 library and patches) --- $ gdc complex.d -O2 -frelease With cfloat: 154 ms, 871 μs, and 8 hnsecs With cdouble: 59 ms, 205 μs, and 7 hnsecs With Complex!float: 32 ms, 566 μs, and 5 hnsecs With Complex!double: 34 ms, 961 μs, and 6 hnsecs --- However with gdc, std.complex is /faster/ than native. -- |
March 20, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 --- Comment #4 from ponce <aliloko@gmail.com> --- This benchmark is a variation that does only division. --- divide.d import std.string; import std.datetime; import std.datetime.stopwatch : benchmark, StopWatch; import std.complex; import std.stdio; import std.math; void main() { int[] divider = new int[1024]; cfloat[] A = new cfloat[1024]; cdouble[] B = new cdouble[1024]; Complex!float[] C = new Complex!float[1024]; Complex!double[] D = new Complex!double[1024]; foreach(i; 0..1024) { divider[i] = (i*69060) / 10000; // Initialize with something A[i] = i + 1i; B[i] = i + 1i; C[i] = Complex!float(i, 1); D[i] = Complex!double(i, 1); } void justDivide(ComplexType)(ComplexType[] arr) { int size = cast(int)(arr.length); for (int i = 0; i < size; ++i) { arr[i] = divider[i] / arr[i]; } } void fA() { justDivide!(cfloat)(A); } void fB() { justDivide!(cdouble)(B); } void fC() { justDivide!(Complex!float)(C); } void fD() { justDivide!(Complex!double)(D); } auto r = benchmark!(fA, fB, fC, fD)(1000000); { writefln("With cfloat: %s", r[0] ); writefln("With cdouble: %s", r[1] ); writefln("With Complex!float: %s", r[2] ); writefln("With Complex!double: %s", r[3] ); } } RESULTS * With ldc 1.8.0 64-bit: $ ldc2.exe -O3 -enable-inlining -release divide.d -m64 $ divide.exe With cfloat: 7 secs, 623 ms, 829 ╬╝s, and 9 hnsecs With cdouble: 7 secs, 594 ms, 449 ╬╝s, and 8 hnsecs With Complex!float: 7 secs, 988 ms, 642 ╬╝s, and 4 hnsecs With Complex!double: 15 secs, 501 ms, 128 ╬╝s, and 4 hnsecs * With ldc 1.8.0 32-bit: $ ldc2.exe -O3 -enable-inlining -release divide.d -m32 $ divide.exe With cfloat: 7 secs, 618 ms, 202 ╬╝s, and 1 hnsec With cdouble: 7 secs, 593 ms, 777 ╬╝s, and 2 hnsecs With Complex!float: 7 secs, 958 ms, 692 ╬╝s, and 9 hnsecs With Complex!double: 15 secs, 414 ms, and 344 ╬╝s This show that even with latest LDC you can have a regression. I appreciate that std.complex gives more precision in the divide operation, it's also something that is _different_ from builtin complex it replaces. -- |
March 20, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 --- Comment #5 from ponce <aliloko@gmail.com> --- Division with DMD 32-bit: With cfloat: 1 minute, 18 secs, 451 ms, 932 ╬╝s, and 9 hnsecs With cdouble: 1 minute, 19 secs, 747 ms, 70 ╬╝s, and 5 hnsecs With Complex!float: 27 secs, 412 ms, 926 ╬╝s, and 5 hnsecs With Complex!double: 25 secs, 39 ms, 159 ╬╝s, and 2 hnsecs -- |
March 20, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 --- Comment #6 from ponce <aliloko@gmail.com> --- Conversely complex divide seems faster with DMD with std.complex than builtins. -- |
March 20, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 Seb <greensunny12@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |greensunny12@gmail.com --- Comment #7 from Seb <greensunny12@gmail.com> --- > Division with DMD 32-bit: Using DMD for any performance arguments is a bit of a moot point as DMD's optimizer is pretty bad. So this would halt almost all development as there are many many performance regressions with DMD. -- |
March 20, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 --- Comment #8 from ponce <aliloko@gmail.com> --- @Seb: It's not only about DMD, there is a 2x performance regression with Complex!double vs cdouble using LDC. There are probably more I haven't exposed yet. And yes, I use cdouble for designing IIR filters, in a real-time program. Our main product use builtin complexes, it's downloaded 2000 times per month. -- |
March 22, 2018 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 --- Comment #9 from ponce <aliloko@gmail.com> --- I think at the very least std.complex should contain a function to divide Complex without the additional precision provided by the check with the 2 fabs(). People that want speed could opt-in, and others will enjoy increased precision without noticing. -- |
July 05, 2020 [Issue 18627] std.complex is a lot slower than builtin complex types at number crunching | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=18627 --- Comment #10 from Iain Buclaw <ibuclaw@gdcproject.org> --- (In reply to ponce from comment #4) > This benchmark is a variation that does only division. > > --- divide.d * With gdc -O2 -frelease -m64 With cfloat: 11 secs, 204 ms, 475 μs, and 2 hnsecs With cdouble: 13 secs, 420 ms, 497 μs, and 6 hnsecs With Complex!float: 4 secs, 689 ms, 546 μs, and 2 hnsecs With Complex!double: 8 secs, 903 ms, 172 μs, and 4 hnsecs * With gdc -O2 -frelease -m32 With cfloat: 29 secs, 471 ms, 678 μs, and 9 hnsecs With cdouble: 29 secs, 176 ms, 189 μs, and 2 hnsecs With Complex!float: 13 secs, 379 ms, 856 μs, and 8 hnsecs With Complex!double: 18 secs, 240 ms, 975 μs, and 5 hnsecs Native complex floating point must die. -- |
Copyright © 1999-2021 by the D Language Foundation