Jump to page: 1 2
Thread overview
[Issue 18627] std.complex is a lot slower than builtin complex types at number crunching
Mar 18, 2018
greenify
Mar 18, 2018
ponce
Mar 20, 2018
Iain Buclaw
Mar 20, 2018
ponce
Mar 20, 2018
ponce
Mar 20, 2018
ponce
Mar 20, 2018
Seb
Mar 20, 2018
ponce
Mar 22, 2018
ponce
Jul 05, 2020
Iain Buclaw
Feb 28, 2021
Dlang Bot
Mar 22, 2021
Dlang Bot
Mar 24, 2021
Iain Buclaw
Mar 24, 2021
ponce
Apr 16, 2021
Iain Buclaw
Apr 24, 2021
Iain Buclaw
March 18, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

greenify <greeenify@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |greeenify@gmail.com

--- Comment #1 from greenify <greeenify@gmail.com> ---
See also: https://github.com/dlang/dmd/pull/7640

--
March 18, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

--- Comment #2 from ponce <aliloko@gmail.com> ---
I've posted there, thanks.

--
March 20, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ibuclaw@gdcproject.org

--- Comment #3 from Iain Buclaw <ibuclaw@gdcproject.org> ---
FYI, GDC is missing, but I'll post it anyway, along with DMD as a comparative benchmark, because each machine is different and DMD may optimize weirdly for one CPU but is perfectly fine for another (see for instance issue 5100)


DMD64 D Compiler v2.076.1
---
$ dmd complex.d -O -inline -release
With cfloat: 75 ms, 688 μs, and 2 hnsecs
With cdouble: 61 ms, 546 μs, and 7 hnsecs
With Complex!float: 161 ms, 816 μs, and 8 hnsecs
With Complex!double: 109 ms, 66 μs, and 1 hnsec
---

There seems to be room for improvement in dmd or the general phobos implementation.


gdc (GCC) 8.0.1 20180226 (2.076.1 library and patches)
---
$ gdc complex.d -O2 -frelease
With cfloat: 154 ms, 871 μs, and 8 hnsecs
With cdouble: 59 ms, 205 μs, and 7 hnsecs
With Complex!float: 32 ms, 566 μs, and 5 hnsecs
With Complex!double: 34 ms, 961 μs, and 6 hnsecs
---

However with gdc, std.complex is /faster/ than native.

--
March 20, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

--- Comment #4 from ponce <aliloko@gmail.com> ---
This benchmark is a variation that does only division.

--- divide.d

import std.string;
import std.datetime;
import std.datetime.stopwatch : benchmark, StopWatch;
import std.complex;
import std.stdio;
import std.math;

void main()
{
    int[] divider = new int[1024];
    cfloat[] A = new cfloat[1024];
    cdouble[] B = new cdouble[1024];
    Complex!float[] C = new Complex!float[1024];
    Complex!double[] D = new Complex!double[1024];
    foreach(i; 0..1024)
    {
        divider[i] = (i*69060) / 10000;
        // Initialize with something
        A[i] = i + 1i;
        B[i] = i + 1i;
        C[i] = Complex!float(i, 1);
        D[i] = Complex!double(i, 1);
    }

    void justDivide(ComplexType)(ComplexType[] arr)
    {
        int size = cast(int)(arr.length);
        for (int i = 0; i < size; ++i)
        {
            arr[i] = divider[i] / arr[i];
        }
    }

    void fA()
    {
        justDivide!(cfloat)(A);
    }

    void fB()
    {
        justDivide!(cdouble)(B);
    }

    void fC()
    {
        justDivide!(Complex!float)(C);
    }

    void fD()
    {
        justDivide!(Complex!double)(D);
    }

    auto r = benchmark!(fA, fB, fC, fD)(1000000);

    {
        writefln("With cfloat: %s", r[0] );
        writefln("With cdouble: %s", r[1] );
        writefln("With Complex!float: %s", r[2] );
        writefln("With Complex!double: %s", r[3] );
    }
}

RESULTS

* With ldc 1.8.0 64-bit:

$ ldc2.exe -O3 -enable-inlining -release divide.d -m64
$ divide.exe

With cfloat: 7 secs, 623 ms, 829 ╬╝s, and 9 hnsecs
With cdouble: 7 secs, 594 ms, 449 ╬╝s, and 8 hnsecs
With Complex!float: 7 secs, 988 ms, 642 ╬╝s, and 4 hnsecs
With Complex!double: 15 secs, 501 ms, 128 ╬╝s, and 4 hnsecs


* With ldc 1.8.0 32-bit:

$ ldc2.exe -O3 -enable-inlining -release divide.d -m32
$ divide.exe

With cfloat: 7 secs, 618 ms, 202 ╬╝s, and 1 hnsec
With cdouble: 7 secs, 593 ms, 777 ╬╝s, and 2 hnsecs
With Complex!float: 7 secs, 958 ms, 692 ╬╝s, and 9 hnsecs
With Complex!double: 15 secs, 414 ms, and 344 ╬╝s


This show that even with latest LDC you can have a regression.

I appreciate that std.complex gives more precision in the divide operation, it's also something that is _different_ from builtin complex it replaces.

--
March 20, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

--- Comment #5 from ponce <aliloko@gmail.com> ---
Division with DMD 32-bit:

With cfloat: 1 minute, 18 secs, 451 ms, 932 ╬╝s, and 9 hnsecs With cdouble: 1 minute, 19 secs, 747 ms, 70 ╬╝s, and 5 hnsecs With Complex!float: 27 secs, 412 ms, 926 ╬╝s, and 5 hnsecs With Complex!double: 25 secs, 39 ms, 159 ╬╝s, and 2 hnsecs

--
March 20, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

--- Comment #6 from ponce <aliloko@gmail.com> ---
Conversely complex divide seems faster with DMD with std.complex than builtins.

--
March 20, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

Seb <greensunny12@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |greensunny12@gmail.com

--- Comment #7 from Seb <greensunny12@gmail.com> ---
> Division with DMD 32-bit:

Using DMD for any performance arguments is a bit of a moot point as DMD's optimizer is pretty bad. So this would halt almost all development as there are many many performance regressions with DMD.

--
March 20, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

--- Comment #8 from ponce <aliloko@gmail.com> ---
@Seb: It's not only about DMD, there is a 2x performance regression with Complex!double vs cdouble using LDC. There are probably more I haven't exposed yet. And yes, I use cdouble for designing IIR filters, in a real-time program.

Our main product use builtin complexes, it's downloaded 2000 times per month.

--
March 22, 2018
https://issues.dlang.org/show_bug.cgi?id=18627

--- Comment #9 from ponce <aliloko@gmail.com> ---
I think at the very least std.complex should contain a function to divide Complex without the additional precision provided by the check with the 2 fabs().

People that want speed could opt-in, and others will enjoy increased precision without noticing.

--
July 05, 2020
https://issues.dlang.org/show_bug.cgi?id=18627

--- Comment #10 from Iain Buclaw <ibuclaw@gdcproject.org> ---
(In reply to ponce from comment #4)
> This benchmark is a variation that does only division.
> 
> --- divide.d

* With gdc -O2 -frelease -m64

With cfloat: 11 secs, 204 ms, 475 μs, and 2 hnsecs
With cdouble: 13 secs, 420 ms, 497 μs, and 6 hnsecs
With Complex!float: 4 secs, 689 ms, 546 μs, and 2 hnsecs
With Complex!double: 8 secs, 903 ms, 172 μs, and 4 hnsecs

* With gdc -O2 -frelease -m32

With cfloat: 29 secs, 471 ms, 678 μs, and 9 hnsecs
With cdouble: 29 secs, 176 ms, 189 μs, and 2 hnsecs
With Complex!float: 13 secs, 379 ms, 856 μs, and 8 hnsecs
With Complex!double: 18 secs, 240 ms, 975 μs, and 5 hnsecs

Native complex floating point must die.

--
« First   ‹ Prev
1 2