February 13, 2013
On 02/13/2013 04:41 PM, Joseph Rushton Wakeling wrote:
> On 02/13/2013 04:17 PM, FG wrote:
>> Good point about choosing the right type of floating point numbers.
>> Conclusion: when there's enough space, always pick double over float.
>> Tested with GDC in win64. floats: 16.0s / doubles: 14.1s / reals: 11.2s.
>> I thought to myself: cool, I almost beat the 13.4s I got with C++, until I
>> changed the C++ code to also use doubles and... got a massive speedup: 7.1s!
>
> Yea, ditto for C++: 5.3 sec with double, 9.3 with float (using g++ -O3).

Just to update on times.  I was running another large job at the same time as doing all these tests, so there was some slowdown.  Current results are:

-- with g++ -O3 and using double rather than float: about 4.3 s

-- with clang++ -O3 and using double rather than float: about 3.1 s

-- with gdmd -O -release -inline:

    D code serial with dimension 32768 ...
      using floats Total time: 17.179 [sec], Julia value: 0
      using doubles Total time: 10.298 [sec], Julia value: 0
      using reals Total time: 17.126 [sec], Julia value: 0

-- with ldmd2 -O -release -inline:

    D code serial with dimension 32768 ...
      using floats Total time: 3.548 [sec], Julia value: 0
      using doubles Total time: 2.708 [sec], Julia value: 0
      using reals Total time: 4.371 [sec], Julia value: 0

-- with dmd -O -release -inline:

    D code serial with dimension 32768 ...
      using floats Total time: 15.696 [sec], Julia value: 0
      using doubles Total time: 7.233 [sec], Julia value: 0
      using reals Total time: 28.71 [sec], Julia value: 0

You'll note that I added a writeout of the global juliaValue in order to check that certain calculations weren't being optimized away.

It's striking that in this case GDC is slower not only than LDC but also DMD. Current GDC is based off 2.060 as far as I know, whereas current LDC has upgraded to 2.061, so are there some changes between D 2.060 and 2.061 that could explain this?

It's also interesting that clang++ produces a faster executable than g++, but it's not possible to make a direct LLVM vs GCC comparison here, as g++ is GCC 4.7.2 whereas GDC is based off a GCC snapshot.

My guess would be that it's some combination of LLVM superiority in a particular case here, together with some 2.060 --> 2.061.

Are these results comparable to what other people are getting?

I can confirm that where code of mine is concerned, GDC still seems to have the edge in terms of executable speed ...
February 13, 2013
Am Wed, 13 Feb 2013 18:10:47 +0100
schrieb Joseph Rushton Wakeling <joseph.wakeling@webdrake.net>:

> Just to update on times.  I was running another large job at the same time as doing all these tests, so there was some slowdown.  Current results are:
> 
> -- with g++ -O3 and using double rather than float: about 4.3 s
> 
> -- with clang++ -O3 and using double rather than float: about 3.1 s
> 
> -- with gdmd -O -release -inline:
> 
>      D code serial with dimension 32768 ...
>        using floats Total time: 17.179 [sec], Julia value: 0
>        using doubles Total time: 10.298 [sec], Julia value: 0
>        using reals Total time: 17.126 [sec], Julia value: 0
> 
> -- with ldmd2 -O -release -inline:
> 
>      D code serial with dimension 32768 ...
>        using floats Total time: 3.548 [sec], Julia value: 0
>        using doubles Total time: 2.708 [sec], Julia value: 0
>        using reals Total time: 4.371 [sec], Julia value: 0
> 
> -- with dmd -O -release -inline:
> 
>      D code serial with dimension 32768 ...
>        using floats Total time: 15.696 [sec], Julia value: 0
>        using doubles Total time: 7.233 [sec], Julia value: 0
>        using reals Total time: 28.71 [sec], Julia value: 0
> 
> You'll note that I added a writeout of the global juliaValue in order to check that certain calculations weren't being optimized away.
> 
> It's striking that in this case GDC is slower not only than LDC but also DMD. Current GDC is based off 2.060 as far as I know, whereas current LDC has upgraded to 2.061, so are there some changes between D 2.060 and 2.061 that could explain this?

???
Anyways I upgraded to LLVM 3.2 - no change. You have an i7, I
have a Core2. It would be really interesting to know what LDC
does there. Since GDC's output seems rather CPU agnostic and
LDC's output is better in every case but also exhibits system
specific details so harshly I would never have imagined
possible. Should Intel have changed their CPU design so
radically?

> It's also interesting that clang++ produces a faster executable than g++, but it's not possible to make a direct LLVM vs GCC comparison here, as g++ is GCC 4.7.2 whereas GDC is based off a GCC snapshot.

I've compiled GDC based on the same source that the Gentoo package manager built G++ 4.7.2 from and, I get similar numbers.

> My guess would be that it's some combination of LLVM superiority in a particular case here, together with some 2.060 --> 2.061.
> 
> Are these results comparable to what other people are getting?
> 
> I can confirm that where code of mine is concerned, GDC still seems to have the edge in terms of executable speed ...

I've seen a tête à tête between LDC and GDC in some of my code.

-- 
Marco

February 13, 2013
When you are comparing LDC and GDC, you should either use -mcpu=generic for ldc or -march=native for GDC, because their default targets are different. GDC will produce code that works on most x86_64 (if you are on a x86_64 system) CPUs by default, and LDC targets the host CPU. But this does not explain the difference in timings you are seeing here.

One reason why the code generaged by GDC is slower is that squarePlusMag isn't inlined. It seems that the fact that its parameter is const is somehow preventing it from being inlined - I have no idea why. Removing const and adding -march=native to gdc flags gives me:

gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native:
  using floats Total time: 8.283 [sec]
  using doubles Total time: 6.827 [sec]
  using reals Total time: 6.795 [sec]

ldc2 -O3  -release -singleobj tmp.d -oftmp:
  using floats Total time: 3.348 [sec]
  using doubles Total time: 3.08 [sec]
  using reals Total time: 4.174 [sec]

The difference is smaller, but still pretty large.

I have noticed that there are needless conversions in this code that are slowing down both GDC generated and LDC generated code. This code is a bit faster:

module main;

import std.datetime;
import std.metastrings;
import std.stdio;
import std.typetuple;


enum DIM = 32 * 1024;

int juliaValue;

template Julia(TReal)
{
    struct ComplexStruct
    {
        TReal r;
        TReal i;

        TReal squarePlusMag(ComplexStruct another)
        {
            TReal r1 = r*r - i*i + another.r;
            TReal i1 = cast(TReal)2.0*i*r + another.i;

            r = r1;
            i = i1;

            return (r1*r1 + i1*i1);
        }
    }

    int juliaFunction( int x, int y )
    {
        auto c = ComplexStruct(0.8, 0.156);
        auto a = ComplexStruct(x, y);

        foreach (i; 0 .. 200)
            if (a.squarePlusMag(c) > cast(TReal) 1000)
                return 0;
        return 1;
    }

    void kernel()
    {
        foreach (x; 0 .. DIM) {
            foreach (y; 0 .. DIM) {
                juliaValue = juliaFunction( x, y );
            }
        }
    }
}

void main()
{
    writeln("D code serial with dimension " ~ toStringNow!DIM ~ " ...");
    StopWatch sw;
    foreach (Math; TypeTuple!(float, double, real))
    {
        sw.start();
        Julia!(Math).kernel();
        sw.stop();
        writefln("  using %ss Total time: %s [sec]",
                 Math.stringof, (sw.peek().msecs * 0.001));
        sw.reset();
    }
}

This gives me:

gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native:
  using floats Total time: 6.746 [sec]
  using doubles Total time: 6.872 [sec]
  using reals Total time: 5.226 [sec]

ldc2 -O3  -release -singleobj tmp.d -oftmp:
  using floats Total time: 2.36 [sec]
  using doubles Total time: 2.535 [sec]
  using reals Total time: 4.106 [sec]

At least part of the difference is due to the fact that juliaFunction still isn't getting inlined (but squarePlusMag is). Making juliaFunction a static method of ComplexStruct causes it to get inlined (again, I have no idea why). Moving juliaFunction inside ComplexStruct does not affect the performance of LDC generated code, but for GDC it gives me:

  using floats Total time: 4.262 [sec]
  using doubles Total time: 4.251 [sec]
  using reals Total time: 3.512 [sec]

There is still a large difference between LDC and GDC four floats and doubles and I can't explain it. But at least it is much smaller than it was initially.

I ran all the benchmarks on 64 bit linux, using core i5 2500k.
February 14, 2013
Thanks a lot for your reply.
1 2 3 4
Next ›   Last »