Jump to page: 1 2
Thread overview
A significant performance difference
Aug 31, 2014
bearophile
Sep 01, 2014
Daniel Kozak
Sep 01, 2014
ketmar
Sep 01, 2014
ketmar
Sep 01, 2014
bearophile
Sep 01, 2014
ketmar
Sep 01, 2014
bearophile
Sep 01, 2014
ketmar
Sep 01, 2014
ketmar
Sep 01, 2014
Daniel Kozak
Sep 01, 2014
ketmar
Sep 01, 2014
ketmar
Sep 01, 2014
bearophile
Sep 01, 2014
ketmar
Sep 01, 2014
ketmar
Sep 01, 2014
ketmar
Sep 01, 2014
safety0ff
Oct 01, 2014
Martin Nowak
August 31, 2014
This is C++ code that solves one Euler problem:

--------------
#include <stdio.h>
#include <map>

const unsigned int H = 9, W = 12;

const int g[6][3] = {{7, 0, H - 3},
                     {1 + (1 << H) + (1 << (2 * H)), 0, H - 1},
                     {3 + (1 << H), 0, H - 2},
                     {3 + (2 << H), 0, H - 2},
                     {1 + (1 << H) + (2 << H), 0, H - 2},
                     {1 + (1 << H) + (1 << (H - 1)), 1, H - 1}};

int main() {
    unsigned long long p, i, k;
    unsigned int j, l;
    std::map<unsigned int, unsigned long long> x, y;
    x[0] = 1;

    for (i = 0; i < W; ++i) {
        y.clear();
        while (!x.empty()) {
            j = x.begin()->first;
            p = x.begin()->second;
            x.erase(x.begin());

            for (k = 0; k < H; ++k)
                if ((j & (1 << k)) == 0)
                    break;

            if (k == H)
                y[j >> H] += p;
            else
                for (l = 0; l < 6; ++l)
                    if (k >= g[l][1] && k <= g[l][2])
                        if ((j & (g[l][0] << k)) == 0)
                            x[j + (g[l][0] << k)] += p;
        }
        x = y;
    }

    printf("%lld\n", y[0]);
    return 0;
}
--------------


I have translated it to D like this (I know in D there are nicer ways to write it, but I have tried to keep the look of the code as much similar as possible to the C++ code):


--------------
import core.stdc.stdio;

const uint H = 9, W = 12;

const uint[3][6] g = [[7, 0, H - 3],
                      [1 + (1 << H) + (1 << (2 * H)), 0, H - 1],
                      [3 + (1 << H), 0, H - 2],
                      [3 + (2 << H), 0, H - 2],
                      [1 + (1 << H) + (2 << H), 0, H - 2],
                      [1 + (1 << H) + (1 << (H - 1)), 1, H - 1]];

int main() {
    ulong p, i, k;
    uint j, l;
    ulong[uint] x, y;
    x[0] = 1;

    for (i = 0; i < W; ++i) {
        y = null;
        while (x.length) {
            j = x.byKey.front;
            p = x.byValue.front;
            x.remove(cast(int)j);

            for (k = 0; k < H; ++k)
                if ((j & (1 << k)) == 0)
                    break;

            if (k == H)
                y[j >> H] += p;
            else
                for (l = 0; l < 6; ++l)
                    if (k >= g[l][1] && k <= g[l][2])
                        if ((j & (g[l][0] << k)) == 0)
                            x[j + (g[l][0] << k)] += p;
        }
        x = y;
    }

    printf("%lld\n", y[0]);
    return 0;
}
--------------


The C++ code is much faster than the D code (I see the D code 30+ times slower with dmd and about like 20 times with ldc2). One difference between the C++ and D code is that the C++ map uses a search tree (red-black probably), while the D code uses a hash.

To test that algorithmic difference, if I replace the map in the C++ code with a std::unordered_map (C++11):

#include <unordered_map>
...
std::unordered_map<unsigned int, unsigned long long> x, y;


then the run-time increases (more than two times) but it's still much faster than the D code.

Is it possible to fix the D code to increase its performance (there are associative array libraries for D, but I have not tried them in this program).

Bye,
bearophile
September 01, 2014
V Sun, 31 Aug 2014 10:55:31 +0000
bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
napsáno:

> This is C++ code that solves one Euler problem:
> 
> --------------
> #include <stdio.h>
> #include <map>
> 
> const unsigned int H = 9, W = 12;
> 
> const int g[6][3] = {{7, 0, H - 3},
>                       {1 + (1 << H) + (1 << (2 * H)), 0, H - 1},
>                       {3 + (1 << H), 0, H - 2},
>                       {3 + (2 << H), 0, H - 2},
>                       {1 + (1 << H) + (2 << H), 0, H - 2},
>                       {1 + (1 << H) + (1 << (H - 1)), 1, H - 1}};
> 
> int main() {
>      unsigned long long p, i, k;
>      unsigned int j, l;
>      std::map<unsigned int, unsigned long long> x, y;
>      x[0] = 1;
> 
>      for (i = 0; i < W; ++i) {
>          y.clear();
>          while (!x.empty()) {
>              j = x.begin()->first;
>              p = x.begin()->second;
>              x.erase(x.begin());
> 
>              for (k = 0; k < H; ++k)
>                  if ((j & (1 << k)) == 0)
>                      break;
> 
>              if (k == H)
>                  y[j >> H] += p;
>              else
>                  for (l = 0; l < 6; ++l)
>                      if (k >= g[l][1] && k <= g[l][2])
>                          if ((j & (g[l][0] << k)) == 0)
>                              x[j + (g[l][0] << k)] += p;
>          }
>          x = y;
>      }
> 
>      printf("%lld\n", y[0]);
>      return 0;
> }
> --------------
> 
> 
> I have translated it to D like this (I know in D there are nicer ways to write it, but I have tried to keep the look of the code as much similar as possible to the C++ code):
> 
> 
> --------------
> import core.stdc.stdio;
> 
> const uint H = 9, W = 12;
> 
> const uint[3][6] g = [[7, 0, H - 3],
>                        [1 + (1 << H) + (1 << (2 * H)), 0, H - 1],
>                        [3 + (1 << H), 0, H - 2],
>                        [3 + (2 << H), 0, H - 2],
>                        [1 + (1 << H) + (2 << H), 0, H - 2],
>                        [1 + (1 << H) + (1 << (H - 1)), 1, H - 1]];
> 
> int main() {
>      ulong p, i, k;
>      uint j, l;
>      ulong[uint] x, y;
>      x[0] = 1;
> 
>      for (i = 0; i < W; ++i) {
>          y = null;
>          while (x.length) {
>              j = x.byKey.front;
>              p = x.byValue.front;
>              x.remove(cast(int)j);
> 
>              for (k = 0; k < H; ++k)
>                  if ((j & (1 << k)) == 0)
>                      break;
> 
>              if (k == H)
>                  y[j >> H] += p;
>              else
>                  for (l = 0; l < 6; ++l)
>                      if (k >= g[l][1] && k <= g[l][2])
>                          if ((j & (g[l][0] << k)) == 0)
>                              x[j + (g[l][0] << k)] += p;
>          }
>          x = y;
>      }
> 
>      printf("%lld\n", y[0]);
>      return 0;
> }
> --------------
> 
> 
> The C++ code is much faster than the D code (I see the D code 30+ times slower with dmd and about like 20 times with ldc2). One difference between the C++ and D code is that the C++ map uses a search tree (red-black probably), while the D code uses a hash.
> 
> To test that algorithmic difference, if I replace the map in the C++ code with a std::unordered_map (C++11):
> 
> #include <unordered_map>
> ...
> std::unordered_map<unsigned int, unsigned long long> x, y;
> 
> 
> then the run-time increases (more than two times) but it's still much faster than the D code.
> 
> Is it possible to fix the D code to increase its performance (there are associative array libraries for D, but I have not tried them in this program).
> 
> Bye,
> bearophile

I think main problem is in calling delegates (x.byKey.front and x.byValue.front;). If I change x.byValue.front with x[j], It makes program a lot faster

September 01, 2014
On Mon, 1 Sep 2014 09:21:03 +0200
Daniel Kozak via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> I think main problem is in calling delegates (x.byKey.front and x.byValue.front;). If I change x.byValue.front with x[j], It makes program a lot faster
yes. replacing `p = x.byValue.front;` by `p = x[j];` cutted down execution time from 3.2 seconds to 1.8 seconds with my gdc.

i always dreamt about official 'get any key from AA' API. and, btw, 'get value and remove key' API too. they are useful sometimes.

seems that i have to write another silly patch for this. ;-)


September 01, 2014
On Mon, 1 Sep 2014 09:21:03 +0200
Daniel Kozak via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

and this version executes in 0.2 seconds:

    import core.stdc.stdio;

    immutable H = 9, W = 12;

    immutable uint[3][6] g = [
      [7, 0, H - 3],
      [1 + (1 << H) + (1 << (2 * H)), 0, H - 1],
      [3 + (1 << H), 0, H - 2],
      [3 + (2 << H), 0, H - 2],
      [1 + (1 << H) + (2 << H), 0, H - 2],
      [1 + (1 << H) + (1 << (H - 1)), 1, H - 1]
    ];


    int main () {
      ulong p, i, k;
      uint j, l;
      ulong[uint] x, y;
      x[0] = 1;

      for (i = 0; i < W; ++i) {
        y = null;
        while (x.length) {
          foreach (key; x.keys) {
            auto pp = key in x;
            if (pp is null) continue;
            j = key;
            p = *pp;
            x.remove(j);

            for (k = 0; k < H; ++k) {
              if ((j & (1 << k)) == 0) break;
            }

            if (k == H) {
              y[j >> H] += p;
            } else {
              for (l = 0; l < 6; ++l) {
                if (k >= g[l][1] && k <= g[l][2]) {
                  if ((j & (g[l][0] << k)) == 0) {
                    x[j + (g[l][0] << k)] += p;
                  }
                }
              }
            }
          }
        }
        x = y;
      }

      printf("%lld\n", y[0]);
      return 0;
    }

so yes, creating delegates are SLOW. even taking into account
that we creating dynamic arrays with keys in this version, it's rockin'
fast.


September 01, 2014
ketmar:

Thank you for your analysis and code.

>         while (x.length) {
>           foreach (key; x.keys) {
>             auto pp = key in x;
>             if (pp is null) continue;
>             j = key;
>             p = *pp;
>             x.remove(j);

This is a little cleaner:

        while (x.length) {
            foreach (immutable j; x.keys) {
                p = x[j];
                x.remove(j);


> so yes, creating delegates are SLOW. even taking into account
> that we creating dynamic arrays with keys in this version, it's rockin' fast.

This D version is much faster, it runs in about 0.26 seconds (while the C++ version runs in about 0.21 seconds), this is acceptable because x.keys allocates a dynamic array of keys in the middle of the code.

But I think the "j = x.byKey.front;  p = x.byValue.front;" code looks sufficiently idiomatic, and its performance is terrible. It seems fit for a performance bug report. I have to create a synthetic benchmark that shows just the problem.

Bye,
bearophile
September 01, 2014
On Mon, 01 Sep 2014 08:53:45 +0000
bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> ketmar:
> Thank you for your analysis and code.
actually, it's based on Daniel Kozak's observation, so credit him instead. ;-)

> But I think the "j = x.byKey.front;  p = x.byValue.front;" code looks sufficiently idiomatic, and its performance is terrible.

that's why i suggest to extend AA API, adding x.firstKey and x.firstValue functions. i'll try to make the patch soon.

ah, and x.getAndRemove() too, it's handy sometimes. but this will
need new druntime hook, i think. ;-)


September 01, 2014
On Mon, 01 Sep 2014 08:53:45 +0000
bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> This is a little cleaner:
yes, i just did a 'quickhack' without even analyzing what the code does. ;-)


September 01, 2014
> It seems fit for a performance bug report. I have to create a synthetic benchmark that shows just the problem.

https://issues.dlang.org/show_bug.cgi?id=13410

(Perhaps I have credited the wrong person there, sorry, I will fix).

Bye,
bearophile
September 01, 2014
ketmar:

>> But I think the "j = x.byKey.front;  p = x.byValue.front;" code looks sufficiently idiomatic, and its performance is terrible.
>
> that's why i suggest to extend AA API, adding x.firstKey and
> x.firstValue functions. i'll try to make the patch soon.

In theory the best solution is to improve the performance of the "byKey.front" and "byValue.front" idioms.

If that's not possible and you need to add the firstKey/firstValue functions, then it's a good idea to make the D front-end rewrite "byKey.front" with a call to "firstKey" and a "byValue.front" with a call to firstValue.


> ah, and x.getAndRemove() too, it's handy sometimes. but this will need new druntime hook, i think. ;-)

x.pop() sounds nicer :-)

Bye,
bearophile
September 01, 2014
On Mon, 01 Sep 2014 09:22:50 +0000
bearophile via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> In theory the best solution is to improve the performance of the "byKey.front" and "byValue.front" idioms.
i found that slowdown is from _aaRange(), not from delegates. the following code is slow even w/o delegate creation:

    import core.stdc.stdio;

    ref K firstKey(T : V[K], V, K)(T aa) @property
    {
        return *cast(K*)_aaRangeFrontKey(_aaRange(cast(void*)aa));
    }

    ref V firstVal(T : V[K], V, K)(T aa) @property
    {
        return *cast(V*)_aaRangeFrontValue(_aaRange(cast(void*)aa));
    }


    int main() {
        long[int] aa;
        for (int i = 0; i < 50000; i++)
            aa[i] = i;

        long total = 0;

        while (aa.length) {
            //int k = aa.byKey.front;
            int k = aa.firstKey;
            //long v = aa.byValue.front;
            long v = aa.firstVal;
            aa.remove(k);
            total += k + v;
        }

        printf("%lld\n", total);
        return 0;
    }


seems that we need two more hooks for AAs: `_aaFrontKey()` and
`_aaFrontValue()`.

> x.pop() sounds nicer :-)
ah, sure. i'm not very good in inventing names. ;-)



« First   ‹ Prev
1 2