April 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #10 from SomeDude <lovelydear@mailmetrash.com> 2012-04-25 01:12:19 PDT ---
For range = 100, n = 10_000_000

Eclipse, JVM1.6 with option -Xmx512M : 381 ms
dmd -O -release -inline (2.059): around 1800 ms

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #11 from SomeDude <lovelydear@mailmetrash.com> 2012-04-25 01:24:40 PDT ---
I noticed that just after compilation, the first run is always slightly longer than the following ones by about 50 to 80 ms.

The other observation, - and that's crazy -, is if I uncomment the line
   writeln(walkLength(t[]));
I am consistently *faster* by 10 to 25 ms than when it's commented !!

My dmd measurements are done via Powershell command: Measure-Command{./bug.exe}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 27, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #12 from dawg@dawgfoto.de 2012-04-26 19:54:08 PDT ---
Created an attachment (id=1099)
optimzations

dmd -O -release -inline rbtree.d
time ./rbtree 0.566 total

With "scope(success) ++_length" in RBTree._add
the property calls are not inlined.

time ./rbtree 0.362 total

Then new RBTree!Elem allocates 33 bytes spending lots of time and space for the unused 31 bytes.

Using GC.malloc directly:
time ./rbtree 0.228 total

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 27, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #13 from Steven Schveighoffer <schveiguy@yahoo.com> 2012-04-27 03:29:07 PDT ---
(In reply to comment #12)
> With "scope(success) ++_length" in RBTree._add
> the property calls are not inlined.

Gr... I hate how dmd makes poor decisions about inlining and then won't let you force the issue...

Your optimization ignores the multiple option, it will have to be fixed.

> Then new RBTree!Elem allocates 33 bytes spending lots of time and space for the unused 31 bytes.

This will be fixed when allocators are added.  I think we can hold off on this optimization.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 27, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #14 from SomeDude <lovelydear@mailmetrash.com> 2012-04-27 08:40:08 PDT ---
(In reply to comment #11)
> I noticed that just after compilation, the first run is always slightly longer than the following ones by about 50 to 80 ms.
> 
> The other observation, - and that's crazy -, is if I uncomment the line
>    writeln(walkLength(t[]));
> I am consistently *faster* by 10 to 25 ms than when it's commented !!
> 
> My dmd measurements are done via Powershell command: Measure-Command{./bug.exe}

OK, forget it, it turns out using Measure-Command{...} was a rather poor way of measuring runtimes because of startup/shutdown.

Using StopWatch, I now get consistent results.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 29, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650


dawg@dawgfoto.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dawg@dawgfoto.de


--- Comment #15 from dawg@dawgfoto.de 2012-04-29 07:04:14 PDT ---
>Gr... I hate how dmd makes poor decisions about inlining and then won't let you force the issue...

This is an issue with how the exception handling alters the control
flow. Perhaps annotating the properties with nothrow would have a similar
effect.
Anyhow scope (success) means that you'll have to temporarily store the return
value, so a small penalty will remain in most cases.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 22, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #16 from bearophile_hugs@eml.cc 2012-05-22 14:56:09 PDT ---
Now on my PC the D version (using redBlackTree) is just 3.7 times slower than the Java version that uses boxed Integers. This is not good enough still, but now redBlackTree is usable (DMD 2.060alpha, 32 bit, Windows):

Timings, seconds:
  D version: 1.65
  D versions + GC.disable() at the top: 0.78
  Java version: 0.45

I think a good D version has to run this benchmark in 0.2 - 0.3 seconds.

And indeed, this C++ program (that uses a STL ordered set, that is usually
implemented with a Red-Black Tree) runs in 0.27 seconds on my PC (GCC 4.7.0,
-Ofast -flto -s), it's about 6.1 times faster than the current D code (that
uses a different and less efficient back-end):


#include <set>
#include <stdio.h>
using namespace std;

int main() {
    const int range = 100;
    const int n = 1000000;

    set<int> t;
    t.insert(0);

    for (int i = 0; i < n; i++) {
        if (i > range)
            t.erase(t.begin());
        t.insert(i);
    }

    /*
    printf("%u\n", t.size()); // prints 101
    for (set<int>::iterator it = t.begin(); it != t.end(); it++)
        printf("%d ", *it);
    printf("\n");
    */
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 22, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #17 from Steven Schveighoffer <schveiguy@yahoo.com> 2012-05-22 16:40:50 PDT ---
If the issue is that my redblacktree implementation isn't fast enough, you are welcome to try and improve it.

I implemented it based on the algorithm I found inside my CLR Algorithms book. I have never pretended really to be an expert in RBT implementation.  I have no idea how optimized it is, or how much more optimized it could be.  I made some refactoring adjustments from the standard algorithm, and the red black algorithm in my code is well documented.

I also have not looked at any of the g++ STL code, so I have no idea how it compares to that.  Obviously I haven't looked at any Java implementations.

On my system, the D version with dcollections, which uses the same RBT implementation as std.algorithm, takes about 380 milliseconds (-release -O -inline), while the equivalent C++ version takes 250 ms.  I don't think I have the JDK installed, so I cannot test the java version.

I don't think the GC is really hindering performance any more, it's just pure algorithmic + compiler optimization comparison.

BTW, MAKE SURE you compile with -release for testing, because without this, the runtime calls Object.invariant for every method call.

Here is the optimized-for-dcollections code (which keeps track of the begin element), you can't do this in std.container, since the concept of cursors does not exist:

import std.stdio;
import dcollections.TreeSet;

void main() {
    enum int range = 100;
    enum int n = 1_000_000;

    auto t = new TreeSet!int(0);
    auto nextToRemove = t.begin;

    for (int i = 0; i < n; i++) {
        if (i > range)
            nextToRemove = t.remove(nextToRemove);
        t.add(i);
    }

    //writeln(walkLength(t[]));
    //writeln(t[]);
}

Same code in C++:

#include <set>
#include <stdio.h>
using namespace std;

int main() {
    const int range = 100;
    const int n = 1000000;

    set<int> t;
    t.insert(0);
    set<int>::iterator nextToRemove = t.begin();

    for (int i = 0; i < n; i++) {
        if (i > range)
        {
            set<int>::iterator tmp = nextToRemove;
            tmp++;
            t.erase(nextToRemove);
            nextToRemove = tmp;
        }
        t.insert(i);
    }

    /*
       printf("%u\n", t.size()); // prints 101
       for (set<int>::iterator it = t.begin(); it != t.end(); it++)
       printf("%d ", *it);
       printf("\n");
     */
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 22, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #18 from Steven Schveighoffer <schveiguy@yahoo.com> 2012-05-22 16:42:17 PDT ---
(In reply to comment #17)
> On my system, the D version with dcollections, which uses the same RBT implementation as std.algorithm...

should be std.container, not std.algorithm!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 23, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5650



--- Comment #19 from bearophile_hugs@eml.cc 2012-05-22 18:00:40 PDT ---
(In reply to comment #17)

Thank you for implementing the Red-Black Tree.

> On my system, the D version with dcollections, which uses the same RBT implementation as std.algorithm, takes about 380 milliseconds (-release -O -inline), while the equivalent C++ version takes 250 ms.

There is a large difference between your D timings and mine.


> BTW, MAKE SURE you compile with -release for testing, because without this, the runtime calls Object.invariant for every method call.

I have used -release too.

I will wait for other people to time the D and C++ versions. If they confirm such small timing difference (like 380 ms Vs of 250 ms), we can close this bug report.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------