Jump to page: 1 2 3
Thread overview
[phobos] std.algorithm.sort slow as molasses
Jul 02, 2010
David Simcha
Jul 02, 2010
Sean Kelly
Jul 02, 2010
David Simcha
Jul 02, 2010
David Simcha
Jul 02, 2010
Walter Bright
Jul 02, 2010
Sean Kelly
Jul 02, 2010
Brad Roberts
Jul 02, 2010
Sean Kelly
Jul 02, 2010
Walter Bright
Jul 02, 2010
Leandro Lucarella
Jul 02, 2010
Brad Roberts
Jul 02, 2010
David Simcha
Jul 02, 2010
David Simcha
Jul 02, 2010
David Simcha
Jul 02, 2010
David Simcha
Jul 02, 2010
Jonathan M Davis
Jul 03, 2010
David Simcha
Jul 03, 2010
David Simcha
Jul 02, 2010
Leandro Lucarella
Jul 02, 2010
Jason Spencer
Jul 02, 2010
David Simcha
July 02, 2010
Here are some benchmarks of std.algorithm.sort vs. GCC 4.1.2's implementation of STL sort() and the implementation I use in my dstats library for lots of statistical calculations.

D code:
import std.stdio, std.perf, std.random, std.algorithm, dstats.sort;

void main(string[] args) {
    auto nums = new float[1_000_000];
    foreach(ref num; nums) {
        num = uniform(0.0, 10_000_000.0);
    }

    auto pc = new PerformanceCounter;
    pc.start;

    if(args.length > 1 && args[1] == "--dstats") {
        dstats.sort.qsort(nums);
    } else {
        std.algorithm.sort(nums);
    }

    pc.stop;
    writeln(pc.milliseconds);
}

C++ code:

#include <vector>
#include <ctime>
#include <cstdlib>
#include <algorithm>
#include <iostream>

using namespace std;

// Generates quick and dirty random numbers.
float sloppyRand() {
    unsigned num = 0;
    num += (rand() << 16);
    num += rand();
    return num;
}

int main() {
    vector<float> nums(1000000);
    for(int i = 0; i < nums.size(); i++) {
        nums[i] = sloppyRand();
    }

    double startTime = clock();
    sort(nums.begin(), nums.end());
    double stopTime = clock();

    double clocksPerMillisec = CLOCKS_PER_SEC / 1000.0;
    cout << (stopTime - startTime) / clocksPerMillisec << endl;
}

Compilers:
DMD 2.047 (for D)
GCC 4.1.2  (For C++;  I couldn't get the C++ code to compile on DMC because
of STL issues that I don't feel like solving, even though this would level
the playing field because D and C++ would have the same backend)

Results on Indel Xeon x5472:

Compiler settings:  -O -inline -release (for D), -O3 (for GCC)

D, using std.algorithm.sort:  330 milliseconds
D, using dstats.qsort:  96 milliseconds
C++, using 64-bit compile:  90 milliseconds
C++, using 32-bit compile:  100 milliseconds

I'd say std.algorithm.sort could use some serious optimization.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/1a52f247/attachment.html>
July 02, 2010
The Array.sort implementation I put in Tango is pretty well tuned.  It would have to be adapted to use ranges, but I'd be happy to contribute it.

On Jul 2, 2010, at 7:55 AM, David Simcha wrote:

> Here are some benchmarks of std.algorithm.sort vs. GCC 4.1.2's implementation of STL sort() and the implementation I use in my dstats library for lots of statistical calculations.
> 
> D code:
> import std.stdio, std.perf, std.random, std.algorithm, dstats.sort;
> 
> void main(string[] args) {
>     auto nums = new float[1_000_000];
>     foreach(ref num; nums) {
>         num = uniform(0.0, 10_000_000.0);
>     }
> 
>     auto pc = new PerformanceCounter;
>     pc.start;
> 
>     if(args.length > 1 && args[1] == "--dstats") {
>         dstats.sort.qsort(nums);
>     } else {
>         std.algorithm.sort(nums);
>     }
> 
>     pc.stop;
>     writeln(pc.milliseconds);
> }
> 
> C++ code:
> 
> #include <vector>
> #include <ctime>
> #include <cstdlib>
> #include <algorithm>
> #include <iostream>
> 
> using namespace std;
> 
> // Generates quick and dirty random numbers.
> float sloppyRand() {
>     unsigned num = 0;
>     num += (rand() << 16);
>     num += rand();
>     return num;
> }
> 
> int main() {
>     vector<float> nums(1000000);
>     for(int i = 0; i < nums.size(); i++) {
>         nums[i] = sloppyRand();
>     }
> 
>     double startTime = clock();
>     sort(nums.begin(), nums.end());
>     double stopTime = clock();
> 
>     double clocksPerMillisec = CLOCKS_PER_SEC / 1000.0;
>     cout << (stopTime - startTime) / clocksPerMillisec << endl;
> }
> 
> Compilers:
> DMD 2.047 (for D)
> GCC 4.1.2  (For C++;  I couldn't get the C++ code to compile on DMC because of STL issues that I don't feel like solving, even though this would level the playing field because D and C++ would have the same backend)
> 
> Results on Indel Xeon x5472:
> 
> Compiler settings:  -O -inline -release (for D), -O3 (for GCC)
> 
> D, using std.algorithm.sort:  330 milliseconds
> D, using dstats.qsort:  96 milliseconds
> C++, using 64-bit compile:  90 milliseconds
> C++, using 32-bit compile:  100 milliseconds
> 
> I'd say std.algorithm.sort could use some serious optimization.
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos

July 02, 2010
One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy?

Andrei

David Simcha wrote:
> Here are some benchmarks of std.algorithm.sort vs. GCC 4.1.2's implementation of STL sort() and the implementation I use in my dstats library for lots of statistical calculations.
> 
> D code:
> import std.stdio, std.perf, std.random, std.algorithm, dstats.sort;
> 
> void main(string[] args) {
>     auto nums = new float[1_000_000];
>     foreach(ref num; nums) {
>         num = uniform(0.0, 10_000_000.0);
>     }
> 
>     auto pc = new PerformanceCounter;
>     pc.start;
> 
>     if(args.length > 1 && args[1] == "--dstats") {
>         dstats.sort.qsort(nums);
>     } else {
>         std.algorithm.sort(nums);
>     }
> 
>     pc.stop;
>     writeln(pc.milliseconds);
> }
> 
> C++ code:
> 
> #include <vector>
> #include <ctime>
> #include <cstdlib>
> #include <algorithm>
> #include <iostream>
> 
> using namespace std;
> 
> // Generates quick and dirty random numbers.
> float sloppyRand() {
>     unsigned num = 0;
>     num += (rand() << 16);
>     num += rand();
>     return num;
> }
> 
> int main() {
>     vector<float> nums(1000000);
>     for(int i = 0; i < nums.size(); i++) {
>         nums[i] = sloppyRand();
>     }
> 
>     double startTime = clock();
>     sort(nums.begin(), nums.end());
>     double stopTime = clock();
> 
>     double clocksPerMillisec = CLOCKS_PER_SEC / 1000.0;
>     cout << (stopTime - startTime) / clocksPerMillisec << endl;
> }
> 
> Compilers:
> DMD 2.047 (for D)
> GCC 4.1.2  (For C++;  I couldn't get the C++ code to compile on DMC
> because of STL issues that I don't feel like solving, even though this
> would level the playing field because D and C++ would have the same backend)
> 
> Results on Indel Xeon x5472:
> 
> Compiler settings:  -O -inline -release (for D), -O3 (for GCC)
> 
> D, using std.algorithm.sort:  330 milliseconds
> D, using dstats.qsort:  96 milliseconds
> C++, using 64-bit compile:  90 milliseconds
> C++, using 32-bit compile:  100 milliseconds
> 
> I'd say std.algorithm.sort could use some serious optimization.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
July 02, 2010


>
>From: David Simcha <dsimcha at gmail.com>
>To: Discuss the phobos library for D <phobos at puremagic.com>
>
>Compilers:
>DMD 2.047 (for D)
>GCC 4.1.2  (For C++;  I couldn't get the C++ code to compile on DMC because of STL issues that I don't feel like solving, even though this would level the playing field because D and C++ would have the same backend)
>
>
>You could use the Gnu D compiler and still have the same backend.  Might be interesting.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/47cca44c/attachment.html>
July 02, 2010
dstats.sort only compiles on recent versions of D2.

On Fri, Jul 2, 2010 at 12:22 PM, Jason Spencer <spencer8 at sbcglobal.net>wrote:

>
>
> *From:* David Simcha <dsimcha at gmail.com>
> *To:* Discuss the phobos library for D <phobos at puremagic.com>
> *
> *Compilers:
> DMD 2.047 (for D)
> GCC 4.1.2  (For C++;  I couldn't get the C++ code to compile on DMC because
> of STL issues that I don't feel like solving, even though this would level
> the playing field because D and C++ would have the same backend)
>
> You could use the Gnu D compiler and still have the same backend.  Might be interesting.
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/6fa8c64e/attachment-0001.html>
July 02, 2010
.
On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com>wrote:

> One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy?
>
>
Insertion sort at 25 elements.  This is based on fairly heavy empirical testing.  I tried disabling this and doing qsort all the way down.  This only explains a small part of the difference (about 30 milliseconds' worth).

I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it:

// This is probably not inlined b/c I don't think DMD can inline nested
functions
// that access the outer scope.  Someone please confirm this
bool pred(ElementType!(Range) a)
{
       return less(a, r.back);
}
auto right = partition!(pred, ss)(r);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/49015881/attachment.html>
July 02, 2010
Working around that would mess much of the elegance of std.sort, sigh. The nice thing is that sort reuses partition instead of writing a dedicated version of it (as I had in my implementation for a while). I was quite glad I'd pulled that off.

Andrei

David Simcha wrote:
> .
> On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com
> <mailto:andrei at erdani.com>> wrote:
> 
>     One simple solution would be for you to contribute dstat's sort to
>     Phobos. However, I'd be curious what the reason of std's sort
>     slowness is. I suspect it might be the fact that I use qsort all the
>     way down instead of switching to insertion sort. What is your sort's
>     strategy?
> 
> 
> Insertion sort at 25 elements.  This is based on fairly heavy empirical testing.  I tried disabling this and doing qsort all the way down.  This only explains a small part of the difference (about 30 milliseconds' worth).
> 
> I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it:
> 
> // This is probably not inlined b/c I don't think DMD can inline nested
> functions
> // that access the outer scope.  Someone please confirm this
> bool pred(ElementType!(Range) a)
> {
>        return less(a, r.back);
> }
> auto right = partition!(pred, ss)(r);
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
July 02, 2010
On Fri, Jul 2, 2010 at 12:44 PM, Andrei Alexandrescu <andrei at erdani.com>wrote:

> Working around that would mess much of the elegance of std.sort, sigh. The nice thing is that sort reuses partition instead of writing a dedicated version of it (as I had in my implementation for a while). I was quite glad I'd pulled that off.
>
> Andrei
>

Couldn't you just put in a second version of std.algorithm.partition that takes a second argument in addition to pred, similar to what you did for std.algorithm.count()?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/c4501da8/attachment.html>
July 02, 2010
The rules for inlining are weird.  When I wrote Array.sort I noticed that a nested function that took reference parameters (ie. swap) wasn't inlined either, so I changed mine to use indexes and accessed the outer array that way.  For this level of performance tuning you really have to look at the ASM output to see what's going on.

On Jul 2, 2010, at 9:39 AM, David Simcha wrote:

> .
> On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
> One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy?
> 
> 
> Insertion sort at 25 elements.  This is based on fairly heavy empirical testing.  I tried disabling this and doing qsort all the way down.  This only explains a small part of the difference (about 30 milliseconds' worth).
> 
> I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it:
> 
> // This is probably not inlined b/c I don't think DMD can inline nested functions
> // that access the outer scope.  Someone please confirm this
> bool pred(ElementType!(Range) a)
> {
>        return less(a, r.back);
> }
> auto right = partition!(pred, ss)(r);
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/61458618/attachment.html>
July 02, 2010
Please file bugs with reduced test cases where seemingly obvious inlining isn't done.  I'd like to spend time on them.  Ref parameters were one of the things that blocked inlining until this most recent release (or maybe the one before that).  That works now. :)

On Fri, 2 Jul 2010, Sean Kelly wrote:

> Date: Fri, 2 Jul 2010 11:26:58 -0700
> From: Sean Kelly <sean at invisibleduck.org>
> Reply-To: Discuss the phobos library for D <phobos at puremagic.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Subject: Re: [phobos] std.algorithm.sort slow as molasses
> 
> The rules for inlining are weird.  When I wrote Array.sort I noticed that a nested function that took reference parameters (ie. swap) wasn't inlined either, so I changed mine to use indexes and accessed the outer array that way.  For this level of performance tuning you really have to look at the ASM output to see what's going on.
> 
> On Jul 2, 2010, at 9:39 AM, David Simcha wrote:
> 
> > .
> > On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
> > One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy?
> > 
> > 
> > Insertion sort at 25 elements.  This is based on fairly heavy empirical testing.  I tried disabling this and doing qsort all the way down.  This only explains a small part of the difference (about 30 milliseconds' worth).
> > 
> > I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it:
> > 
> > // This is probably not inlined b/c I don't think DMD can inline nested functions
> > // that access the outer scope.  Someone please confirm this
> > bool pred(ElementType!(Range) a)
> > {
> >        return less(a, r.back);
> > }
> > auto right = partition!(pred, ss)(r);
> > 
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
-------------- next part --------------
_______________________________________________
phobos mailing list
phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
« First   ‹ Prev
1 2 3