Thread overview | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Here are some benchmarks of std.algorithm.sort vs. GCC 4.1.2's implementation of STL sort() and the implementation I use in my dstats library for lots of statistical calculations. D code: import std.stdio, std.perf, std.random, std.algorithm, dstats.sort; void main(string[] args) { auto nums = new float[1_000_000]; foreach(ref num; nums) { num = uniform(0.0, 10_000_000.0); } auto pc = new PerformanceCounter; pc.start; if(args.length > 1 && args[1] == "--dstats") { dstats.sort.qsort(nums); } else { std.algorithm.sort(nums); } pc.stop; writeln(pc.milliseconds); } C++ code: #include <vector> #include <ctime> #include <cstdlib> #include <algorithm> #include <iostream> using namespace std; // Generates quick and dirty random numbers. float sloppyRand() { unsigned num = 0; num += (rand() << 16); num += rand(); return num; } int main() { vector<float> nums(1000000); for(int i = 0; i < nums.size(); i++) { nums[i] = sloppyRand(); } double startTime = clock(); sort(nums.begin(), nums.end()); double stopTime = clock(); double clocksPerMillisec = CLOCKS_PER_SEC / 1000.0; cout << (stopTime - startTime) / clocksPerMillisec << endl; } Compilers: DMD 2.047 (for D) GCC 4.1.2 (For C++; I couldn't get the C++ code to compile on DMC because of STL issues that I don't feel like solving, even though this would level the playing field because D and C++ would have the same backend) Results on Indel Xeon x5472: Compiler settings: -O -inline -release (for D), -O3 (for GCC) D, using std.algorithm.sort: 330 milliseconds D, using dstats.qsort: 96 milliseconds C++, using 64-bit compile: 90 milliseconds C++, using 32-bit compile: 100 milliseconds I'd say std.algorithm.sort could use some serious optimization. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/1a52f247/attachment.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | The Array.sort implementation I put in Tango is pretty well tuned. It would have to be adapted to use ranges, but I'd be happy to contribute it.
On Jul 2, 2010, at 7:55 AM, David Simcha wrote:
> Here are some benchmarks of std.algorithm.sort vs. GCC 4.1.2's implementation of STL sort() and the implementation I use in my dstats library for lots of statistical calculations.
>
> D code:
> import std.stdio, std.perf, std.random, std.algorithm, dstats.sort;
>
> void main(string[] args) {
> auto nums = new float[1_000_000];
> foreach(ref num; nums) {
> num = uniform(0.0, 10_000_000.0);
> }
>
> auto pc = new PerformanceCounter;
> pc.start;
>
> if(args.length > 1 && args[1] == "--dstats") {
> dstats.sort.qsort(nums);
> } else {
> std.algorithm.sort(nums);
> }
>
> pc.stop;
> writeln(pc.milliseconds);
> }
>
> C++ code:
>
> #include <vector>
> #include <ctime>
> #include <cstdlib>
> #include <algorithm>
> #include <iostream>
>
> using namespace std;
>
> // Generates quick and dirty random numbers.
> float sloppyRand() {
> unsigned num = 0;
> num += (rand() << 16);
> num += rand();
> return num;
> }
>
> int main() {
> vector<float> nums(1000000);
> for(int i = 0; i < nums.size(); i++) {
> nums[i] = sloppyRand();
> }
>
> double startTime = clock();
> sort(nums.begin(), nums.end());
> double stopTime = clock();
>
> double clocksPerMillisec = CLOCKS_PER_SEC / 1000.0;
> cout << (stopTime - startTime) / clocksPerMillisec << endl;
> }
>
> Compilers:
> DMD 2.047 (for D)
> GCC 4.1.2 (For C++; I couldn't get the C++ code to compile on DMC because of STL issues that I don't feel like solving, even though this would level the playing field because D and C++ would have the same backend)
>
> Results on Indel Xeon x5472:
>
> Compiler settings: -O -inline -release (for D), -O3 (for GCC)
>
> D, using std.algorithm.sort: 330 milliseconds
> D, using dstats.qsort: 96 milliseconds
> C++, using 64-bit compile: 90 milliseconds
> C++, using 32-bit compile: 100 milliseconds
>
> I'd say std.algorithm.sort could use some serious optimization.
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy?
Andrei
David Simcha wrote:
> Here are some benchmarks of std.algorithm.sort vs. GCC 4.1.2's implementation of STL sort() and the implementation I use in my dstats library for lots of statistical calculations.
>
> D code:
> import std.stdio, std.perf, std.random, std.algorithm, dstats.sort;
>
> void main(string[] args) {
> auto nums = new float[1_000_000];
> foreach(ref num; nums) {
> num = uniform(0.0, 10_000_000.0);
> }
>
> auto pc = new PerformanceCounter;
> pc.start;
>
> if(args.length > 1 && args[1] == "--dstats") {
> dstats.sort.qsort(nums);
> } else {
> std.algorithm.sort(nums);
> }
>
> pc.stop;
> writeln(pc.milliseconds);
> }
>
> C++ code:
>
> #include <vector>
> #include <ctime>
> #include <cstdlib>
> #include <algorithm>
> #include <iostream>
>
> using namespace std;
>
> // Generates quick and dirty random numbers.
> float sloppyRand() {
> unsigned num = 0;
> num += (rand() << 16);
> num += rand();
> return num;
> }
>
> int main() {
> vector<float> nums(1000000);
> for(int i = 0; i < nums.size(); i++) {
> nums[i] = sloppyRand();
> }
>
> double startTime = clock();
> sort(nums.begin(), nums.end());
> double stopTime = clock();
>
> double clocksPerMillisec = CLOCKS_PER_SEC / 1000.0;
> cout << (stopTime - startTime) / clocksPerMillisec << endl;
> }
>
> Compilers:
> DMD 2.047 (for D)
> GCC 4.1.2 (For C++; I couldn't get the C++ code to compile on DMC
> because of STL issues that I don't feel like solving, even though this
> would level the playing field because D and C++ would have the same backend)
>
> Results on Indel Xeon x5472:
>
> Compiler settings: -O -inline -release (for D), -O3 (for GCC)
>
> D, using std.algorithm.sort: 330 milliseconds
> D, using dstats.qsort: 96 milliseconds
> C++, using 64-bit compile: 90 milliseconds
> C++, using 32-bit compile: 100 milliseconds
>
> I'd say std.algorithm.sort could use some serious optimization.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | > >From: David Simcha <dsimcha at gmail.com> >To: Discuss the phobos library for D <phobos at puremagic.com> > >Compilers: >DMD 2.047 (for D) >GCC 4.1.2 (For C++; I couldn't get the C++ code to compile on DMC because of STL issues that I don't feel like solving, even though this would level the playing field because D and C++ would have the same backend) > > >You could use the Gnu D compiler and still have the same backend. Might be interesting. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/47cca44c/attachment.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jason Spencer | dstats.sort only compiles on recent versions of D2. On Fri, Jul 2, 2010 at 12:22 PM, Jason Spencer <spencer8 at sbcglobal.net>wrote: > > > *From:* David Simcha <dsimcha at gmail.com> > *To:* Discuss the phobos library for D <phobos at puremagic.com> > * > *Compilers: > DMD 2.047 (for D) > GCC 4.1.2 (For C++; I couldn't get the C++ code to compile on DMC because > of STL issues that I don't feel like solving, even though this would level > the playing field because D and C++ would have the same backend) > > You could use the Gnu D compiler and still have the same backend. Might be interesting. > > _______________________________________________ > phobos mailing list > phobos at puremagic.com > http://lists.puremagic.com/mailman/listinfo/phobos > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/6fa8c64e/attachment-0001.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | . On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com>wrote: > One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy? > > Insertion sort at 25 elements. This is based on fairly heavy empirical testing. I tried disabling this and doing qsort all the way down. This only explains a small part of the difference (about 30 milliseconds' worth). I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it: // This is probably not inlined b/c I don't think DMD can inline nested functions // that access the outer scope. Someone please confirm this bool pred(ElementType!(Range) a) { return less(a, r.back); } auto right = partition!(pred, ss)(r); -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/49015881/attachment.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | Working around that would mess much of the elegance of std.sort, sigh. The nice thing is that sort reuses partition instead of writing a dedicated version of it (as I had in my implementation for a while). I was quite glad I'd pulled that off.
Andrei
David Simcha wrote:
> .
> On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com
> <mailto:andrei at erdani.com>> wrote:
>
> One simple solution would be for you to contribute dstat's sort to
> Phobos. However, I'd be curious what the reason of std's sort
> slowness is. I suspect it might be the fact that I use qsort all the
> way down instead of switching to insertion sort. What is your sort's
> strategy?
>
>
> Insertion sort at 25 elements. This is based on fairly heavy empirical testing. I tried disabling this and doing qsort all the way down. This only explains a small part of the difference (about 30 milliseconds' worth).
>
> I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it:
>
> // This is probably not inlined b/c I don't think DMD can inline nested
> functions
> // that access the outer scope. Someone please confirm this
> bool pred(ElementType!(Range) a)
> {
> return less(a, r.back);
> }
> auto right = partition!(pred, ss)(r);
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Fri, Jul 2, 2010 at 12:44 PM, Andrei Alexandrescu <andrei at erdani.com>wrote: > Working around that would mess much of the elegance of std.sort, sigh. The nice thing is that sort reuses partition instead of writing a dedicated version of it (as I had in my implementation for a while). I was quite glad I'd pulled that off. > > Andrei > Couldn't you just put in a second version of std.algorithm.partition that takes a second argument in addition to pred, similar to what you did for std.algorithm.count()? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/c4501da8/attachment.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | The rules for inlining are weird. When I wrote Array.sort I noticed that a nested function that took reference parameters (ie. swap) wasn't inlined either, so I changed mine to use indexes and accessed the outer array that way. For this level of performance tuning you really have to look at the ASM output to see what's going on. On Jul 2, 2010, at 9:39 AM, David Simcha wrote: > . > On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com> wrote: > One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy? > > > Insertion sort at 25 elements. This is based on fairly heavy empirical testing. I tried disabling this and doing qsort all the way down. This only explains a small part of the difference (about 30 milliseconds' worth). > > I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it: > > // This is probably not inlined b/c I don't think DMD can inline nested functions > // that access the outer scope. Someone please confirm this > bool pred(ElementType!(Range) a) > { > return less(a, r.back); > } > auto right = partition!(pred, ss)(r); > > _______________________________________________ > phobos mailing list > phobos at puremagic.com > http://lists.puremagic.com/mailman/listinfo/phobos -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/61458618/attachment.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Please file bugs with reduced test cases where seemingly obvious inlining isn't done. I'd like to spend time on them. Ref parameters were one of the things that blocked inlining until this most recent release (or maybe the one before that). That works now. :) On Fri, 2 Jul 2010, Sean Kelly wrote: > Date: Fri, 2 Jul 2010 11:26:58 -0700 > From: Sean Kelly <sean at invisibleduck.org> > Reply-To: Discuss the phobos library for D <phobos at puremagic.com> > To: Discuss the phobos library for D <phobos at puremagic.com> > Subject: Re: [phobos] std.algorithm.sort slow as molasses > > The rules for inlining are weird. When I wrote Array.sort I noticed that a nested function that took reference parameters (ie. swap) wasn't inlined either, so I changed mine to use indexes and accessed the outer array that way. For this level of performance tuning you really have to look at the ASM output to see what's going on. > > On Jul 2, 2010, at 9:39 AM, David Simcha wrote: > > > . > > On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com> wrote: > > One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy? > > > > > > Insertion sort at 25 elements. This is based on fairly heavy empirical testing. I tried disabling this and doing qsort all the way down. This only explains a small part of the difference (about 30 milliseconds' worth). > > > > I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it: > > > > // This is probably not inlined b/c I don't think DMD can inline nested functions > > // that access the outer scope. Someone please confirm this > > bool pred(ElementType!(Range) a) > > { > > return less(a, r.back); > > } > > auto right = partition!(pred, ss)(r); > > > > _______________________________________________ > > phobos mailing list > > phobos at puremagic.com > > http://lists.puremagic.com/mailman/listinfo/phobos > > -------------- next part -------------- _______________________________________________ phobos mailing list phobos at puremagic.com http://lists.puremagic.com/mailman/listinfo/phobos |
Copyright © 1999-2021 by the D Language Foundation