Thread overview | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
August 25, 2013 Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
I wrote a few versions of a palindrome testing function and noticed that versions using ranges (isPalindrome1..isPalindrome3) are 3 times slower than array indexing one (isPalindrome0). Is there a more efficient way of using ranges? Timing was done on Windows with DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline -release. // 6.9ms bool isPalindrome0(T)(T[] s) { auto i = 0; for (; i < s.length/2 && s[i] == s[$-i-1]; ++i) {} return i == s.length/2; } // 21ms bool isPalindrome1(T)(T[] s) { while (s.length > 1 && s.front == s.back) { s.popBack(); s.popFront(); } return s.length <= 1; } // 21ms bool isPalindrome2(T)(T[] s) { for (; s.length > 1 && s.front == s.back; s.popBack(), s.popFront()) {} return s.length <= 1; } // 21.4ms bool isPalindrome3(T)(T s) { foreach (i; 0..s.length/2) if (s[i] != s[$-i-1]) return false; return true; } int test() { int a[10]; int n = 0; foreach (i; 0..1_000_000) { a[4] = !a[4]; n += isPalindrome0(a); } return n; } void main() { enum N = 16; auto t = benchmark!(test)(N); writeln(cast(float)t[0].msecs/N); writeln(test); } |
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Paul Jurczak | Paul Jurczak:
> I wrote a few versions of a palindrome testing function and noticed that versions using ranges (isPalindrome1..isPalindrome3) are 3 times slower than array indexing one (isPalindrome0). Is there a more efficient way of using ranges?
I have modified and improved and fixed your code a little:
import std.stdio, std.datetime, std.array, std.typetuple,
std.range, std.algorithm;
bool isPalindrome0(T)(in T[] s) pure nothrow {
size_t i = 0;
for (; i < s.length / 2 && s[i] == s[$ - i - 1]; ++i) {}
return i == s.length / 2;
}
bool isPalindrome1(T)(T[] s) pure nothrow {
while (s.length > 1 && s.front == s.back) {
s.popBack;
s.popFront;
}
return s.length <= 1;
}
bool isPalindrome2(T)(T[] s) pure nothrow {
for (;
s.length > 1 && s.front == s.back;
s.popBack, s.popFront) {}
return s.length <= 1;
}
bool isPalindrome3(T)(in T[] s) pure nothrow {
foreach (immutable i; 0 .. s.length / 2)
if (s[i] != s[$ - i - 1])
return false;
return true;
}
/// A high-level version.
bool isPalindrome4(T)(in T[] s) /*pure nothrow*/ {
return s.retro.equal(s);
}
int test(alias F)(in int nLoops) /*pure nothrow*/ {
int[10] a;
typeof(return) n = 0;
foreach (immutable _; 0 .. nLoops) {
a[4] = !a[4];
n += F(a);
}
return n;
}
void main() {
enum size_t nLoops = 60_000_000;
StopWatch sw;
foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1,
isPalindrome2, isPalindrome3,
isPalindrome4)) {
sw.reset;
sw.start;
immutable n = test!alg(nLoops);
sw.stop;
writeln(n, " ", sw.peek.msecs);
}
}
Using a compiler with a better optimizing back-end (especially with a better inlining), the timings show that the faster versions are the second and third, but all of them have a very similar performance (the last one reads the whole array twice, so it's a little slower, as expected):
...>ldmd2 -O -release -inline -noboundscheck -run test2.d
30000000 1150
30000000 1073
30000000 1038
30000000 1215
30000000 1621
Someday a function like isPalindrome4 will be pure & nothrow.
Bye,
bearophile
|
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Sunday, 25 August 2013 at 14:54:04 UTC, bearophile wrote: > I have modified and improved and fixed your code a little: [..] Thank you for your analysis. I've run it on Linux with DMD64 2.063.2: dmd -O -noboundscheck -inline -release and relative timings are different than on Windows: 500000 7160 500000 18588 500000 18592 500000 14466 500000 28368 I'm also running it with LDC, but it reports timings too good to be true - something meaningful is getting optimized away and I'm trying to find out why. I have a few newbie questions below: > int test(alias F)(in int nLoops) /*pure nothrow*/ { > int[10] a; > typeof(return) n = 0; > > foreach (immutable _; 0 .. nLoops) { > a[4] = !a[4]; > n += F(a); > } > > return n; > } What is the purpose of "immutable _;" above? Why not "i;"? > void main() { > enum size_t nLoops = 60_000_000; > StopWatch sw; > > foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, > isPalindrome2, isPalindrome3, > isPalindrome4)) { > sw.reset; > sw.start; > immutable n = test!alg(nLoops); > sw.stop; > writeln(n, " ", sw.peek.msecs); > } > } Same question here: why immutable? I wanted to get more stable run-to-run results, so I'm looking for minimums: void main() { enum N = 100; enum M = 1_000_000; StopWatch sw; foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2, isPalindrome3, isPalindrome4)) { int [N] results; long[N] timings; foreach (i; 0..N) { sw.reset; sw.start; results[i] = test!alg(M); sw.stop; timings[i] = sw.peek.usecs; } writeln(results.reduce!min, " ", timings.reduce!min); } } |
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Paul Jurczak | On 25/08/13 20:07, Paul Jurczak wrote:
> What is the purpose of "immutable _;" above? Why not "i;"?
It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i).
Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ...
|
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On Sunday, 25 August 2013 at 18:13:39 UTC, Joseph Rushton Wakeling wrote: > On 25/08/13 20:07, Paul Jurczak wrote: >> What is the purpose of "immutable _;" above? Why not "i;"? > > It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i). > > Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ... It never clashes untill you nest two foreach, and then you have to use __ ... foreach( _ ; 0 .. M) foreach( __ ; 0 .. N) ... I have an enhancement request to simply allow anonymous iteration: foreach( ; 0 .. N) http://d.puremagic.com/issues/show_bug.cgi?id=9009 |
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Paul Jurczak | On Sunday, 25 August 2013 at 18:07:33 UTC, Paul Jurczak wrote:
[..]
> I'm also running it with LDC, but it reports timings too good to be true - something meaningful is getting optimized away and I'm trying to find out why.
[..]
It seems that LDC optimizer is smarter than I expected and it was able to optimize away comparison on constant elements of array a in my test:
int test(alias F)(int N) {
int a[10];
int n = 0;
foreach (immutable _; 0..N) {
a[4] = !a[4];
n += F(a);
}
return n;
}
I modified it, so there is no constant elements and all cases of pairwise comparison are exercised:
int test(alias F)(int N) {
enum L = 10;
int a[L];
int n = 0;
foreach (int i; 0..N) {
auto j = (i%L + L/2)%L;
a[j] = !a[j];
n += F(a);
}
return n;
}
Here are the results:
Xubuntu 13.04 64-bit Core i5 3450S 2.8GHz (3.5GHz turbo):
LDC 0.11.0: ldmd2 -m64 -O -noboundscheck -inline -release
100000 5284
100000 4265
100000 4268
100000 5717
100000 4265
GDC 4.8.1: gdc -m64 -march=native -fno-bounds-check -frename-registers -frelease -O3
100000 4612
100000 5717
100000 5718
100000 4984
100000 11546
DMD64 2.063.2: dmd -O -noboundscheck -inline -release
100000 8350
100000 14615
100000 14597
100000 14270
100000 18509
Windows 7 64-bit Core i5 2500 3.4GHz turbo:
DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline -release
100000 9803
100000 17050
100000 17031
100000 20851
100000 20283
I'm stunned by LDC ability to optimize isPalindrome4 to the best performance level of the whole group. Other compilers placed it at the bottom of performance rank.
|
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On 25/08/13 21:10, monarch_dodra wrote:
> It never clashes untill you nest two foreach, and then you have to use __ ...
>
> foreach( _ ; 0 .. M)
> foreach( __ ; 0 .. N)
> ...
>
> I have an enhancement request to simply allow anonymous iteration:
> foreach( ; 0 .. N)
>
> http://d.puremagic.com/issues/show_bug.cgi?id=9009
Good call. :-)
|
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Paul Jurczak | Paul Jurczak: >> int test(alias F)(in int nLoops) /*pure nothrow*/ { >> int[10] a; >> typeof(return) n = 0; >> >> foreach (immutable _; 0 .. nLoops) { >> a[4] = !a[4]; >> n += F(a); >> } >> >> return n; >> } > > What is the purpose of "immutable _;" above? Why not "i;"? I have used _ as variable iteration name to remind the person that is reading the code that loop variable name is not used inside the loop itself. When you have more of those variables you can number them as _1, _2, or use no more than two underscores __. And there the usage of immutable is to disallow the mutation the iteration variable inside the loop. In my opinion, to keep code clean and less bug-prone D should on make foreach indexes immutable on default. This saves you from bugs when you iterate on struct items: struct Foo { int x; } auto foos = [Foo(10), Foo(20)]; foreach (f; foos) f.x++; It's not uncommon in those cases to forget a "ref" and think foos are changed. If iteration items are immutable as in C# this saves you from this class of bugs. I have asked this in past, but D lacks a "mutable" keyword for the situations when you want to actually mutate the foreach variable, and generally Walter didn't sound very interested on this. So as workaround for this language design problem, it's a good habit to always put a const/immutable on foreach variables, unless you don't wait it, and unless something forces you to not use a const. >> void main() { >> enum size_t nLoops = 60_000_000; >> StopWatch sw; >> >> foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, >> isPalindrome2, isPalindrome3, >> isPalindrome4)) { >> sw.reset; >> sw.start; >> immutable n = test!alg(nLoops); >> sw.stop; >> writeln(n, " ", sw.peek.msecs); >> } >> } > > Same question here: why immutable? It's explained here: http://blog.knatten.org/2011/11/11/disempower-every-variable/ In short all your variables/function arguments should be tagged as const/immutable unless the semantics of your algorithm needs them to mutate, or unless some limit in D/Phobos forbids you to (like when you have created a lazy range, currently you can't assign it to const and then use it). Bye, bearophile |
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 25/08/13 22:22, bearophile wrote:
> In short all your variables/function arguments should be tagged as
> const/immutable unless the semantics of your algorithm needs them to mutate, or
> unless some limit in D/Phobos forbids you to (like when you have created a lazy
> range, currently you can't assign it to const and then use it).
It's slightly annoying that one can't readily get immutability to play nice with more general iterations than i .. j.
For example if you consider the loop,
for (i = 10; i > 0; --i) { ... }
You can't make i immutable for obvious reasons but it's not clear to me how you can get a performant version of that with immutable i.
You could do something like [*],
foreach (immutable i; iota(1, 11).retro) { ... }
... but that will be slow and inefficient.
Ditto for cases like
for (i = 0; i < n; i += 2) { ... }
which you could write as
foreach (immutable i; iota(0, n, 2)) { ... }
... but again, that will be slow compared to the for loop or a foreach across an interval.
[* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned. Would be useful for cases like these.]
|
August 25, 2013 Re: Performance penalty for using ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | Joseph Rushton Wakeling: > It's slightly annoying that one can't readily get immutability to play nice with more general iterations than i .. j. > > For example if you consider the loop, > > for (i = 10; i > 0; --i) { ... } Thankfully foreach_reverse was not deprecated: void main() { import std.stdio; for (auto i = 10; i > 0; --i) write(i, " "); writeln; foreach_reverse (immutable i; 1 .. 11) write(i, " "); writeln; } > [* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned. Would be useful for cases like these.] Yes, it's a kind of necessary enhancement: http://d.puremagic.com/issues/show_bug.cgi?id=10466 Bye, bearophile |
Copyright © 1999-2021 by the D Language Foundation