Jump to page: 1 2 3
Thread overview
Performance penalty for using ranges
Aug 25, 2013
Paul Jurczak
Aug 25, 2013
bearophile
Aug 25, 2013
Paul Jurczak
Aug 25, 2013
monarch_dodra
Aug 25, 2013
bearophile
Aug 25, 2013
bearophile
Aug 26, 2013
monarch_dodra
Aug 25, 2013
Paul Jurczak
Aug 25, 2013
bearophile
Aug 25, 2013
bearophile
Aug 25, 2013
monarch_dodra
Aug 25, 2013
H. S. Teoh
Aug 25, 2013
Timon Gehr
Aug 25, 2013
bearophile
Aug 25, 2013
Timon Gehr
Aug 25, 2013
Timon Gehr
Aug 25, 2013
bearophile
Aug 25, 2013
Timon Gehr
August 25, 2013
I wrote a few versions of a palindrome testing function and noticed that versions using ranges (isPalindrome1..isPalindrome3) are 3 times slower than array indexing one (isPalindrome0). Is there a more efficient way of using ranges? Timing was done on Windows with DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline -release.

// 6.9ms
bool isPalindrome0(T)(T[] s) {
   auto i = 0;
   for (; i < s.length/2 && s[i] == s[$-i-1]; ++i) {}
   return i == s.length/2;
}

// 21ms
bool isPalindrome1(T)(T[] s) {
   while (s.length > 1 && s.front == s.back) {
      s.popBack();
      s.popFront();
   }

   return s.length <= 1;
}

// 21ms
bool isPalindrome2(T)(T[] s) {
   for (; s.length > 1 && s.front == s.back; s.popBack(), s.popFront()) {}
   return s.length <= 1;
}

// 21.4ms
bool isPalindrome3(T)(T s) {
   foreach (i; 0..s.length/2)
      if (s[i] != s[$-i-1])
         return false;

   return true;
}


int test() {
   int a[10];
   int n = 0;

   foreach (i; 0..1_000_000) {
      a[4] = !a[4];
      n += isPalindrome0(a);
   }

   return n;
}


void main()
{
   enum N = 16;
   auto t = benchmark!(test)(N);

   writeln(cast(float)t[0].msecs/N);
   writeln(test);
}
August 25, 2013
Paul Jurczak:

> I wrote a few versions of a palindrome testing function and noticed that versions using ranges (isPalindrome1..isPalindrome3) are 3 times slower than array indexing one (isPalindrome0). Is there a more efficient way of using ranges?

I have modified and improved and fixed your code a little:


import std.stdio, std.datetime, std.array, std.typetuple,
       std.range, std.algorithm;

bool isPalindrome0(T)(in T[] s) pure nothrow {
    size_t i = 0;
    for (; i < s.length / 2 && s[i] == s[$ - i - 1]; ++i) {}
    return i == s.length / 2;
}

bool isPalindrome1(T)(T[] s) pure nothrow {
    while (s.length > 1 && s.front == s.back) {
        s.popBack;
        s.popFront;
    }

    return s.length <= 1;
}

bool isPalindrome2(T)(T[] s) pure nothrow {
    for (;
         s.length > 1 && s.front == s.back;
         s.popBack, s.popFront) {}
    return s.length <= 1;
}

bool isPalindrome3(T)(in T[] s) pure nothrow {
    foreach (immutable i; 0 .. s.length / 2)
        if (s[i] != s[$ - i - 1])
            return false;
    return true;
}

/// A high-level version.
bool isPalindrome4(T)(in T[] s) /*pure nothrow*/ {
    return s.retro.equal(s);
}

int test(alias F)(in int nLoops) /*pure nothrow*/ {
    int[10] a;
    typeof(return) n = 0;

    foreach (immutable _; 0 .. nLoops) {
        a[4] = !a[4];
        n += F(a);
    }

    return n;
}

void main() {
    enum size_t nLoops = 60_000_000;
    StopWatch sw;

    foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1,
                             isPalindrome2, isPalindrome3,
                             isPalindrome4)) {
        sw.reset;
        sw.start;
        immutable n = test!alg(nLoops);
        sw.stop;
        writeln(n, " ", sw.peek.msecs);
    }
}


Using a compiler with a better optimizing back-end (especially with a better inlining), the timings show that the faster versions are the second and third, but all of them have a very similar performance (the last one reads the whole array twice, so it's a little slower, as expected):


...>ldmd2 -O -release -inline -noboundscheck -run test2.d
30000000 1150
30000000 1073
30000000 1038
30000000 1215
30000000 1621


Someday a function like isPalindrome4 will be pure & nothrow.

Bye,
bearophile
August 25, 2013
On Sunday, 25 August 2013 at 14:54:04 UTC, bearophile wrote:
> I have modified and improved and fixed your code a little:
[..]
Thank you for your analysis. I've run it on Linux with DMD64 2.063.2: dmd -O -noboundscheck -inline -release and relative timings are different than on Windows:

500000 7160
500000 18588
500000 18592
500000 14466
500000 28368

I'm also running it with LDC, but it reports timings too good to be true - something meaningful is getting optimized away and I'm trying to find out why. I have a few newbie questions below:

> int test(alias F)(in int nLoops) /*pure nothrow*/ {
>     int[10] a;
>     typeof(return) n = 0;
>
>     foreach (immutable _; 0 .. nLoops) {
>         a[4] = !a[4];
>         n += F(a);
>     }
>
>     return n;
> }

What is the purpose of "immutable _;" above? Why not "i;"?

> void main() {
>     enum size_t nLoops = 60_000_000;
>     StopWatch sw;
>
>     foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1,
>                              isPalindrome2, isPalindrome3,
>                              isPalindrome4)) {
>         sw.reset;
>         sw.start;
>         immutable n = test!alg(nLoops);
>         sw.stop;
>         writeln(n, " ", sw.peek.msecs);
>     }
> }

Same question here: why immutable?

I wanted to get more stable run-to-run results, so I'm looking for minimums:

void main()
{
   enum      N = 100;
   enum      M = 1_000_000;
   StopWatch sw;

   foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2,
	                          isPalindrome3, isPalindrome4)) {
      int [N] results;
      long[N] timings;

      foreach (i; 0..N) {
         sw.reset;
         sw.start;
         results[i] = test!alg(M);
         sw.stop;
         timings[i] = sw.peek.usecs;
      }

      writeln(results.reduce!min, "  ", timings.reduce!min);
   }
}
August 25, 2013
On 25/08/13 20:07, Paul Jurczak wrote:
> What is the purpose of "immutable _;" above? Why not "i;"?

It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i).

Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ...
August 25, 2013
On Sunday, 25 August 2013 at 18:13:39 UTC, Joseph Rushton Wakeling wrote:
> On 25/08/13 20:07, Paul Jurczak wrote:
>> What is the purpose of "immutable _;" above? Why not "i;"?
>
> It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i).
>
> Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ...

It never clashes untill you nest two foreach, and then you have to use __ ...

foreach( _ ; 0 .. M)
    foreach( __ ; 0 .. N)
        ...

I have an enhancement request to simply allow anonymous iteration:
foreach( ; 0 .. N)

http://d.puremagic.com/issues/show_bug.cgi?id=9009
August 25, 2013
On Sunday, 25 August 2013 at 18:07:33 UTC, Paul Jurczak wrote:
[..]
> I'm also running it with LDC, but it reports timings too good to be true - something meaningful is getting optimized away and I'm trying to find out why.
[..]

It seems that LDC optimizer is smarter than I expected and it was able to optimize away comparison on constant elements of array a in my test:

int test(alias F)(int N) {
   int a[10];
   int n = 0;

   foreach (immutable _; 0..N) {
      a[4] = !a[4];
      n += F(a);
   }

   return n;
}

I modified it, so there is no constant elements and all cases of pairwise comparison are exercised:

int test(alias F)(int N) {
   enum L = 10;
   int  a[L];
   int  n = 0;

   foreach (int i; 0..N) {
      auto j = (i%L + L/2)%L;
      a[j] = !a[j];
      n += F(a);
   }

   return n;
}

Here are the results:

Xubuntu 13.04 64-bit Core i5 3450S 2.8GHz (3.5GHz turbo):
LDC 0.11.0: ldmd2 -m64 -O -noboundscheck -inline -release
100000  5284
100000  4265
100000  4268
100000  5717
100000  4265

GDC 4.8.1: gdc -m64 -march=native -fno-bounds-check -frename-registers -frelease -O3
100000  4612
100000  5717
100000  5718
100000  4984
100000  11546

DMD64 2.063.2: dmd -O -noboundscheck -inline -release
100000  8350
100000  14615
100000  14597
100000  14270
100000  18509

Windows 7 64-bit Core i5 2500 3.4GHz turbo:
DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline -release
100000  9803
100000  17050
100000  17031
100000  20851
100000  20283

I'm stunned by LDC ability to optimize isPalindrome4 to the best performance level of the whole group. Other compilers placed it at the bottom of performance rank.

August 25, 2013
On 25/08/13 21:10, monarch_dodra wrote:
> It never clashes untill you nest two foreach, and then you have to use __ ...
>
> foreach( _ ; 0 .. M)
>      foreach( __ ; 0 .. N)
>          ...
>
> I have an enhancement request to simply allow anonymous iteration:
> foreach( ; 0 .. N)
>
> http://d.puremagic.com/issues/show_bug.cgi?id=9009

Good call. :-)

August 25, 2013
Paul Jurczak:

>> int test(alias F)(in int nLoops) /*pure nothrow*/ {
>>    int[10] a;
>>    typeof(return) n = 0;
>>
>>    foreach (immutable _; 0 .. nLoops) {
>>        a[4] = !a[4];
>>        n += F(a);
>>    }
>>
>>    return n;
>> }
>
> What is the purpose of "immutable _;" above? Why not "i;"?

I have used _ as variable iteration name to remind the person that is reading the code that loop variable name is not used inside the loop itself. When you have more of those variables you can number them as _1, _2, or use no more than two underscores __.

And there the usage of immutable is to disallow the mutation the iteration variable inside the loop. In my opinion, to keep code clean and less bug-prone D should on make foreach indexes immutable on default. This saves you from bugs when you iterate on struct items:

struct Foo { int x; }
auto foos = [Foo(10), Foo(20)];
foreach (f; foos)
    f.x++;

It's not uncommon in those cases to forget a "ref" and think foos are changed. If iteration items are immutable as in C# this saves you from this class of bugs.

I have asked this in past, but D lacks a "mutable" keyword for the situations when you want to actually mutate the foreach variable, and generally Walter didn't sound very interested on this.

So as workaround for this language design problem, it's a good habit to always put a const/immutable on foreach variables, unless you don't wait it, and unless something forces you to not use a const.


>> void main() {
>>    enum size_t nLoops = 60_000_000;
>>    StopWatch sw;
>>
>>    foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1,
>>                             isPalindrome2, isPalindrome3,
>>                             isPalindrome4)) {
>>        sw.reset;
>>        sw.start;
>>        immutable n = test!alg(nLoops);
>>        sw.stop;
>>        writeln(n, " ", sw.peek.msecs);
>>    }
>> }
>
> Same question here: why immutable?

It's explained here:
http://blog.knatten.org/2011/11/11/disempower-every-variable/

In short all your variables/function arguments should be tagged as const/immutable unless the semantics of your algorithm needs them to mutate, or unless some limit in D/Phobos forbids you to (like when you have created a lazy range, currently you can't assign it to const and then use it).

Bye,
bearophile
August 25, 2013
On 25/08/13 22:22, bearophile wrote:
> In short all your variables/function arguments should be tagged as
> const/immutable unless the semantics of your algorithm needs them to mutate, or
> unless some limit in D/Phobos forbids you to (like when you have created a lazy
> range, currently you can't assign it to const and then use it).

It's slightly annoying that one can't readily get immutability to play nice with more general iterations than i .. j.

For example if you consider the loop,

    for (i = 10; i > 0; --i) { ... }

You can't make i immutable for obvious reasons but it's not clear to me how you can get a performant version of that with immutable i.

You could do something like [*],

    foreach (immutable i; iota(1, 11).retro) { ... }

... but that will be slow and inefficient.

Ditto for cases like

    for (i = 0; i < n; i += 2) { ... }

which you could write as

    foreach (immutable i; iota(0, n, 2)) { ... }

... but again, that will be slow compared to the for loop or a foreach across an interval.


[* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned.  Would be useful for cases like these.]
August 25, 2013
Joseph Rushton Wakeling:

> It's slightly annoying that one can't readily get immutability to play nice with more general iterations than i .. j.
>
> For example if you consider the loop,
>
>     for (i = 10; i > 0; --i) { ... }

Thankfully foreach_reverse was not deprecated:

void main() {
    import std.stdio;

    for (auto i = 10; i > 0; --i)
        write(i, " ");
    writeln;

    foreach_reverse (immutable i; 1 .. 11)
        write(i, " ");
    writeln;
}


> [* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned.  Would be useful for cases like these.]

Yes, it's a kind of necessary enhancement:
http://d.puremagic.com/issues/show_bug.cgi?id=10466

Bye,
bearophile
« First   ‹ Prev
1 2 3