Performance of loops - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Performance of loops

Thread overview

Performance of loops
Apr 24, 2015 Chris
Apr 24, 2015 John Colvin
Apr 24, 2015 Chris
Apr 24, 2015 Marc Schütz
Apr 24, 2015 Chris
Apr 24, 2015 Steven Schveighoffer
Apr 24, 2015 Chris
Apr 24, 2015 Baz
Apr 27, 2015 Chris

April 24, 2015

Performance of loops

Posted by Chris

Chris

I tested the performance of three types of loops (see code below). It turns out that the fastest loop is the "plainLoop". Unless my examples are completely screwed up, the difference between "plainLoop" and the other two loops is gigantic (e.g.):

9 ms, 149 μs, and 4 hnsecs  // foreach (const ref w)
9 ms, 77 μs, and 8 hnsecs   // foreach (ref w)
1 ms, 183 μs, and 6 hnsecs  // foreach (w)

with -release -inline -O -boundscheck=off

8 ms, 492 μs, and 3 hnsecs
8 ms, 287 μs, and 1 hnsec
341 μs and 2 hnsecs

[compiler dmd v2.067.0, Linux Ubuntu, 64bit]


import std.datetime : benchmark, Duration;
import std.string : format;
import std.conv : to;
import std.stdio : writeln;

enum {
  string[] words = ["Hello", "world", "Ola", "mundo"],
}

void main() {
  auto result = benchmark!(constLoop, refLoop, plainLoop)(100_000);
  writeln(to!Duration(result[0]));
  writeln(to!Duration(result[1]));
  writeln(to!Duration(result[2]));

}

void constLoop() {
  size_t cnt;
  foreach (const ref w; words)
    cnt += w.length;
}

void refLoop() {
  size_t cnt;
  foreach (ref w; words)
    cnt += w.length;
}

void plainLoop() {
  size_t cnt;
  foreach (w; words)
    cnt += w.length;
}

April 24, 2015

Re: Performance of loops

Posted by John Colvin
in reply to Chris

John Colvin

Posted in reply to Chris

On Friday, 24 April 2015 at 11:00:23 UTC, Chris wrote:
> I tested the performance of three types of loops (see code below). It turns out that the fastest loop is the "plainLoop". Unless my examples are completely screwed up, the difference between "plainLoop" and the other two loops is gigantic (e.g.):
>
> 9 ms, 149 μs, and 4 hnsecs  // foreach (const ref w)
> 9 ms, 77 μs, and 8 hnsecs   // foreach (ref w)
> 1 ms, 183 μs, and 6 hnsecs  // foreach (w)
>
> with -release -inline -O -boundscheck=off
>
> 8 ms, 492 μs, and 3 hnsecs
> 8 ms, 287 μs, and 1 hnsec
> 341 μs and 2 hnsecs
>
> [compiler dmd v2.067.0, Linux Ubuntu, 64bit]
>
>
> import std.datetime : benchmark, Duration;
> import std.string : format;
> import std.conv : to;
> import std.stdio : writeln;
>
> enum {
>   string[] words = ["Hello", "world", "Ola", "mundo"],
> }
>
> void main() {
>   auto result = benchmark!(constLoop, refLoop, plainLoop)(100_000);
>   writeln(to!Duration(result[0]));
>   writeln(to!Duration(result[1]));
>   writeln(to!Duration(result[2]));
>
> }
>
> void constLoop() {
>   size_t cnt;
>   foreach (const ref w; words)
>     cnt += w.length;
> }
>
> void refLoop() {
>   size_t cnt;
>   foreach (ref w; words)
>     cnt += w.length;
> }
>
> void plainLoop() {
>   size_t cnt;
>   foreach (w; words)
>     cnt += w.length;
> }

dmd's optimiser isn't great. GDC creates identical assembly for all three functions.

Rule of thumb: if you want high performance, use GDC or LDC.

April 24, 2015

Re: Performance of loops

Posted by Chris
in reply to John Colvin

Chris

Posted in reply to John Colvin

On Friday, 24 April 2015 at 11:39:14 UTC, John Colvin wrote:
> On Friday, 24 April 2015 at 11:00:23 UTC, Chris wrote:
>> I tested the performance of three types of loops (see code below). It turns out that the fastest loop is the "plainLoop". Unless my examples are completely screwed up, the difference between "plainLoop" and the other two loops is gigantic (e.g.):
>>
>> 9 ms, 149 μs, and 4 hnsecs  // foreach (const ref w)
>> 9 ms, 77 μs, and 8 hnsecs   // foreach (ref w)
>> 1 ms, 183 μs, and 6 hnsecs  // foreach (w)
>>
>> with -release -inline -O -boundscheck=off
>>
>> 8 ms, 492 μs, and 3 hnsecs
>> 8 ms, 287 μs, and 1 hnsec
>> 341 μs and 2 hnsecs
>>
>> [compiler dmd v2.067.0, Linux Ubuntu, 64bit]
>>
>>
>> import std.datetime : benchmark, Duration;
>> import std.string : format;
>> import std.conv : to;
>> import std.stdio : writeln;
>>
>> enum {
>>  string[] words = ["Hello", "world", "Ola", "mundo"],
>> }
>>
>> void main() {
>>  auto result = benchmark!(constLoop, refLoop, plainLoop)(100_000);
>>  writeln(to!Duration(result[0]));
>>  writeln(to!Duration(result[1]));
>>  writeln(to!Duration(result[2]));
>>
>> }
>>
>> void constLoop() {
>>  size_t cnt;
>>  foreach (const ref w; words)
>>    cnt += w.length;
>> }
>>
>> void refLoop() {
>>  size_t cnt;
>>  foreach (ref w; words)
>>    cnt += w.length;
>> }
>>
>> void plainLoop() {
>>  size_t cnt;
>>  foreach (w; words)
>>    cnt += w.length;
>> }
>
> dmd's optimiser isn't great. GDC creates identical assembly for all three functions.
>
> Rule of thumb: if you want high performance, use GDC or LDC.

Yeah, I figure. However, the gap is a serious issue for dmd, even if it's only a reference compiler.

April 24, 2015

Re: Performance of loops

Posted by Marc Schütz
in reply to Chris

Marc Schütz

Posted in reply to Chris

Most of the time is taken up by the array's allocation. The optimizers of both DMD and LDC evidently doesn't optimize it away when you use `ref`, even though it could in theory.

Remove the `enum` and just use a normal global variable, and you will get more or less identical times for all three loops. LDC even turns then into empty functions (didn't check for DMD).

April 24, 2015

Re: Performance of loops

Posted by Chris
in reply to Marc Schütz

Chris

Posted in reply to Marc Schütz

On Friday, 24 April 2015 at 11:53:56 UTC, Marc Schütz wrote:
> Most of the time is taken up by the array's allocation. The optimizers of both DMD and LDC evidently doesn't optimize it away when you use `ref`, even though it could in theory.
>
> Remove the `enum` and just use a normal global variable, and you will get more or less identical times for all three loops. LDC even turns then into empty functions (didn't check for DMD).

I've tried it, and the loops indeed show similar results. What is it about `enum` that slows it down?

April 24, 2015

Re: Performance of loops

Posted by Baz
in reply to Chris

Baz

Posted in reply to Chris

On Friday, 24 April 2015 at 11:00:23 UTC, Chris wrote:
> I tested the performance of three types of loops (see code below). It turns out that the fastest loop is the "plainLoop". Unless my examples are completely screwed up, the difference between "plainLoop" and the other two loops is gigantic (e.g.):
>
> 9 ms, 149 μs, and 4 hnsecs  // foreach (const ref w)
> 9 ms, 77 μs, and 8 hnsecs   // foreach (ref w)
> 1 ms, 183 μs, and 6 hnsecs  // foreach (w)
>
> with -release -inline -O -boundscheck=off
>
> 8 ms, 492 μs, and 3 hnsecs
> 8 ms, 287 μs, and 1 hnsec
> 341 μs and 2 hnsecs
>
> [compiler dmd v2.067.0, Linux Ubuntu, 64bit]
>
>
> import std.datetime : benchmark, Duration;
> import std.string : format;
> import std.conv : to;
> import std.stdio : writeln;
>
> enum {
>   string[] words = ["Hello", "world", "Ola", "mundo"],
> }
>
> void main() {
>   auto result = benchmark!(constLoop, refLoop, plainLoop)(100_000);
>   writeln(to!Duration(result[0]));
>   writeln(to!Duration(result[1]));
>   writeln(to!Duration(result[2]));
>
> }
>
> void constLoop() {
>   size_t cnt;
>   foreach (const ref w; words)
>     cnt += w.length;
> }
>
> void refLoop() {
>   size_t cnt;
>   foreach (ref w; words)
>     cnt += w.length;
> }
>
> void plainLoop() {
>   size_t cnt;
>   foreach (w; words)
>     cnt += w.length;
> }

You should temper the results because these kinds of tests are known not to represent well the reality. In a real program, the loops performances are affected by the instruction cache, the memory cache, the complexity of the operations inside the loop (nested TEST/CMP). Actually this kind of test represents an ideal, unreallisctic situation, that will never happend in a "IRL" program.

April 24, 2015

Re: Performance of loops

Posted by Steven Schveighoffer
in reply to Chris

Steven Schveighoffer

Posted in reply to Chris

On 4/24/15 9:13 AM, Chris wrote:
> On Friday, 24 April 2015 at 11:53:56 UTC, Marc Schütz wrote:
>> Most of the time is taken up by the array's allocation. The optimizers
>> of both DMD and LDC evidently doesn't optimize it away when you use
>> `ref`, even though it could in theory.
>>
>> Remove the `enum` and just use a normal global variable, and you will
>> get more or less identical times for all three loops. LDC even turns
>> then into empty functions (didn't check for DMD).
>
> I've tried it, and the loops indeed show similar results. What is it
> about `enum` that slows it down?

enum of an array literal is like pasting the array literal wherever the enum lives.

So for example, your const loop:

void constLoop() {
  size_t cnt;
  foreach (const ref w; words)
    cnt += w.length;
}

Is really like saying:

void constLoop() {
  size_t cnt;
  foreach (const ref w; ["Hello", "world", "Ola", "mundo"])
    cnt += w.length;
}

And note that in D right now, an array literal is NOT stored in the data segment. It's generated every time you use it (with a call to _d_newarray, which allocates on the GC).

There was a push a while back to make array literals immutable and placed in the data segment. But it never panned out, would break too much code.

-Steve

April 24, 2015

Re: Performance of loops

Posted by Chris
in reply to Steven Schveighoffer

Chris

Posted in reply to Steven Schveighoffer

On Friday, 24 April 2015 at 13:55:49 UTC, Steven Schveighoffer wrote:
> On 4/24/15 9:13 AM, Chris wrote:
>> On Friday, 24 April 2015 at 11:53:56 UTC, Marc Schütz wrote:
>>> Most of the time is taken up by the array's allocation. The optimizers
>>> of both DMD and LDC evidently doesn't optimize it away when you use
>>> `ref`, even though it could in theory.
>>>
>>> Remove the `enum` and just use a normal global variable, and you will
>>> get more or less identical times for all three loops. LDC even turns
>>> then into empty functions (didn't check for DMD).
>>
>> I've tried it, and the loops indeed show similar results. What is it
>> about `enum` that slows it down?
>
> enum of an array literal is like pasting the array literal wherever the enum lives.
>
> So for example, your const loop:
>
> void constLoop() {
>   size_t cnt;
>   foreach (const ref w; words)
>     cnt += w.length;
> }
>
> Is really like saying:
>
> void constLoop() {
>   size_t cnt;
>   foreach (const ref w; ["Hello", "world", "Ola", "mundo"])
>     cnt += w.length;
> }
>
> And note that in D right now, an array literal is NOT stored in the data segment. It's generated every time you use it (with a call to _d_newarray, which allocates on the GC).
>
> There was a push a while back to make array literals immutable and placed in the data segment. But it never panned out, would break too much code.
>
> -Steve

Thanks for the clarification.

April 27, 2015

Re: Performance of loops

Posted by Chris
in reply to Baz

Chris

Posted in reply to Baz

On Friday, 24 April 2015 at 13:27:18 UTC, Baz wrote:
> On Friday, 24 April 2015 at 11:00:23 UTC, Chris wrote:
>> I tested the performance of three types of loops (see code below). It turns out that the fastest loop is the "plainLoop". Unless my examples are completely screwed up, the difference between "plainLoop" and the other two loops is gigantic (e.g.):
>>
>> 9 ms, 149 μs, and 4 hnsecs  // foreach (const ref w)
>> 9 ms, 77 μs, and 8 hnsecs   // foreach (ref w)
>> 1 ms, 183 μs, and 6 hnsecs  // foreach (w)
>>
>> with -release -inline -O -boundscheck=off
>>
>> 8 ms, 492 μs, and 3 hnsecs
>> 8 ms, 287 μs, and 1 hnsec
>> 341 μs and 2 hnsecs
>>
>> [compiler dmd v2.067.0, Linux Ubuntu, 64bit]
>>
>>
>> import std.datetime : benchmark, Duration;
>> import std.string : format;
>> import std.conv : to;
>> import std.stdio : writeln;
>>
>> enum {
>>  string[] words = ["Hello", "world", "Ola", "mundo"],
>> }
>>
>> void main() {
>>  auto result = benchmark!(constLoop, refLoop, plainLoop)(100_000);
>>  writeln(to!Duration(result[0]));
>>  writeln(to!Duration(result[1]));
>>  writeln(to!Duration(result[2]));
>>
>> }
>>
>> void constLoop() {
>>  size_t cnt;
>>  foreach (const ref w; words)
>>    cnt += w.length;
>> }
>>
>> void refLoop() {
>>  size_t cnt;
>>  foreach (ref w; words)
>>    cnt += w.length;
>> }
>>
>> void plainLoop() {
>>  size_t cnt;
>>  foreach (w; words)
>>    cnt += w.length;
>> }
>
> You should temper the results because these kinds of tests are known not to represent well the reality. In a real program, the loops performances are affected by the instruction cache, the memory cache, the complexity of the operations inside the loop (nested TEST/CMP). Actually this kind of test represents an ideal, unreallisctic situation, that will never happend in a "IRL" program.

Yes, I am aware of this. It's hard to come up with meaningful tests in no time. At least this one helped me to understand that `enum` can be very harmful performance wise.

Unfortunately, I cannot easily extract loops and logic of my real programs to put on the forum. Too complicated.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation