April 01, 2021

On Thursday, 1 April 2021 at 19:00:08 UTC, Berni44 wrote:

>

On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:

>

I was hoping to beat my dear Python and get similar results to Go, but that is not the case neither using rdmd nor running the executable generated by dmd. I am getting values between 350-380 ms, and 81ms in Python.

Try using ldc2 instead of dmd:

ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d

should produce much better results.

It did! Tried those flags with dmd and ldc and got the following (ms) for the approaches I had earlier (made two runs for each)

DMD
11
7
6
4

DMD
15
7
10
6

LDC
6
7
9
6

LDC
12
6
8
5

April 01, 2021
On Thursday, 1 April 2021 at 19:00:08 UTC, Berni44 wrote:
>
> Try using ldc2 instead of dmd:
>
> ```
> ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
> ```
>
> should produce much better results.

Since this is a "Learn" part of the Foruam, be careful with "-boundscheck=off".

I mean for this little snippet is OK, but for a other projects this my be wrong, and as it says here: https://dlang.org/dmd-windows.html#switch-boundscheck

"This option should be used with caution and as a last resort to improve performance. Confirm turning off @safe bounds checks is worthwhile by benchmarking."

Matheus.
April 01, 2021
On 01.04.21 21:00, Berni44 wrote:
> ```
> ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
> ```

Please don't recommend `-boundscheck=off` to newbies. It's not just an optimization. It breaks @safe. If you want to do welding without eye protection, that's on you. But please don't recommend it to the new guy.
April 01, 2021
On 4/1/21 3:27 PM, ag0aep6g wrote:
> On 01.04.21 21:00, Berni44 wrote:
>> ```
>> ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
>> ```
> 
> Please don't recommend `-boundscheck=off` to newbies. It's not just an optimization. It breaks @safe. If you want to do welding without eye protection, that's on you. But please don't recommend it to the new guy.

Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on for @safe code.

though I personally leave it on for everything.

-Steve
April 01, 2021
On Thu, Apr 01, 2021 at 04:52:17PM +0000, Nestor via Digitalmars-d-learn wrote: [...]
> ```
> import std.stdio;
> import std.random;
> import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
> import std.algorithm;
> 
> void main()
> {
>     auto sw = StopWatch(AutoStart.no);
>     sw.start();
>     int[] mylist;

Since the length of the array is already known beforehand, you could get significant speedups by preallocating the array:

	int[] mylist = new int[100000];
	for (int number ...)
	{
		...
		mylist[number] = n;
	}


>     for (int number = 0; number < 100000; ++number)
>     {
>         auto rnd = Random(unpredictableSeed);
[...]

Don't reseed the RNG every loop iteration. (1) It's very inefficient and slow, and (2) it actually makes it *less* random than if you seeded it only once at the start of the program.  Move this outside the loop, and you should see some gains.


>         auto n = uniform(0, 100, rnd);
>         mylist ~= n;
>     }
>     mylist.sort();
>     sw.stop();
>     long msecs = sw.peek.total!"msecs";
>     writefln("%s", msecs);
> }
[...]
> ```

Also, whenever performance matters, use gdc or ldc2 instead of dmd. Try `ldc2 -O2`, for example.


I did a quick test with LDC, with a side-by-side comparison of your original version and my improved version:

-------------
import std.stdio;
import std.random;
import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
import std.algorithm;

void original()
{
    auto sw = StopWatch(AutoStart.no);
    sw.start();
    int[] mylist;
    for (int number = 0; number < 100000; ++number)
    {
        auto rnd = Random(unpredictableSeed);
        auto n = uniform(0, 100, rnd);
        mylist ~= n;
    }
    mylist.sort();
    sw.stop();
    long msecs = sw.peek.total!"msecs";
    writefln("%s", msecs);
}

void improved()
{
    auto sw = StopWatch(AutoStart.no);
    sw.start();
    int[] mylist = new int[100000];
    auto rnd = Random(unpredictableSeed);
    for (int number = 0; number < 100000; ++number)
    {
        auto n = uniform(0, 100, rnd);
        mylist[number] = n;
    }
    mylist.sort();
    sw.stop();
    long msecs = sw.peek.total!"msecs";
    writefln("%s", msecs);
}

void main()
{
    original();
    improved();
}
-------------


Here's the typical output:
-------------
209
5
-------------

As you can see, that's a 40x improvement in speed. ;-)

Assuming that the ~209 msec on my PC corresponds with your observed 280ms, and assuming that the 40x improvement will also apply on your machine, the improved version should run in about 9-10 msec.  So this *should* have give you a 4x speedup over the Python version, in theory. I'd love to see how it actually measures on your machine, if you don't mind. ;-)


T

-- 
Holding a grudge is like drinking poison and hoping the other person dies. -- seen on the 'Net
April 01, 2021
On 01.04.21 21:36, Steven Schveighoffer wrote:
> On 4/1/21 3:27 PM, ag0aep6g wrote:
>> On 01.04.21 21:00, Berni44 wrote:
>>> ```
>>> ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
>>> ```
[...]
> Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on for @safe code.
`-O -release` already does that, doesn't it?
April 01, 2021
On 4/1/21 3:44 PM, ag0aep6g wrote:
> On 01.04.21 21:36, Steven Schveighoffer wrote:
>> On 4/1/21 3:27 PM, ag0aep6g wrote:
>>> On 01.04.21 21:00, Berni44 wrote:
>>>> ```
>>>> ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
>>>> ```
> [...]
>> Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on for @safe code.
> `-O -release` already does that, doesn't it?

Maybe, but I wasn't responding to that, just your statement not to recommend -boundscheck=off. In any case, it wouldn't hurt, right?

I don't know what -O3 and -release do on ldc.

-Steve
April 01, 2021
On Thu, Apr 01, 2021 at 07:25:53PM +0000, matheus via Digitalmars-d-learn wrote: [...]
> Since this is a "Learn" part of the Foruam, be careful with "-boundscheck=off".
> 
> I mean for this little snippet is OK, but for a other projects this my be wrong, and as it says here: https://dlang.org/dmd-windows.html#switch-boundscheck
> 
> "This option should be used with caution and as a last resort to improve performance. Confirm turning off @safe bounds checks is worthwhile by benchmarking."
[...]

It's interesting that whenever a question about D's performance pops up in the forums, people tend to reach for optimization flags.  I wouldn't say it doesn't help; but I've found that significant performance improvements can usually be obtained by examining the code first, and catching common newbie mistakes.  Those usually account for the majority of the observed performance degradation.

Only after the code has been cleaned up and obvious mistakes fixed, is it worth reaching for optimization flags, IMO.

Common mistakes I've noticed include:

- Constructing large arrays by appending 1 element at a time with `~`.
  Obviously, this requires many array reallocations and the associated
  copying; not to mention greatly-increased GC load that could have been
  easily avoided by preallocation or using std.array.appender.

- Failing to move repeated computations (esp. inefficient ones) outside
  the inner loop.  Sometimes a good optimizing compiler is able to hoist
  it out automatically, but not always.

- Constructing lots of temporaries in inner loops as heap-allocated
  classes instead of by-value structs: the former leads to heavy GC
  load, not to mention memory allocation is generally slow and should be
  avoided inside inner loops. Heap-allocated objects also require
  indirections, which slow things down even more. The latter can be
  passed around in registers: no GC pressure, no indirections; so can
  significantly improve performance.

- Using O(N^2) (or other super-linear) algorithms with large data sets
  where a more efficient algorithm is available. This one ought to speak
  for itself. :-D  Nevertheless it still crops up from time to time, so
  deserves to be mentioned again.


T

-- 
Those who don't understand Unix are condemned to reinvent it, poorly.
April 01, 2021
On 01.04.21 21:53, Steven Schveighoffer wrote:
> Maybe, but I wasn't responding to that, just your statement not to recommend -boundscheck=off. In any case, it wouldn't hurt, right?

Right.
April 01, 2021
On 4/1/21 12:55 PM, H. S. Teoh wrote:

> - Constructing large arrays by appending 1 element at a time with `~`.
>    Obviously, this requires many array reallocations and the associated
>    copying

And that may not be a contributing factor. :) The following program sees just 15 allocations and 1722 element copies for 1 million appending operations:

import std.stdio;

void main() {
  int[] arr;
  auto place = arr.ptr;
  size_t relocated = 0;
  size_t copied = 0;
  foreach (i; 0 .. 1_000_000) {
    arr ~= i;
    if (arr.ptr != place) {
      ++relocated;
      copied += arr.length - 1;
      place = arr.ptr;
    }
  }

  writeln("relocated: ", relocated);
  writeln("copied   : ", copied);
}

This is because the GC does not allocate if there are unused pages right after the array. (However, increasing the element count to 10 million increases allocations slightly to 18 but element copies jump to 8 million.)

Ali