Jump to page: 1 2 3
Thread overview
Reduce has dreadful performance?
Jun 18, 2015
Russel Winder
Jun 18, 2015
Laeeth Isharc
Jun 18, 2015
Laeeth Isharc
Jun 18, 2015
weaselcat
Jun 18, 2015
Ilya Yaroshenko
Jun 18, 2015
Andrea Fontana
Jun 18, 2015
Ilya Yaroshenko
Jun 18, 2015
Russel Winder
Jun 18, 2015
Laeeth Isharc
Jun 18, 2015
Laeeth Isharc
Jun 19, 2015
John Colvin
Jun 20, 2015
Russel Winder
Jun 18, 2015
Walter Bright
Jun 18, 2015
weaselcat
Jun 20, 2015
Russel Winder
Jun 20, 2015
weaselcat
Jun 20, 2015
Russel Winder
Jun 20, 2015
Martin Nowak
Jun 18, 2015
John Colvin
Jun 18, 2015
weaselcat
June 18, 2015
On a given machine, the code:

double sequential_loop(const int n, const double delta) {
  auto sum = 0.0;
  foreach (immutable i; 1 .. n + 1) {
    immutable x = (i - 0.5) * delta;
    sum += 1.0 / (1.0 + x * x);
  }
  return 4.0 * delta * sum;
}

runs in about 6.70s. However the code:

double sequential_reduce(const int n, const double delta) {
  return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
}

runs in about 17.03s, whilst:

double sequential_reduce_alt(const int n, const double delta) {
  return 4.0 * delta * reduce!"a + b"(
         map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
(1.0 + x * x);})(iota(1, n + 1)));
}

takes about 28.02s. Unless I am missing something (very possible), this
is not going to create a good advert for D as an imperative language
with declarative (internal iteration) expression.


-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


June 18, 2015
On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

which compiler ?

June 18, 2015
On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

first two run in the same time on LDC, third one runs about 30-40% faster.
June 18, 2015
On Thursday, 18 June 2015 at 10:34:46 UTC, Laeeth Isharc wrote:
> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>> On a given machine, the code:
>>
>> double sequential_loop(const int n, const double delta) {
>>   auto sum = 0.0;
>>   foreach (immutable i; 1 .. n + 1) {
>>     immutable x = (i - 0.5) * delta;
>>     sum += 1.0 / (1.0 + x * x);
>>   }
>>   return 4.0 * delta * sum;
>> }
>>
>> runs in about 6.70s. However the code:
>>
>> double sequential_reduce(const int n, const double delta) {
>>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
>> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
>> }
>>
>> runs in about 17.03s, whilst:
>>
>> double sequential_reduce_alt(const int n, const double delta) {
>>   return 4.0 * delta * reduce!"a + b"(
>>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
>> (1.0 + x * x);})(iota(1, n + 1)));
>> }
>>
>> takes about 28.02s. Unless I am missing something (very possible), this
>> is not going to create a good advert for D as an imperative language
>> with declarative (internal iteration) expression.
>
> which compiler ?

fwiw n=1bn, delta=0.1
ldc no params:
8.749s/17.466s
ldc -O2
4.541s/4.562s
gdc no params
4.690/22.163 (!)
gdc -O2
4.552/4.551



June 18, 2015
On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

import std.stdio, std.datetime, std.algorithm, std.range, std.conv;

double sequential_loop(const int n, const double delta) {
  auto sum = 0.0;
  foreach (immutable i; 1 .. n + 1) {
    immutable x = (i - 0.5) * delta;
    sum += 1.0 / (1.0 + x * x);
  }
  return 4.0 * delta * sum;
}

//runs in about 6.70s. However the code:

double sequential_reduce(const int n, const double delta) {
  return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
}

//runs in about 17.03s, whilst:

double sequential_reduce_alt(const int n, const double delta) {
  return 4.0 * delta * reduce!"a + b"(
         map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
(1.0 + x * x);})(iota(1, n + 1)));
}

void main() {
	auto res = benchmark!(
		{sequential_loop(1000, 10);},
		{sequential_reduce(1000, 10);},
		{sequential_reduce_alt(1000, 10);},
		)(1000);
	writeln(res[].map!(to!Duration));
}

$ dmd -run test.d
[9 ms, 305 μs, and 9 hnsecs, 16 ms, 625 μs, and 3 hnsecs, 27 ms, 417 μs, and 3 hnsecs]

$ dmd -O -inline -release -run test.d
[4 ms, 567 μs, and 6 hnsecs, 4 ms, 853 μs, and 8 hnsecs, 5 ms, 52 μs, and 5 hnsecs]

OS X, DMD64 D Compiler v2.067.1

Looks like you forgot optimisation troika "-O -inline -release"

Regards,
Ilya
June 18, 2015
On Thursday, 18 June 2015 at 10:46:18 UTC, Ilya Yaroshenko wrote:
> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>> On a given machine, the code:
>>
>> double sequential_loop(const int n, const double delta) {
>>   auto sum = 0.0;
>>   foreach (immutable i; 1 .. n + 1) {
>>     immutable x = (i - 0.5) * delta;
>>     sum += 1.0 / (1.0 + x * x);
>>   }
>>   return 4.0 * delta * sum;
>> }

What about this:

double sequential_alternative(const int n, const double delta) {
	return 4.0 * delta * iota(1, n+1).map!(x => (x-0.5)*delta).map!(x => 1.0/(1.0 + x*x)).sum;
}

?
June 18, 2015
On Thursday, 18 June 2015 at 10:50:09 UTC, Andrea Fontana wrote:
> On Thursday, 18 June 2015 at 10:46:18 UTC, Ilya Yaroshenko wrote:
>> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>>> On a given machine, the code:
>>>
>>> double sequential_loop(const int n, const double delta) {
>>>   auto sum = 0.0;
>>>   foreach (immutable i; 1 .. n + 1) {
>>>     immutable x = (i - 0.5) * delta;
>>>     sum += 1.0 / (1.0 + x * x);
>>>   }
>>>   return 4.0 * delta * sum;
>>> }
>
> What about this:
>
> double sequential_alternative(const int n, const double delta) {
> 	return 4.0 * delta * iota(1, n+1).map!(x => (x-0.5)*delta).map!(x => 1.0/(1.0 + x*x)).sum;
> }
>
> ?

This is not equivalent, because sum uses another (more precise) algorithms to do summation. It should work a little bit slower.
June 18, 2015
On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

What you've forgotten is to turn on your optimisation flags. Even DMD gets the first 2 vaguely right with -O -inline -release

gdc is able to vectorize all 3 examples if you give it -Ofast

The 3rd one does appear to be a bit slower in all cases, not sure why right now.
June 18, 2015
On Thursday, 18 June 2015 at 13:20:04 UTC, John Colvin wrote:
> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>> [...]
>
> What you've forgotten is to turn on your optimisation flags. Even DMD gets the first 2 vaguely right with -O -inline -release
>
> gdc is able to vectorize all 3 examples if you give it -Ofast
>
> The 3rd one does appear to be a bit slower in all cases, not sure why right now.

third one was faster for me on ldc, fwiw
June 18, 2015
On Thu, 2015-06-18 at 10:46 +0000, Ilya Yaroshenko via Digitalmars-d wrote:
> 
[…]
> Looks like you forgot optimisation troika "-O -inline -release"
> 

Don't I feel stupid :-)

Thanks to all for quick answers, most helpful.

Now I get

Loop: 3.14s
Reduce 1: 4.76s
Reduce 2: 5.12s

This is DMD 2.067

Anyone know how to get PyD to use LDC rather than DMD in the setuptools build spec?

(I can't use GDC just now as it is not packaged for Fedora.)

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


« First   ‹ Prev
1 2 3