Reduce has dreadful performance? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Reduce has dreadful performance?

Thread overview

Reduce has dreadful performance?
Jun 18, 2015 Russel Winder
Jun 18, 2015 Laeeth Isharc
Jun 18, 2015 Laeeth Isharc
Jun 18, 2015 weaselcat
Jun 18, 2015 Ilya Yaroshenko
Jun 18, 2015 Andrea Fontana
Jun 18, 2015 Ilya Yaroshenko
Jun 18, 2015 Russel Winder
Jun 18, 2015 Laeeth Isharc
Jun 18, 2015 Laeeth Isharc
Jun 19, 2015 John Colvin
Jun 20, 2015 Russel Winder
Jun 18, 2015 Walter Bright
Jun 18, 2015 weaselcat
Jun 18, 2015 Andrei Alexandrescu
Jun 20, 2015 Russel Winder
Jun 20, 2015 weaselcat
Jun 20, 2015 Russel Winder
Jun 20, 2015 Martin Nowak
Jun 18, 2015 John Colvin
Jun 18, 2015 weaselcat

June 18, 2015

Reduce has dreadful performance?

Posted by Russel Winder

Russel Winder

Attachments:

signature.asc (This is a digitally signed message part)

On a given machine, the code:

double sequential_loop(const int n, const double delta) {
  auto sum = 0.0;
  foreach (immutable i; 1 .. n + 1) {
    immutable x = (i - 0.5) * delta;
    sum += 1.0 / (1.0 + x * x);
  }
  return 4.0 * delta * sum;
}

runs in about 6.70s. However the code:

double sequential_reduce(const int n, const double delta) {
  return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
}

runs in about 17.03s, whilst:

double sequential_reduce_alt(const int n, const double delta) {
  return 4.0 * delta * reduce!"a + b"(
         map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
(1.0 + x * x);})(iota(1, n + 1)));
}

takes about 28.02s. Unless I am missing something (very possible), this
is not going to create a good advert for D as an imperative language
with declarative (internal iteration) expression.


-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

June 18, 2015

Re: Reduce has dreadful performance?

Posted by Laeeth Isharc
in reply to Russel Winder

Laeeth Isharc

Posted in reply to Russel Winder

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

which compiler ?

June 18, 2015

Re: Reduce has dreadful performance?

Posted by weaselcat
in reply to Russel Winder

weaselcat

Posted in reply to Russel Winder

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

first two run in the same time on LDC, third one runs about 30-40% faster.

June 18, 2015

Re: Reduce has dreadful performance?

Posted by Laeeth Isharc
in reply to Laeeth Isharc

Laeeth Isharc

Posted in reply to Laeeth Isharc

On Thursday, 18 June 2015 at 10:34:46 UTC, Laeeth Isharc wrote:
> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>> On a given machine, the code:
>>
>> double sequential_loop(const int n, const double delta) {
>>   auto sum = 0.0;
>>   foreach (immutable i; 1 .. n + 1) {
>>     immutable x = (i - 0.5) * delta;
>>     sum += 1.0 / (1.0 + x * x);
>>   }
>>   return 4.0 * delta * sum;
>> }
>>
>> runs in about 6.70s. However the code:
>>
>> double sequential_reduce(const int n, const double delta) {
>>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
>> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
>> }
>>
>> runs in about 17.03s, whilst:
>>
>> double sequential_reduce_alt(const int n, const double delta) {
>>   return 4.0 * delta * reduce!"a + b"(
>>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
>> (1.0 + x * x);})(iota(1, n + 1)));
>> }
>>
>> takes about 28.02s. Unless I am missing something (very possible), this
>> is not going to create a good advert for D as an imperative language
>> with declarative (internal iteration) expression.
>
> which compiler ?

fwiw n=1bn, delta=0.1
ldc no params:
8.749s/17.466s
ldc -O2
4.541s/4.562s
gdc no params
4.690/22.163 (!)
gdc -O2
4.552/4.551

June 18, 2015

Re: Reduce has dreadful performance?

Posted by Ilya Yaroshenko
in reply to Russel Winder

Ilya Yaroshenko

Posted in reply to Russel Winder

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

import std.stdio, std.datetime, std.algorithm, std.range, std.conv;

double sequential_loop(const int n, const double delta) {
  auto sum = 0.0;
  foreach (immutable i; 1 .. n + 1) {
    immutable x = (i - 0.5) * delta;
    sum += 1.0 / (1.0 + x * x);
  }
  return 4.0 * delta * sum;
}

//runs in about 6.70s. However the code:

double sequential_reduce(const int n, const double delta) {
  return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
}

//runs in about 17.03s, whilst:

double sequential_reduce_alt(const int n, const double delta) {
  return 4.0 * delta * reduce!"a + b"(
         map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
(1.0 + x * x);})(iota(1, n + 1)));
}

void main() {
	auto res = benchmark!(
		{sequential_loop(1000, 10);},
		{sequential_reduce(1000, 10);},
		{sequential_reduce_alt(1000, 10);},
		)(1000);
	writeln(res[].map!(to!Duration));
}

$ dmd -run test.d
[9 ms, 305 μs, and 9 hnsecs, 16 ms, 625 μs, and 3 hnsecs, 27 ms, 417 μs, and 3 hnsecs]

$ dmd -O -inline -release -run test.d
[4 ms, 567 μs, and 6 hnsecs, 4 ms, 853 μs, and 8 hnsecs, 5 ms, 52 μs, and 5 hnsecs]

OS X, DMD64 D Compiler v2.067.1

Looks like you forgot optimisation troika "-O -inline -release"

Regards,
Ilya

June 18, 2015

Re: Reduce has dreadful performance?

Posted by Andrea Fontana
in reply to Ilya Yaroshenko

Andrea Fontana

Posted in reply to Ilya Yaroshenko

On Thursday, 18 June 2015 at 10:46:18 UTC, Ilya Yaroshenko wrote:
> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>> On a given machine, the code:
>>
>> double sequential_loop(const int n, const double delta) {
>>   auto sum = 0.0;
>>   foreach (immutable i; 1 .. n + 1) {
>>     immutable x = (i - 0.5) * delta;
>>     sum += 1.0 / (1.0 + x * x);
>>   }
>>   return 4.0 * delta * sum;
>> }

What about this:

double sequential_alternative(const int n, const double delta) {
	return 4.0 * delta * iota(1, n+1).map!(x => (x-0.5)*delta).map!(x => 1.0/(1.0 + x*x)).sum;
}

?

June 18, 2015

Re: Reduce has dreadful performance?

Posted by Ilya Yaroshenko
in reply to Andrea Fontana

Ilya Yaroshenko

Posted in reply to Andrea Fontana

On Thursday, 18 June 2015 at 10:50:09 UTC, Andrea Fontana wrote:
> On Thursday, 18 June 2015 at 10:46:18 UTC, Ilya Yaroshenko wrote:
>> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>>> On a given machine, the code:
>>>
>>> double sequential_loop(const int n, const double delta) {
>>>   auto sum = 0.0;
>>>   foreach (immutable i; 1 .. n + 1) {
>>>     immutable x = (i - 0.5) * delta;
>>>     sum += 1.0 / (1.0 + x * x);
>>>   }
>>>   return 4.0 * delta * sum;
>>> }
>
> What about this:
>
> double sequential_alternative(const int n, const double delta) {
> 	return 4.0 * delta * iota(1, n+1).map!(x => (x-0.5)*delta).map!(x => 1.0/(1.0 + x*x)).sum;
> }
>
> ?

This is not equivalent, because sum uses another (more precise) algorithms to do summation. It should work a little bit slower.

June 18, 2015

Re: Reduce has dreadful performance?

Posted by John Colvin
in reply to Russel Winder

John Colvin

Posted in reply to Russel Winder

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very possible), this
> is not going to create a good advert for D as an imperative language
> with declarative (internal iteration) expression.

What you've forgotten is to turn on your optimisation flags. Even DMD gets the first 2 vaguely right with -O -inline -release

gdc is able to vectorize all 3 examples if you give it -Ofast

The 3rd one does appear to be a bit slower in all cases, not sure why right now.

June 18, 2015

Re: Reduce has dreadful performance?

Posted by weaselcat
in reply to John Colvin

weaselcat

Posted in reply to John Colvin

On Thursday, 18 June 2015 at 13:20:04 UTC, John Colvin wrote:
> On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
>> [...]
>
> What you've forgotten is to turn on your optimisation flags. Even DMD gets the first 2 vaguely right with -O -inline -release
>
> gdc is able to vectorize all 3 examples if you give it -Ofast
>
> The 3rd one does appear to be a bit slower in all cases, not sure why right now.

third one was faster for me on ldc, fwiw

June 18, 2015

Re: Reduce has dreadful performance?

Posted by Russel Winder
in reply to Ilya Yaroshenko

Russel Winder

Posted in reply to Ilya Yaroshenko

Attachments:

signature.asc (This is a digitally signed message part)

On Thu, 2015-06-18 at 10:46 +0000, Ilya Yaroshenko via Digitalmars-d wrote:
> 
[…]
> Looks like you forgot optimisation troika "-O -inline -release"
> 

Don't I feel stupid :-)

Thanks to all for quick answers, most helpful.

Now I get

Loop: 3.14s
Reduce 1: 4.76s
Reduce 2: 5.12s

This is DMD 2.067

Anyone know how to get PyD to use LDC rather than DMD in the setuptools build spec?

(I can't use GDC just now as it is not packaged for Fedora.)

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation