WTF! Parallel foreach more slower that normal foreach in multicore CPU ? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Thread overview

WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Jun 23, 2011

Jun 23, 2011

Robert Clipsham

Jun 23, 2011

Jun 23, 2011

Jun 23, 2011

Jun 23, 2011

Andrej Mitrovic

Jun 24, 2011

Jun 24, 2011

Jun 24, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore
Jun 24, 2011 Kagamin
Jun 24, 2011 Khint Enco

June 23, 2011

WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Zardoz

Zardoz

I'm trying std.parallelism, and I made this code (based over foreach parallel example) :
import std.stdio;
import std.parallelism;
import std.math;
import std.c.time;

void main () {
  auto logs = new double[20_000_000];
		const num = 10;

		clock_t clk;
		double norm;
		double par;

		writeln("CPUs : ",totalCPUs );

		clk = clock();
		foreach (t; 0..num) {

	    foreach(i, ref elem; logs) {
	        elem = log(i + 1.0);
	    }
		}
		norm = clock() -clk;

		clk = clock();
		foreach (t; 0..num) {

	    foreach(i, ref elem; taskPool.parallel(logs, 100)) {
	        elem = log(i + 1.0);
	    }

    }
		par = clock() -clk;

		norm = norm / num;
		par = par / num;

    writeln("Normal : ", norm / CLOCKS_PER_SEC, " Parallel : ", par / CLOCKS_PER_SEC);
}

I get this result :

CPUs : 2
Normal : 1.325 Parallel : 1.646

And the result changes, every time that I run it, around +-100ms (I think that depends of how are CPUs busy in these moment)

I played changin workUnitSize from 1 to 10000000 without any apreciable change....
My computer it's a AMD Athlon 64 X2 Dual Core Processor 6000+ running over a kUbuntu 11.04 64bits with 2 GiB of ram. I compiled it with dmd 2.053
htop shows that when test program are running parallel foreach, both cores are at ~98% of load and with normal foreach, only one core gets at ~99% of load.

June 23, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Robert Clipsham
in reply to Zardoz

Robert Clipsham

Posted in reply to Zardoz

On 23/06/2011 11:05, Zardoz wrote:
> I'm trying std.parallelism, and I made this code (based over foreach parallel example) :
> import std.stdio;
> import std.parallelism;
> import std.math;
> import std.c.time;
>
> void main () {
>    auto logs = new double[20_000_000];
> 		const num = 10;
>
> 		clock_t clk;
> 		double norm;
> 		double par;
>
> 		writeln("CPUs : ",totalCPUs );
>
> 		clk = clock();
> 		foreach (t; 0..num) {
>
> 	    foreach(i, ref elem; logs) {
> 	        elem = log(i + 1.0);
> 	    }
> 		}
> 		norm = clock() -clk;
>
> 		clk = clock();
> 		foreach (t; 0..num) {
>
> 	    foreach(i, ref elem; taskPool.parallel(logs, 100)) {
> 	        elem = log(i + 1.0);
> 	    }
>
>      }
> 		par = clock() -clk;
>
> 		norm = norm / num;
> 		par = par / num;
>
>      writeln("Normal : ", norm / CLOCKS_PER_SEC, " Parallel : ", par / CLOCKS_PER_SEC);
> }
>
> I get this result :
>
> CPUs : 2
> Normal : 1.325 Parallel : 1.646
>
> And the result changes, every time that I run it, around +-100ms (I think that depends of how are CPUs busy in these moment)
>
> I played changin workUnitSize from 1 to 10000000 without any apreciable change....
> My computer it's a AMD Athlon 64 X2 Dual Core Processor 6000+ running over a kUbuntu 11.04 64bits with 2 GiB of ram. I compiled it with dmd 2.053
> htop shows that when test program are running parallel foreach, both cores are at ~98% of load and with normal foreach, only one core gets at ~99% of load.

The reason for this is your workload is very small - it's likely that the overhead from context switching and spawning threads is greater than the gain in performance from running in parallel. Using parallel() in foreach will only be faster if you're doing something more expensive.

Also note that parallel() is an alias for taskPool.parallel(), saving you a few characters :)

-- 
Robert
http://octarineparrot.com/

June 23, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Zardoz
in reply to Robert Clipsham

Zardoz

Posted in reply to Robert Clipsham

Ok, so why in std.parallelism examples are this :

// Same thing, but use the default work unit size.
    //
    // Timings on an Athlon 64 X2 dual core machine:
    //
    // Parallel foreach:  388 milliseconds
    // Regular foreach:   619 milliseconds
    foreach(i, ref elem; taskPool.parallel(logs)) {
        elem = log(i + 1.0);
    }

Plus, a change my code to make that for same elem, calc log 100000 times in each loop, and now I get  parallel foreach get a bit shorter time that normal foreach....
Change code :

  auto logs = new double[200];
		const num = 2;

		clock_t clk;
		double norm;
		double par;

		writeln("CPUs : ",totalCPUs );
foreach (t; 0..num) {

	    foreach(i, ref elem; logs) {
	    	foreach(p; 0..100_000)
	        elem = log(i + 1.0);
	    }
		}
		norm = clock() -clk;

		clk = clock();
		foreach (t; 0..num) {

	    foreach(i, ref elem; taskPool.parallel(logs, 100)) {
	    	foreach(p; 0..100_000)
	        elem = log(i + 1.0);
	    }

    }

New times : Normal : 12.725 Parallel : 12.499

June 23, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Piotr Szturmaj
in reply to Zardoz

Piotr Szturmaj

Posted in reply to Zardoz

Zardoz wrote:
> 		const num = 10;
>
> 		foreach (t; 0..num) {
>
> 	    foreach(i, ref elem; taskPool.parallel(logs, 100)) {
> 	        elem = log(i + 1.0);
> 	    }
>
>      }

I think you just spawned 10 tasks. Look at foreach (t; 0..num).

June 23, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Zardoz
in reply to Piotr Szturmaj

Zardoz

Posted in reply to Piotr Szturmaj

Code :
 auto logs = new double[200];
 const num = 2;
 clock_t clk;
 double norm;
 double par;
 writeln("CPUs : ",totalCPUs );
 clk = clock();
 foreach(i, ref elem; logs) {
  elem = log(i + 1.0);
 }
 norm = clock() -clk;
 clk = clock();
 foreach(i, ref elem; taskPool.parallel(logs, 100)) {
  elem = log(i + 1.0);
 }

I get same problem. Parallel foreach, is more slower that normal foreach. And it's same code that hace lib example that claims that parallel foreach do it in aprox. half time in Athlon X2

June 23, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Andrej Mitrovic
in reply to Zardoz

Andrej Mitrovic

Posted in reply to Zardoz

I don't know why David set a work unit of 100 for a 1 million element
array. I get slow results for this example:
    foreach(i, ref elem; taskPool.parallel(logs, 100)) {
        elem = log(i + 1.0);
    }

CPUs : 4
Serial usecs:   70418.
Parallel usecs: 91519.

But if I up the work unit size to 100_000 I get much better results:
    foreach(i, ref elem; taskPool.parallel(logs, 100_000)) {
        elem = log(i + 1.0);
    }

CPUs : 4
Serial usecs:   69979.
Parallel usecs: 25355.

Sometimes the best thing to do is let parallel use the default work unit size:
    foreach(i, ref elem; taskPool.parallel(logs)) {
        elem = log(i + 1.0);
    }
CPUs : 4
Serial usecs:   70219.
Parallel usecs: 21942.

Here's your original example on my PC:
CPUs : 4
Normal : 1.4609 Parallel : 2.4797

And here it is by letting parallel use the default work unit size:
CPUs : 4
Normal : 1.461 Parallel : 0.425

It's all about fine-tuning your parameters. Essentially when you up the work unit size it means each thread will process more elements from the array or range. If the loop body executes really fast, then you should increase the work unit size.

June 24, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Ali Çehreli
in reply to Zardoz

Ali Çehreli

Posted in reply to Zardoz

On Thu, 23 Jun 2011 23:18:36 +0000, Zardoz wrote:

> Code :
>  auto logs = new double[200];
>  const num = 2;
>  clock_t clk;
>  double norm;
>  double par;
>  writeln("CPUs : ",totalCPUs );
>  clk = clock();
>  foreach(i, ref elem; logs) {
>   elem = log(i + 1.0);
>  }
>  norm = clock() -clk;
>  clk = clock();
>  foreach(i, ref elem; taskPool.parallel(logs, 100)) {
>   elem = log(i + 1.0);
>  }
> 
> I get same problem. Parallel foreach, is more slower that normal foreach. And it's same code that hace lib example that claims that parallel foreach do it in aprox. half time in Athlon X2

I was able to reproduce your results. I think there is a problem with clock(). Try StopWatch:

import std.parallelism;
import std.stdio;
import std.math;
import std.datetime;

void main()
{
    auto logs = new double[200_000_000];

    writeln("CPUs : ",totalCPUs );

    {
        StopWatch stopWatch;
        stopWatch.start();

        foreach(i, ref elem; logs) {
            elem = log(i + 1.0);
        }

        writeln(stopWatch.peek().msecs);
    }

    {
        StopWatch stopWatch;
        stopWatch.start();

        foreach(i, ref elem; parallel(logs)) {
            elem = log(i + 1.0);
        }

        writeln(stopWatch.peek().msecs);
    }
}

Here is my output:

CPUs : 4
8061
2686

I get similar results whether I pass 100_000 to parallel() or not.

Ali

June 24, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Zardoz
in reply to Ali Çehreli

Zardoz

Posted in reply to Ali Çehreli

Thanks !!! Now I'm getting a more logical result. I get  1407ms normal vs 759ms in parallel foreach with logs[20_000_000]
It's strange that clock() do weird things with threading, not ?

PD: It's me, or http version of the forum do strange things, like not getting correct javascript action for the buttons if you hover from one to other ??

June 24, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Posted by Khint Enco
in reply to Zardoz

Khint Enco

Posted in reply to Zardoz

On 24/06/11 09:19, Zardoz wrote:
> Thanks !!! Now I'm getting a more logical result. I get  1407ms normal vs 759ms in parallel foreach with logs[20_000_000]
> It's strange that clock() do weird things with threading, not ?
>
> PD: It's me, or http version of the forum do strange things, like not getting correct javascript action for the buttons if you hover from one to other ??

I'm surprised that web ng thing hasn't been nuked already .. everyone around here (except you and some others) use Thunderbird!

http://www.mozillamessaging.com/en-US/thunderbird/

It's cross platform and comes in several languages, there's also support for POP3 and newsgroups. There may be more features, but I don't use them ..

June 24, 2011

Re: WTF! Parallel foreach more slower that normal foreach in multicore

Posted by Kagamin
in reply to Khint Enco

Kagamin

Posted in reply to Khint Enco

Khint Enco Wrote:

> I'm surprised that web ng thing hasn't been nuked already .. everyone around here (except you and some others) use Thunderbird!
> 
> http://www.mozillamessaging.com/en-US/thunderbird/

There's no point in internet if every site would require its own brand name client to access it.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation