Why is this code slow? (page 2)

Settings

Help

Index » Learn » Why is this code slow? (page 2)

March 26

Re: Why is this code slow?

Posted by Lance Bachmeier
in reply to Csaba

Permalink

Lance Bachmeier

Posted in reply to Csaba

Permalink

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:

I know that benchmarks are always controversial and depend on a lot of factors. So far, I read that D performs very well in benchmarks, as well, if not better, as C.

I wrote a little program that approximates PI using the Leibniz formula. I implemented the same thing in C, D and Python, all of them execute 1,000,000 iterations 20 times and display the average time elapsed.

Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s

What the hell? D slower than Python? This cannot be real. I am sure I am making a mistake here. I'm sharing all 3 programs here:

C: https://pastebin.com/s7e2HFyL
D: https://pastebin.com/fuURdupc
Python: https://pastebin.com/zcXAkSEf

As you can see the function that does the job is exactly the same in C and D.

Here are the compile/run commands used:

C: gcc leibniz.c -lm -oleibc
D: gdc leibniz.d -frelease -oleibd
Python: python3 leibniz.py

PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that matters.

As others suggested, pow is the problem. I noticed that the C versions are often much faster than their D counterparts. (And I don't view that as a problem, since both are built into the language - my only thought is that the D version should call the C version).

Changing

import std.math:pow;

import core.stdc.math: pow;

and leaving everything unchanged, I get

C: Avg execution time: 0.007918
D (original): Avg execution time: 0.102612
D (using core.stdc.math): Avg execution time: 0.008134

So more or less the exact same numbers if you use core.stdc.math.

March 26

Re: Why is this code slow?

Posted by Lance Bachmeier
in reply to Lance Bachmeier

Permalink

Lance Bachmeier

Posted in reply to Lance Bachmeier

Permalink

On Tuesday, 26 March 2024 at 14:25:53 UTC, Lance Bachmeier wrote:

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:

I know that benchmarks are always controversial and depend on a lot of factors. So far, I read that D performs very well in benchmarks, as well, if not better, as C.

Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s

What the hell? D slower than Python? This cannot be real. I am sure I am making a mistake here. I'm sharing all 3 programs here:

C: https://pastebin.com/s7e2HFyL
D: https://pastebin.com/fuURdupc
Python: https://pastebin.com/zcXAkSEf

As you can see the function that does the job is exactly the same in C and D.

Here are the compile/run commands used:

C: gcc leibniz.c -lm -oleibc
D: gdc leibniz.d -frelease -oleibd
Python: python3 leibniz.py

PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that matters.

Changing

import std.math:pow;

import core.stdc.math: pow;

and leaving everything unchanged, I get

C: Avg execution time: 0.007918
D (original): Avg execution time: 0.102612
D (using core.stdc.math): Avg execution time: 0.008134

So more or less the exact same numbers if you use core.stdc.math.

And then the other thing is changing

const int BENCHMARKS = 20;

enum BENCHMARKS = 20;

which should allow substitution of the constant directly into the rest of the program, which gives

Avg execution time: 0.007564

On my Ubuntu 22.04 machine, therefore, the LDC binary with no flags is slightly faster than the C code compiled with your flags.

March 27

Re: Why is this code slow?

Posted by rkompass
in reply to Lance Bachmeier

Permalink

rkompass

Posted in reply to Lance Bachmeier

Permalink

I apologize for digressing a little bit further - just to share insights to other learners.

I had the question, why my binary was so big (> 4M), discovered the
gdc -Wall -O2 -frelease -shared-libphobos options (now >200K).
Then I tried to avoid GC, just learnt about this: The GC in the Leibnitz code is there only for the writeln. With a change to (again standard C) printf the
@nogc modifier can be applied, the binary then gets down to ~17K, a comparable size of the C counterpart.

Another observation regarding precision:
The iteration proceeds in the wrong order. Adding small contributions first and bigger last leads to less loss when summing up the small parts below the final real/double LSB limit.

So I'm now at this code (abolishing the avarage of 20 interations as unnesseary)

// import std.stdio;  // writeln will lead to the garbage collector to be included
import core.stdc.stdio: printf;
import std.datetime.stopwatch;

const int ITERATIONS = 1_000_000_000;

@nogc pure double leibniz(int it) {  // sum up the small values first
  double n = 0.5*((it%2) ? -1.0 : 1.0) / (it * 2.0 + 1.0);
  for (int i = it-1; i >= 0; i--)
    n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
  return n * 4.0;
}

@nogc void main() {
    double result;
    double total_time = 0;
    auto sw = StopWatch(AutoStart.yes);
    result = leibniz(ITERATIONS);
    sw.stop();
    total_time = sw.peek.total!"nsecs";
    printf("%.16f\n", result);
    printf("Execution time: %f\n", total_time / 1e9);
}

result:

3.1415926535897931
Execution time: 1.068111

March 28

Re: Why is this code slow?

Posted by Salih Dincer
in reply to rkompass

Permalink

Salih Dincer

Posted in reply to rkompass

Permalink

On Wednesday, 27 March 2024 at 08:22:42 UTC, rkompass wrote:

I apologize for digressing a little bit further - just to share insights to other learners.

Good thing you're digressing; I am 45 years old and I still cannot say that I am finished as a student! For me this is version 4 and it looks like we don't need a 3rd variable other than the function parameter and return value:

auto leibniz_v4(int i) @nogc pure {
  double n = 0.5*((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);

  while(--i >= 0)
    n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);

  return n * 4.0;
} /*
3.1415926535892931
3.141592653589 793238462643383279502884197169399375105
3.141593653590774200000 (v1)
Avg execution time: 0.000033
*/

SDB@79

March 28

Re: Why is this code slow?

Posted by rkompass
in reply to Salih Dincer

Permalink

rkompass

Posted in reply to Salih Dincer

Permalink

On Thursday, 28 March 2024 at 01:09:34 UTC, Salih Dincer wrote:

So we go with another digression. I discovered parallel, also avoided the extra variable, as suggested by Salih:

import std.range;
import std.parallelism;
import core.stdc.stdio: printf;
import std.datetime.stopwatch;

enum ITERS = 1_000_000_000;
enum STEPS = 31; // 5 is fine, even numbers (e.g. 10) may give bad precision (for math reason ???)

pure double leibniz(int i) {  // sum up the small values first
	double r = (i == ITERS) ? 0.5 * ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0) : 0.0;
	for (--i; i >= 0; i-= STEPS)
		r += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
	return r * 4.0;
}

void main() {
	auto start = iota(ITERS, ITERS-STEPS, -1).array;
	auto sw = StopWatch(AutoStart.yes);
	double result = 0.0;
	foreach(s; start.parallel)
		result += leibniz(s);
	double total_time = sw.peek.total!"nsecs";
    printf("%.16f\n", result);
    printf("Execution time: %f\n", total_time / 1e9);
}

gives:

3.1415926535897931
Execution time: 0.211667

My laptop has 6 cores and obviously 5 are used in parallel by this.

The original question related to a comparison between C, D and Python.
Turning back to this: Are there similarly simple libraries for C, that allow for
parallel computation?

March 28

Re: Why is this code slow?

Posted by Salih Dincer
in reply to rkompass

Permalink

Salih Dincer

Posted in reply to rkompass

Permalink

On Thursday, 28 March 2024 at 11:50:38 UTC, rkompass wrote:

Turning back to this: Are there similarly simple libraries for C, that allow for
parallel computation?

You can achieve parallelism in C using libraries such as OpenMP, which provides a set of compiler directives and runtime library routines for parallel programming.

Here’s an example of how you might modify the code to use OpenMP for parallel processing:

#include <stdio.h>
#include <time.h>
#include <omp.h>

#define ITERS 1000000000
#define STEPS 31

double leibniz(int i) {
  double r = (i == ITERS) ? 0.5 * ((i % 2) ? -1.0 : 1.0) / (i * 2.0 + 1.0) : 0.0;
  for (--i; i >= 0; i -= STEPS)
    r += ((i % 2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
  return r * 4.0;
}

int main() {
  double start_time = omp_get_wtime();

  double result = 0.0;

  #pragma omp parallel for reduction(+:result)
  for (int s = ITERS; s >= 0; s -= STEPS) {
    result += leibniz(s);
  }

  // Calculate the time taken
  double time_taken = omp_get_wtime() - start_time;

  printf("%.16f\n", result);
  printf("%f (seconds)\n", time_taken);

  return 0;
}

To compile this code with OpenMP support, you would use a command like gcc -fopenmp your_program.c. This tells the GCC compiler to enable OpenMP directives. The #pragma omp parallel for directive tells the compiler to parallelize the loop, and the reduction clause is used to safely accumulate the result variable across multiple threads.

SDB@79

March 28

Re: Why is this code slow?

Posted by rkompass
in reply to Salih Dincer

Permalink

rkompass

Posted in reply to Salih Dincer

Permalink

On Thursday, 28 March 2024 at 14:07:43 UTC, Salih Dincer wrote:

On Thursday, 28 March 2024 at 11:50:38 UTC, rkompass wrote:

Turning back to this: Are there similarly simple libraries for C, that allow for
parallel computation?

You can achieve parallelism in C using libraries such as OpenMP, which provides a set of compiler directives and runtime library routines for parallel programming.

Here’s an example of how you might modify the code to use OpenMP for parallel processing:

 . . .

  #pragma omp parallel for reduction(+:result)
  for (int s = ITERS; s >= 0; s -= STEPS) {
    result += leibniz(s);
  }
 . . . ```
To compile this code with OpenMP support, you would use a command like gcc -fopenmp your_program.c. This tells the GCC compiler to enable OpenMP directives. The #pragma omp parallel for directive tells the compiler to parallelize the loop, and the reduction clause is used to safely accumulate the result variable across multiple threads.

SDB@79

Nice, thank you.
It worked endlessly until I saw I had to correct the for to
for (int s = ITERS; s > ITERS-STEPS; s--)
Now the result is:

3.1415926535897936
Execution time: 0.212483 (seconds).

This result is sooo similar!

I didn't know that OpenMP programming could be that easy.
Binary size is 16K, same order of magnitude, although somewhat less.
D advantage is gone here, I would say.

March 28

Re: Why is this code slow?

Posted by Sergey
in reply to rkompass

Permalink

Sergey

Posted in reply to rkompass

Permalink

On Thursday, 28 March 2024 at 20:18:10 UTC, rkompass wrote:

D advantage is gone here, I would say.

It's hard to compare actually.
Std.parallelism has a bit different mechanics, and I think easier to use. The syntax is nicer.

OpenMP is an well-known and highly adopted tool, which is also quite flexible, but usually used with initially sequential code. And the syntax is not very intuitive.

Interesting point from Dr Russel here: https://forum.dlang.org/thread/qvksmhwkaxbrnggsvtxe@forum.dlang.org

However since 2012 OpenMP also got some development and improvement and HPC world is pretty conservative. So it is one of the most popular tool in the area: https://www.openmp.org/wp-content/uploads/sc23-openmp-popularity-mattson.pdf
With MPI.. But probably with AI and GPU revolution the balance will shift a bit to CUDA-like technologies.

March 28

Re: Why is this code slow?

Posted by Salih Dincer
in reply to rkompass

Permalink

Salih Dincer

Posted in reply to rkompass

Permalink

On Thursday, 28 March 2024 at 20:18:10 UTC, rkompass wrote:

I didn't know that OpenMP programming could be that easy.
Binary size is 16K, same order of magnitude, although somewhat less.
D advantage is gone here, I would say.

There is no such thing as parallel programming in D anyway. At least it has modules, but I didn't see it being works. Whenever I use toys built in foreach() it always ends in disappointment :)

SDB@79

March 29

Re: Why is this code slow?

Posted by Serg Gini
in reply to Salih Dincer

Permalink

Serg Gini

Posted in reply to Salih Dincer

Permalink

On Thursday, 28 March 2024 at 23:15:26 UTC, Salih Dincer wrote:

There is no such thing as parallel programming in D anyway. At least it has modules, but I didn't see it being works. Whenever I use toys built in foreach() it always ends in disappointment

I think it just works :)
Which issues did you have with it?

Top | Forum index | About this forum

Forums