Thread overview
Beginner's Comparison Benchmark
May 05, 2020
RegeleIONESCU
May 06, 2020
welkam
May 06, 2020
H. S. Teoh
May 06, 2020
p.shkadzko
May 05, 2020
Hello!

I made a little test(counting to 1 billion by adding 1)to compare execution speed of a small counting for loop in C, D, Julia and Python.
=========================================================================================
The C version:      |The D version:       |The Julia version:  |The Python Version
#include<stdio.h>   |import std.stdio;    |function counter()  |def counter():
int a=0;            |int main(){          |      z = 0         | z = 0
int main(){         |int a = 0;           |      for i=1:bil   | for i in range(1, bil):
int i;              |for(int i=0; i<=bil; |           z=z+1    |  z=z+1
for(i=0; i<bil;i++){|		i++){     |       end          | print(z)
a=a+1;              | a=a+1;              |print(z)            |counter()
}                   | }                   |end                 |
printf("%d", a);    | write(a);           |counter()           |
}                   |return 0;            |                    |
                    |}                    |                    |
=========================================================================================
Test Results without optimization:
C              |DLANG           |JULIA              | Python
real 0m2,981s  | real 0m3,051s  | real 0m0,413s     | real 2m19,501s
user 0m2,973s  | user 0m2,975s  | user 0m0,270s     | user 2m18,095s
sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s     | sys  0m0,033s
=========================================================================================
Test Results with optimization:
C - GCC -O3    |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
real 0m0,002s  | real 0m0,006s  | real 0m0,408s     | real 2m21,801s
user 0m0,001s  | user 0m0,003s  | user 0m0,269s     | user 2m19,964s
sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s     | sys  0m0,050s
=========================================================================================
=========================================================================================
bil is the shortcut for 1000000000
gcc 9.3.0
ldc2 1.21.0
python 3.8.2
julia 1.4.1
all on Ubuntu 20.04 - 64bit
Host CPU: k8-sse3

Unoptimized C and D are slow compared with Julia. Optimization increases the execution speed very much for C and D but has almost no effect on Julia.
Python, the slowest of all, when optimized, runs even slower :)))

Although I see some times are better than others, I do not really know the difference between user and sys, I do not know which one is the time the app run.

I am just a beginner, I am not a specialist. I made it just out of curiosity. If there is any error in my method please let me know.
May 05, 2020
On 5/5/20 4:07 PM, RegeleIONESCU wrote:
> Hello!
> 
> I made a little test(counting to 1 billion by adding 1)to compare execution speed of a small counting for loop in C, D, Julia and Python.
> ========================================================================================= 
> 
> The C version:      |The D version:       |The Julia version: |The Python Version
> #include<stdio.h>   |import std.stdio;    |function counter() |def counter():
> int a=0;            |int main(){          |      z = 0         | z = 0
> int main(){         |int a = 0;           |      for i=1:bil   | for i in range(1, bil):
> int i;              |for(int i=0; i<=bil; |           z=z+1    | z=z+1
> for(i=0; i<bil;i++){|        i++){     |       end          | print(z)
> a=a+1;              | a=a+1;              |print(z) |counter()
> }                   | }                   |end                 |
> printf("%d", a);    | write(a);           |counter()           |
> }                   |return 0;            |                    |
>                      |}                    |                    |
> ========================================================================================= 
> 
> Test Results without optimization:
> C              |DLANG           |JULIA              | Python
> real 0m2,981s  | real 0m3,051s  | real 0m0,413s     | real 2m19,501s
> user 0m2,973s  | user 0m2,975s  | user 0m0,270s     | user 2m18,095s
> sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s     | sys 0m0,033s
> ========================================================================================= 
> 
> Test Results with optimization:
> C - GCC -O3    |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
> real 0m0,002s  | real 0m0,006s  | real 0m0,408s     | real 2m21,801s
> user 0m0,001s  | user 0m0,003s  | user 0m0,269s     | user 2m19,964s
> sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s     | sys 0m0,050s
> ========================================================================================= 
> 
> ========================================================================================= 
> 
> bil is the shortcut for 1000000000
> gcc 9.3.0
> ldc2 1.21.0
> python 3.8.2
> julia 1.4.1
> all on Ubuntu 20.04 - 64bit
> Host CPU: k8-sse3
> 
> Unoptimized C and D are slow compared with Julia. Optimization increases the execution speed very much for C and D but has almost no effect on Julia.
> Python, the slowest of all, when optimized, runs even slower :)))
> 
> Although I see some times are better than others, I do not really know the difference between user and sys, I do not know which one is the time the app run.
> 
> I am just a beginner, I am not a specialist. I made it just out of curiosity. If there is any error in my method please let me know.

1: you are interested in "real" time, that's how much time the whole thing took.
2: if you want to run benchmarks, you want to run multiple tests, and throw out the outliers, or use an average.
3: with simple things like this, the compiler is smarter than you ;) It doesn't really take 0.002s to do what you wrote, what happens is the optimizer recognizes what you are doing and changes your code to:

writeln(1_000_000_001);

(yes, you can use underscores to make literals more readable in D)

doing benchmarks like this is really tricky.

Julia probably recognizes the thing too, but has to optimize at runtime? Not sure.

-Steve
May 06, 2020
On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer wrote:
> the optimizer recognizes what you are doing and changes your code to:
>
> writeln(1_000_000_001);
>
Oh yes a classic constant folding. The other thing to worry about is dead code elimination. Walter has a nice story where he sent his compiler for benchmarking and the compiler figured out that the the result of the calculation in benchmark is not used so it deleted the whole benchmark.
May 06, 2020
On Wed, May 06, 2020 at 09:59:48AM +0000, welkam via Digitalmars-d-learn wrote:
> On Tuesday, 5 May 2020 at 20:29:13 UTC, Steven Schveighoffer wrote:
> > the optimizer recognizes what you are doing and changes your code to:
> > 
> > writeln(1_000_000_001);
> > 
> Oh yes a classic constant folding. The other thing to worry about is dead code elimination. Walter has a nice story where he sent his compiler for benchmarking and the compiler figured out that the the result of the calculation in benchmark is not used so it deleted the whole benchmark.

I remember one time I was doing some benchmarks between different compilers, and LDC consistently beat them all -- which is not surprising, but what was surprising was that running times were suspiciously short.  Curious to learn what magic code transformation LDC applied to make it run so incredibly fast, I took a look at the generated assembly.

Turns out, because I was calling the function being benchmarked with constant arguments, LDC decided to execute the entire danged thing at compile-time and substitute the entire function call with a single instruction that loaded its return value(!).

Another classic guffaw was when the function return value was simply discarded: LDC figured out that the function had no side-effects and its return value was not being used, so it deleted the function call, leaving the benchmark with the equivalent of:

	void main() {}

which, needless to say, beat all other benchmarks hands down. :-D

Lessons learned:

(1) Always use external input to your benchmark (e.g., load from a file,
so that an overly aggressive optimizer won't decide to execute the
entire program at compile-time);

(2) Always make use of the return value somehow, even if it's just to print 0 to stdout, or pipe the whole thing to /dev/null, so that the overly aggressive optimizer won't decide that since your program has no effect on the outside world, it should just consist of a single ret instruction. :-D


T

-- 
This is not a sentence.
May 06, 2020
On Tuesday, 5 May 2020 at 20:07:54 UTC, RegeleIONESCU wrote:
> [...]

Python should be ruled out, this is not its war :)

I have done benchmarks against NumPy if you are interested:
https://github.com/tastyminerals/mir_benchmarks