May 30, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Thomas | On Friday, 30 May 2014 at 13:35:59 UTC, Thomas wrote:
> gdc ./source/perf/testperf.d -frelease -o testperf
This effectively compiles the program without optimizations. Try -O3 or -Ofast.
David
| |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Thomas | Run this with: -O3 -frelease -fno-assert -fno-bounds-check -march=native
This way GCC and LLVM will recognize that you alternately add
p0 and p1 to the sum and partially unroll the loop, thereby
removing the condition. It takes 1.4xxxx nanoseconds per step
on my not so new 2.0 Ghz notebook, so I assume your PC will
easily reach parity with your original C++ version.
import std.stdio;
import core.time;
alias ℕ = size_t;
void main()
{
run!plus(1_000_000_000);
}
double plus(ℕ steps)
{
enum p0 = 0.0045;
enum p1 = 1.00045452 - p0;
double sum = 1.346346;
foreach (i; 0 .. steps)
sum += i%2 ? p1 : p0;
return sum;
}
void run(alias func)(ℕ steps)
{
auto t1 = TickDuration.currSystemTick;
auto output = func(steps);
auto t2 = TickDuration.currSystemTick;
auto nanotime = 1_000_000_000.0 / steps * (t2 - t1).length / TickDuration.ticksPerSec;
writefln("Last: %s", output);
writefln("Time per op: %s", nanotime);
writeln();
}
--
Marco
| |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Thomas | faulty benchmark
-do not benchmark "format"
-use a dummy-var - just add(overflow is not a problem) your plus() results to it and return that in your main - preventing dead code optimization in any way
-introduce some sort of random-value into your plus() code, for example
use an random-generator or the int-casted pointer to program args as startup value
-do not benchmark anything without millions of loops - use the average as the result
anything else does not makes sense
Am 30.05.2014 15:35, schrieb Thomas:
> I made the following performance test, which adds 10^9 Doubleâs
> on Linux with the latest dmd compiler in the Eclipse IDE and with
> the Gdc-Compiler also on Linux. Then the same test was done with
> C++ on Linux and with Scala in the Java ecosystem on Linux. All
> the testing was done on the same PC.
> The results for one addition are:
>
> D-DMD: 3.1 nanoseconds
> D-GDC: 3.8 nanoseconds
> C++: 1.0 nanoseconds
> Scala: 1.0 nanoseconds
>
>
> D-Source:
>
> import std.stdio;
> import std.datetime;
> import std.string;
> import core.time;
>
>
> void main() {
> run!(plus)( 1000*1000*1000 );
> }
>
> class C {
> }
>
> string plus( int steps ) {
> double sum = 1.346346;
> immutable double p0 = 0.0045;
> immutable double p1 = 1.00045452-p0;
> auto b = true;
> for( int i=0; i<steps; i++){
> switch( b ){
> case true :
> sum += p0;
> break;
> default:
> sum += p1;
> break;
> }
> b = !b;
> }
> return (format("%s %f","plus\nLast: ", sum) );
> // return ("plus\nLast: ", sum );
> }
>
>
> void run( alias func )( int steps )
> if( is(typeof(func(steps)) == string)) {
> auto begin = Clock.currStdTime();
> string output = func( steps );
> auto end = Clock.currStdTime();
> double nanotime = toNanos(end-begin)/steps;
> writeln( output );
> writeln( "Time per op: " , nanotime );
> writeln( );
> }
>
> double toNanos( long hns ) { return hns*100.0; }
>
>
> Compiler settings for D:
>
> dmd -c
> -of.dub/build/application-release-nobounds-linux.posix-x86-dmd-DF74188E055ED2E8ADD9C152107A632F/first.o
> -release -inline -noboundscheck -O -w -version=Have_first
> -Isource source/perf/testperf.d
>
> gdc ./source/perf/testperf.d -frelease -o testperf
>
> So what is the problem ? Are the compiler switches wrong ? Or is
> D on the used compilers so slow ? Can you help me.
>
>
> Thomas
>
>
| |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | On Sat, 2014-05-31 at 07:32 +0200, dennis luehring via Digitalmars-d wrote: > faulty benchmark Indeed. > -do not benchmark "format" > > -use a dummy-var - just add(overflow is not a problem) your plus() results to it and return that in your main - preventing dead code optimization in any way > > -introduce some sort of random-value into your plus() code, for example use an random-generator or the int-casted pointer to program args as startup value > > -do not benchmark anything without millions of loops - use the average as the result > > anything else does not makes sense As well as the average (mean), you must provide standard deviation and degrees of freedom so that a proper error analysis and t-tests are feasible. Or put it another way: even if you quote a mean with knowing how many in the sample and what the spread is you cannot judge the error and so cannot make deductions or inferences. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder | |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Fri, 2014-05-30 at 19:58 +0000, bearophile via Digitalmars-d wrote: > Russel Winder: > > > A priori I would believe there a problem with these numbers: my experience of CPU-bound D code is that it is generally as fast as C++. > > The C++ code I've shown above if compiled with -Ofast seems faster than the D code compiled with ldc2. I am assuming you are comparing C++/clang with D/ldc2, it is only reasonable to compare C++/g++ with D/gdc. I am not sure about other compilers. Of course there is then the question of whether C++/clang is better/worse than C++/g++. Lots of fun experimentation and data analysis to be had here, if only there were microbenchmarking frameworks for C++ as well as D ;-) -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder | |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | Am 31.05.2014 08:36, schrieb Russel Winder via Digitalmars-d:
> As well as the average (mean), you must provide standard deviation and
> degrees of freedom so that a proper error analysis and t-tests are
> feasible.
average means average of benchmarked times
and the dummy values are only for keeping the compiler from removing
anything it can reduce at compiletime - that makes benchmarks compareable, these values does not change the algorithm or result quality an any way - its more like an overflowing-second-output bases on the result of the original algorithm (but should be just a simple addition or substraction - ignoring overflow etc.)
thats the base of all types of non-stupid benchmarking - next/pro step is to look at the resulting assemblercode
| |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | Am 31.05.2014 13:25, schrieb dennis luehring: > Am 31.05.2014 08:36, schrieb Russel Winder via Digitalmars-d: >> As well as the average (mean), you must provide standard deviation and >> degrees of freedom so that a proper error analysis and t-tests are >> feasible. > > average means average of benchmarked times > > and the dummy values are only for keeping the compiler from removing > anything it can reduce at compiletime - that makes benchmarks > compareable, these values does not change the algorithm or result > quality an any way - its more like an overflowing-second-output bases on > the result of the original algorithm (but should be just a simple > addition or substraction - ignoring overflow etc.) > > thats the base of all types of non-stupid benchmarking - next/pro step > is to look at the resulting assemblercode > so the anti-optimizer-overflowing-second-output aka AOOSO should be initialized outside of the testfunction with an random-value - i normaly use the pointer to the main args as int the AOOSO should be incremented by the needed result of the benchmarked algorithm - that could be an int casted float/double value, the variant size of an string or whatever is floaty and needed enough to be used and then return the AOOSO as main return so the original algorithm isn't changed but the compiler got absolutely nothing to prevent the usage and the end output of this AOOSO dummy value yes it ignores that the code-size (cache problems) is changed by the AOOSO incrementation - thats the reason for simple casting/overflowing integer stuff here, but if the benchmarking goes that deep you should better take a look at the assembler-level | |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | On 5/30/14, 10:32 PM, dennis luehring wrote:
> -do not benchmark anything without millions of loops - use the average
> as the result
Use the minimum unless networking is involved. -- Andrei
| |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | On 5/30/14, 11:36 PM, Russel Winder via Digitalmars-d wrote:
> As well as the average (mean), you must provide standard deviation and
> degrees of freedom so that a proper error analysis and t-tests are
> feasible. Or put it another way: even if you quote a mean with knowing
> how many in the sample and what the spread is you cannot judge the error
> and so cannot make deductions or inferences.
No. Elapsed time in a benchmark does not follow a Student or Gaussian distribution. Use the mode or (better) the minimum. -- Andrei
| |||
May 31, 2014 Re: Performance | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sat, 2014-05-31 at 07:02 -0700, Andrei Alexandrescu via Digitalmars-d wrote: > On 5/30/14, 11:36 PM, Russel Winder via Digitalmars-d wrote: > > As well as the average (mean), you must provide standard deviation and degrees of freedom so that a proper error analysis and t-tests are feasible. Or put it another way: even if you quote a mean with knowing how many in the sample and what the spread is you cannot judge the error and so cannot make deductions or inferences. > > No. Elapsed time in a benchmark does not follow a Student or Gaussian distribution. Use the mode or (better) the minimum. -- Andrei We almost certainly need to unpack that more. I agree that behind my comment was an implicit assumption of a normal distribution of results. This is an easy assumption to make even if it is wrong. So is it provably wrong? What is the distribution? If we know that then there is knowledge of the parameters which then allow for statistical inference and deduction. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply