June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 06/04/2011 04:25 AM, Walter Bright wrote:
> On 6/3/2011 1:24 PM, bearophile wrote:
>> The Ada language has a syntax to write those names at the closing
>> ends, and
>> the Ada compiler enforces such names to be always coherent and
>> correct. In
>> C/C++/D unfortunately such names (written as comments) may go out of
>> sync.
>
> I never understood the point of such.
>
> My editor has a single key "find matching { [ < ( ) > ] }" command, and
> so I never have a need for such ugly comments. In fact I usually delete
> such comments when I encounter them.
I thought you were big on printed out code reviews and not requiring any editing features from the language?
| |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Caligo | On 06/04/2011 02:10 AM, Caligo wrote: > And just to be fair to C++: > > g++ -O2 -m32 > [VIRT: 94MB, RES: 92MB] > real 0m24.567s > user 0m24.500s > sys 0m0.060s Thanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ Andrei | |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Am 04.06.2011 16:37, schrieb Andrei Alexandrescu:
> On 06/04/2011 02:10 AM, Caligo wrote:
>> And just to be fair to C++:
>>
>> g++ -O2 -m32
>> [VIRT: 94MB, RES: 92MB]
>> real 0m24.567s
>> user 0m24.500s
>> sys 0m0.060s
>
> Thanks to all who participated to this. I've shared these results on
> reddit:
>
> http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/
>
>
>
> Andrei
Could you please give a link to the post. I can't find it.
| |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 6/4/2011 9:15 AM, bearophile wrote:
> Walter:
>
>> It would be nice to figure out what is different. Try using the coverage
>> analyzer and profiler for starters!
>
> There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests.
>
> Bye,
> bearophile
That's probably right, for two reasons:
1. Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code.
2. A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't.
| |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | Am 04.06.2011 16:54, schrieb dsimcha: > On 6/4/2011 9:15 AM, bearophile wrote: >> Walter: >> >>> It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters! >> >> There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests. >> >> Bye, >> bearophile > > That's probably right, for two reasons: > > 1. Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code. > Besides better optimization by the compiler, see Adam Ruppe's post, 3 posts up: > On my computer, the D version ran slightly faster (56 seconds vs 63s > for C++) without optimizations turned on. > > With optimizations turned on, C++ took a nice lead (28 seconds vs 53 > seconds for D). So it seems like it's not all the GCs fault. > 2. A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't. | |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | > Andrei: > >> Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer. > > First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d > > Second version, with all structs: http://codepad.org/etsLsZV5 > > Tomorrow I'll de-optimize it a bit replacing some structs with classes. And > then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper). > > The number of instances allocated: > Class instances: > SimpleLoop_counter 3_936_102 > LoopStructureGraph_counter 15_051 > UnionFindNode_counter 13_017_663 > HavlakLoopFinder_counter 15_051 > BasicBlockEdge_counter 378_036 > BasicBlock_counter 252_013 > MaoCFG_counter 1 > > UnionFindNode probably will give some gain if allocated from a pool. > > Later, > bearophile Your port segfaults DMD 2.053 with the -g flag (at least on linux). @Andrei: You may want to point out on reddit that the code is approx. a 1 to 1 port of the C++ code and not specially tuned. Timon | |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 6/4/2011 6:15 AM, bearophile wrote:
> There are little differences and inefficiencies here and there, but in the
> second D version I think most of the performance difference over the C++ code
> is caused by the GC. I will do some tests.
Easy to test, simply disable the gc.
| |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jeff Nowakowski | On 6/4/2011 7:14 AM, Jeff Nowakowski wrote:
> I thought you were big on printed out code reviews and not requiring any editing
> features from the language?
I don't find those comments useful in printed code either.
| |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | > Andrei: > >> Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer. > > First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d > > Second version, with all structs: http://codepad.org/etsLsZV5 > > Tomorrow I'll de-optimize it a bit replacing some structs with classes. And > then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper). > > The number of instances allocated: > Class instances: > SimpleLoop_counter 3_936_102 > LoopStructureGraph_counter 15_051 > UnionFindNode_counter 13_017_663 > HavlakLoopFinder_counter 15_051 > BasicBlockEdge_counter 378_036 > BasicBlock_counter 252_013 > MaoCFG_counter 1 > > UnionFindNode probably will give some gain if allocated from a pool. > > Later, > bearophile One simple but very powerful optimization is to minimize the runs of the GC. I have added a call to GC.disable(); in the beginning of main and then added a GC.collect(); after each 10 test runs. Results on my machine (32bit executables): C++ (-O2): 30.7s, ~170MB. D (-release -O -inline): 29.5s, ~520MB Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;). Timon | |||
June 04, 2011 Re: Port a benchmark to D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | Am 04.06.2011 18:25, schrieb Timon Gehr:
>> Andrei:
>>
>>> Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
>>
>> First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d
>>
>> Second version, with all structs: http://codepad.org/etsLsZV5
>>
>> Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >
> then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).
>>
>> The number of instances allocated:
>> Class instances:
>> SimpleLoop_counter 3_936_102
>> LoopStructureGraph_counter 15_051
>> UnionFindNode_counter 13_017_663
>> HavlakLoopFinder_counter 15_051
>> BasicBlockEdge_counter 378_036
>> BasicBlock_counter 252_013
>> MaoCFG_counter 1
>>
>> UnionFindNode probably will give some gain if allocated from a pool.
>>
>> Later,
>> bearophile
>
> One simple but very powerful optimization is to minimize the runs of the GC. I
> have added a call to GC.disable(); in the beginning of main and then added a
> GC.collect(); after each 10 test runs.
>
> Results on my machine (32bit executables):
>
> C++ (-O2): 30.7s, ~170MB.
> D (-release -O -inline): 29.5s, ~520MB
>
> Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;).
>
> Timon
What was your time for D without disabling the GC? Probably 40-50s? This certainly is a big improvement, I didn't think the GC slows it down that much.
What'd be really interesting is the benchmark with a D-style implementation of the code (if I understood correctly the current versions are more or less direct translations of the C++ code to D).
Cheers,
- Daniel
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply