June 04, 2011
On 06/04/2011 04:25 AM, Walter Bright wrote:
> On 6/3/2011 1:24 PM, bearophile wrote:
>> The Ada language has a syntax to write those names at the closing
>> ends, and
>> the Ada compiler enforces such names to be always coherent and
>> correct. In
>> C/C++/D unfortunately such names (written as comments) may go out of
>> sync.
>
> I never understood the point of such.
>
> My editor has a single key "find matching { [ < ( ) > ] }" command, and
> so I never have a need for such ugly comments. In fact I usually delete
> such comments when I encounter them.

I thought you were big on printed out code reviews and not requiring any editing features from the language?
June 04, 2011
On 06/04/2011 02:10 AM, Caligo wrote:
> And just to be fair to C++:
>
> g++ -O2 -m32
> [VIRT: 94MB,  RES: 92MB]
> real    0m24.567s
> user    0m24.500s
> sys     0m0.060s

Thanks to all who participated to this. I've shared these results on reddit:

http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/


Andrei
June 04, 2011
Am 04.06.2011 16:37, schrieb Andrei Alexandrescu:
> On 06/04/2011 02:10 AM, Caligo wrote:
>> And just to be fair to C++:
>>
>> g++ -O2 -m32
>> [VIRT: 94MB, RES: 92MB]
>> real 0m24.567s
>> user 0m24.500s
>> sys 0m0.060s
>
> Thanks to all who participated to this. I've shared these results on
> reddit:
>
> http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/
>
>
>
> Andrei

Could you please give a link to the post. I can't find it.
June 04, 2011
On 6/4/2011 9:15 AM, bearophile wrote:
> Walter:
>
>> It would be nice to figure out what is different. Try using the coverage
>> analyzer and profiler for starters!
>
> There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests.
>
> Bye,
> bearophile

That's probably right, for two reasons:

1.  Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code.

2.  A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't.
June 04, 2011
Am 04.06.2011 16:54, schrieb dsimcha:
> On 6/4/2011 9:15 AM, bearophile wrote:
>> Walter:
>>
>>> It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!
>>
>> There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests.
>>
>> Bye,
>> bearophile
> 
> That's probably right, for two reasons:
> 
> 1.  Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code.
> 

Besides better optimization by the compiler,
see Adam Ruppe's post, 3 posts up:

> On my computer, the D version ran slightly faster (56 seconds vs 63s >
for C++) without optimizations turned on.
>
> With optimizations turned on, C++ took a nice lead (28 seconds vs 53
> seconds for D).

So it seems like it's not all the GCs fault.

> 2.  A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't.

June 04, 2011
> Andrei:
>
>> Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
>
> First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d
>
> Second version, with all structs: http://codepad.org/etsLsZV5
>
> Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >
then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).
>
> The number of instances allocated:
> Class instances:
> SimpleLoop_counter            3_936_102
> LoopStructureGraph_counter       15_051
> UnionFindNode_counter        13_017_663
> HavlakLoopFinder_counter         15_051
> BasicBlockEdge_counter          378_036
> BasicBlock_counter              252_013
> MaoCFG_counter                        1
>
> UnionFindNode probably will give some gain if allocated from a pool.
>
> Later,
> bearophile

Your port segfaults DMD 2.053 with the -g flag (at least on linux).
@Andrei: You may want to point out on reddit that the code is approx. a 1 to 1
port of the C++ code and not specially tuned.

Timon
June 04, 2011
On 6/4/2011 6:15 AM, bearophile wrote:
> There are little differences and inefficiencies here and there, but in the
> second D version I think most of the performance difference over the C++ code
> is caused by the GC. I will do some tests.

Easy to test, simply disable the gc.
June 04, 2011
On 6/4/2011 7:14 AM, Jeff Nowakowski wrote:
> I thought you were big on printed out code reviews and not requiring any editing
> features from the language?

I don't find those comments useful in printed code either.
June 04, 2011
> Andrei:
>
>> Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
>
> First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d
>
> Second version, with all structs: http://codepad.org/etsLsZV5
>
> Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >
then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).
>
> The number of instances allocated:
> Class instances:
> SimpleLoop_counter            3_936_102
> LoopStructureGraph_counter       15_051
> UnionFindNode_counter        13_017_663
> HavlakLoopFinder_counter         15_051
> BasicBlockEdge_counter          378_036
> BasicBlock_counter              252_013
> MaoCFG_counter                        1
>
> UnionFindNode probably will give some gain if allocated from a pool.
>
> Later,
> bearophile

One simple but very powerful optimization is to minimize the runs of the GC. I
have added a call to GC.disable(); in the beginning of main and then added a
GC.collect(); after each 10 test runs.

Results on my machine (32bit executables):

C++ (-O2): 30.7s, ~170MB.
D (-release -O -inline): 29.5s, ~520MB

Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;).

Timon
June 04, 2011
Am 04.06.2011 18:25, schrieb Timon Gehr:
>> Andrei:
>>
>>> Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
>>
>> First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d
>>
>> Second version, with all structs: http://codepad.org/etsLsZV5
>>
>> Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >
> then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).
>>
>> The number of instances allocated:
>> Class instances:
>> SimpleLoop_counter            3_936_102
>> LoopStructureGraph_counter       15_051
>> UnionFindNode_counter        13_017_663
>> HavlakLoopFinder_counter         15_051
>> BasicBlockEdge_counter          378_036
>> BasicBlock_counter              252_013
>> MaoCFG_counter                        1
>>
>> UnionFindNode probably will give some gain if allocated from a pool.
>>
>> Later,
>> bearophile
> 
> One simple but very powerful optimization is to minimize the runs of the GC. I
> have added a call to GC.disable(); in the beginning of main and then added a
> GC.collect(); after each 10 test runs.
> 
> Results on my machine (32bit executables):
> 
> C++ (-O2): 30.7s, ~170MB.
> D (-release -O -inline): 29.5s, ~520MB
> 
> Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;).
> 
> Timon

What was your time for D without disabling the GC? Probably 40-50s? This certainly is a big improvement, I didn't think the GC slows it down that much.

What'd be really interesting is the benchmark with a D-style implementation of the code (if I understood correctly the current versions are more or less direct translations of the C++ code to D).

Cheers,
- Daniel