June 03, 2009
What's the difference between:
>   D 1:  40.20 DMD
>   D 2:  21.83 DMD
>   D 2:  18.80 DMD, struct + scope

and:
>   D 1:   8.47 DMD
>   D 2:   7.41 DMD + scope

...?
June 03, 2009
Robert Fraser:

> What's the difference between:
> >   D 1:  40.20 DMD
> >   D 2:  21.83 DMD

That's the standard code.

> and:
> >   D 1:   8.47 DMD
> >   D 2:   7.41 DMD + scope

They are both with scope, on D1 and D2.
Sorry for my small omission.

Bye,
bearophile
June 04, 2009
bearophile Wrote:

> I have tried the new JavaVM on Win, that optionally performs escape analysis, and the results are nice:
> 
> Timings, N=100_000_000, Windows, seconds:
>   D 1:  40.20 DMD
>   D 2:  21.83 DMD
>   D 2:  18.80 DMD, struct + scope
>   C++:  18.06
>   D 1:   8.47 DMD
>   D 2:   7.41 DMD + scope
>   Java:  1.84 V.1.6.0_14, -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
>   Java   1.78 -server
>   Java:  1.44
>   Java:  1.38 V.1.6.0_14
>   Java:  0.28 V.1.6.0_14, -server -XX:+DoEscapeAnalysis
> 
> Timings, N=100_000_000, Pubuntu, seconds:
>   D 1:  25.7  LDC
>   C++:   6.87
>   D 1:   2.67 LDC + scope
>   Java:  1.49
> 
> Bye,
> bearophile

Sorry for my stepping in...

What does this result mean?Does it mean D is slower than Java and C++ is also slower than Java?Or that's true just under  certain circumstance?
I am really confused and really appreicate if any further explanation.


Regards,
Sam
June 04, 2009
Sam Hu wrote:
> bearophile Wrote:
> 
>> I have tried the new JavaVM on Win, that optionally performs escape analysis, and the results are nice:
>>
>> Timings, N=100_000_000, Windows, seconds:
>>   D 1:  40.20 DMD
>>   D 2:  21.83 DMD
>>   D 2:  18.80 DMD, struct + scope
>>   C++:  18.06
>>   D 1:   8.47 DMD
>>   D 2:   7.41 DMD + scope
>>   Java:  1.84 V.1.6.0_14, -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
>>   Java   1.78 -server
>>   Java:  1.44
>>   Java:  1.38 V.1.6.0_14
>>   Java:  0.28 V.1.6.0_14, -server -XX:+DoEscapeAnalysis
>>   Timings, N=100_000_000, Pubuntu, seconds:
>>   D 1:  25.7  LDC
>>   C++:   6.87
>>   D 1:   2.67 LDC + scope
>>   Java:  1.49
>>
>> Bye,
>> bearophile
> 
> Sorry for my stepping in...
> 
> What does this result mean?Does it mean D is slower than Java and C++ is also slower than Java?Or that's true just under  certain circumstance?
> I am really confused and really appreicate if any further explanation.
> 
> 
> Regards,
> Sam

It suggests that for dynamic allocation of many small objects via "new", Java is an order of magnitude faster than C++, which in turn is slightly faster than D.
June 04, 2009
Sam Hu wrote:
> What does this result mean?Does it mean D is slower than Java and C++ is also slower than Java?Or that's true just under  certain circumstance? I am really confused and really appreicate if any further explanation.

These are the timings for using dynamic memory allocation:
>>   D 1:  40.20 DMD
>>   D 2:  21.83 DMD
>>   C++:  18.06
>>   Java:  1.38 V.1.6.0_14

Java is the fastest by a large margin because it has the benefit of a moving garbage collector.  This means allocation is a simple pointer bump and deallocation is completely free.  The slow aspects of this garbage collector (detection and preservation) aren't really tested by this benchmark.

These are the timings without dynamic memory allocation:
>>   D 1:   8.47 DMD [+ scope]
>>   D 2:   7.41 DMD + scope
>>   Java:  0.28 V.1.6.0_14, -server -XX:+DoEscapeAnalysis

D's performance is unexpectedly bad, so much that I expect that it might be using dynamic memory allocation anyway despite the 'scope' keyword. Java is clever in that it eliminates unnecessary dynamic memory allocations automatically.  C++ is notable absent, but I fully expect it to outperform Java by a significant margin.


-- 
Rainer Deyke - rainerd@eldwood.com
June 04, 2009
> D's performance is unexpectedly bad, so much that I expect that it might be using dynamic memory allocation anyway despite the 'scope' keyword.

I am sorry to hear that,really,really sorry.
June 04, 2009
Rainer Deyke:

>The slow aspects of this garbage collector (detection and preservation) aren't really tested by this benchmark.<

In practice most times real-world Java programs show a good enough performance even taking in account detection and preservation too.


>These are the timings without dynamic memory allocation:
>>>   D 1:   8.47 DMD [+ scope]
>>>   D 2:   7.41 DMD + scope
>>>   Java:  0.28 V.1.6.0_14, -server -XX:+DoEscapeAnalysis

It's not exactly the same, because in that Java code I have used a program-wide optimization flag (that I guess will become default), while in D I have had to add a "scope" everywhere, and I think adding "scope" is less safe than letting the compiler perform an escape analysis.

So I am tempted to put the 0.28 seconds result among the dynamic allocation timings, even if technically it is not, because for the programmer the program "feels" and looks and acts like a dynamic allocation, it's just faster :-) In the end what counts is how well the programs runs after the compiler has done its work.


>D's performance is unexpectedly bad, so much that I expect that it might be using dynamic memory allocation anyway despite the 'scope' keyword. Java is clever in that it eliminates unnecessary dynamic memory allocations automatically.<

I think Java here is doing a bit more than just removing the dynamic allocation. I don't think D (compiled with LDC) is doing doing any allocation here. I'll ask to the LDC IRC channel.
I'll also take a look at the asm generated by the JavaVM (it's not handy to find the asm generated by the JVM, you need to install a debug version of it... how stupid).

-------------------------

Sam Hu:

>I am sorry to hear that,really,really sorry.<

Wait, things may not be that bad. And even if they are bad, the developers of the LDC compiler may find ways to improve the situation.

-------------------------

Robert Fraser:

>It suggests that for dynamic allocation of many small objects via "new", Java is an order of magnitude faster than C++, which in turn is slightly faster than D.<

Yes, for such tiny benchmarks I have seen several times 10-12 higher allocation performance in Java compared to D1-DMD. But real programs don't use all their time allocating and freeing memory...

Bye,
bearophile
June 04, 2009
bearophile wrote:
> Rainer Deyke:
>> D's performance is unexpectedly bad, so much that I expect that it might be using dynamic memory allocation anyway despite the 'scope' keyword. Java is clever in that it eliminates unnecessary dynamic memory allocations automatically.<
> 
> I think Java here is doing a bit more than just removing the dynamic allocation. I don't think D (compiled with LDC) is doing doing any allocation here. I'll ask to the LDC IRC channel.

LDC actually still does a dynamic allocation there because it doesn't eliminate dynamic allocations in loops.
This is unfortunate, but I haven't yet had the time to figure out how to get the optimization passes to prove the allocation can't be live when reached again. (If multiple instances of memory allocated at the same allocation site may be reachable at the same time, it's not safe to use a stack allocation instead of a heap allocation)

It's on my to-do list, though.
June 04, 2009
Frits van Bommel:
> LDC actually still does a dynamic allocation there because it doesn't eliminate dynamic allocations in loops.

I have compiled the loop in foo() with LDC:

class AllocationItem {
    int value;
    this(int v) { this.value = v; }
}
int foo(int iters) {
    int sum = 0;
    for (int i = 0; i < iters; ++i) {
        scope auto item = new AllocationItem(i);
        sum += item.value;
    }
    return sum;
}

The asm of the core of the loop:

.LBB2_2:
    movl    $_D11gc_test2b_d14AllocationItem6__vtblZ, 8(%esp)
    movl    $0, 12(%esp)
    movl    %edi, 16(%esp)
    movl    %ebx, (%esp)
    call    _d_callfinalizer
    incl    %edi
    cmpl    %esi, %edi
    jne .LBB2_2

I can see a call to finalizer, but not the allocation?


> This is unfortunate, but I haven't yet had the time to figure out how to get the optimization passes to prove the allocation can't be live when reached again. (If multiple instances of memory allocated at the same allocation site may be reachable at the same time, it's not safe to use a stack allocation instead of a heap allocation)

The new JavaVM with the option I have shown is clearly able to do such things.
Can't you take a look at the source code of the JavaVM? :-)
There's a huge amount of NIH in the open source :-)

Bye,
bearophile
June 04, 2009
bearophile wrote:
> Frits van Bommel:
>> LDC actually still does a dynamic allocation there because it doesn't eliminate dynamic allocations in loops.
> 
> I have compiled the loop in foo() with LDC:
> 
[snip]
>         scope auto item = new AllocationItem(i);
[snip]
> 
> The asm of the core of the loop:
> 
> .LBB2_2:
>     movl    $_D11gc_test2b_d14AllocationItem6__vtblZ, 8(%esp)
>     movl    $0, 12(%esp)
>     movl    %edi, 16(%esp)
>     movl    %ebx, (%esp)
>     call    _d_callfinalizer
>     incl    %edi
>     cmpl    %esi, %edi
>     jne .LBB2_2
> 
> I can see a call to finalizer, but not the allocation?

Sorry, I thought we were talking about the code without 'scope'. Of course the class is indeed stack-allocated if you use scope.

(The following:

>> This is unfortunate, but I haven't yet had the time to figure out how to get the optimization passes to prove the allocation can't be live when reached again. (If multiple instances of memory allocated at the same allocation site may be reachable at the same time, it's not safe to use a stack allocation instead of a heap allocation)

only applies when 'scope' was not used, and the compiler therefore initially heap-allocated it)

> The new JavaVM with the option I have shown is clearly able to do such things.
> Can't you take a look at the source code of the JavaVM? :-)
> There's a huge amount of NIH in the open source :-)

I suspect the Java VM uses a different internal representation of the code than LLVM does...