Jump to page: 1 24  
Page
Thread overview
Message passing between threads: Java 4 times faster than D
Feb 09, 2012
Nicolae Mihalache
Feb 09, 2012
Marco Leise
Feb 09, 2012
Alex_Dovhal
Feb 09, 2012
Alex_Dovhal
Feb 09, 2012
Gor Gyolchanyan
Feb 09, 2012
Marco Leise
Feb 09, 2012
Marco Leise
Feb 09, 2012
bearophile
Feb 09, 2012
Sean Kelly
Feb 09, 2012
Sean Kelly
Feb 09, 2012
Sean Kelly
Feb 09, 2012
dsimcha
Feb 09, 2012
Brad Anderson
Feb 09, 2012
Marco Leise
Feb 09, 2012
Sean Kelly
Feb 09, 2012
Timon Gehr
Feb 10, 2012
Oliver Plow
Feb 10, 2012
Jacob Carlborg
Feb 10, 2012
Artur Skawina
Feb 10, 2012
Sean Kelly
Feb 09, 2012
Graham St Jack
Feb 09, 2012
Martin Nowak
Feb 10, 2012
deadalnix
Feb 10, 2012
Martin Nowak
Feb 09, 2012
Sean Kelly
Feb 10, 2012
David Nadlinger
Feb 09, 2012
Sean Kelly
Feb 10, 2012
Sean Kelly
Feb 09, 2012
Nicolae Mihalache
Jun 12, 2020
mw
February 09, 2012
Hello,

I'm a complete newbie in D and trying to compare with Java. I implemented  a simple test for measuring the throughput in message passing between threads. I see that Java can pass about 4mil messages/sec while D only achieves 1mil/sec. I thought that D should be faster.

The messages are simply integers (which are converted to Integer in Java).

The two programs are attached. I tried compiling the D version with both dmd and gdc and various optimization flags.

mache


February 09, 2012
Am 09.02.2012, 10:06 Uhr, schrieb Nicolae Mihalache <xpromache@gmail.com>:

> Hello,
>
> I'm a complete newbie in D and trying to compare with Java. I
> implemented  a simple test for measuring the throughput in message
> passing between threads. I see that Java can pass about 4mil
> messages/sec while D only achieves 1mil/sec. I thought that D should
> be faster.
>
> The messages are simply integers (which are converted to Integer in Java).
>
> The two programs are attached. I tried compiling the D version with
> both dmd and gdc and various optimization flags.
>
> mache

I cannot give you an explanation, just want to say that a message in std.concurrency is also using a wrapper (a 'Variant') + a type field (standard, priority, linkDead). So you effectively have no optimization for int, but the same situation as in Java.
The second thing I notice is that std.concurrency uses a double linked list implementation, while you use an array in the Java version, which results in no additional node allocations.
February 09, 2012
"Nicolae Mihalache" <xpromache@gmail.com> wrote:
> Hello,
>
> I'm a complete newbie in D and trying to compare with Java. I implemented  a simple test for measuring the throughput in message passing between threads. I see that Java can pass about 4mil messages/sec while D only achieves 1mil/sec. I thought that D should be faster.
>
> The messages are simply integers (which are converted to Integer in Java).
>
> The two programs are attached. I tried compiling the D version with both dmd and gdc and various optimization flags.
>
> mache

Hi, I downloaded your two programs, I didn't run them but noticed that in
'mp.d'
you have n set to 100_000_000, while in 'ThroughputMpTest.java' n is set to
10_000_000, so with this D code is 10/4 = 2.5 times faster :)


February 09, 2012
Sorry, my mistake. It's strange to have different 'n', but you measure speed as 1000*n/time, so it's doesn't matter if n is 10 times bigger.


February 09, 2012
That would be funny but it's not true. I tested with different values, that's why I ended up uploading different versions.

The programs print the computed message rate and takes into account the number of messages.

mache





On Thu, Feb 9, 2012 at 11:57 AM, Alex_Dovhal <alex_dovhal@yahoo.com> wrote:
> "Nicolae Mihalache" <xpromache@gmail.com> wrote:
>> Hello,
>>
>> I'm a complete newbie in D and trying to compare with Java. I implemented  a simple test for measuring the throughput in message passing between threads. I see that Java can pass about 4mil messages/sec while D only achieves 1mil/sec. I thought that D should be faster.
>>
>> The messages are simply integers (which are converted to Integer in Java).
>>
>> The two programs are attached. I tried compiling the D version with both dmd and gdc and various optimization flags.
>>
>> mache
>
> Hi, I downloaded your two programs, I didn't run them but noticed that in
> 'mp.d'
> you have n set to 100_000_000, while in 'ThroughputMpTest.java' n is set to
> 10_000_000, so with this D code is 10/4 = 2.5 times faster :)
>
>
February 09, 2012
Generally, D's message passing is implemented in quite easy-to-use
way, but far from being fast.
I dislike the Variant structure, because it adds a huge overhead. I'd
rather have a templated message passing system with type-safe message
queue, so no Variant is necessary.
In specific cases Messages can be polymorphic objects. This will be
way faster, then Variant.

On Thu, Feb 9, 2012 at 3:12 PM, Alex_Dovhal <alex_dovhal@yahoo.com> wrote:
> Sorry, my mistake. It's strange to have different 'n', but you measure speed as 1000*n/time, so it's doesn't matter if n is 10 times bigger.
>
>



-- 
Bye,
Gor Gyolchanyan.
February 09, 2012
So a queue per message type?  How would ordering be preserved? Also, how would this work for interprocess messaging?  An array-based queue is an option however (though it would mean memmoves on receive), as are free-lists for nodes, etc.  I guess the easiest thing there would be a lock-free shared slist for the node free-list, though I couldn't weigh the chance of cache misses from using old memory blocks vs. just expecting the allocator to be fast.

On Feb 9, 2012, at 6:10 AM, Gor Gyolchanyan <gor.f.gyolchanyan@gmail.com> wrote:

> Generally, D's message passing is implemented in quite easy-to-use
> way, but far from being fast.
> I dislike the Variant structure, because it adds a huge overhead. I'd
> rather have a templated message passing system with type-safe message
> queue, so no Variant is necessary.
> In specific cases Messages can be polymorphic objects. This will be
> way faster, then Variant.
> 
> On Thu, Feb 9, 2012 at 3:12 PM, Alex_Dovhal <alex_dovhal@yahoo.com> wrote:
>> Sorry, my mistake. It's strange to have different 'n', but you measure speed as 1000*n/time, so it's doesn't matter if n is 10 times bigger.
>> 
>> 
> 
> 
> 
> -- 
> Bye,
> Gor Gyolchanyan.
February 09, 2012
I wonder how much it helps to just optimize the GC a little.  How much does the performance gap close when you use DMD 2.058 beta instead of 2.057?  This upcoming release has several new garbage collector optimizations.  If the GC is the bottleneck, then it's not surprising that anything that relies heavily on it is slow because D's GC is still fairly naive.

On Thursday, 9 February 2012 at 15:44:59 UTC, Sean Kelly wrote:
> So a queue per message type?  How would ordering be preserved? Also, how would this work for interprocess messaging?  An array-based queue is an option however (though it would mean memmoves on receive), as are free-lists for nodes, etc.  I guess the easiest thing there would be a lock-free shared slist for the node free-list, though I couldn't weigh the chance of cache misses from using old memory blocks vs. just expecting the allocator to be fast.
>
> On Feb 9, 2012, at 6:10 AM, Gor Gyolchanyan <gor.f.gyolchanyan@gmail.com> wrote:
>
>> Generally, D's message passing is implemented in quite easy-to-use
>> way, but far from being fast.
>> I dislike the Variant structure, because it adds a huge overhead. I'd
>> rather have a templated message passing system with type-safe message
>> queue, so no Variant is necessary.
>> In specific cases Messages can be polymorphic objects. This will be
>> way faster, then Variant.
>> 
>> On Thu, Feb 9, 2012 at 3:12 PM, Alex Dovhal <alex dovhal@yahoo.com> wrote:
>>> Sorry, my mistake. It's strange to have different 'n', but you measure speed
>>> as 1000*n/time, so it's doesn't matter if n is 10 times bigger.
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Bye,
>> Gor Gyolchanyan.


February 09, 2012
On Thu, Feb 9, 2012 at 9:22 AM, dsimcha <dsimcha@yahoo.com> wrote:

> I wonder how much it helps to just optimize the GC a little.  How much does the performance gap close when you use DMD 2.058 beta instead of 2.057?  This upcoming release has several new garbage collector optimizations.  If the GC is the bottleneck, then it's not surprising that anything that relies heavily on it is slow because D's GC is still fairly naive.
>
>
> On Thursday, 9 February 2012 at 15:44:59 UTC, Sean Kelly wrote:
>
>> So a queue per message type?  How would ordering be preserved? Also, how would this work for interprocess messaging?  An array-based queue is an option however (though it would mean memmoves on receive), as are free-lists for nodes, etc.  I guess the easiest thing there would be a lock-free shared slist for the node free-list, though I couldn't weigh the chance of cache misses from using old memory blocks vs. just expecting the allocator to be fast.
>>
>> On Feb 9, 2012, at 6:10 AM, Gor Gyolchanyan <gor.f.gyolchanyan@gmail.com> wrote:
>>
>>  Generally, D's message passing is implemented in quite easy-to-use
>>> way, but far from being fast.
>>> I dislike the Variant structure, because it adds a huge overhead. I'd
>>> rather have a templated message passing system with type-safe message
>>> queue, so no Variant is necessary.
>>> In specific cases Messages can be polymorphic objects. This will be
>>> way faster, then Variant.
>>>
>>> On Thu, Feb 9, 2012 at 3:12 PM, Alex Dovhal <alex dovhal@yahoo.com> wrote:
>>>
>>>> Sorry, my mistake. It's strange to have different 'n', but you measure
>>>> speed
>>>> as 1000*n/time, so it's doesn't matter if n is 10 times bigger.
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Bye,
>>> Gor Gyolchanyan.
>>>
>>
>
>
dmd 2.057:
received 100000000 messages in 192034 msec sum=4999999950000000
speed=520741 msg/sec
received 100000000 messages in 84118 msec sum=4999999950000000
speed=1188806 msg/sec
received 100000000 messages in 88274 msec sum=4999999950000000
speed=1132836 msg/sec

dmd 2.058 beta:
received 100000000 messages in 93539 msec sum=4999999950000000
speed=1069072 msg/sec
received 100000000 messages in 96422 msec sum=4999999950000000
speed=1037107 msg/sec
received 100000000 messages in 203961 msec sum=4999999950000000
speed=490289 msg/sec

Both versions would inexplicably run at approximately half the speed sometimes. I have no idea what is up with that.  I have no java development environment to test for comparison.  This machine has 4 cores and is running Windows.

Regards,
Brad Anderson


February 09, 2012
On 2/9/12 6:10 AM, Gor Gyolchanyan wrote:
> Generally, D's message passing is implemented in quite easy-to-use
> way, but far from being fast.
> I dislike the Variant structure, because it adds a huge overhead. I'd
> rather have a templated message passing system with type-safe message
> queue, so no Variant is necessary.
> In specific cases Messages can be polymorphic objects. This will be
> way faster, then Variant.

cc Sean Kelly

I haven't looked at the implementation, but one possible liability is that large messages don't fit in a Variant and must use dynamic allocation under the wraps. There are a number of ways to avoid that, such as parallel arrays (one array per type for data and one for the additional tags).

We must make the message passing subsystem to not use any memory allocation in the quiescent state. If we're doing one allocation per message passed, that might explain the 4x performance difference (I have no trouble figuring Java's allocator is this much faster than D's).


Andrei
« First   ‹ Prev
1 2 3 4