Thread overview
std.concurrency, speed, etc.
Feb 04, 2011
Adam Conner-Sax
Feb 04, 2011
bearophile
Feb 05, 2011
Sean Kelly
Feb 05, 2011
Adam Conner-Sax
Feb 05, 2011
Jonathan M Davis
February 04, 2011
Attached (over this and the next post) are 3 D files and sample output (compiled with "dmd -O -release -inline Queue_Tester.d Queue_Examples.d PrettyPrint.d" on my system, OSX 10.6.6, 2x2.8GHz quad-core Xeon) from a multi-threaded queuing test-bed.

I wrote the tester as an exercise in learning D.  The language is great; perfect for me as someone who loved generics in C++ but found that all the cool things you could do got ugly and messy fast.

The tester loops over a few different sorts of queues (a locking queue using a lock in the queue/dequeue functions,a lock free queue using hazard pointers, std.concurrency message passing) and a few different test scenarios (number of threads, number of messages, message rates) and measures the statistics of the latencies (I stamp the time when the message is formed and enqueued and then when it's dequeued).

Anyway, after writing and debugging this, I'm left with some questions (note, I imagine the answers to all these may be that I coded things wrong or badly. I welcome that answer as long as it comes with a hint for how to do better!)

1) I couldn't get the synchronized class version (as opposed to using
synchronized statements in the functions) to run.  It would hang in odd ways.
 This may be related to a bug I reported earlier (and Sean was helpful enough
to fix!) so this may be moot.

2) Message passing is slow in my tests.  Often an order of magnitude or more slower than the fastest (lock-free queue).  I expect to pay some price for the convenience, etc. but that seems excessive.

3) I've built and run on windows and linux.  The windows version works fine
but the linux version seems to have some issue with the timestamping (using
systime()), often the latencies came through as 0 (and I'm using
toMicroseconds!double() so I should see any ticks at all).  Is there a known
Linux bug or issue with systime().

4) I still don't totally understand shared.  It does what I expect when variables are static.  That's why all the queues are static objects.  But that doesn't scale so well (I know I could set up static factories for static objects but that seems like it shouldn't be necessary).  When I put an unshared variable in a non-static class and then use the class from multiple threads, the variable acts shared.  Is that a bug or a feature?

Thanks for any and all thoughts.

Adam
February 04, 2011
Adam Conner-Sax:

> I wrote the tester as an exercise in learning D.  The language is great; perfect for me as someone who loved generics in C++ but found that all the cool things you could do got ugly and messy fast.

Few notes on the form of your code:
- I suggest to use module names all in lowercase

Some alternative ways to write some of your code:

auto sum = reduce!("a+b")(0.0,latency_data);
==>
auto sum = reduce!q{a + b}(0.0, latencyData);


auto sd = reduce!(f)(0.0,latency_data);
==>
auto sd = reduce!f(0.0, latencyData);

  Test_Parameters[] tests;
  tests ~= Test_Parameters(1,10,0,0);
  tests ~= Test_Parameters(4,10,0,0);
...
==>
  auto tests = [TestParameters(1,10,0,0),
                TestParameters(4,10,0,0), ...

immutable int[] widths = [11,10,10,10,10,10,10,10];
==>
enum int[] widths = [11, 10, 10, 10, 10, 10, 10, 10];


debug (5) { printf("Rec'd: (pkt %i) %.*s\n",received,QT.package_tostring(p)); }
==>
debug(5) printf("Rec'd: (pkt %i) %.*s\n", received, QT.packageToString(p));

Bye,
bearophile
February 05, 2011
Adam Conner-Sax Wrote:
> 
> 1) I couldn't get the synchronized class version (as opposed to using
> synchronized statements in the functions) to run.  It would hang in odd ways.
>  This may be related to a bug I reported earlier (and Sean was helpful enough
> to fix!) so this may be moot.

'synchronized' as a class label may not be implemented in the compiler yet.  I'd stick to explicitly labeling methods are 'synchronized' for now.

> 2) Message passing is slow in my tests.  Often an order of magnitude or more slower than the fastest (lock-free queue).  I expect to pay some price for the convenience, etc. but that seems excessive.

The limiting factor at this point is the cost of copying the Message struct around during processing.  I've eliminated nearly all copies by passing by ref internally, but I believe an unnecessary copy or two may still remain.  I'll see about tuning this further.  Tuning the ctor and copy ops in Variant and Tuple would help as well, since nearly all the time spent is in those routines.  For what it's worth, it's fairly easy to time this by building with -profile and having the main thread send messages to itself (since -profile doesn't yet work in multithreaded apps).

> 4) I still don't totally understand shared.  It does what I expect when variables are static.  That's why all the queues are static objects.  But that doesn't scale so well (I know I could set up static factories for static objects but that seems like it shouldn't be necessary).  When I put an unshared variable in a non-static class and then use the class from multiple threads, the variable acts shared.  Is that a bug or a feature?

Maybe you're just lucky?  It's hard to reason about behavior without an example.
February 05, 2011
== Quote from Sean Kelly (sean@invisibleduck.org)'s article
> Adam Conner-Sax Wrote:
> >
> > 1) I couldn't get the synchronized class version (as opposed to using
> > synchronized statements in the functions) to run.  It would hang in odd ways.
> >  This may be related to a bug I reported earlier (and Sean was helpful enough
> > to fix!) so this may be moot.
> 'synchronized' as a class label may not be implemented in the compiler yet.  I'd
stick to explicitly labeling methods are 'synchronized' for now.

I couldn't get that to work either.  What does work is a "synchronized" block of code.  That seems potentially more efficient also.

> > 2) Message passing is slow in my tests.  Often an order of magnitude or more slower than the fastest (lock-free queue).  I expect to pay some price for the convenience, etc. but that seems excessive.
> The limiting factor at this point is the cost of copying the Message struct
around during processing.  I've eliminated nearly all copies by passing by ref internally, but I believe an unnecessary copy or two may still remain.  I'll see about tuning this further.  Tuning the ctor and copy ops in Variant and Tuple would help as well, since nearly all the time spent is in those routines.  For what it's worth, it's fairly easy to time this by building with -profile and having the main thread send messages to itself (since -profile doesn't yet work in multithreaded apps).

Right.  I've run into the multithreaded profiling issue.  What you're describing makes sense: message passing has a much higher minimum time (7-8 us) than any of the others (1-2 us).  That could be copying.  I had thought it was some sort of wakeup to the receiver.  The other methods just have a while loop waiting on new data rather than the blocking "receive" so I imagined there was some cost to waking up the receive thread.
> > 4) I still don't totally understand shared.  It does what I expect when variables are static.  That's why all the queues are static objects.  But that doesn't scale so well (I know I could set up static factories for static objects but that seems like it shouldn't be necessary).  When I put an unshared variable in a non-static class and then use the class from multiple threads, the variable acts shared.  Is that a bug or a feature?
> Maybe you're just lucky?  It's hard to reason about behavior without an example.

Maybe.  I'd rather it not work this way (sharing even though not marked shared). Then I could put the queues into non-static structures and get the shared and TLS the way I expect.  That would make using them from a spawned function a bit trickier but I think that could be handled.

Thanks for the thoughts.

Adam
February 05, 2011
On Friday 04 February 2011 16:09:08 Sean Kelly wrote:
> Adam Conner-Sax Wrote:
> > 1) I couldn't get the synchronized class version (as opposed to using
> > synchronized statements in the functions) to run.  It would hang in odd
> > ways.
> > 
> >  This may be related to a bug I reported earlier (and Sean was helpful
> >  enough
> > 
> > to fix!) so this may be moot.
> 
> 'synchronized' as a class label may not be implemented in the compiler yet.
>  I'd stick to explicitly labeling methods are 'synchronized' for now.

IIRC, according to TDPL, it's supposed to be the whole class or non if it, not a per-function thing. So, if that's not how it works at the moment, that it's another of the things that hasn't been fixed to match TDPL yet.

- Jonathan M Davis