January 12, 2010
Robert Jacques wrote:
> On Tue, 12 Jan 2010 17:27:09 -0500, Andrei Alexandrescu <andrei at erdani.com> wrote:
>> Sean Kelly wrote:
>>> There are a bunch of things still missing from the current API that will need to be addressed at some point though, like linking threads (so one will be notified if the other exits), a Tid registery, and probably a bunch of other things I'm forgetting offhand.  These aren't necessary for a first cut, but they'll have to be added soon after.
>>
>> Yah, please do mention everything you remember. I want to describe a reasonably complete API.
>>
>> Andrei
>> _______________________________________________
>> dmd-concurrency mailing list
>> dmd-concurrency at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
> 
> On that note, will we be able to create message boxes/queues that are separate from threads?

Good question. I had been thinking about it; in particular, that means the built-in mailbox is just a hidden global. I haven't been able to think of compelling scenarios though, and Erlang doesn't seem to need them.

Andrei

January 12, 2010
On Tue, 12 Jan 2010 20:55:09 -0500, Andrei Alexandrescu <andrei at erdani.com> wrote:
> Robert Jacques wrote:
>> On Tue, 12 Jan 2010 17:27:09 -0500, Andrei Alexandrescu <andrei at erdani.com> wrote:
>>> Sean Kelly wrote:
>>>> There are a bunch of things still missing from the current API that will need to be addressed at some point though, like linking threads (so one will be notified if the other exits), a Tid registery, and probably a bunch of other things I'm forgetting offhand.  These aren't necessary for a first cut, but they'll have to be added soon after.
>>>
>>> Yah, please do mention everything you remember. I want to describe a reasonably complete API.
>>>
>>> Andrei
>>> _______________________________________________
>>> dmd-concurrency mailing list
>>> dmd-concurrency at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>>  On that note, will we be able to create message boxes/queues that are
>> separate from threads?
>
> Good question. I had been thinking about it; in particular, that means the built-in mailbox is just a hidden global. I haven't been able to think of compelling scenarios though, and Erlang doesn't seem to need them.
>
> Andrei

In think that a lot of of the need for this in Erlang is satisfied by having ultra-lightweight processes and its particular language niche. Erlang processes often have the same 'heft' as classes do in other languages. So, in terms of granularity, Erlang supports one mailbox per class, which is a lot finer than one per thread. As for use cases, one is CSP. Or one mailbox per client session on a server. Or one mailbox per GUI element. Or a master thread having one mailbox per worker. In the end, it's all about encapsulation, composition and compatibility. Yes, I can build all this functionality on top of send/receive, but then I end up with a giant muxing/de-muxing switch statement. This breaks encapsulation and makes it harder to maintain code. But worse, it prevents me from mixing two different libraries together, since they both use their own message formats, etc. With separate mailboxes, I can give each library their own communication channel and avoid those issues. Also, since the API is supposed to allow the free interchange of thread mailboxes with network mailboxes, I assume non-thread-base mailboxes would be fairly simple to include.

With regard to CSP, although there are some high-level things which can be left for after TDPL and built on top of other primitives, some things, like mailbox state (closed, last message, poisoned, blackhole) and alternatives/selects seem like very useful low level primitives which could be included efficiently.



January 13, 2010
On Tue, Jan 12, 2010 at 3:12 PM, Steve Schveighoffer <schveiguy at yahoo.com>wrote:

> The other possibility is to only allow so many messages to be queued before the queue is "full".  Unsure what you want to do at that point though, either block or return an EAGAIN type error (maybe you can do both depending on how you want to send the message).  This solution is at least more likely to show logic errors, since with the first solution you may forget to close the handle.  But it also could cause deadlocks or force you to write handling code for full queue conditions.
>
> How does erlang handle this?
>
> -Steve
>
>
Not sure what erlang does, but...

I've tried to implement request / response on top of a system that happens to use blocking queues.  Come to discover, that this pattern results in deadlock.  A 'server' actor can fill up with messages on its input queue and its output queue.  If any of the actors writing requests into the server are also blocked waiting for responses, you can get a deadlock where A is writing a response but B can't get it because as part of processing another new incoming request, it has blocked on writing another request to A.

I think blocking is kind of like reference counted memory -- you don't want to do it unless there is a way to break the cycles in the graph.  In this case that would mean a few queues that are not capable of blocking (they would be the 'ever expanding' kind), in the right places.  (I'm not sure if this is enough.  Even if it is enough in theory it might not be enough in practice as the system might grind to a halt if things expand too much.)

An idea that I think is likely to work better is to prioritize actors reading from large input queues over actors reading from small ones, or else prioritize readers from large queues over writers to them.  Depending on circularity in the design, it might be possible to prioritize "later" processes over "earlier" ones.  None of this is fool proof of course.

Suppose you have three processes, A, B, and C, that process each piece of data in that order, and C is slower than the others, and you have at least four CPUs working on the problem.

My hunch is that you will not be able to avoid an ever expanding queue between B and C, or else wasted CPU time, no matter what you do at the framework level.  If there is circularity as well, you may get an ever expanding queue, wasted CPU, or deadlock.  If there is enough circularity and/or general complexity in the overall message passing 'graph' you may not even be able to reason about what is going to happen.

You could send messages "backward" along the chain saying "please stop until I tell you otherwise".  That approach could be used to resist this problem. (NOTE: If you have these meta-messages, be sure to check for them before checking for more incoming elephants that need to be turned into hot dogs.)

--> This assumes you can tell how full your own mailbox is!

And one downside of this application-controlled volume throttling is that the system can probably no longer reason about deadlocks at all.  Since the messages going "backward" are always part of a two-way flow, there is no way for the system to automatically tell which end of the animal is the head and which is the tail.

This suggests that different policies might be available for each queue... (does anyone have experience with a system that has such configurable queues?)

My advice in general is that this kind of message passing system can't be completely encapsulated.  There are failure states, the application designer *needs* to be able to measure the system to gauge whether you are about to stumble into one, and in general, designing the message passing control and recovery behavior is part of designing your system.

And it is probably one of the next big "it's hard to get right" topics for concurrent programming.

Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100113/8f53c912/attachment.htm>
January 13, 2010
On Jan 12, 2010, at 8:49 PM, Robert Jacques wrote:

> On Tue, 12 Jan 2010 20:55:09 -0500, Andrei Alexandrescu <andrei at erdani.com> wrote:
>> 
>> 
>> Good question. I had been thinking about it; in particular, that means the built-in mailbox is just a hidden global. I haven't been able to think of compelling scenarios though, and Erlang doesn't seem to need them.
> 
> In think that a lot of of the need for this in Erlang is satisfied by having ultra-lightweight processes and its particular language niche. Erlang processes often have the same 'heft' as classes do in other languages. So, in terms of granularity, Erlang supports one mailbox per class, which is a lot finer than one per thread. As for use cases, one is CSP. Or one mailbox per client session on a server. Or one mailbox per GUI element. Or a master thread having one mailbox per worker. In the end, it's all about encapsulation, composition and compatibility. Yes, I can build all this functionality on top of send/receive, but then I end up with a giant muxing/de-muxing switch statement. This breaks encapsulation and makes it harder to maintain code. But worse, it prevents me from mixing two different libraries together, since they both use their own message formats, etc. With separate mailboxes, I can give each library their own communication channel and avoid those issues. Also, since the API is supposed to allow the free interchange of thread mailboxes with network mailboxes, I assume non-thread-base mailboxes would be fairly simple to include.

All good points.  Sounds like we should at least provide the option to create ad-hoc mailboxes.  The only weird bit about multiple mailboxes per thread is that if one does a blocking wait for a message it could suspend the entire thread.  I guess this is even more reason to consider using fibers as the real return type of spawn(), though I guess GUI programs might not use spawn() at all?

> With regard to CSP, although there are some high-level things which can be left for after TDPL and built on top of other primitives, some things, like mailbox state (closed, last message, poisoned, blackhole) and alternatives/selects seem like very useful low level primitives which could be included efficiently.

I definitely want it to be possible for CSP to be layered on top of this package, so if there's something we need to add for this to happen, we will.  Linking of threads, for example, will definitely happen.  It may be that some things aren't practical to integrate directly though, and would have to be handled at the protocol level by such a package.  It's been a while since I looked at CSP in detail, so if you have any suggestions here they'd be very welcome.
January 13, 2010
On Jan 13, 2010, at 12:48 AM, Kevin Bealer wrote:
> 
> I've tried to implement request / response on top of a system that happens to use blocking queues.  Come to discover, that this pattern results in deadlock.  A 'server' actor can fill up with messages on its input queue and its output queue.  If any of the actors writing requests into the server are also blocked waiting for responses, you can get a deadlock where A is writing a response but B can't get it because as part of processing another new incoming request, it has blocked on writing another request to A.

We'll have a timeout feature that will cause a blocking receive to abort after a set interval.  This is necessary to allow other work to be done by a thread anyway.

> I think blocking is kind of like reference counted memory -- you don't want to do it unless there is a way to break the cycles in the graph.  In this case that would mean a few queues that are not capable of blocking (they would be the 'ever expanding' kind), in the right places.  (I'm not sure if this is enough.  Even if it is enough in theory it might not be enough in practice as the system might grind to a halt if things expand too much.)
> 
> An idea that I think is likely to work better is to prioritize actors reading from large input queues over actors reading from small ones, or else prioritize readers from large queues over writers to them.  Depending on circularity in the design, it might be possible to prioritize "later" processes over "earlier" ones.  None of this is fool proof of course.

Good point.  And for people who really care about deadlocks, the CSP algebra (and others) makes it possible to mathematically prove that a messaging network is deadlock-free. I'm not 100% positive that this algebra works for an asynchronous model that spans a network (CSP does mean "communicating /sequential/ processes"), but last I checked there was work being done in this area.

> Suppose you have three processes, A, B, and C, that process each piece of data in that order, and C is slower than the others, and you have at least four CPUs working on the problem.
> 
> My hunch is that you will not be able to avoid an ever expanding queue between B and C, or else wasted CPU time, no matter what you do at the framework level.  If there is circularity as well, you may get an ever expanding queue, wasted CPU, or deadlock.  If there is enough circularity and/or general complexity in the overall message passing 'graph' you may not even be able to reason about what is going to happen.
> 
> You could send messages "backward" along the chain saying "please stop until I tell you otherwise".  That approach could be used to resist this problem.  (NOTE: If you have these meta-messages, be sure to check for them before checking for more incoming elephants that need to be turned into hot dogs.)

Yeah, one option is to handle this at the protocol level.  I agree that we'll likely need some options for mailbox maintenance as well though.  It's easy enough to clear out a queue by looping on receive((Variant) {...}), but you're right that it may be nice to suspend a queue, etc, in some instances.  This sort of thing is often necessary in network server programming, so there's some justification for it in practice as well.  My primary interest now is finding a solid set of primitives that can be built upon as need dictates.  It's easy enough to add a feature later, but having to change one just stinks.

> --> This assumes you can tell how full your own mailbox is!

Hm... I guess it may be worth looking at MQueue packages for tips as well.  We're mostly building a transport mechanism here, but there obviously may be a few features we could add to simplify implementing things on top of it.  I'll need to check Erlang again as well to see how they handle this.

> And one downside of this application-controlled volume throttling is that the system can probably no longer reason about deadlocks at all.  Since the messages going "backward" are always part of a two-way flow, there is no way for the system to automatically tell which end of the animal is the head and which is the tail.
> 
> This suggests that different policies might be available for each queue... (does anyone have experience with a system that has such configurable queues?)

Not I, though MQueue would be someplace to look.  I don't want to get too fancy about queue implementation though, if we can avoid it.  One obvious issue with throwing away arbitrary messages is that it becomes impossible to reason about the behavior of the system.  But since this would be under user control, I guess that's their choice and their problem :-p

> My advice in general is that this kind of message passing system can't be completely encapsulated.  There are failure states, the application designer *needs* to be able to measure the system to gauge whether you are about to stumble into one, and in general, designing the message passing control and recovery behavior is part of designing your system.
> 
> And it is probably one of the next big "it's hard to get right" topics for concurrent programming.

You're probably right.
January 13, 2010
Le 2010-01-13 ? 10:13, Sean Kelly a ?crit :
> On Jan 12, 2010, at 8:49 PM, Robert Jacques wrote:
>> In think that a lot of of the need for this in Erlang is satisfied by having ultra-lightweight processes and its particular language niche. Erlang processes often have the same 'heft' as classes do in other languages. So, in terms of granularity, Erlang supports one mailbox per class, which is a lot finer than one per thread. As for use cases, one is CSP. Or one mailbox per client session on a server. Or one mailbox per GUI element. Or a master thread having one mailbox per worker. In the end, it's all about encapsulation, composition and compatibility. Yes, I can build all this functionality on top of send/receive, but then I end up with a giant muxing/de-muxing switch statement. This breaks encapsulation and makes it harder to maintain code. But worse, it prevents me from mixing two different libraries together, since they both use their own message formats, etc. With separate mailboxes, I can give each library their own communication channel and avoid those issues. Also!
>> , since the API is supposed to allow the free interchange of thread mailboxes with network mailboxes, I assume non-thread-base mailboxes would be fairly simple to include.
> 
> All good points.  Sounds like we should at least provide the option to create ad-hoc mailboxes.  The only weird bit about multiple mailboxes per thread is that if one does a blocking wait for a message it could suspend the entire thread.

Why is it a problem? There's already many blocking calls you can make in any thread: you could open a file, make an HTTP request over the network, read from standard input or wait for a network packet. How is all this different from waiting on a message box? Should we prevent all of this from blocking a thread?


> I guess this is even more reason to consider using fibers as the real return type of spawn(), though I guess GUI programs might not use spawn() at all?

How would that work?

I'm not sure why GUI programs would not use spawn to create threads, but surely GUI programs shouldn't use receive(), otherwise it'd block the GUI event loop.

GUI frameworks already have their own message passing system (for events), and their own event loops. Could there be a way to attach a customized "passer" function which would make the sending thread pass messages in a special way? This could allow messages to be sent to GUI threads by wrapping them in a GUI event. It'd simplify things from the worker thread point of view, which wouldn't have to mess with GUI stuff.


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



January 13, 2010
Michel Fortin wrote:
> Le 2010-01-13 ? 10:13, Sean Kelly a ?crit :
>> On Jan 12, 2010, at 8:49 PM, Robert Jacques wrote:
>>> In think that a lot of of the need for this in Erlang is satisfied by having ultra-lightweight processes and its particular language niche. Erlang processes often have the same 'heft' as classes do in other languages. So, in terms of granularity, Erlang supports one mailbox per class, which is a lot finer than one per thread. As for use cases, one is CSP. Or one mailbox per client session on a server. Or one mailbox per GUI element. Or a master thread having one mailbox per worker. In the end, it's all about encapsulation, composition and compatibility. Yes, I can build all this functionality on top of send/receive, but then I end up with a giant muxing/de-muxing switch statement. This breaks encapsulation and makes it harder to maintain code. But worse, it prevents me from mixing two different libraries together, since they both use their own message formats, etc. With separate mailboxes, I can give each library their own communication channel and avoid those issues. Als
o!
>>> , since the API is supposed to allow the free interchange of thread mailboxes with network mailboxes, I assume non-thread-base mailboxes would be fairly simple to include.
>> All good points.  Sounds like we should at least provide the option to create ad-hoc mailboxes.  The only weird bit about multiple mailboxes per thread is that if one does a blocking wait for a message it could suspend the entire thread.
> 
> Why is it a problem? There's already many blocking calls you can make in any thread: you could open a file, make an HTTP request over the network, read from standard input or wait for a network packet. How is all this different from waiting on a message box? Should we prevent all of this from blocking a thread?

The difference is that waiting on the wrong mailbox could block you forever.

>> I guess this is even more reason to consider using fibers as the real return type of spawn(), though I guess GUI programs might not use spawn() at all?
> 
> How would that work?
> 
> I'm not sure why GUI programs would not use spawn to create threads, but surely GUI programs shouldn't use receive(), otherwise it'd block the GUI event loop.
> 
> GUI frameworks already have their own message passing system (for events), and their own event loops. Could there be a way to attach a customized "passer" function which would make the sending thread pass messages in a special way? This could allow messages to be sent to GUI threads by wrapping them in a GUI event. It'd simplify things from the worker thread point of view, which wouldn't have to mess with GUI stuff.

A GUI might have a different thread that calls receive() in a loop, converts messages to GUI events, and PostMessage() them to the GUI.

I wouldn't want to complicate things in not-experimented ways. If Erlang didn't need "passers" in 25 years, I guess there wasn't an overwhelming need for them.


Andrei
January 13, 2010
On Wed, 13 Jan 2010 12:41:32 -0500, Andrei Alexandrescu <andrei at erdani.com> wrote:
>>> I guess this is even more reason to consider using fibers as the real return type of spawn(), though I guess GUI programs might not use spawn() at all?
>>  How would that work?
>>  I'm not sure why GUI programs would not use spawn to create threads,
>> but surely GUI programs shouldn't use receive(), otherwise it'd block
>> the GUI event loop.
>>  GUI frameworks already have their own message passing system (for
>> events), and their own event loops. Could there be a way to attach a
>> customized "passer" function which would make the sending thread pass
>> messages in a special way? This could allow messages to be sent to GUI
>> threads by wrapping them in a GUI event. It'd simplify things from the
>> worker thread point of view, which wouldn't have to mess with GUI stuff.
>
> A GUI might have a different thread that calls receive() in a loop, converts messages to GUI events, and PostMessage() them to the GUI.
>
> I wouldn't want to complicate things in not-experimented ways. If Erlang didn't need "passers" in 25 years, I guess there wasn't an overwhelming need for them.
>
>
> Andrei

Erlang uses "passers" all the time. The difference is spinning off an Erlang process is no big deal, so you don't think about it. Spinning off an OS thread, on the other hand, is a big deal.

I was also thinking about OS/windowing implementors (i.e. Plan9) as D is a systems programming language. I think being able to write a windowing system is a major use case for D's concurrency system. Yes, it probably won't happen, but it will make sure we cover all our bases.
January 13, 2010
Robert Jacques wrote:
> Erlang uses "passers" all the time. The difference is spinning off an Erlang process is no big deal, so you don't think about it. Spinning off an OS thread, on the other hand, is a big deal.

Point taken. BTW we should reserve the possibility to implement abstract threads in D as well. I wonder how Erlang achieves its good performance numbers.

At a minimum, I was thinking of a respawn() function that would work like spawn() but may revive an existing thread or create a new one.

> I was also thinking about OS/windowing implementors (i.e. Plan9) as D is a systems programming language. I think being able to write a windowing system is a major use case for D's concurrency system. Yes, it probably won't happen, but it will make sure we cover all our bases.

I agree.


Andrei
January 13, 2010
On Jan 13, 2010, at 9:07 AM, Michel Fortin wrote:

> Le 2010-01-13 ? 10:13, Sean Kelly a ?crit :
>> 
>> 
>> All good points.  Sounds like we should at least provide the option to create ad-hoc mailboxes.  The only weird bit about multiple mailboxes per thread is that if one does a blocking wait for a message it could suspend the entire thread.
> 
> Why is it a problem? There's already many blocking calls you can make in any thread: you could open a file, make an HTTP request over the network, read from standard input or wait for a network packet. How is all this different from waiting on a message box? Should we prevent all of this from blocking a thread?

Fair enough.

>> I guess this is even more reason to consider using fibers as the real return type of spawn(), though I guess GUI programs might not use spawn() at all?
> 
> How would that work?

Multiplex fibers on kernel threads and yield inside the receive call.

> I'm not sure why GUI programs would not use spawn to create threads, but surely GUI programs shouldn't use receive(), otherwise it'd block the GUI event loop.

Oops, you're right.

> GUI frameworks already have their own message passing system (for events), and their own event loops. Could there be a way to attach a customized "passer" function which would make the sending thread pass messages in a special way? This could allow messages to be sent to GUI threads by wrapping them in a GUI event. It'd simplify things from the worker thread point of view, which wouldn't have to mess with GUI stuff.

For GUI events, it's more likely that you'd have a thread waiting for those messages and then repackaging and resending them using the D model.  I've thought about trying to have some way to use a custom encoding layer, but haven't come up with a good solution yet.