Thread overview
Getting started with threads in D
Jun 17, 2012
Jonathan M Davis
Jun 17, 2012
Russel Winder
Jun 22, 2012
Sean Kelly
June 17, 2012
Hi again!

I have looked around a little with what D offers but don't know really what I should use since D offers several ways to use threads. Some more high level than others. Don't really also know which one would be suitable for me.

A little background could help. I am a game developer and during my semester I want to experiment with making games in D. I use threads to separate some tasks that can easily work in parallel with each other. The most common being a Logic/Graphics separation. But as development progresses I usually add more threads like inside graphics I can end up with 2 or 3 more threads.

I want to avoid Amdahl's law as much as possible and have as small synchronization nodes. The data exchange should be as basic as possible but still have room for improvements and future additions.

The Concurrency library looked very promising but felt like the synchronization wouldn't be that nice but it would provide a random-access to the data in your code. Correct me of course if I am wrong. Is there a good thread pool system that could be used? Does that system also handle solving dependencies in the work-flow? This is what we use at my work more or less.

In worst case scenario I will just use the basic thread class and implement my own system above that. Then there is the question, is there any pitfalls in the current library that I should be aware of?
June 17, 2012
On Sunday, June 17, 2012 03:15:44 Henrik Valter Vogelius Hansson wrote:
> Hi again!
> 
> I have looked around a little with what D offers but don't know really what I should use since D offers several ways to use threads. Some more high level than others. Don't really also know which one would be suitable for me.
> 
> A little background could help. I am a game developer and during my semester I want to experiment with making games in D. I use threads to separate some tasks that can easily work in parallel with each other. The most common being a Logic/Graphics separation. But as development progresses I usually add more threads like inside graphics I can end up with 2 or 3 more threads.
> 
> I want to avoid Amdahl's law as much as possible and have as small synchronization nodes. The data exchange should be as basic as possible but still have room for improvements and future additions.
> 
> The Concurrency library looked very promising but felt like the synchronization wouldn't be that nice but it would provide a random-access to the data in your code. Correct me of course if I am wrong. Is there a good thread pool system that could be used? Does that system also handle solving dependencies in the work-flow? This is what we use at my work more or less.
> 
> In worst case scenario I will just use the basic thread class and implement my own system above that. Then there is the question, is there any pitfalls in the current library that I should be aware of?

For starters, read this:

http://www.informit.com/articles/article.aspx?p=1609144

And look at these modules in the standard library:

http://dlang.org/phobos/std_concurrency.html http://dlang.org/phobos/std_parallelism.html

- Jonathan M Davis
June 17, 2012
On Sun, 2012-06-17 at 03:15 +0200, Henrik Valter Vogelius Hansson wrote:
> Hi again!
> 
> I have looked around a little with what D offers but don't know really what I should use since D offers several ways to use threads. Some more high level than others. Don't really also know which one would be suitable for me.

My take on this is that as soon as an applications programmer talks about using threads in their program, they have admitted they are working at the wrong level.  Applications programmers do not manage their control stacks, applications programmers do not manage their heaps, why on earth manage your threads. Threads are an implementation resource best managed by an abstraction.

Using processes and message passing (over a thread pool, as you are heading towards in comments below) has proven over the last 30+ years to be the only scalable way of managing parallelism, so use it as a concurrency technique as well and get parallelism as near as for free as it is possible to get.

Ancient models and techniques such as actors, dataflow, CSP, data parallelism are making a resurgence exactly because explicit shared memory multi-threading is an inappropriate technique. It has just taken the world 15+ years to appreciate this.

> A little background could help. I am a game developer and during my semester I want to experiment with making games in D. I use threads to separate some tasks that can easily work in parallel with each other. The most common being a Logic/Graphics separation. But as development progresses I usually add more threads like inside graphics I can end up with 2 or 3 more threads.

I can only repeat the above: don't think in terms of threads and shared memory, think in terms of processes and messages passed between them.

> I want to avoid Amdahl's law as much as possible and have as small synchronization nodes. The data exchange should be as basic as possible but still have room for improvements and future additions.

Isn't the current hypothesis that you can't avoid Amdahl's Law? If what you mean is that you want to ensure you have an embarrassingly parallel solution so that speed up is linear that seems entirely reasonable, but then D has a play in this game with the std.parallelism module.  It uses the term "task" rather than process or thread to try and enforce an algorithm-focused view. cf. http://dlang.org/phobos/std_parallelism.html

> The Concurrency library looked very promising but felt like the synchronization wouldn't be that nice but it would provide a random-access to the data in your code. Correct me of course if I am wrong. Is there a good thread pool system that could be used? Does that system also handle solving dependencies in the work-flow? This is what we use at my work more or less.

What makes you say synchronization is not that nice?

Random access, data, threads and parallelism in the same paragraph raises a red flag of warning!

std.concurrency is a realization of actors so there is effectively a variety of thread pool involved. std.parallelism has task pools explicitly.

> In worst case scenario I will just use the basic thread class and implement my own system above that. Then there is the question, is there any pitfalls in the current library that I should be aware of?

I am sure D's current offerings are not perfect but they do represent a good part of the right direction to be travelling.  What is missing is a module for dataflow processing(*) and one for CSP.  Sadly I haven't had time to get stuck into doing an implementation as I had originally planned 18 months or so ago: most of my time is now in the Python and Groovy arena as that is where the income comes from.  cf. GPars (http://gpars.codehaus.org) and Python-CSP – though the latter has stopped moving due to planning a whole new Python framework for concurrency and parallelism.


(*) People who talk about "you can implement dataflow with actors and
vice versa" miss the point about provision of appropriate abstractions
with appropriate performance characteristics.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


June 22, 2012
On Sunday, 17 June 2012 at 07:23:38 UTC, Russel Winder wrote:
> On Sun, 2012-06-17 at 03:15 +0200, Henrik Valter Vogelius Hansson wrote:
>> Hi again!
>> 
>> I have looked around a little with what D offers but don't know really what I should use since D offers several ways to use threads. Some more high level than others. Don't really also know which one would be suitable for me.
>
> My take on this is that as soon as an applications programmer talks
> about using threads in their program, they have admitted they are
> working at the wrong level.  Applications programmers do not manage
> their control stacks, applications programmers do not manage their
> heaps, why on earth manage your threads. Threads are an implementation
> resource best managed by an abstraction.
>
> Using processes and message passing (over a thread pool, as you are
> heading towards in comments below) has proven over the last 30+ years to
> be the only scalable way of managing parallelism, so use it as a
> concurrency technique as well and get parallelism as near as for free as
> it is possible to get.
>
> Ancient models and techniques such as actors, dataflow, CSP, data
> parallelism are making a resurgence exactly because explicit shared
> memory multi-threading is an inappropriate technique. It has just taken
> the world 15+ years to appreciate this.
>
>> A little background could help. I am a game developer and during my semester I want to experiment with making games in D. I use threads to separate some tasks that can easily work in parallel with each other. The most common being a Logic/Graphics separation. But as development progresses I usually add more threads like inside graphics I can end up with 2 or 3 more threads.
>
> I can only repeat the above: don't think in terms of threads and shared
> memory, think in terms of processes and messages passed between them.
>
>> I want to avoid Amdahl's law as much as possible and have as small synchronization nodes. The data exchange should be as basic as possible but still have room for improvements and future additions.
>
> Isn't the current hypothesis that you can't avoid Amdahl's Law? If what
> you mean is that you want to ensure you have an embarrassingly parallel
> solution so that speed up is linear that seems entirely reasonable, but
> then D has a play in this game with the std.parallelism module.
>  It uses
> the term "task" rather than process or thread to try and enforce an
> algorithm-focused view. cf. http://dlang.org/phobos/std_parallelism.html
>
>> The Concurrency library looked very promising but felt like the synchronization wouldn't be that nice but it would provide a random-access to the data in your code. Correct me of course if I am wrong. Is there a good thread pool system that could be used? Does that system also handle solving dependencies in the work-flow? This is what we use at my work more or less.
>
> What makes you say synchronization is not that nice?
>
> Random access, data, threads and parallelism in the same paragraph
> raises a red flag of warning!
>
> std.concurrency is a realization of actors so there is effectively a
> variety of thread pool involved. std.parallelism has task pools
> explicitly.
>
>> In worst case scenario I will just use the basic thread class and implement my own system above that. Then there is the question, is there any pitfalls in the current library that I should be aware of?
>
> I am sure D's current offerings are not perfect but they do represent a
> good part of the right direction to be travelling.  What is missing is a
> module for dataflow processing(*) and one for CSP.  Sadly I haven't had
> time to get stuck into doing an implementation as I had originally
> planned 18 months or so ago: most of my time is now in the Python and
> Groovy arena as that is where the income comes from.  cf. GPars
> (http://gpars.codehaus.org) and Python-CSP – though the latter has
> stopped moving due to planning a whole new Python framework for
> concurrency and parallelism.
>
>
> (*) People who talk about "you can implement dataflow with actors and
> vice versa" miss the point about provision of appropriate abstractions
> with appropriate performance characteristics.
> 

Aight been reading a lot now about it. I'm interested in the TaskPool but there is a problem and also why I have to think about threads. OpenGL/DirectX contexts are only valid for one thread at a time. And with the task pool I can't control what thread to be used with the specified task right? At least from what I could find I couldn't. So that's out of the question. The concurrency library is... I don't know. I most usually do a very fast synchronization swap(just swap two pointers) while the concurrency library seems like it would halt both threads for a longer time. Or am I viewing this from the wrong direction? Should I do it like lazy evaluation maybe? If you need code examples of what I am talking about I can give you that. Though I don't know the code-tag for this message board.

I will still use the task pool I think though all OpenGL calls will have to be routed so they are all done on the same thread somehow.

The message box for the threads in concurrency, are they thread safe? Let's say we have two logic tasks running in parallel and both are sending messages to the graphics thread. Would that result in undefined behavior or does the concurrency library handle this kind of scenario for you?
June 22, 2012
On Jun 22, 2012, at 11:17 AM, Henrik Valter Vogelius Hansson wrote:
> 
> Aight been reading a lot now about it. I'm interested in the TaskPool but there is a problem and also why I have to think about threads. OpenGL/DirectX contexts are only valid for one thread at a time. And with the task pool I can't control what thread to be used with the specified task right?

That's pretty much the entire point of a thread pool--it aims for optimal task completion time, and does this via an opaque scheduling mechanism.

> At least from what I could find I couldn't. So that's out of the question. The concurrency library is... I don't know. I most usually do a very fast synchronization swap(just swap two pointers) while the concurrency library seems like it would halt both threads for a longer time. Or am I viewing this from the wrong direction? Should I do it like lazy evaluation maybe? If you need code examples of what I am talking about I can give you that. Though I don't know the code-tag for this message board.

Games are an odd bird in that performance comes at the expense of much else, and that it really isn't easy to parallelize the main loop.  That said, the only time the concurrency library would halt a thread is if you do a receive() with no timeout and the message you want isn't in the queue.  So you can bypass this by using a timeout of 0 (basically a peek operation), and changing the code path based on whether the desired message was received.

> I will still use the task pool I think though all OpenGL calls will have to be routed so they are all done on the same thread somehow.

I think that will net you worse performance than if the main thread just did everything.  You still have synchronous execution but thread synchronization on top of that.  Can ownership of an OpenGL/DirectX contact be passed between threads?  Can you maybe just give every thread its own context and let it process whatever task you give to it, or is a context necessarily linked with some set of operations?

> The message box for the threads in concurrency, are they thread safe? Let's say we have two logic tasks running in parallel and both are sending messages to the graphics thread. Would that result in undefined behavior or does the concurrency library handle this kind of scenario for you?

Since it's a concurrency library, of course the API is thread safe :-)  Basically, how receive() works is it first looks in a thread-local queue for the desired message.  If one wasn't found it acquires a lock on that thread's shared message queue, moves the shared queue elements into the local queue, and releases the mutex.  Then it scans the new elements in the list for a match.  If it still doesn't find one, it re-acquires the mutex on the shared queue, and does the same thing.  If the shared queue is ever empty during this process, receive() will block on a condition variable up to the supplied timeout value.

The only performance issue with the concurrency API right now is that it allocates a struct to wrap each sent message, so there is some GC load.  I experimented with using a shared free list instead however, and it didn't really help performance in my test cases.  I suspect I'd either have to go to a lock-free free list, or something other fairly fancy approach.  Beyond that, I've experimented with using ref and not using ref attributes for parameters everywhere applicable, etc.  The current implementation is as fast as I could get things.

For future directions, I really want to add inter-process messaging.  That means serialization support and a scalable socket implementation though.  Not to mention free time.  I've considered just hacking together the implementation and limiting inter-process messages to concrete variables as a proof of concept.  That would need just free time.
June 22, 2012
> Games are an odd bird in that performance comes at the expense of much else, and that it really isn't easy to parallelize the main loop.  That said, the only time the concurrency library would halt a thread is if you do a receive() with no timeout and the message you want isn't in the queue.  So you can bypass this by using a timeout of 0 (basically a peek operation), and changing the code path based on whether the desired message was received.

Well it also depends on how you do the receive. Though right now I am thinking of like a lazy evaluation, so I only try to receive the messages(with timeout) where I expect to use them instead of doing it all on the same place. And the same goes on the other end. Well might be over thinking it cause it's starting to sound more and more like how I used to work before I tried task pools. And I guess it won't be added in the near future so you can specify thread id's to the task? Like all OpenGL-related tasks get a specific thread while all other's doesn't matter.

> Can ownership of an OpenGL/DirectX contact be passed between threads?  Can you maybe just give every thread its own context and let it process whatever task you give to it, or is a context necessarily linked with some set of operations?

Ownership can not be passed between threads. And giving every thread it's own context is possible but is bothersome because for instance the different context would have different states. (Backface culling, depth settings, and so on) Plus it would be pretty slow because I would have to call glFlush or similar to force the drivers to make sure all texture data has been updated to all threads and so on. Most of these problems with context and threads I have learned through the hard way :P

If you have a opinion in what you think would be the best way to do this then I am interested, even if it is single threading it. But I want a motivation of course. Otherwise I'll just go with the concurrency library and lazy evaluation idea. I'll probably profile a little and do a consideration over what is easiest to work with and expand on later as well.