std.parallelism changes done (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.parallelism changes done (page 4)

March 25, 2011

Re: std.parallelism changes done

Posted by dsimcha
in reply to Sönke Ludwig

dsimcha

Posted in reply to Sönke Ludwig

On 3/24/2011 9:05 PM, Sönke Ludwig wrote:
>> This may not be an issue in the std.parallelism design. A TaskPool task
>> can safely wait on other tasks. What prevents this from causing a
>> deadlock is that calling yieldForce, spinForce, or waitForce on a task
>> that has not started executing yet will execute the task immediately in
>> the thread that tried to force it, regardless of where it is in the
>> queue.
>
> Indeed this pattern solves the problem to wait for the completion of a
> specific task. It also avoids a huge potential of deadlocks that a
> general yield() that does not take a task would have. However, it will
> not solve the general problem of one task waiting for another, which
> could be in terms of a condition variable or just a mutex that is used
> in the middle of the task execution.
>

Can you elaborate and/or provide an example of the "general" problem? I'm not quite sure what you're getting at.

>> Can you elaborate on this? The whole point of executeInNewThread() was
>> supposed to be that a TaskPool is not needed for simple cases.
>
> Well OK if that is the only purpose to provide a shortcut for (new
> Thread(&fun)).start then my suggestion may not make too much sense.
> However, I have some doubts that the advantage to have this shortcut
> here justifies the functional duplication of core.thread. Is there some
> specific use case where you would use a Task object but not a ThreadPool?

executeInNewThread() is useful where you only have a few Tasks, rather than needing to map a large number of Tasks onto a small number of threads.  Using core.thread here doesn't cut it, because core.thread doesn't automate passing the function's arguments to the new thread. Also, I figured out that some of the conditions for @safe tasks can be relaxed a little if they run in a dedicated thread instead of a TaskPool.

>
> But what I wanted to say is, even if it may be difficult to implement
> such thread caching now, putting means to execute a Task in its own
> thread now into the ThreadPool allows for such an optimization later (it
> could even exist while still keeping Task.executeInNewThread()).

I can't really comment because I still don't understand this very well.

March 25, 2011

Re: std.parallelism changes done

Posted by dsimcha
in reply to Sönke Ludwig

dsimcha

Posted in reply to Sönke Ludwig

On 3/24/2011 9:15 PM, Sönke Ludwig wrote:
> Am 24.03.2011 13:03, schrieb Michel Fortin:
>> On 2011-03-24 03:00:01 -0400, Sönke Ludwig
>> <ludwig@informatik.uni-luebeck.de> said:
>>
>>> Am 24.03.2011 05:32, schrieb dsimcha:
>>>> In addition to improving the documentation, I added
>>>> Task.executeInNewThread() to allow Task to be useful without a
>>>> TaskPool.
>>>> (Should this have a less verbose name?)
>>>
>>> The threading system I designed for the company I work for uses
>>> priority per task to control which tasks can overtake others. A
>>> special priority is out-of-bands (the name my be debatable), which
>>> will guarantee that the task will run in its own thread so it can
>>> safely wait for other tasks. However, those threads that process OOB
>>> tasks are also cached in the thread pool and reused for new OOB tasks.
>>> Only if the number of parallel OOB tasks goes over a specific number,
>>> new threads will be created and destroyed. This can safe quite a bit
>>> of time for those tasks.
>>>
>>> Both kinds of priority have been very useful and I would suggest to
>>> put at least the executeInNewThread() method into ThreadPool to be
>>> later able to make such an optimization.
>>>
>>> The task priority thing in general may only be necessary for complex
>>> applications with user interaction, where you have to statisfy certain
>>> interactivity needs. I wouldn't be too sad if this is not implemented
>>> now, but it would be good to keep it in mind as a possible improvement
>>> for later.
>>
>> Do you think having multiple task pools each with a different thread
>> priority would do the trick? Simply put tasks in the task pool with the
>> right priority... I had a similar use case in mind and this is what I
>> proposed in the previous discussion.
>>
>
> Yes, that may be actually enough because although you would normally
> want to avoid the overhead of the additional threads running in
> parallel, in the scenarios I have in mind you always have unrelated
> things in different priority classes. An for these different tasks it
> should only be an exception to run in parallel (otherwise using
> priorities would be strange in the first place).
>
> The only thing that is a bit of a pity is that now you have to manage
> multiple thread pools instead of simply using the one singleton instance
> in the whole application. And this could really cause some headaches if
> you have a lot of different types of workload that may all have
> different priorities but also may have the same - you would somehow have
> to share several thread pools across those types of workload.
>
> (type of workload = copying files, computing a preview images, computing
> some physics calcs etc)

My main concern here is that these kinds of use cases are getting far beyond the scope of std.parallelism.  By definition (at least as I understand it) parallelism is focused on throughput, not responsiveness/latency and is about utilizing as many execution resources as possible for useful work.  (This article, originally posted here by Andrei, describes the distinction nicely: http://existentialtype.wordpress.com/2011/03/17/parallelism-is-not-concurrency/)  If you're implementing parallelism, then it is correct to only use one thread on a single-core machine (std.parallelism does this by default), since one thread will utilize all execution resources.  If you're implementing concurrency, this is not correct.  Concurrency is used to implement parallelism, but that's different from saying concurrency _is_ parallelism.

When you start talking about application responsiveness, prioritization, etc., you're getting beyond _parallelism_ and into general-case concurrency.  I have neither the expertise nor the desire to build a general case concurrency library. D already has a general case concurrency library (std.concurrency), and this might be a better place to implement suggestions dealing with general-case concurrency.

std.parallelism was designed from the ground up to focus on parallelism, not general-case concurrency.  I don't mind implementing features useful to general-case concurrency if they're trivial in both interface and implementation, but I'd rather not do any that require major changes to the interface or implementation.

March 25, 2011

Re: std.parallelism changes done

Posted by Sönke Ludwig
in reply to dsimcha

Sönke Ludwig

Posted in reply to dsimcha

Am 25.03.2011 02:17, schrieb dsimcha:
> On 3/24/2011 9:05 PM, Sönke Ludwig wrote:
>>> This may not be an issue in the std.parallelism design. A TaskPool task
>>> can safely wait on other tasks. What prevents this from causing a
>>> deadlock is that calling yieldForce, spinForce, or waitForce on a task
>>> that has not started executing yet will execute the task immediately in
>>> the thread that tried to force it, regardless of where it is in the
>>> queue.
>>
>> Indeed this pattern solves the problem to wait for the completion of a
>> specific task. It also avoids a huge potential of deadlocks that a
>> general yield() that does not take a task would have. However, it will
>> not solve the general problem of one task waiting for another, which
>> could be in terms of a condition variable or just a mutex that is used
>> in the middle of the task execution.
>>
>
> Can you elaborate and/or provide an example of the "general" problem?
> I'm not quite sure what you're getting at.

I have one very specific constellation that I can only sketch.. Suppose you have some kind of complex computation going on in the ThreadPool. This computation is done by a large set of tasks where each tasks depends on the result of one or more other tasks. One task is responsible for coordinating the work - it is spawning tasks and waiting for their completion to spawn new tasks for which the results are now available.

First thing here is that you do not want to do the waitForce() kind of waiting in the coordinator task because this might cause the coordinator to be busy with an expensive taks while it could already spawn new tasks because maybe in the meantime some other tasks have already finished.

However, if you wait for a condition variable instead (which is fired after each finished task) and if you can have multiple computations of this kind running in parallel, you can immediately run into the situation that the thread pool is crowded with coordinator tasks that are all waiting for their condition variables which will never be triggered because no worker tasks can be executed.

This is only one example and basically this problem can arise in all cases where one task depends on another task by some form of waiting that will not execute the dependency like waitForce() does.

I also have a completely different example invloving the main thread (doing GPU work) which is much more diffucult, but I don't think I can make that clear with only text.

>
>>> Can you elaborate on this? The whole point of executeInNewThread() was
>>> supposed to be that a TaskPool is not needed for simple cases.
>>
>> Well OK if that is the only purpose to provide a shortcut for (new
>> Thread(&fun)).start then my suggestion may not make too much sense.
>> However, I have some doubts that the advantage to have this shortcut
>> here justifies the functional duplication of core.thread. Is there some
>> specific use case where you would use a Task object but not a ThreadPool?
>
> executeInNewThread() is useful where you only have a few Tasks, rather
> than needing to map a large number of Tasks onto a small number of
> threads. Using core.thread here doesn't cut it, because core.thread
> doesn't automate passing the function's arguments to the new thread.
> Also, I figured out that some of the conditions for @safe tasks can be
> relaxed a little if they run in a dedicated thread instead of a TaskPool.

I see.

>
>>
>> But what I wanted to say is, even if it may be difficult to implement
>> such thread caching now, putting means to execute a Task in its own
>> thread now into the ThreadPool allows for such an optimization later (it
>> could even exist while still keeping Task.executeInNewThread()).
>
> I can't really comment because I still don't understand this very well.

I hope I could make it a little more clear what I mean. The problem is just that the system I'm talking about is quite complex and it's not easy to find good and simple examples in that system. The problems of course arise only in the most complex pathes of execution..

What I'm not sure about is if executeInNewThread() is supposed to be useful just because it is somtimes nice to have the fine-grained parallelism of the OS scheduler as opposed to task granilarity, or if the advantage is supposed to be efficiency gained because the thread pool is not created. In the latter case the caching of some threads to be reused for a executeInOwnThread()-method should lead to a better performance in almost any case where thread creation overhead is relevant.

.. OK my writing skills are degrading rapidly as I have to fight against my closing eyes.. will go to sleep now

March 25, 2011

Re: std.parallelism changes done

Posted by Sönke Ludwig
in reply to dsimcha

Sönke Ludwig

Posted in reply to dsimcha

Am 25.03.2011 02:51, schrieb dsimcha:
> On 3/24/2011 9:15 PM, Sönke Ludwig wrote:
>> Am 24.03.2011 13:03, schrieb Michel Fortin:
>>> On 2011-03-24 03:00:01 -0400, Sönke Ludwig
>>> <ludwig@informatik.uni-luebeck.de> said:
>>>
>>>> Am 24.03.2011 05:32, schrieb dsimcha:
>>>>> In addition to improving the documentation, I added
>>>>> Task.executeInNewThread() to allow Task to be useful without a
>>>>> TaskPool.
>>>>> (Should this have a less verbose name?)
>>>>
>>>> The threading system I designed for the company I work for uses
>>>> priority per task to control which tasks can overtake others. A
>>>> special priority is out-of-bands (the name my be debatable), which
>>>> will guarantee that the task will run in its own thread so it can
>>>> safely wait for other tasks. However, those threads that process OOB
>>>> tasks are also cached in the thread pool and reused for new OOB tasks.
>>>> Only if the number of parallel OOB tasks goes over a specific number,
>>>> new threads will be created and destroyed. This can safe quite a bit
>>>> of time for those tasks.
>>>>
>>>> Both kinds of priority have been very useful and I would suggest to
>>>> put at least the executeInNewThread() method into ThreadPool to be
>>>> later able to make such an optimization.
>>>>
>>>> The task priority thing in general may only be necessary for complex
>>>> applications with user interaction, where you have to statisfy certain
>>>> interactivity needs. I wouldn't be too sad if this is not implemented
>>>> now, but it would be good to keep it in mind as a possible improvement
>>>> for later.
>>>
>>> Do you think having multiple task pools each with a different thread
>>> priority would do the trick? Simply put tasks in the task pool with the
>>> right priority... I had a similar use case in mind and this is what I
>>> proposed in the previous discussion.
>>>
>>
>> Yes, that may be actually enough because although you would normally
>> want to avoid the overhead of the additional threads running in
>> parallel, in the scenarios I have in mind you always have unrelated
>> things in different priority classes. An for these different tasks it
>> should only be an exception to run in parallel (otherwise using
>> priorities would be strange in the first place).
>>
>> The only thing that is a bit of a pity is that now you have to manage
>> multiple thread pools instead of simply using the one singleton instance
>> in the whole application. And this could really cause some headaches if
>> you have a lot of different types of workload that may all have
>> different priorities but also may have the same - you would somehow have
>> to share several thread pools across those types of workload.
>>
>> (type of workload = copying files, computing a preview images, computing
>> some physics calcs etc)
>
> My main concern here is that these kinds of use cases are getting far
> beyond the scope of std.parallelism. By definition (at least as I
> understand it) parallelism is focused on throughput, not
> responsiveness/latency and is about utilizing as many execution
> resources as possible for useful work. (This article, originally posted
> here by Andrei, describes the distinction nicely:
> http://existentialtype.wordpress.com/2011/03/17/parallelism-is-not-concurrency/)
> If you're implementing parallelism, then it is correct to only use one
> thread on a single-core machine (std.parallelism does this by default),
> since one thread will utilize all execution resources. If you're
> implementing concurrency, this is not correct. Concurrency is used to
> implement parallelism, but that's different from saying concurrency _is_
> parallelism.
>
> When you start talking about application responsiveness, prioritization,
> etc., you're getting beyond _parallelism_ and into general-case
> concurrency. I have neither the expertise nor the desire to build a
> general case concurrency library. D already has a general case
> concurrency library (std.concurrency), and this might be a better place
> to implement suggestions dealing with general-case concurrency.
>
> std.parallelism was designed from the ground up to focus on parallelism,
> not general-case concurrency. I don't mind implementing features useful
> to general-case concurrency if they're trivial in both interface and
> implementation, but I'd rather not do any that require major changes to
> the interface or implementation.
>

Well what can I say.. things can become more complex and you cannot always say this is parallelism and this is concurrency ore something. It's just nice when the libary does not get in the way when you are in a situation where eg. throughput and responsiveness or whatever else matter. Sometimes it can be a small change that can make or break the deal. It's sad that I'm not able to get my points across.. But I'm too tired right now to go down and discuss the parallelism/concurrency topic.

March 25, 2011

Re: std.parallelism changes done

Posted by dsimcha
in reply to Sönke Ludwig

dsimcha

Posted in reply to Sönke Ludwig

On 3/24/2011 10:31 PM, Sönke Ludwig wrote:
> Well what can I say.. things can become more complex and you cannot
> always say this is parallelism and this is concurrency ore something.
> It's just nice when the libary does not get in the way when you are in a
> situation where eg. throughput and responsiveness or whatever else
> matter. Sometimes it can be a small change that can make or break the
> deal.

Agreed.  I'm not trying to be pedantic here, and I'm certainly willing to make **small** changes even if they stretch the scope somewhat into general concurrency.  It's just that I don't want to make big changes, especially if they will make the interface more complex, reduce efficiency and/or lock me into certain implementations.  (For example, using a priority queue precludes supporting work stealing later without breaking the priority feature.)

March 25, 2011

Re: std.parallelism changes done

Posted by dsimcha
in reply to Sönke Ludwig

dsimcha

Posted in reply to Sönke Ludwig

On 3/24/2011 10:21 PM, Sönke Ludwig wrote:
>>> Indeed this pattern solves the problem to wait for the completion of a
>>> specific task. It also avoids a huge potential of deadlocks that a
>>> general yield() that does not take a task would have. However, it will
>>> not solve the general problem of one task waiting for another, which
>>> could be in terms of a condition variable or just a mutex that is used
>>> in the middle of the task execution.
>>>
>>
>> Can you elaborate and/or provide an example of the "general" problem?
>> I'm not quite sure what you're getting at.
>
> I have one very specific constellation that I can only sketch.. Suppose
> you have some kind of complex computation going on in the ThreadPool.
> This computation is done by a large set of tasks where each tasks
> depends on the result of one or more other tasks. One task is
> responsible for coordinating the work - it is spawning tasks and waiting
> for their completion to spawn new tasks for which the results are now
> available.
>

As I've said before in related discussions, you are _probably_ better off using one of the high level primitives instead of using tasks directly in these cases.  If not, I'd prefer to improve the higher level primitives and/or create new ones if possible.  (Feel free to suggest one if you can think of it.)  Tasks are, IMHO, too low level for anything except basic future/promise parallelism and implementing higher level primitives.  Incidentally the situation you describe (a coordinator task creating lots of worker tasks) is exactly how amap(), reduce() and parallel foreach work under the hood.  This complexity is completely encapsulated, though.

> First thing here is that you do not want to do the waitForce() kind of
> waiting in the coordinator task because this might cause the coordinator
> to be busy with an expensive taks while it could already spawn new tasks
> because maybe in the meantime some other tasks have already finished.

I assume you mean yieldForce().

>
> However, if you wait for a condition variable instead (which is fired
> after each finished task) and if you can have multiple computations of
> this kind running in parallel, you can immediately run into the
> situation that the thread pool is crowded with coordinator tasks that
> are all waiting for their condition variables which will never be
> triggered because no worker tasks can be executed.

I assume you're talking about a condition variable other than the one yieldForce() uses.  As mentioned previously, in the specific case of yieldForce() this is a solved problem.  In the general case I can see the problem.

>
> This is only one example and basically this problem can arise in all
> cases where one task depends on another task by some form of waiting
> that will not execute the dependency like waitForce() does.

Hmm, ok, I definitely understand the problem now.

>>> But what I wanted to say is, even if it may be difficult to implement
>>> such thread caching now, putting means to execute a Task in its own
>>> thread now into the ThreadPool allows for such an optimization later (it
>>> could even exist while still keeping Task.executeInNewThread()).
>>
>> I can't really comment because I still don't understand this very well.
>
> I hope I could make it a little more clear what I mean. The problem is
> just that the system I'm talking about is quite complex and it's not
> easy to find good and simple examples in that system. The problems of
> course arise only in the most complex pathes of execution..
>
> What I'm not sure about is if executeInNewThread() is supposed to be
> useful just because it is somtimes nice to have the fine-grained
> parallelism of the OS scheduler as opposed to task granilarity, or if
> the advantage is supposed to be efficiency gained because the thread
> pool is not created. In the latter case the caching of some threads to
> be reused for a executeInOwnThread()-method should lead to a better
> performance in almost any case where thread creation overhead is relevant.

Ok, now I'm starting to understand this.  Please correct me (once you've gotten a good night's sleep and can think again) wherever this is wrong:

1.  As is currently the case, executeInNewThread() is _guaranteed_ to start the task immediately.  There is never a queue involved.

2.  Unlike the current implementation, executeInNewThread() may use a cached thread.  It will **NOT**, however, put the task on a queue or otherwise delay its execution.  If no cached thread is available, it will create a new one and possibly destroy it when the task is done.

Thanks for this suggestion.  Now that I (think I) understand it, it makes sense in principle.  The devil may be in the details, though.

1.  How many threads should be cached?  I guess this could just be configurable with some reasonable default.

2.  Should the cache be lazily or eagerly created?  I'd assume lazily.

3.  Where would these threads be stored?  I think they probably belong in some kind of thread-safe global data structure, **NOT** in a TaskPool instance.

4.  How would we explain to people what the cache is good for and how to use it?  The fact that you're proposing it and even you find this difficult to convey makes me skeptical that this feature is worth the weight it adds to the API.  Maybe you'll find it easier once you get some sleep.   (I understand the problem it solves at an abstract level but have never encountered a concrete use case.  It also took me a long time to understand it.)

5.  It would break some relaxations of what @safe tasks can do when started via executeInNewThread().

6.  This whole proposal might fit better in std.concurrency, by using a thread cache for spawn().

March 25, 2011

Re: std.parallelism changes done

Posted by Sönke Ludwig
in reply to dsimcha

Sönke Ludwig

Posted in reply to dsimcha

Am 25.03.2011 04:32, schrieb dsimcha:
> On 3/24/2011 10:31 PM, Sönke Ludwig wrote:
>> Well what can I say.. things can become more complex and you cannot
>> always say this is parallelism and this is concurrency ore something.
>> It's just nice when the libary does not get in the way when you are in a
>> situation where eg. throughput and responsiveness or whatever else
>> matter. Sometimes it can be a small change that can make or break the
>> deal.
>
> Agreed. I'm not trying to be pedantic here, and I'm certainly willing to
> make **small** changes even if they stretch the scope somewhat into
> general concurrency. It's just that I don't want to make big changes,
> especially if they will make the interface more complex, reduce
> efficiency and/or lock me into certain implementations. (For example,
> using a priority queue precludes supporting work stealing later without
> breaking the priority feature.)

I agree, the proirities are things that can be important in some cases but most of the time they are not really _necessary_ in that sense. And maybe in most of those cases where someone would like to have them, the suggestion by Michel to create a second thread pool with a different priority may be just fine.

The more important aspect was the OOB part with the cached threads for something like executeInNewThread. And even that is not a real deal-breaker.

March 25, 2011

Re: std.parallelism changes done

Posted by Sönke Ludwig
in reply to dsimcha

Sönke Ludwig

Posted in reply to dsimcha

Am 25.03.2011 05:14, schrieb dsimcha:
> On 3/24/2011 10:21 PM, Sönke Ludwig wrote:
>>>
>>> Can you elaborate and/or provide an example of the "general" problem?
>>> I'm not quite sure what you're getting at.
>>
>> I have one very specific constellation that I can only sketch.. Suppose
>> you have some kind of complex computation going on in the ThreadPool.
>> This computation is done by a large set of tasks where each tasks
>> depends on the result of one or more other tasks. One task is
>> responsible for coordinating the work - it is spawning tasks and waiting
>> for their completion to spawn new tasks for which the results are now
>> available.
>>
>
> As I've said before in related discussions, you are _probably_ better
> off using one of the high level primitives instead of using tasks
> directly in these cases. If not, I'd prefer to improve the higher level
> primitives and/or create new ones if possible. (Feel free to suggest one
> if you can think of it.) Tasks are, IMHO, too low level for anything
> except basic future/promise parallelism and implementing higher level
> primitives. Incidentally the situation you describe (a coordinator task
> creating lots of worker tasks) is exactly how amap(), reduce() and
> parallel foreach work under the hood. This complexity is completely
> encapsulated, though.

I would certainly agree that this belongs to a higher level structure. This structure would basically get a set of special tasks, where each of those tasks has a list of all the tasks it depends on. All tasks would then be executed in parallel on a thread pool in on order that statisfies their dependencies - possibly with some form of cost function that controls which task should come first if there are multiple orders possible.

One problem here is for example that for the system I have here, I need to execute several tasks in the main thread by sending a message to it (the main thread executes window messages in a loop). Specifically this is for tasks that use OpenGL or a similar API that has a single thread assigned - and the main thread is the most efficient one to use because it already has an OpenGL context.

The important thing is to either support such things or to make it general enough to let the user add it from the outside. Otherwise if you really need such things, the only option is to completely use a custom thread pool and thins means no parallel for, map, reduce and whatever might be added later.

>
>> First thing here is that you do not want to do the waitForce() kind of
>> waiting in the coordinator task because this might cause the coordinator
>> to be busy with an expensive taks while it could already spawn new tasks
>> because maybe in the meantime some other tasks have already finished.
>
> I assume you mean yieldForce().

Yes, sorry, got the names mixed up.

>
>>
>> However, if you wait for a condition variable instead (which is fired
>> after each finished task) and if you can have multiple computations of
>> this kind running in parallel, you can immediately run into the
>> situation that the thread pool is crowded with coordinator tasks that
>> are all waiting for their condition variables which will never be
>> triggered because no worker tasks can be executed.
>
> I assume you're talking about a condition variable other than the one
> yieldForce() uses. As mentioned previously, in the specific case of
> yieldForce() this is a solved problem. In the general case I can see the
> problem.
>

Yes, just the general problem with other condition variables.

>>
>> This is only one example and basically this problem can arise in all
>> cases where one task depends on another task by some form of waiting
>> that will not execute the dependency like waitForce() does.
>
> Hmm, ok, I definitely understand the problem now.
>
>>>> But what I wanted to say is, even if it may be difficult to implement
>>>> such thread caching now, putting means to execute a Task in its own
>>>> thread now into the ThreadPool allows for such an optimization later
>>>> (it
>>>> could even exist while still keeping Task.executeInNewThread()).
>>>
>>> I can't really comment because I still don't understand this very well.
>>
>> I hope I could make it a little more clear what I mean. The problem is
>> just that the system I'm talking about is quite complex and it's not
>> easy to find good and simple examples in that system. The problems of
>> course arise only in the most complex pathes of execution..
>>
>> What I'm not sure about is if executeInNewThread() is supposed to be
>> useful just because it is somtimes nice to have the fine-grained
>> parallelism of the OS scheduler as opposed to task granilarity, or if
>> the advantage is supposed to be efficiency gained because the thread
>> pool is not created. In the latter case the caching of some threads to
>> be reused for a executeInOwnThread()-method should lead to a better
>> performance in almost any case where thread creation overhead is
>> relevant.
>
> Ok, now I'm starting to understand this. Please correct me (once you've
> gotten a good night's sleep and can think again) wherever this is wrong:
>
> 1. As is currently the case, executeInNewThread() is _guaranteed_ to
> start the task immediately. There is never a queue involved.
>
> 2. Unlike the current implementation, executeInNewThread() may use a
> cached thread. It will **NOT**, however, put the task on a queue or
> otherwise delay its execution. If no cached thread is available, it will
> create a new one and possibly destroy it when the task is done.

Exactly.

>
> Thanks for this suggestion. Now that I (think I) understand it, it makes
> sense in principle. The devil may be in the details, though.
>
> 1. How many threads should be cached? I guess this could just be
> configurable with some reasonable default.

A configurable minimum number of threads sounds reasonable. The default could probably be a fixed small number like 1 or 2.

>
> 2. Should the cache be lazily or eagerly created? I'd assume lazily.

Lazy sounds good.

>
> 3. Where would these threads be stored? I think they probably belong in
> some kind of thread-safe global data structure, **NOT** in a TaskPool
> instance.

Thats a good question.. ThreadPool would be nice because it is the class of which maybe you are already dragging an instance through your code. Global would certainly work.

>
> 4. How would we explain to people what the cache is good for and how to
> use it? The fact that you're proposing it and even you find this
> difficult to convey makes me skeptical that this feature is worth the
> weight it adds to the API. Maybe you'll find it easier once you get some
> sleep. (I understand the problem it solves at an abstract level but have
> never encountered a concrete use case. It also took me a long time to
> understand it.)

I would basically just say its a faster way than to create a new thread each time you start a task. Use it whenever you need to have a task run outside of the thread pool threads - candidates are tasks that wait a lot, either because of IO or because of waiting primitives apart from the ones present in ThreadPool (message queues, condition variables).

But please don't make the mistake to dismiss the problem because it is complex. Beeing complex and maybe rare does not mean it cannot be important. Its like a bug that will delete your data but in very rare and complex use cases of the application. You would not want to ignore that just because of those reasons.

Also I'm not sure if using the primitives of std.concurrency is allowed within in a Task, maybe not. But if it is, it would be really easy to construct a higher level example without condition variables and stuff like that.

>
> 5. It would break some relaxations of what @safe tasks can do when
> started via executeInNewThread().

You mean because of TLS that is not reinitialized each time? I have to admit that I can't really gauge the impact of this.

>
> 6. This whole proposal might fit better in std.concurrency, by using a
> thread cache for spawn().

But isn't the previous problem (5.) even more relevant in std.concurrency? Putting it near ThreadPool could be a good idea because it still is some sort of thread pool in the abstract sense. It could also be something that std.concurrency uses for its spawn().

Anyway, I would be happy if there would be a place allocated for this wherever this fits. If it is std.concurrency, its fine as long as std.concurrency and std.parallelism play well together. One problem with std.concurrency is that it also does not really work well when you need to wait for other primitives than a message. To have something like WaitForMultipleObjects is critical in many cases, but thats another topic.

Having said all this, I just want to make sure that you don't get this wrong. I certainly to not want to push in complex changes for no reason and no complex changes at all for that matter. And in this case I see how it is _functionally_ independent from the ThreadPool itself.

My problem here is just that there is a executeInNewThread function that really should not be used for _many_ tasks and maybe in most other cases it would be cleaner to use spawn() - it may still have its place if you count in the @safe implications, but I would like to also see a function supporting the same threading guarantees but suitable for many tasks, if only to avoid bad usage patterns of executeInNewThread.

However it definitely is in the gray area between std.concurrency and std.parallelism.

March 25, 2011

Re: std.parallelism changes done

Posted by Sönke Ludwig
in reply to Sönke Ludwig

Sönke Ludwig

Posted in reply to Sönke Ludwig

Am 25.03.2011 10:33, schrieb Sönke Ludwig:
>yadda-yadda
>

Apart from all this - I just want to make this a known problem, what you (or maybe Andrei for std.concurrency) decide is up to you and I'm fine with any outcome for my personal stuff because I do not have such a complex system apart from work (C++).

And I would like to say that I do like this module in general very much . It also fits almost 99% the design I have in my libraries ;)

March 25, 2011

Re: std.parallelism changes done

Posted by dsimcha
in reply to Sönke Ludwig

dsimcha

Posted in reply to Sönke Ludwig

On 3/25/2011 5:42 AM, Sönke Ludwig wrote:
> Am 25.03.2011 10:33, schrieb Sönke Ludwig:
>> yadda-yadda
>  >
>
> Apart from all this - I just want to make this a known problem, what you
> (or maybe Andrei for std.concurrency) decide is up to you and I'm fine
> with any outcome for my personal stuff because I do not have such a
> complex system apart from work (C++).

You've done a good job of explaining a complex problem.  I appreciate it.  I think we should make this a long-term todo, like Michael Fortin's suggestion that std.concurrency should be able to create tasks or std.parallelism should handle message passing.  You should probably file a Bugzilla enhancement request saying Phobos should support thread caching, so that this proposal doesn't get lost.

Your proposal is feasible and solves an important problem.  On the other hand, the design and implementation details are still vague and would require substantial discussion.  The feature is tangential to the purpose of std.parallelism and clearly crosses the line into general case concurrency.

Bottom line:  This proposal should not hold up the vote and adoption of std.parallelism, but it should not be discarded permanently either.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation