October 30, 2010
On Fri, 29 Oct 2010 07:13:46 -0400, tls <do@notha.ev> wrote:
> Robert Jacques Wrote:
>
>> Getting back to pure, one of the "big"
>> advantages of functional languages are their ability to automatically
>> parallelize themselves; and they use a work stealing runtime (aka tasks)
>> to do this.
>
> What functional has task? I switch in Meantime when D2 not staple to study this.

Every functional language can do this; it's part and parcel of being functional. It's really a more question of how mature their runtimes are, as opposed to language support.
November 01, 2010
On 29/10/2010 02:32, Robert Jacques wrote:
> On Thu, 28 Oct 2010 10:48:34 -0400, Bruno Medeiros
> <brunodomedeiros+spam@com.gmail> wrote:
>> On 26/10/2010 04:47, Robert Jacques wrote:
>>> On Mon, 25 Oct 2010 09:44:14 -0400, Bruno Medeiros
>>> <brunodomedeiros+spam@com.gmail> wrote:
>>>
>>>> On 23/09/2010 23:39, Robert Jacques wrote:
>>>>> On Thu, 23 Sep 2010 16:35:23 -0400, Tomek Sowiński <just@ask.me>
>>>>> wrote:
>>>>>>
>>>>>> On topic: this means a pure function can take a reference to data
>>>>>> that
>>>>>> can be mutated by
>>>>>> someone else. So we're giving up on the "can parallelize with no
>>>>>> dataraces" guarantee on
>>>>>> pure functions?
>>>>>>
>>>>>
>>>>> In short, No. In long; the proposal is for pure functions become
>>>>> broken
>>>>> up into two groups (weak and strong) based on their function
>>>>> signatures.
>>>>> This division is internal to the compiler, and isn't expressed in the
>>>>> language in any way. Strongly-pure functions provide all the
>>>>> guarantees
>>>>> that pure does today and can be automatically parallelized or cached
>>>>> without consequence. Weakly-pure functions, don't provide either of
>>>>> these guarantees, but allow a much larger number of functions to be
>>>>> strongly-pure. In order to guarantee a function is strongly pure, one
>>>>> would have to declare all its inputs immutable or use an appropriate
>>>>> template constraint.
>>>>
>>>> I think we need to be more realistic with what kinds of optimizations
>>>> we could expect from a D compiler and pure functions.
>>>> Caching might be done, but only a temporary sense (caching under a
>>>> limited execution scope). I doubt we would ever have something like
>>>> memoization, which would incur memory costs (potentially quite big
>>>> ones), and so the compiler almost certainly would not be able to know
>>>> (without additional metadata/anotations or compile options) if that
>>>> trade-off is acceptable.
>>>>
>>>> Similarly for parallelism, how would the compiler know that it's ok to
>>>> spawn 10 or 100 new threads to parallelize the execution of some loop?
>>>> The consequences for program and whole-machine scheduling would not be
>>>> trivial and easy to understand. For this to happen, amongst other
>>>> things the compiler and OS would need to ensure that the spawned
>>>> threads would not starve the rest of the threads of that program.
>>>> I suspect all these considerations might be very difficult to
>>>> guarantee on a non-VM environment.
>>>>
>>>
>>> Ahem, it's trivial for the compiler to know if it's okay to spawn 10 or
>>> 100 _tasks_. Tasks, as opposed to threads or even thread pools, are
>>> extremely cheap (think on the order of function call overhead).
>>
>> What are these tasks you mention? I've never heard of them.
>>
>
> The programming language Cilk popularized the concept of parallelization
> through many small tasks combined with a work stealing runtime. Futures
> are essentially the same concept, but because futures were generally
> implemented with OS-threads, a thread pool or fibers/coroutines, that
> term is generally avoided. Like message passing, tasks are often
> implemented in libraries with Intel's threading building blocks probably
> being the most famous, though both Microsoft's Task Parallel Library and
> Apple's Grand Central are gaining mind-share. David Simcha currently has
> a task library in review for inclusion to phobos. Basically, the point
> of tasks is to provide parallelization with extremely low overhead (on
> average a Cilk spawn is less than 4 function calls). That way, instead
> of having a few coarse grain threads which neither scale nor load
> balance well, you're encouraged to use tasks everywhere and therefore
> reap the benefits of a balanced N-way scalable system.

Hum, I see what you mean know, but tasks only help with the *creation overhead* of otherwise spawning lots of OS threads, they don't solve the main problems I mentioned.
First, it may be fine to spawn 100 tasks, but there is still the issue of deciding how many OS threads the tasks will run in! Obviously, you won't run them in just one OS thread, otherwise you won't get any parallelism. Ideally for your program only, your program would have as many OS threads as there are cores. But here there is still the same issue of whether its ok for you program to use up all the cores in your machine. The compiler doesn't know that. Could it be enough to have a global compiler option to specify that? I don't think so: What if you want some code of your program to use as much OS-threads as possible, but not some other code?
Second, and perhaps more importantly, the very same issue occurs in the scope of your program alone. So, even if you use all OS threads, and don't care about other programs, spawning 100 tasks for some loop might take time away from other more important tasks of your program. The compiler/task-scheduler/whatever would not automatically know what is acceptable and what is not. (the only exception being if your program was logically single-threaded)

-- 
Bruno Medeiros - Software Engineer
November 02, 2010
On Mon, 01 Nov 2010 10:24:43 -0400, Bruno Medeiros <brunodomedeiros+spam@com.gmail> wrote:
> On 29/10/2010 02:32, Robert Jacques wrote:
[snip]
>> The programming language Cilk popularized the concept of parallelization
>> through many small tasks combined with a work stealing runtime. Futures
>> are essentially the same concept, but because futures were generally
>> implemented with OS-threads, a thread pool or fibers/coroutines, that
>> term is generally avoided. Like message passing, tasks are often
>> implemented in libraries with Intel's threading building blocks probably
>> being the most famous, though both Microsoft's Task Parallel Library and
>> Apple's Grand Central are gaining mind-share. David Simcha currently has
>> a task library in review for inclusion to phobos. Basically, the point
>> of tasks is to provide parallelization with extremely low overhead (on
>> average a Cilk spawn is less than 4 function calls). That way, instead
>> of having a few coarse grain threads which neither scale nor load
>> balance well, you're encouraged to use tasks everywhere and therefore
>> reap the benefits of a balanced N-way scalable system.
>
> Hum, I see what you mean know, but tasks only help with the *creation overhead* of otherwise spawning lots of OS threads, they don't solve the main problems I mentioned.
> First, it may be fine to spawn 100 tasks, but there is still the issue of deciding how many OS threads the tasks will run in! Obviously, you won't run them in just one OS thread, otherwise you won't get any parallelism. Ideally for your program only, your program would have as many OS threads as there are cores. But here there is still the same issue of whether its ok for you program to use up all the cores in your machine. The compiler doesn't know that. Could it be enough to have a global compiler option to specify that? I don't think so: What if you want some code of your program to use as much OS-threads as possible, but not some other code?
> Second, and perhaps more importantly, the very same issue occurs in the scope of your program alone. So, even if you use all OS threads, and don't care about other programs, spawning 100 tasks for some loop might take time away from other more important tasks of your program. The compiler/task-scheduler/whatever would not automatically know what is acceptable and what is not. (the only exception being if your program was logically single-threaded)
>

Controlling the task runtime thread-pool size is trivial. Indeed, you'll often want to reduce the number of daemon threads by the number of active program threads. And if you need fine grain control over pool sizes, you can always create separate pools and assign tasks to them. I think a reasonable default would be (# of cores - 2) daemons with automatic decreases/increases with every spawn/termination. But, all those settings should be controllable at runtime.

November 09, 2010
On 02/11/2010 03:26, Robert Jacques wrote:
> On Mon, 01 Nov 2010 10:24:43 -0400, Bruno Medeiros
> <brunodomedeiros+spam@com.gmail> wrote:
>> On 29/10/2010 02:32, Robert Jacques wrote:
> [snip]
>>> The programming language Cilk popularized the concept of parallelization
>>> through many small tasks combined with a work stealing runtime. Futures
>>> are essentially the same concept, but because futures were generally
>>> implemented with OS-threads, a thread pool or fibers/coroutines, that
>>> term is generally avoided. Like message passing, tasks are often
>>> implemented in libraries with Intel's threading building blocks probably
>>> being the most famous, though both Microsoft's Task Parallel Library and
>>> Apple's Grand Central are gaining mind-share. David Simcha currently has
>>> a task library in review for inclusion to phobos. Basically, the point
>>> of tasks is to provide parallelization with extremely low overhead (on
>>> average a Cilk spawn is less than 4 function calls). That way, instead
>>> of having a few coarse grain threads which neither scale nor load
>>> balance well, you're encouraged to use tasks everywhere and therefore
>>> reap the benefits of a balanced N-way scalable system.
>>
>> Hum, I see what you mean know, but tasks only help with the *creation
>> overhead* of otherwise spawning lots of OS threads, they don't solve
>> the main problems I mentioned.
>> First, it may be fine to spawn 100 tasks, but there is still the issue
>> of deciding how many OS threads the tasks will run in! Obviously, you
>> won't run them in just one OS thread, otherwise you won't get any
>> parallelism. Ideally for your program only, your program would have as
>> many OS threads as there are cores. But here there is still the same
>> issue of whether its ok for you program to use up all the cores in
>> your machine. The compiler doesn't know that. Could it be enough to
>> have a global compiler option to specify that? I don't think so: What
>> if you want some code of your program to use as much OS-threads as
>> possible, but not some other code?
>> Second, and perhaps more importantly, the very same issue occurs in
>> the scope of your program alone. So, even if you use all OS threads,
>> and don't care about other programs, spawning 100 tasks for some loop
>> might take time away from other more important tasks of your program.
>> The compiler/task-scheduler/whatever would not automatically know what
>> is acceptable and what is not. (the only exception being if your
>> program was logically single-threaded)
>>
>
> Controlling the task runtime thread-pool size is trivial. Indeed, you'll
> often want to reduce the number of daemon threads by the number of
> active program threads. And if you need fine grain control over pool
> sizes, you can always create separate pools and assign tasks to them. I
> think a reasonable default would be (# of cores - 2) daemons with
> automatic decreases/increases with every spawn/termination. But, all
> those settings should be controllable at runtime.
>

I didn't say controlling pools/task-count/threads-count/priorities wasn't easy or trivial. I just claimed it should not done automatically by the compiler, but rather "manually" in the code, and with fine grained control.
∎

-- 
Bruno Medeiros - Software Engineer
1 2 3 4 5 6 7 8 9 10 11
Next ›   Last »