Jump to page: 1 2 3
Thread overview
Threadpools, difference between DMD and LDC
Aug 03, 2014
Philippe Sigaud
Aug 03, 2014
safety0ff
Aug 03, 2014
David Nadlinger
Aug 04, 2014
Philippe Sigaud
Aug 04, 2014
Kapps
Aug 04, 2014
Philippe Sigaud
Aug 04, 2014
Chris Cain
Aug 04, 2014
Philippe Sigaud
Aug 04, 2014
Dicebot
Aug 04, 2014
David Nadlinger
Aug 04, 2014
Dicebot
Aug 04, 2014
Philippe Sigaud
Aug 04, 2014
Marc Schütz
Aug 04, 2014
Dicebot
Aug 04, 2014
Philippe Sigaud
Aug 04, 2014
Dicebot
Aug 04, 2014
Sean Kelly
Aug 05, 2014
Philippe Sigaud
Aug 04, 2014
Russel Winder
Aug 04, 2014
Dicebot
Aug 04, 2014
Russel Winder
Aug 04, 2014
Dicebot
Aug 05, 2014
Russel Winder
Aug 04, 2014
Philippe Sigaud
August 03, 2014
I'm trying to grok message passing. That's my very first foray
into this, so I'm probably making every mistake in the book :-)

I wrote a small threadpool test, it's there:

http://dpaste.dzfl.pl/3d3a65a00425

I'm playing with the number of threads and the number of tasks,
and getting a feel about how message passing works. I must say I
quite like it: it's a bit like suddenly being able to safely
return different types from a function.

What I don't get is the difference between DMD (I'm using 2.065)
and LDC (0.14-alpha1).

For DMD, I compile with -O -inline -noboundscheck
For LDC, I use -03 -inline

LDC gives me smaller executables than DMD (also, 3 to 5 times
smaller than 0.13, good job!) but above all else incredibly,
astoundingly faster. I'm used to LDC producing 20-30% faster
programs, but here it's 1000 times faster!

8 threads, 1000 tasks: DMD:  4000 ms, LDC: 3 ms (!)

So my current hypothesis is a) I'm doing something wrong or b)
the tasks are optimized away or something.

Can someone confirm the results and tell me what I'm doing wrong?
August 03, 2014
On Sunday, 3 August 2014 at 19:52:42 UTC, Philippe Sigaud wrote:
>
> Can someone confirm the results and tell me what I'm doing wrong?

LDC is likely optimizing the summation:

    int sum = 0;
    foreach(i; 0..task.goal)
        sum += i;

To something like:

    int sum = cast(int)(cast(ulong)(task.goal-1)*task.goal/2);
August 03, 2014
On Sunday, 3 August 2014 at 22:24:22 UTC, safety0ff wrote:
> On Sunday, 3 August 2014 at 19:52:42 UTC, Philippe Sigaud wrote:
>>
>> Can someone confirm the results and tell me what I'm doing wrong?
>
> LDC is likely optimizing the summation:
>
>     int sum = 0;
>     foreach(i; 0..task.goal)
>         sum += i;
>
> To something like:
>
>     int sum = cast(int)(cast(ulong)(task.goal-1)*task.goal/2);

This is correct – the LLVM optimizer indeed gets rid of the loop completely.

Although I'd be more than happy to be able to claim a thousandfold speedup over DMD on real-world applications. ;)

Cheers,
David
August 04, 2014
> This is correct – the LLVM optimizer indeed gets rid of the loop completely.

OK,that's clever. But I get this even when put a writeln("some msg") inside the task. I thought a write couldn't be optimized away that way and that it's a slow operation?

Anyway, I discovered Thread.wait() in core in the meantime, I'll use that. I just wanted to have tasks taking a different amount of time each time.

I have another question: it seems I can spawn hundreds of threads
(Heck, even 10_000 is accepted), even when I have 4-8 cores. Is there:
is there a limit to the number of threads? I tried a threadpool
because in my application I feared having to spawn ~100-200 threads
but if that's not the case, I can drastically simplify my code.
Is spawning a thread a slow operation in general?

August 04, 2014
On Monday, 4 August 2014 at 05:14:22 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
>
> I have another question: it seems I can spawn hundreds of threads
> (Heck, even 10_000 is accepted), even when I have 4-8 cores. Is there:
> is there a limit to the number of threads? I tried a threadpool
> because in my application I feared having to spawn ~100-200 threads
> but if that's not the case, I can drastically simplify my code.
> Is spawning a thread a slow operation in general?

Without going into much detail: Threads are heavy, and creating a thread is an expensive operation (which is partially why virtually every standard library includes a ThreadPool). Along with the overhead of creating the thread, you also get the overhead of additional context switches for each thread you have actively running. Context switches are expensive and a significant waste of time where your CPU gets to sit there doing effectively nothing while the OS manages scheduling which thread will go and restoring its context to run again. If you have 10,000 threads even if you won't run into limits of how many threads you can have, this will provide very significant overhead.

I haven't looked into detail your code, but consider using the TaskPool if you just want to schedule some tasks to run amongst a few threads, or potentially using Fibers (which are fairly light-weight) instead of Threads.
August 04, 2014
On Monday, 4 August 2014 at 05:14:22 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
>> This is correct – the LLVM optimizer indeed gets rid of the loop completely.
>
> OK,that's clever. But I get this even when put a writeln("some msg")
> inside the task. I thought a write couldn't be optimized away that way
> and that it's a slow operation?

You need the _result_ of the computation for the writeln. LLVM's optimizer recognizes what the loop tries to compute, though, and replaces it with an equivalent expression for the sum of the series, as Trass3r alluded to.

Cheers,
David
August 04, 2014
> Without going into much detail: Threads are heavy, and creating a thread is an expensive operation (which is partially why virtually every standard library includes a ThreadPool).

> I haven't looked into detail your code, but consider using the TaskPool if you just want to schedule some tasks to run amongst a few threads, or potentially using Fibers (which are fairly light-weight) instead of Threads.

OK, I get it. Just to be sure, there is no ThreadPool in Phobos or in
core, right?
IIRC, there are fibers somewhere in core, I'll have a look. I also
heard the vibe.d has them.
August 04, 2014
On Monday, 4 August 2014 at 12:05:31 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
> OK, I get it. Just to be sure, there is no ThreadPool in Phobos or in
> core, right?
> IIRC, there are fibers somewhere in core, I'll have a look. I also
> heard the vibe.d has them.

There is. It's called taskPool, though:

http://dlang.org/phobos/std_parallelism.html#.taskPool
August 04, 2014
On Mon, Aug 4, 2014 at 2:13 PM, Chris Cain via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:

>> OK, I get it. Just to be sure, there is no ThreadPool in Phobos or in core, right?

> There is. It's called taskPool, though:
>
> http://dlang.org/phobos/std_parallelism.html#.taskPool

Ah, std.parallelism. I stoopidly searched in std.concurrency and core.* Thanks!
August 04, 2014
On Monday, 4 August 2014 at 05:14:22 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
> I have another question: it seems I can spawn hundreds of threads
> (Heck, even 10_000 is accepted), even when I have 4-8 cores. Is there:
> is there a limit to the number of threads? I tried a threadpool
> because in my application I feared having to spawn ~100-200 threads
> but if that's not the case, I can drastically simplify my code.
> Is spawning a thread a slow operation in general?

Most likely those threads either do nothing or are short living so you don't get actually 10 000 threads running simultaneously. In general you should expect your operating system to start stalling at few thousands of concurrent threads competing for context switches and system resources. Creating new thread is rather costly operation though you may not spot it in synthetic snippets, only under actual load.

Modern default approach is to have amount of "worker" threads identical or close to amount of CPU cores and handle internal scheduling manually via fibers or some similar solution.

If you are totally new to the topic of concurrent services, getting familiar with http://en.wikipedia.org/wiki/C10k_problem may be useful :)
« First   ‹ Prev
1 2 3