February 05, 2014
On Wednesday, 5 February 2014 at 10:56:23 UTC, Bienlein wrote:
> On Wednesday, 5 February 2014 at 01:02:37 UTC, Sean Kelly wrote:
>> Okay, just for fun, here are some results with the new scheduler.
>>  I injected periodic yields into the code to simulate the
>> yielding that would happen automatically if the code was using
>> send and receive.  First the code:
>
> Hi Sean,
>
> with "send and receive" you mean adding to a channel and doing a blocking take on it? Just for me to build up an understanding.

Sort of. std.concurrency uses the actor model. So it's messaging, but not the CSP model used by Go. We should probably offer both, but for now it's just actors. And because you basically have one channel per thread, the limiting factor to date is how many threads you can sanely run simultaneously. Actor-oriented languages typically use green threads instead of kernel threads so the number of threads can scale. In Erlang, a "process" (ie. a thread) is equivalent to a class in D, so there tends to be a lot of them.
February 05, 2014
On Wednesday, 5 February 2014 at 14:40:46 UTC, Sean Kelly wrote:
> Sort of. std.concurrency uses the actor model. So it's messaging, but not the CSP model used by Go. We should probably offer both, but for now it's just actors. And because you basically have one channel per thread, the limiting factor to date is how many threads you can sanely run simultaneously. Actor-oriented languages typically use green threads instead of kernel threads so the number of threads can scale. In Erlang, a "process" (ie. a thread) is equivalent to a class in D, so there tends to be a lot of them.

On a very well equipped machine 10.000 threads is about the maximum for the JVM. Now for D 1.000.000 kernel threads are not a problem!? Well, I'm a D newbie and a bit confused now... Have to ask some questions trying not to bug people. Apparently, a kernel thread in D is not an OS thread. Does D have it's own threading model then? Couldn't see that from what I found on dlang.org. The measurement result for fibers is that much better as for threads, because fibers have less overhead for context switching? Will actors in D benefit from your FiberScheduler when it has been released? Do you know which next version of D your FiberScheduler is planned to be included?

In Go you can easily spawn 100.000 goroutines (aka green threads), probably several 100.000. Being able to spawn way more than 100.000 threads in D with little context switching overhead as with using fibers you are basically in the same league as with Go. And D is a really rich language contrary to Go. This looks cool :-)

February 05, 2014
On Wednesday, 5 February 2014 at 15:38:43 UTC, Bienlein wrote:
>
> On a very well equipped machine 10.000 threads is about the maximum for the JVM. Now for D 1.000.000 kernel threads are not a problem!? Well, I'm a D newbie and a bit confused now... Have to ask some questions trying not to bug people. Apparently, a kernel thread in D is not an OS thread. Does D have it's own threading model then? Couldn't see that from what I found on dlang.org. The measurement result for fibers is that much better as for threads, because fibers have less overhead for context switching? Will actors in D benefit from your FiberScheduler when it has been released? Do you know which next version of D your FiberScheduler is planned to be included?

Well, I spawned 1 million threads, but there's no guarantee that
1 million were running concurrently.  So I decided to run a test.
  I forced the code to block until all threads were started, and
when using kernel threads this hung with 2047 threads running
(this is on OSX).  So I think OSX has a hard internal limit of
2047 threads.  It's possible this can be extended somehow, but I
didn't investigate.  And since I don't currently have a great way
to block fibers, what I was doing there was a busy wait, which
was just slow going waiting for all the threads to spin up.

Next I just figured I'd keep a high water mark for concurrent
thread count for the code I posted yesterday.  Both fibers and
kernel threads topped out at about 10.  For fibers, this makes
perfect sense given the yield strategy (each client thread yields
10 times while running).  And I guess the scheduling for kernel
threads made that come out about the same.  So the fact that I
was able to spawn 1 million kernel threads doesn't actually mean
a whole lot.  I should have thought about that more yesterday.
Because of the added synchronization counting threads, everything
slowed down a bit, so I reduced the number of threads to 100.000.
  Here are some timings:

$ time concurrency threads
numThreadsToSpawn = 100000, maxConcurrent = 12

real	1m8.573s
user	1m22.516s
sys	0m27.985s

$ time concurrency fibers
numThreadsToSpawn = 100000, maxConcurrent = 10

real	0m5.860s
user	0m3.493s
sys	0m2.361s

So in short, a "kernel thread" in D (which is equivalent to
instantiating a core.thread.Thread) is an OS thread.  The fibers
are user-space threads that context switch when explicitly
yielded and use core.thread.Fiber.

One thing to note about the FiberScheduler is that I haven't
sorted out a solution for thread-local storage.  So if you're
using the FiberScheduler and each "thread" is accessing some
global static data it expects to be exclusive to itself, you'll
end up with an undefined result.  Making D's "thread-local by
default" actually be fiber-local when using fibers is a pretty
hard problem to solve, and can be dealt with later if the need
arises.  My hope was that by making the choice of scheduler
user-defined however, it's up to the user to choose the
appropriate threading model for their application, and we can
hopefully sidestep the need to sort this out.  It was the main
issue blocking my doing this ages ago, and I didn't think of this
pluggable approach until recently.

The obvious gain here is that std.concurrency is no longer
strictly limited by the overhead of kernel threads, and so can be
used more according to the actor model as was originally
intended.  I can imagine more complex schedulers multiplexing
fibers across a pool of kernel threads, for example.  The
FiberScheduler is more a proof of concept than anything.

As for when this will be available... I will have a pull request
sorted out shortly, so you could start playing with it soon.  It
being included in an actual release means a review and such, but
as this is really just a fairly succinct change to an existing
module, I hope it won't be terribly contentious.


> In Go you can easily spawn 100.000 goroutines (aka green threads), probably several 100.000. Being able to spawn way more than 100.000 threads in D with little context switching overhead as with using fibers you are basically in the same league as with Go. And D is a really rich language contrary to Go. This looks cool :-)

Yeah, I think it's exciting.  I had originally modeled
std.concurrency after Erlang and like the way the syntax worked
out, but using kernel threads is limiting.  I'm interested to see
how this scales once people start playing with it.  It's possible
that some tuning of when yields occur may be needed as time goes
on, but that really needs more eyes than my own and probably
multiple real world tests as well.

As some general background on actors vs. CSP in std.concurrency,
I chose actors for two reasons.  First, the communication model
for actors is unstructured, so it's adaptable to a lot of
different application designs.  If you want structure you can
impose it at the protocol level, but it isn't necessary to do
so--simply using std.concurency requires practically no code at
all for the simple case.  And second, I wasn't terribly fond of
the "sequential" part of CSP.  I really want a messaging model
that scales horizontally across processes and across hosts, and
the CSP algebra doesn't work that way.  At the time, I found a
few algebras that were attempting to basically merge the two
approaches, but nothing really stood out.
February 06, 2014
> "How We Went from 30 Servers to 2: Go". Link: http://blog.iron.io/2013/03/how-we-went-from-30-servers-to-2-go.html

Heh, here is more interesting interpretation of this article http://versusit.org/go-vs-ruby
February 06, 2014
On Wednesday, 5 February 2014 at 20:37:44 UTC, Sean Kelly wrote:

> As for when this will be available... I will have a pull request
> sorted out shortly, so you could start playing with it soon.  It
> being included in an actual release means a review and such, but
> as this is really just a fairly succinct change to an existing
> module, I hope it won't be terribly contentious.

Sounds good. So, I only need to watch the Github repo for phobos and I will get notified? Or do I need to watch some other repo for D on Github? Just to be in the save side since I'm new to D and not familiar with the way things are split up.

> ... And second, I wasn't terribly fond of
> the "sequential" part of CSP.  I really want a messaging model
> that scales horizontally across processes and across hosts, and
> the CSP algebra doesn't work that way.

What is nice about CSP is that you can proof that your code is free of deadlocks. The Go guys have developed a tool that parses the code and then tells you what it has found.

> As some general background on actors vs. CSP in std.concurrency,
> I chose actors for two reasons.  First, the communication model
> for actors is unstructured, so it's adaptable to a lot of
> different application designs.

Yeah, I understand the reasoning. CSP is somewhat from its level of granularity between low-level locks/semaphores/etc. and high-level actors. I guess you can easily build actors on top of CSP. In case of D actors are not that blown up as for example in Scala or Akka. Creating an actor is mostly like spawning a thread. So actors in D are much less heavy than in Scala/Akka. Actors in D must also have a message queue like channels in CSP where the message is inserted when some tid.send(...) is done. It is only not accessible from the outside.

> ...  It's possible this can be extended somehow, but I
> didn't investigate.  And since I don't currently have a great way
> to block fibers, what I was doing there was a busy wait, which
> was just slow going waiting for all the threads to spin up.

Goroutines in Go are also co-operative, but I'm not sure (e.g. not pre-emptive). They probably yield when a channel has run empty. Well, then they have to in order to detach the thread that serves the channel to prevent the system to run out of threads. I guess they may have a strategy when to yield based on how long other channels had to wait to get a thread attached to them. For that purposes maybe there is a way to measure the traffic in the message queues of actors in D to get some effective yielding done. Just some thought. I'm not really an expert here.

> Heh, here is more interesting interpretation of this article http://versusit.org/go-vs-ruby

Thanks for the link. Seems like the whole success story in this article in using Go is based on using goroutines and channels. So getting something similar accomplished in D would be important for D to be used for scalabale/elastic server-side software. Rust is basically using the same approach as Go with regard to threading. There seems to be something to it.

Cheers, Bienlein
February 06, 2014
Here is a document about the scheduler design in Go:
https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit

The C sources for the Go scheduler are here:
http://code.google.com/p/go/source/browse/src/pkg/runtime/proc.c?r=01acf1dbe91f673f6308248b8f45ec0564b1d751

Maybe it could be useful Just in case... ;-).

February 06, 2014
> What is nice about CSP is that you can proof that your code is free of deadlocks. The Go guys have developed a tool that parses the code and then tells you what it has found.

Note that the Go race detector isn't a static analysis tool that identifies deadlocks at compile time; it instruments the code and then detects race conditions at runtime. It's based on the C/C++ ThreadSanitizer runtime library, so a similar thing could probably be implemented for D.

> Goroutines in Go are also co-operative, but I'm not sure (e.g. not pre-emptive).

The Go scheduler can perform a limited form of pre-emptive scheduling; from the version 1.2 release notes:
"In prior releases, a goroutine that was looping forever could starve out other goroutines on the same thread, a serious problem when GOMAXPROCS provided only one user thread. In Go 1.2, this is partially addressed: The scheduler is invoked occasionally upon entry to a function. This means that any loop that includes a (non-inlined) function call can be pre-empted, allowing other goroutines to run on the same thread. "

> Rust is basically using the same approach as Go with regard to threading.

Rust is actually moving away from directly tying the language to one kind of threading, so that it's possible to choose between M:N threading (goroutines) or 1:1 threading (system threads). See this discussion: https://mail.mozilla.org/pipermail/rust-dev/2013-November/006550.html for the reasoning behind this.
February 06, 2014
On Thursday, 6 February 2014 at 13:00:51 UTC, logicchains wrote:

> Note that the Go race detector isn't a static analysis tool that identifies deadlocks at compile time; it instruments the code and then detects race conditions at runtime. It's based on the C/C++ ThreadSanitizer runtime library, so a similar thing could probably be implemented for D.

Thanks for pointing out. I seem to have interpreted the information I had to optimistically.

> Rust is actually moving away from directly tying the language to one kind of threading, so that it's possible to choose between M:N threading (goroutines) or 1:1 threading (system threads). See this discussion: https://mail.mozilla.org/pipermail/rust-dev/2013-November/006550.html for the reasoning behind this.

Yes, I read an interview on infoq.com saying the same thing which confused me a bit. M:N threading is still there, but is there still some focus on it as with the Go people ? Anyway, as long as D continues its own way with fibers ... ;-).

February 06, 2014
On Wednesday, 5 February 2014 at 20:37:44 UTC, Sean Kelly wrote:
>
> As for when this will be available... I will have a pull request
> sorted out shortly, so you could start playing with it soon.  It
> being included in an actual release means a review and such, but
> as this is really just a fairly succinct change to an existing
> module, I hope it won't be terribly contentious.

https://github.com/D-Programming-Language/phobos/pull/1910
February 06, 2014
On Thursday, 6 February 2014 at 19:24:39 UTC, Sean Kelly wrote:
> https://github.com/D-Programming-Language/phobos/pull/1910

x-posted to vibe.d newsgroup. Awesome!