Jump to page: 1 24  
Page
Thread overview
Lets talk about fibers
Jun 03, 2015
Liran Zvibel
Jun 04, 2015
Joakim
Jun 04, 2015
Liran Zvibel
Jun 04, 2015
Liran Zvibel
Jun 04, 2015
Ivan Timokhin
Jun 04, 2015
Jonathan M Davis
Jun 04, 2015
Marc Schütz
Jun 04, 2015
Dmitry Olshansky
Jun 04, 2015
Dan Olson
Jun 04, 2015
Jonathan M Davis
Jun 05, 2015
Chris
Jun 05, 2015
Chris
Jun 08, 2015
Paulo Pinto
Apr 16, 2016
maik klein
Apr 16, 2016
Dicebot
Jan 08, 2017
Suliman
Jan 08, 2017
Suliman
Jan 08, 2017
Chris Wright
Jan 08, 2017
Dicebot
Jan 08, 2017
Russel Winder
Jun 05, 2015
Dmitry Olshansky
Jun 05, 2015
Dmitry Olshansky
Jun 05, 2015
Dan Olson
Jun 06, 2015
Shachar Shemesh
Jun 05, 2015
Dicebot
Jun 05, 2015
Paolo Invernizzi
June 03, 2015
Hi,

We discussed (not) moving fibers between threads on DConf last week, and later it was discussed in the announce group, I think this matter is important enough to get a thread of it's own.

Software fibers/coroutines were created to make asynchronous programming using a Reactor (or another "event loop i/o scheduler") more seamless.

For those unaware of the Reactor Pattern, I advise reading [ http://en.wikipedia.org/wiki/Reactor_pattern ; http://www.dre.vanderbilt.edu/~schmidt/PDF/reactor-siemens.pdf ], and for some perspective at how other languages have addressed this I recommend watching Guido Van Rossum's talk about acyncio and Python: https://www.youtube.com/watch?v=aurOB4qYuFM

The Reactor pattern is a long-time widely accepted way to achieve low latency async io operations, that fortunately became famous thanks to the Web and the C10k requirement/problem. Using the Reactor is the most efficient way to leverage current CPU architectures to perform lots of IO for many reasons outside of this scope.
Another very important quality to using a rector based approach, is that since all event handlers just serialize on a single IO scheduler ("the reactor") on each thread, if designed correctly programmers don't have to think about concurrency and care about code-races.

Another thing to note: when using the reactor pattern you have to make sure that no event handler blocks at all, never! Once an event-handler blocks, since being a non-preemptive model, the other event handlers will not be able to run, basically starving themselves and the clients on the other side of the network.
Reactor implementations usually detect, and notify when an event handler took too much time until giving away control (this is dependent on application, but should be in the usec range on current hw).

The downside for the reactor pattern (used to be) that the programmer has to manually keep the state/context of how the event handler worked. Since each "logical" operation was comprised by many i/o transactions (some NW protocol to keep track, maybe accessing a networked DB for some data, reading/writing to local/remote files/ etc) the reactor would also keep a context for each callback and IO event and the programmer had to either update the context and keep registering new event handlers manually for all extra I/O transactions and in many cases change callback registration in some cases.
This downside means that it's more difficult to program for a Reactor model, but since programmers don't have to think about races and concurrency issues (and then debug them...) from our experience it still more efficient to program than general-purpose threads if you care about correctness/coherency.
One way so mitigate this complexity was through the Proactor pattern -- implementing higher-level async. IO services over the reactor, thus sparing the programmer a lot of the low-level context headaches.

Up until now I did not say anything about Fibers/coroutines.

What Fibers bring to the table, is the ability to program within the reactor model without having to manually keep a context that is separate for the program logic, and without the requirement to manually re/register callbacks for different IO events.
D's Fibers allowed us to create an async io library with support for network/file/disk operations and higher level conditions (waiters, barriers, etc) that allows the programmer to write code as-if it runs in its own thread (almost, sometimes fibers are explicitly "spawned" -- added to the reactor, and fiber-conditions are slightly different than spawning and joining threads) without paying the huge correctness/coherence and performance penalties of the threading model.

There are two main reasons why it does not make sense to move fibers between threads:

1. You'll start having concurrency issues. Lets assume we have a main fiber that received some request, and it spawns 3 fibers looking into different DBs to get some info and update an array with the data. The array will probably be on the stack of the first fiber. If fibers don't move between threads, there is nothing to worry about (as expected by the model). If you start moving fibers across threads you have to start guarding this array now, to make sure it's still coherent.
This is a simple example, but basically shows that you're "losing" one of the biggest selling point of the whole reactor based model.

2. Fibers and reactor based IO make work well (read: make sense) when you have a situation where you have lots of concurrent very small transactions (similar to the Web C10k problem or a storage machine). In this case, if one of the threads has more capacity than the rest, then the IO scheduler ("reactor") will just make sure to spawn new fibers accepting new transactions in that fiber. If you don't have a situation that balancing can be done via placing new requests in the right place, then probably you should not use the reactor model, but a different one that suits your application better.
Currently we can spawn another reactor to take more load, but the load is balanced statically at a system-wide level. On previous projects we had several reactors running on different threads and providing very different functionality (with different handlers, naturally).
We never got to a situation that moving a fiber between threads made any sense.

As we see, there is nothing to gain and lots to lose by moving fibers between threads.

Now, if we want to make sure fibers are well supported in D there are several other things we should do:

1. Implement a good asyncIO library that supports fiber based programming. I don't know Vibe.d very well (e.g. at all), maybe we (Weka.IO) can help review it and suggest ways to make it into a general async IO library (we have over 15 years experience developing with the reactor model in many environments)

2. Adding better compiler support. The one problem with fibers is that upon creation you have to know the stack size for that fiber. Different functions will create different stack depths. It is very convenient to use the stack to hold all objects (recall Walter's first day talk, for example), and it can be used as very convenient way to "garbage collect" all resources added during the run of that fiber, but currently we don't leverage it to the max since we don't have a good way to know/limit the amount of memory used this way.
If the compiler will be able to analyze stack usage by functions (recursively) and be able to give us hints regarding the upper-bounds of stack usage, we will be able to use the stack more aggressively and utilize memory much better.
Also -- I think such static analysis will be a big selling point for D for systems like ours.

I think now everything is written down, and we can move the discussion here.

Liran.
June 04, 2015
On Wednesday, 3 June 2015 at 18:34:34 UTC, Liran Zvibel wrote:
> There are two main reasons why it does not make sense to move fibers between threads:
>
> 1. You'll start having concurrency issues. Lets assume we have a main fiber that received some request, and it spawns 3 fibers looking into different DBs to get some info and update an array with the data. The array will probably be on the stack of the first fiber. If fibers don't move between threads, there is nothing to worry about (as expected by the model). If you start moving fibers across threads you have to start guarding this array now, to make sure it's still coherent.
> This is a simple example, but basically shows that you're "losing" one of the biggest selling point of the whole reactor based model.
>
> 2. Fibers and reactor based IO make work well (read: make sense) when you have a situation where you have lots of concurrent very small transactions (similar to the Web C10k problem or a storage machine). In this case, if one of the threads has more capacity than the rest, then the IO scheduler ("reactor") will just make sure to spawn new fibers accepting new transactions in that fiber. If you don't have a situation that balancing can be done via placing new requests in the right place, then probably you should not use the reactor model, but a different one that suits your application better.
> Currently we can spawn another reactor to take more load, but the load is balanced statically at a system-wide level. On previous projects we had several reactors running on different threads and providing very different functionality (with different handlers, naturally).
> We never got to a situation that moving a fiber between threads made any sense.
>
> As we see, there is nothing to gain and lots to lose by moving fibers between threads.

Your entire argument seems based on fibers moving between threads
breaking your reactor IO model.  If there was an option to
disable fibers moving or if you had to explicitly ask for a fiber
to move, your argument is moot.

I have no dog in this fight, just pointing out that your argument
is very specific to your use.
June 04, 2015
On Thursday, 4 June 2015 at 01:51:25 UTC, Joakim wrote:
> Your entire argument seems based on fibers moving between threads
> breaking your reactor IO model.  If there was an option to
> disable fibers moving or if you had to explicitly ask for a fiber
> to move, your argument is moot.
>
> I have no dog in this fight, just pointing out that your argument
> is very specific to your use.

This is not "my" reactor IO model, this is the model that was popularized by ACE in the '90 (and since this is how I got to know it this is how I call it), and later became the asyncio programming model.
This model was important enough for Guido Van Rossum to spend a lot of his time to add to Python, and Google created a whole programming language around [and I can give more references to that model if you like].

My point is that moving fibers between threads is difficult to implement and makes the model WEAKER. So you work hard, and get less (or just never use that feature you worked hard on as it breaks the model).

The main problem with adding flexibility is that initially it always sounds like a "good idea". I just want to stress the point that in this case it's actually not such a good idea.

If you can come up with another programming model that leverages fibers (and is popular), and moving fibers between threads makes sense in that model, then I think the discussion should be how stronger that other model is with fibers being able to move, and whether it's worth the effort.

Since I think you won't come up with a very good case to moving them between threads on that other popular programming model, and since it's difficult to implement, and since it already makes one popular programming model weaker -- I suggest not to do it.

Currently asyncio is supported by D (Vibe.d and Weka.IO are using it) well without this ability.

At the end of my post I suggested to use the resources freed by not-moving-fibers differently and just endorse the asyncio programming model rather then add generic "flexibility" features.
June 04, 2015
On Thursday, 4 June 2015 at 07:24:48 UTC, Liran Zvibel wrote:
> Since I think you won't come up with a very good case to moving them between threads on that other popular programming model,

INCOMING WORKLOAD ("__" denotes yield+delay):

a____aaaaaaa
        b____bbbbbb
        c____cccccccc
        d____dddddd
        e____eeeeeee

SCHEDULING WITHOUT MIGRATION:

CORE 1: aaaaaaaa
CORE 2: bcdef___bbbbbbccccccccddddddeeeeeee


SCHEDULING WITH MIGRATION:

CORE 1: aaaaaaaacccccccceeeeeee
CORE 2: bcdef___bbbbbbdddddd

And this isn't even a worst case scenario. Please note that it is common to start a task by looking up global caches first. So this is a common pattern:

1. look up caches
2. wait for response
3. process
June 04, 2015
On Thu, Jun 04, 2015 at 07:24:47AM +0000, Liran Zvibel wrote:
> If you can come up with another programming model that leverages fibers (and is popular), and moving fibers between threads makes sense in that model, then I think the discussion should be how stronger that other model is with fibers being able to move, and whether it's worth the effort.

This might be relevant: https://channel9.msdn.com/Events/GoingNative/2013/Bringing-await-to-Cpp

Specifically slide 12 (~12:30 in the video), where he discusses implementation.
June 04, 2015
I mostly agree with what you wrote, but I'd like to point out that it's probably safe to move some kinds of fibers across threads:

If the fiber's main function is pure and its parameters have no mutable indirection (i.e. if the function is strongly pure), there should be no way to get data races.

Therefore I believe we could theoretically support moving such fibers. But currently I see no way how most fibers can be made pure, after all you want to do IO in them. Of course, we could forego the purity requirement, but then the compiler can no longer support us.
June 04, 2015
On 03-Jun-2015 21:34, Liran Zvibel wrote:
> Hi,
>
[snip]

> There are two main reasons why it does not make sense to move fibers
> between threads:
>

For me language being TLS by default is enough to not even try this madness. If we allow moves a typical fiber will see different "globals" depending on where it is scheduled next.

For instance, if a thread local connection is used (inside of some pool presumably) then:

Socket socket;

first_part = socket.read(...); // assume this yields
second_part = socket.read(...); // then this may use different socket



-- 
Dmitry Olshansky
June 04, 2015
On 6/3/15 9:51 PM, Joakim wrote:

>
> Your entire argument seems based on fibers moving between threads
> breaking your reactor IO model.  If there was an option to
> disable fibers moving or if you had to explicitly ask for a fiber
> to move, your argument is moot.
>
> I have no dog in this fight, just pointing out that your argument
> is very specific to your use.

I plead complete ignorance and inexperience with fibers and thread scheduling.

But I think the sanest approach here is to NOT support moving fibers, and then add support if it becomes necessary. We can make the scheduler something that's parameterized, or hell, just edit your own runtime if you need it!

It may also be that fibers that move can't be statically checked to see if they will break on moving. That may simply just be on you, like casting.

I think for the most part, the safest default is to have a fiber scheduler that cannot possibly create races. Let's build from there.

-Steve
June 04, 2015
On Thursday, 4 June 2015 at 08:43:31 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 4 June 2015 at 07:24:48 UTC, Liran Zvibel wrote:
>> Since I think you won't come up with a very good case to moving them between threads on that other popular programming model,
>
> INCOMING WORKLOAD ("__" denotes yield+delay):
>
> a____aaaaaaa
>         b____bbbbbb
>         c____cccccccc
>         d____dddddd
>         e____eeeeeee
>
> SCHEDULING WITHOUT MIGRATION:
>
> CORE 1: aaaaaaaa
> CORE 2: bcdef___bbbbbbccccccccddddddeeeeeee
>
>
> SCHEDULING WITH MIGRATION:
>
> CORE 1: aaaaaaaacccccccceeeeeee
> CORE 2: bcdef___bbbbbbdddddd
>
> And this isn't even a worst case scenario. Please note that it is common to start a task by looking up global caches first. So this is a common pattern:
>
> 1. look up caches
> 2. wait for response
> 3. process

Fibers are good when you get tons of new work constantly.

If you just have a few things that runs forever, you're most probably better off with threads.

It's true that you can misuse fibers that than complains that things don't work well for you, but I don't think it should be supported by the language.

If you assume that new jobs always come in (and then you schedule new jobs to the more-empty fibers), there is no need to balance old jobs (That will finish very soon anyway).

If you have a blocking operation it should not be in fibers anyways.
We have a deferToThread mechanism with a thread pool that waits for such functions (if we want to do something that takes some time, or use external library).
Fibers should never ever block. If your fiber is blocking you're violating the model.

Fibers aren't some magic to solve every CS problem possible. There is a defined class of problems that work well for fibers, and there fibers should be utilized (and even then with great discipline). If your problem is not one of these -- use another form of concurrency/parallelism. One of my main arguments against Go is "If your only tool is a hammer, then every problem looks like a nail" -- D should not go that route.

Looking at your example -- a good scheduler should have distributed a-e evenly across both cores to begin with. Then a good fibers programmer should yield() after each unit of work, so aaaaaaa won't be a valid state. Finally, the blocking code should have run outside the fibers io scheduler, and just have that fiber waiting in suspended mode until it's runnable again, allowing other fibers to execute.

June 04, 2015
Dmitry Olshansky <dmitry.olsh@gmail.com> writes:

> On 03-Jun-2015 21:34, Liran Zvibel wrote:
>> Hi,
>>
> [snip]
>
>> There are two main reasons why it does not make sense to move fibers between threads:
>>
>
> For me language being TLS by default is enough to not even try this madness. If we allow moves a typical fiber will see different "globals" depending on where it is scheduled next.

Opposite problem too, with LLVM's TLS optimizations, the Fiber may keep accessing same "global" even when yield() resumes on a different thread.

int someTls;      // optimizer caches address

    auto fib = new Fiber({

        for (;;)
        {
            printf("%d fiber before yield\n", someTls);
            ++someTls;       // thread A's var
            Fiber.yield();
            ++someTls;       // resumed thread B, but still A's var
            printf("%d fiber after yield\n", someTls);
        }
    });
« First   ‹ Prev
1 2 3 4