June 04, 2015
On Thursday, 4 June 2015 at 13:42:41 UTC, Liran Zvibel wrote:
> If you assume that new jobs always come in (and then you schedule new jobs to the more-empty fibers), there is no need to balance old jobs (That will finish very soon anyway).

That assumes that the tasks don't do much work but just wait and wait and wait.


>
> If you have a blocking operation it should not be in fibers anyways.
> We have a deferToThread mechanism with a thread pool that waits for such functions (if we want to do something that takes some time, or use external library).
> Fibers should never ever block. If your fiber is blocking you're violating the model.
>
> Fibers aren't some magic to solve every CS problem possible.

Actually, co-routines have been basic concurrency building blocks since the 50s, and from a CS perspective the degree of parallelism is an implementation detail.

> Looking at your example -- a good scheduler should have distributed a-e evenly across both cores to begin with.

Nah, because that would require an a priori estimate.

> Then a good fibers programmer should yield() after each unit of work, so aaaaaaa won't be a valid state.

Won't work when you call external libraries. Here is a likely pattern for an image scaling service:

1. check cache
2. request data if not found
3. process, save in cache and return

1____________2____________33333333

You can't just break up workload 3, you would run out of memory.
June 04, 2015
On Thursday, 4 June 2015 at 13:16:48 UTC, Steven Schveighoffer wrote:
> On 6/3/15 9:51 PM, Joakim wrote:
>
>>
>> Your entire argument seems based on fibers moving between threads
>> breaking your reactor IO model.  If there was an option to
>> disable fibers moving or if you had to explicitly ask for a fiber
>> to move, your argument is moot.
>>
>> I have no dog in this fight, just pointing out that your argument
>> is very specific to your use.
>
> I plead complete ignorance and inexperience with fibers and thread scheduling.
>
> But I think the sanest approach here is to NOT support moving fibers, and then add support if it becomes necessary. We can make the scheduler something that's parameterized, or hell, just edit your own runtime if you need it!
>
> It may also be that fibers that move can't be statically checked to see if they will break on moving. That may simply just be on you, like casting.
>
> I think for the most part, the safest default is to have a fiber scheduler that cannot possibly create races. Let's build from there.

One thing that needs to be considered that deadalnix pointed out at dconf is that we _do_ have shared(Fiber), and we have to deal with that in some manner, even if we don't want to support moving fibers across threads (even if that simply means disallowing shared(Fiber)).

- Jonathan M Davis
June 04, 2015
On Wednesday, 3 June 2015 at 18:34:34 UTC, Liran Zvibel wrote:
> As we see, there is nothing to gain and lots to lose by moving fibers between threads.

Given that it sounds like LLVM _can't_ implement moving fibers (or if it can, it'll really hurt performance), I think that we need a really compelling reason to allow it. And I haven't heard one from anyone thus far.

Initially, at dconf, Walter asserted that we needed to make fibers moveable across threads, but I haven't really heard anyone give a reason why we need to. deadalnix talked about load balancing that way, but you gave good reasons as to why that didn't make sense, and that argument is the closest that I've seen to a reason why it would make sense to move fibers across threads.

Now, like Steven, I've never used a fiber in my life (I really should look into them one of these days), so I'm ill-suited for making a decision on this, but it sounds to me like we should start by having it be illegal to move fibers across threads and then add the ability later if someone comes up with a good enough reason. Certainly, it's sounds questionable that it even _can_ be implemented and costly if it can.

Another approach would be to make it so that shared(Fiber) could be moved across threads but that Fiber can't be (or at least, it's undefined behavior if you do, since the compiler will assume that you won't), and if the 3 major backends can all support moving fibers across threads (even in an inefficient fashion), then we can just implement that support for shared(Fiber) and say that folks are free to shoot themselves in the foot using that if they so desire and let Fiber be more restrictive and not have it take the performance hit incurred by allowing fibers to be passed across threads.

But if LLVM really can't support moving fibers across threads, then I think that the clear answer is that we shouldn't allow it at all (in which case, shared(Fiber) should probably be outright disallowed).

- Jonathan M Davis
June 05, 2015
For the record : I am fully with Liran on this case.
June 05, 2015
On Friday, 5 June 2015 at 06:03:13 UTC, Dicebot wrote:
> For the record : I am fully with Liran on this case.

+1 also for me.

At work we are using fibers when appropriate, and I see no advantages in moving them.

/P
June 05, 2015
On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:
> anyone give a reason why we need to. deadalnix talked about load balancing that way, but you gave good reasons as to why that didn't make sense,

What good reasons?

By the time you get response from your shared memcache or database the x86 cache level 1 and possibly 2 is cold. And cache level 3 is shared, so there is no cache penalty for switching cores. Add to this that two-and-two cores share primary caches so if you don't pair tasks that address the same memory you loose up to 10-20% performance in addition to unused capacity and increased latency. Smart scheduling matters, both at the OS level and at the application level. That's not a controversial statement (only in these forums…)!

The only good reason for not switching is that you lack resources/know-how. But then you probably should not make it a language feature in the first place...?

There is no reason to pretend that synthetic performance benchmarks don't carry weight when people pick a language for production. That's just wishful thinking.
June 05, 2015
On 6/5/15 7:29 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:
>> anyone give a reason why we need to. deadalnix talked about load
>> balancing that way, but you gave good reasons as to why that didn't
>> make sense,
>
> What good reasons?
>
> By the time you get response from your shared memcache or database the
> x86 cache level 1 and possibly 2 is cold. And cache level 3 is shared,
> so there is no cache penalty for switching cores. Add to this that
> two-and-two cores share primary caches so if you don't pair tasks that
> address the same memory you loose up to 10-20% performance in addition
> to unused capacity and increased latency.

I think I'll go with Liran's experience over your hypothetical anecdotes. You seem to have a lot of academic knowledge, but I'd rather see what actually happens. If you have that data, please share.

-Steve
June 05, 2015
On 05-Jun-2015 14:29, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:
>> anyone give a reason why we need to. deadalnix talked about load
>> balancing that way, but you gave good reasons as to why that didn't
>> make sense,
>
> What good reasons?
>
> By the time you get response from your shared memcache or database the
> x86 cache level 1 and possibly 2 is cold.

Cache arguments are hard to get right w/o experiment. That "possibly" may be enough compared to certainly cold.

However I'll answer theoretically to equally theoretical argument.

If there is affinity and we assume that OS schedules threads on the same cores*  then each core has it's cache loaded with (some of) stacks of its fibers. If we assume sharing fibers across all cores, then each core will have to cache stacks for all of fibers which is wasteful.

So fiber affinity => that much less burden on each of core's caches, making them that much hotter.

* You seem to assume the same. Fine assumption given that OS usually tries to keep the same cores working on the same threads, for the similar reasons I believe.

>  Add to this that
> two-and-two cores share primary caches so if you don't pair tasks that
> address the same memory you loose up to 10-20% performance in addition
> to unused capacity and increased latency. Smart scheduling matters, both
> at the OS level and at the application level. That's not a controversial
> statement (only in these forums…)!

Moving fibers across threads have no effect on all of the above even if there is some truth. There is simply no way to control what core executes which thread to begin with, this assignment is the OS territory.

>
> The only good reason for not switching is that you lack
> resources/know-how.

Reasons were presented, but there is nothing in your answer that at least acknowledges that.

> But then you probably should not make it a language
> feature in the first place...?

Then it's a good chance for you to prove your design by experimentation. That if we all accept concurrency issues with moving fibers that violate some language guarantees.

-- 
Dmitry Olshansky
June 05, 2015
On Friday, 5 June 2015 at 13:44:16 UTC, Dmitry Olshansky wrote:
> If there is affinity and we assume that OS schedules threads on the same cores*  then each core has it's cache loaded with (some of) stacks of its fibers. If we assume sharing fibers across all cores, then each core will have to cache stacks for all of fibers which is wasteful.

If you cannot control affinity then you can't take advantage of hyper-threading either? I need to think of this in terms of _smart_ scheduling and adaptive load balancing.

> Moving fibers across threads have no effect on all of the above even if there is some truth.

In order to get benefits from hyper-threading you need pay close attention how you schedule, or you should turn it off.

> There is simply no way to control what core executes which thread to begin with, this assignment is the OS territory.

If your OS is does not support hyper-threading level control you should turn it off...

>> The only good reason for not switching is that you lack
>> resources/know-how.
>
> Reasons were presented, but there is nothing in your answer that at least acknowledges that.

No, there were no performance related reasons, only TLS (which is a questionable feature to begin with).

> Then it's a good chance for you to prove your design by experimentation. That if we all accept concurrency issues with moving fibers that violate some language guarantees.

There is nothing to prove. You either perform worse or better than a carefully scheduled event-based solution in C++. You either perform worse or better than Go 1.5 in scheduling and GC.

However, doing well in externally designed and executed benchmarks on _language_ _features_ is good marketing (even if that 10-20% edge does not matter in real world applications).

Right now, neither concurrency or GC are really D language features, they are more like library/runtime features. That makes it difficult to excel in those areas. In languages like Go, Erlang and Pony concurrency is a language feature.

June 05, 2015
On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer wrote:
> I think I'll go with Liran's experience over your hypothetical anecdotes. You seem to have a lot of academic knowledge, but I'd rather see what actually happens. If you have that data, please share.

There is absolutely no reason to go personal. I address weak arguments when I see them. Liran claimed there were no benefits to migrating fibers. That's not true. He is speaking for his particular use case, that is fine. It is easy to create a benchmark where locking fibers to a thread is beneficial. But it is completely orthogonal to my most likely D use case which is in low-latency web-services.

There will be no data that benefits D until D is a making itself look like a serious contender and do it well in aggressive external benchmarking. You don't get the luxury to choose what workload D's performance is benchmarked with!

D is an underdog compared to C++/Rust/Go. That means you need to get that 10-20% performance edge in benchmarks to make D look attractive.

If you want D to succeed you need to figure out what is D's main selling point and make it a compiler-based feature. If it is a library only solution, then any language can steal your thunder...