June 05, 2015
On Friday, 5 June 2015 at 14:17:35 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer wrote:
>> I think I'll go with Liran's experience over your hypothetical anecdotes. You seem to have a lot of academic knowledge, but I'd rather see what actually happens. If you have that data, please share.
>
> There is absolutely no reason to go personal. I address weak arguments when I see them. Liran claimed there were no benefits to migrating fibers. That's not true. He is speaking for his particular use case, that is fine. It is easy to create a benchmark where locking fibers to a thread is beneficial. But it is completely orthogonal to my most likely D use case which is in low-latency web-services.
>
> There will be no data that benefits D until D is a making itself look like a serious contender and do it well in aggressive external benchmarking. You don't get the luxury to choose what workload D's performance is benchmarked with!
>
> D is an underdog compared to C++/Rust/Go. That means you need to get that 10-20% performance edge in benchmarks to make D look attractive.

I agree, but I dare doubt that a slight performance edge will make the difference. There are load of factors (knowledge base, infrastructure, complacency, C++-Guruism, marketing etc.) why D is an underdog.

> If you want D to succeed you need to figure out what is D's main selling point and make it a compiler-based feature. If it is a library only solution, then any language can steal your thunder...

The "problem" D has is that it has loads of selling points. Rust and Go were designed with very specific goals in mind, thus it's easy to sell them "You want X? We have X!". D has been developed over the years by a community not a committee. D is more like "You want X? Yeah, we have X, actually a slightly improved version of X we call it EX, and Y and Z on top of that. And A B C too! And templates!" - "Sorry, man! Too complicated for me! Can I just have a for-loop, please? Milk, no sugar, thanks." I know, as usual I simplify things and exaggerate! He he he. But programming languages are like everything else, only because something is good doesn't mean that people will buy it.

As regard compiler-based features, as soon as features are compiler-based people will complain "Why is it built-in? That should be handled by a library! I want more freedom!" I know for sure.
June 05, 2015
On 05-Jun-2015 17:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Friday, 5 June 2015 at 13:44:16 UTC, Dmitry Olshansky wrote:
>> If there is affinity and we assume that OS schedules threads on the
>> same cores*  then each core has it's cache loaded with (some of)
>> stacks of its fibers. If we assume sharing fibers across all cores,
>> then each core will have to cache stacks for all of fibers which is
>> wasteful.
>
> If you cannot control affinity then you can't take advantage of
> hyper-threading either?

You choose to ignore the point about duplicating the same memory in each core's cache. To me it seems like throwing random CPU technologies won't help make your argument stronger.

However I stand corrected - there are sys-calls to confine thread to specifics subset of cores. The point about cache stays as is as it assumed each thread prefers to run the same core vs e.g. always running on the same core.

> I need to think of this in terms of _smart_
> scheduling and adaptive load balancing.

Can't help you there, especially w/o definition of the first.

Adaptive load-balancing is quite possible with fibers sticking to a thread and is a question of application design.

>> Moving fibers across threads have no effect on all of the above even
>> if there is some truth.
>
> In order to get benefits from hyper-threading you need pay close
> attention how you schedule, or you should turn it off.

I bet it still helps some workloads and hurts others without "me" scheduling anything. There are some things OS can do just fine.

>> There is simply no way to control what core executes which thread to
>> begin with, this assignment is the OS territory.
>
> If your OS is does not support hyper-threading level control you should
> turn it off...

Not sure if this is English, but I stand corrected in that one may set thread affinity for each thread manually. What I argued for is that default is mostly the same and the point stands as is.

>
>>> The only good reason for not switching is that you lack
>>> resources/know-how.
>>
>> Reasons were presented, but there is nothing in your answer that at
>> least acknowledges that.
>
> No, there were no performance related reasons,

I haven't said performance. Fast and incorrect is cheap.

> only TLS (which is a
> questionable feature to begin with).

Aye, no implicit data-races by default is questionable design. What questions do you have?


-- 
Dmitry Olshansky
June 05, 2015
"Ola Fosheim "Grøstad\"" <ola.fosheim.grostad+dlang@gmail.com> writes:

> No, there were no performance related reasons, only TLS (which is a questionable feature to begin with).

On TLS and migrating Fibers - these were posted elsewhere, and want to make sure that when you read TLS Fiber problem here, it is understood to be something that could be solved by compiler solution.

David has a good overview of the problem here:

https://github.com/ldc-developers/ldc/issues/666

And Boost discussion to show D is not alone here:

http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread.html
June 05, 2015
On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:
> I agree, but I dare doubt that a slight performance edge will make the difference. There are load of factors (knowledge base, infrastructure, complacency, C++-Guruism, marketing etc.) why D is an underdog.

But everybody loves the underdog when it catches up to the pack and beats the pack on the finish line. ;^)

I now follow Pony because of this self-provided benchmark:

http://ponylang.org/benchmarks_all.pdf

They are communicating a focus for a domain, a good understanding of their area, and it makes me want to give it a spin even at this early stage where I obviously can't actually use it.

I am not saying Pony is good, but it makes a good case for itself IMO.

> no sugar, thanks." I know, as usual I simplify things and exaggerate! He he he. But programming languages are like everything else, only because something is good doesn't mean that people will buy it.

Sure, but it is also important to make people take notice. People take notice of benchmark leaders. And too often benchmarks measure throughput while latency is just as important.

End user don't notice peak throughput (which is measurable as a bleep on the cloud server instance-count logs), they notice reduced latency. So to me latency is the most important aspect of a web-service (+ programmer productivity).

I don't find Go exciting, but they show concern for latency (concurrent GC etc). Communicating that concern is good, even before they reach whatever goals they have.

> As regard compiler-based features, as soon as features are compiler-based people will complain "Why is it built-in? That should be handled by a library! I want more freedom!" I know for sure.

Heh, not if it is getting you an edge, but if it is a second citizen addition. Yes, then I agree.

Cheers!
June 05, 2015
On Friday, 5 June 2015 at 15:18:59 UTC, Dan Olson wrote:
> On TLS and migrating Fibers - these were posted elsewhere, and want to
> make sure that when you read TLS Fiber problem here, it is understood to
> be something that could be solved by compiler solution.

What I meant is that I don't have a use case for TLS in my own programs.

I think TLS is primarily useful for runtime-level issues like thread local allocators. I either read from global immutables or use lock-free datastructures for sharing...
June 05, 2015
On Friday, 5 June 2015 at 15:06:04 UTC, Dmitry Olshansky wrote:
> You choose to ignore the point about duplicating the same memory in each core's cache. To me it seems like throwing

Not sure what you mean by this. 3rd level cache is shared. Die-level cache is shared. Primary caches are small and are shared between pairs of hyper-threaded cores. If a task has been suspended for 100ms you can just assume that primary cache is cold.

> Adaptive load-balancing is quite possible with fibers sticking to a thread and is a question of application design.

Then you should not have fibers at all since an event based solution is even faster (but more work). Coroutines is a convenience feature, not a performance feature. You need control over workload scheduling to optimize to prevent 3rd level cache pollution. Random  fine grained scheduling is not good for memory intensive workloads because you push out data from the caches prematurely.

> I bet it still helps some workloads and hurts others without "me" scheduling anything.

Hyperthreading requires two cores to run specific workloads at the same time. If not you are better off just halting that extra core. The idea with hyperthreading is that one thread fills in holes in the pipeline when the other thread is stalled.

> Not sure if this is English,

When people pick on typos the debate is essentially over...

EOD
June 05, 2015
On Friday, 5 June 2015 at 17:28:39 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:
>> I agree, but I dare doubt that a slight performance edge will make the difference. There are load of factors (knowledge base, infrastructure, complacency, C++-Guruism, marketing etc.) why D is an underdog.
>
> But everybody loves the underdog when it catches up to the pack and beats the pack on the finish line. ;^)
>
> I now follow Pony because of this self-provided benchmark:
>
> http://ponylang.org/benchmarks_all.pdf
>
> They are communicating a focus for a domain, a good understanding of their area, and it makes me want to give it a spin even at this early stage where I obviously can't actually use it.
>
> I am not saying Pony is good, but it makes a good case for itself IMO.
>
>> no sugar, thanks." I know, as usual I simplify things and exaggerate! He he he. But programming languages are like everything else, only because something is good doesn't mean that people will buy it.
>
> Sure, but it is also important to make people take notice. People take notice of benchmark leaders. And too often benchmarks measure throughput while latency is just as important.
>
> End user don't notice peak throughput (which is measurable as a bleep on the cloud server instance-count logs), they notice reduced latency. So to me latency is the most important aspect of a web-service (+ programmer productivity).
>
> I don't find Go exciting, but they show concern for latency (concurrent GC etc). Communicating that concern is good, even before they reach whatever goals they have.
>
>> As regard compiler-based features, as soon as features are compiler-based people will complain "Why is it built-in? That should be handled by a library! I want more freedom!" I know for sure.
>
> Heh, not if it is getting you an edge, but if it is a second citizen addition. Yes, then I agree.
>
> Cheers!

Thanks for showing me Pony. Languages like Nim and Pony keep popping up which shows a) how important native compilation is and b) that there are still loads of issues in standard languages (C/C++/Python/Java/C#). But D is already there, it's already usable, and new languages often re-invent D.
June 05, 2015
On 6/5/15 10:17 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer wrote:
>> I think I'll go with Liran's experience over your hypothetical
>> anecdotes. You seem to have a lot of academic knowledge, but I'd
>> rather see what actually happens. If you have that data, please share.
>
> There is absolutely no reason to go personal.

I didn't, actually. Your arguments seem well crafted and persuasive, but I've seen so many arguments based on theory that don't always pan out. I like to see hard data. That's what Liran's experience provides. Perhaps you have it too? Please share if you do.

-Steve
June 06, 2015
On Friday, 5 June 2015 at 19:21:32 UTC, Steven Schveighoffer wrote:
> I didn't, actually. Your arguments seem well crafted and persuasive, but I've seen so many arguments based on theory that don't always pan out. I like to see hard data. That's what Liran's experience provides. Perhaps you have it too? Please share if you do.

I have absolutely no idea what you are talking about. Experience is data? Huh?

If you talk about benchmarking, you do this by defining a baseline to measure up against and run a wide set of demanding workloads with increasing load until the system performance collapses, then you analyze the outcome for each workload. One usually pick best-of-breed "competitor" as the baseline. E.g. Nginx gained traction by benchmarking against Apache.

If you are talking about multi-threading/fibers/event-based systems you read technical optimization manuals from CPU vendors for each processor generation, they provide what you need to know when designing scheduling heuristics. The problem is how to give the scheduler meta information. In event systems that is explicit, in D you could provide information through "yield" either by profiling, analysis, or explict... but getting to event based performance isn't all that easy...
June 06, 2015
On 05/06/15 16:44, Dmitry Olshansky wrote:
>
> * You seem to assume the same. Fine assumption given that OS usually
> tries to keep the same cores working on the same threads, for the
> similar reasons I believe.
>

I see that people already raised the point that the OS does allow you to pin a thread to specific cores, so lets skip repeating that.

AFAIK, the kernel tries to keep threads running on the same core they did before is because moving them requires so much locking, synchronous assembly instructions and barriers, resulting in huge costs for migrating threads between cores.

Which turns out to be relevant to this discussion, because that will, likely, also be required in order to move fibers between threads.

A while back, a friend and myself ran an (incomplete) research project where we tried reverting to the long discarded "one thread per socket" model. It actually performed really well (much much better than the "common wisdom" would have it perform), provided you did two things:
1. Use a thread pool. Do not actually spawn a new thread each time a new incoming connection arrives
and
2. pin that thread to a core, don't let it migrate

Since we are talking about several tens of thousands of threads, each random fluctuation in the load resulted in the kernel's scheduler wishing to migrate them, resulting in losing thousands of percent worth of performance. Once we locked the threads into place, we were, more or less, on par with micro-threading in terms of overall performance the server could take.

Shachar