December 29, 2013
On Sunday, 29 December 2013 at 13:15:14 UTC, Ola Fosheim Grøstad wrote:
> I think neither Go, D or this language is as performant as (skilled use of) C/C++.

This is not true. Assuming skilled use and same compiler backend those are equally performant. D lacks some low-level control C has (which is important for embedded) but it is not directly related to performance.
December 29, 2013
On Sunday, 29 December 2013 at 13:46:07 UTC, Dicebot wrote:
> This is not true. Assuming skilled use and same compiler backend those are equally performant. D lacks some low-level control C has (which is important for embedded) but it is not directly related to performance.

That low-level control also matters for performance, when you have hard deadlines. E.g. when the GC kicks in, it not only hogs all the threads that participate in GC, it also trash the caches unless you have a GC implementation that bypasses the caches. Sustained trashing of caches is bad.

C has low-level, low resource usage defaults. While you can do the same in some other languages they tend to default to more expensive use patterns. Like D defaults to stuff like GC and thread-local-storage. Defaults affect library design, which in turn affect performance. (thread local storage requires either an extra indirection through a register or multiple kernel level page tables per process)
December 29, 2013
On 12/29/13 5:15 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Sunday, 29 December 2013 at 06:00:31 UTC, Adam Wilson wrote:
>> I want to make a point here that many people come to do looking for
>> something that is as performant as C++ with the ease of C# or Java,
>> and for the most part (using LDC/GDC) you get exactly that. This
>> language could convince me to go back to C#.
>
> I think neither Go, D or this language is as performant as (skilled use
> of) C/C++.

Wait, what? Go excused itself out of the competition, and you'd need to bring some evidence that D is not as fast/tight as C++. I have accumulated quite a bit of evidence the other way without even trying.

This also smacks of "no true Scotsman" (http://en.wikipedia.org/wiki/No_true_Scotsman). Any inefficient C++ code (owing to hidden costs of features like unnecessary copying, rigidity of the language which discourages aggressive optimization refactoring, the many traps for the unwary that make the simplest and most intuitive code often be the least efficient) can be nicely swiped under the rug as "unskilled" use. By that same argument there is a "skilled" use of D that avoids creating garbage in inner loops, using allocating stdlib functions judiciously etc. etc.

Clearly there's work we need to do on improving particularly the standard library. But claiming that D code can't be efficient because of some stdlib artifacts is like claiming C++ code can't do efficient I/O because it must use iostreams (which are indeed objectively and undeniably horrifically slow). Neither argument has merit.


Andrei
December 29, 2013
On 12/29/13 6:35 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Sunday, 29 December 2013 at 13:46:07 UTC, Dicebot wrote:
>> This is not true. Assuming skilled use and same compiler backend those
>> are equally performant. D lacks some low-level control C has (which is
>> important for embedded) but it is not directly related to performance.
>
> That low-level control also matters for performance, when you have hard
> deadlines. E.g. when the GC kicks in, it not only hogs all the threads
> that participate in GC, it also trash the caches unless you have a GC
> implementation that bypasses the caches. Sustained trashing of caches is
> bad.

Yeah how about using deterministic deallocation in the inner loops - that's the only place where it matters.

> C has low-level, low resource usage defaults. While you can do the same
> in some other languages they tend to default to more expensive use
> patterns. Like D defaults to stuff like GC and thread-local-storage.
> Defaults affect library design, which in turn affect performance.
> (thread local storage requires either an extra indirection through a
> register or multiple kernel level page tables per process)

It is my opinion that safety is the best default at least here; global storage is very often an antipattern in singlethreaded applications and almost always so in multithreaded ones.. I think C got it wrong there and D is in better shape.


Andrei

December 29, 2013
On Sunday, 29 December 2013 at 14:35:44 UTC, Ola Fosheim Grøstad wrote:
> That low-level control also matters for performance, when you have hard deadlines. E.g. when the GC kicks in, it not only hogs all the threads that participate in GC, it also trash the caches unless you have a GC implementation that bypasses the caches. Sustained trashing of caches is bad.

Common misconception. For absolute majority of programs it never gets to make the difference. I have certain experience with those where such difference really matters and often argue about it on this NG. But applying it as general performance criteria is overstatement at best. In practice most user-space applications are likely to be faster in higher level garbage collected language because it allows to spend more time on architecture and algorithms which are always primary bottlenecks.

December 29, 2013
On Sunday, 29 December 2013 at 15:22:29 UTC, Andrei Alexandrescu wrote:
> Wait, what? Go excused itself out of the competition, and you'd

Agree. I consider Go to be a web-service language atm.

> need to bring some evidence that D is not as fast/tight as C++. I have accumulated quite a bit of evidence the other way without even trying.

One example: Performant C++ is actually C with a bit of C++ convinience, so you toss out exception handling, stack unwinding and even turn off stack frames. With that and allocation pools you can backtrack by simply setting the stack-pointer and dropping the pool. C is so barebones that you can do your own coroutines without language support if you wish.

As long as you only call nothrow functions you can do this? So you can use slower C++ convinience for initialization and close-to-the-metal after that.

> the standard library. But claiming that D code can't be efficient because of some stdlib artifacts is like claiming C++ code can't do efficient I/O because it must use iostreams (which are indeed objectively and undeniably horrifically slow). Neither argument has merit.

Well, but people who care about real-time performance in C++ use libraries that stays clear of those areas. C++ stdlib is more for medium-performance code sections than high-performance code.

The GC trash cashes when it kicks in. That affect real-time threads where you basically have hard real-time requirements. That means you need higher headroom (can do less signal-processsing in an audio realtime thread).
December 29, 2013
On Sun, 29 Dec 2013 17:54:51 +0000, Ola Fosheim Grøstad wrote:

> C is so barebones that you can do your own coroutines without language support if you wish.

You can do that in D too. core.thread.Fiber is implemented in D (with a bit of inline assembly), without any special language support.

> The GC trash cashes when it kicks in.

It can only kick in on allocation. In parts of your code where latency is crucial, just avoid allocating from the garbage collected heap.
December 29, 2013
On Sunday, 29 December 2013 at 15:26:35 UTC, Andrei Alexandrescu wrote:
> On 12/29/13 6:35 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
>> On Sunday, 29 December 2013 at 13:46:07 UTC, Dicebot wrote:
>>> This is not true. Assuming skilled use and same compiler backend those
>>> are equally performant. D lacks some low-level control C has (which is
>>> important for embedded) but it is not directly related to performance.
>>
>> That low-level control also matters for performance, when you have hard
>> deadlines. E.g. when the GC kicks in, it not only hogs all the threads
>> that participate in GC, it also trash the caches unless you have a GC
>> implementation that bypasses the caches. Sustained trashing of caches is
>> bad.
>
> Yeah how about using deterministic deallocation in the inner loops - that's the only place where it matters.

If this is indeed true then it sounds like a standard technique people should be aware of. I really hope it's stuck in Ali Cehreli's book (which is awesome) before it's considered completed and released. It would be very nice to have something to point the GC-hating crowd to as a technique and ask them to present examples where the technique isn't enough.
December 29, 2013
On Sunday, 29 December 2013 at 18:29:52 UTC, jerro wrote:
> You can do that in D too. core.thread.Fiber is implemented in D (with a
> bit of inline assembly), without any special language support.

Yes, coroutines was a bad example, you probably can do that in many stack-based languages. My point was more that the transparency of the simple runtime of C is such that you can easily understand the consequences of such tricks. And the advantage of C(++) is that you can do focused low-level fine-tuning one compilation unit while using a more standard feature set on the rest of your code, because the part of the runtime you have to consider is quite simple.

>> The GC trash cashes when it kicks in.
>
> It can only kick in on allocation. In parts of your code where latency is
> crucial, just avoid allocating from the garbage collected heap.

I understand that. In a real-time system you might enter such sections of your code maybe 120 times per second in a different thread which might be killed by the OS if it doesn't complete on time.

It is probably feasible to create a real-time friendly garbage collector that can cooperate with realtime threads, but it isn't trivial. To get good cache coherency all cores have to "cooperate" on what memory areas they write/read to when you enter timing critical code sections. GC jumps all over memory real fast touching cacheline after cacheline basically invalidating the cache (the effect depends on the GC/application/cpu/memorybus).

December 29, 2013
On 12/29/2013 5:46 AM, Dicebot wrote:
> D lacks some low-level control C has

For instance?

On the other hand, D has an inline assembler and C (without vendor extensions) does not. C doesn't even have (without vendor extensions) alignment control on struct fields.