December 29, 2013
On Sunday, December 29, 2013 07:22:28 Andrei Alexandrescu wrote:
> Clearly there's work we need to do on improving particularly the standard library. But claiming that D code can't be efficient because of some stdlib artifacts is like claiming C++ code can't do efficient I/O because it must use iostreams (which are indeed objectively and undeniably horrifically slow). Neither argument has merit.

D's design pretty much guarantees that it's as fast as C++ as long as the application implementations and compiler implementations are comparable. It's too similar to C++ for it to be otherwise. And D adds several features that can make it faster than C++ fairly easily (e.g. slices or CTFE).The only D feature that I think is of any real concern for speed is the GC, and C++ doesn't even have that, so writing D code the same way that you would C++ code would avoid that problem entirely, and you can take advantage of the GC without seriously harming performance simply by being smart about how you go about using it. The main issue is what your implementation is doing, as it's easy enough to make a program slower in either language.

I think that the real question at this point is how fast idiomatic D is vs idiomatic C++, as D's design pretty much guarantees that it's competitive performance-wise as far as the language itself goes. And if idiomatic D is slower than idiomatic C++, it's likely something that can and will be fixed by improving the standard library. The only risk there IMHO is if we happened to have picked a particular idiom that is just inherently slow (e.g. if ranges were slow by their very nature), and I don't think that we've done that. And D (particularly idiomatic D) is so much easier to use that the increase in programmer productivity likely outweighs whatever minor performance hit the current implementation might incur.

I agree that anyone who thinks that D is not competitive with C++ in terms of performance doesn't know what they're talking about. How you go about getting full performance out of each of them is not necessarily quite the same, but they're so similar (with D adding a number of improvements) that I don't see how D could be fundamentally slower than C++, and if it is, we screwed up big time.

- Jonathan M Davis
December 29, 2013
On Sunday, December 29, 2013 12:19:33 Walter Bright wrote:
> On 12/29/2013 5:46 AM, Dicebot wrote:
> > D lacks some low-level control C has
> 
> For instance?
> 
> On the other hand, D has an inline assembler and C (without vendor
> extensions) does not. C doesn't even have (without vendor extensions)
> alignment control on struct fields.

I would guess that he's referring to some sort of compiler extension that isn't standard C but which is generally available (e.g. __restrict, which was discussed here a while back). And we don't necessarily have all of the stray stuff like that in any of the current D compilers. But since I never use that sort of thing in C/C++, I'm not very familiar with all of the various extensions that are available, let alone what people who really care about performance frequently use. However, with regards to the language itself, I think that we're definitely on par (if not better) than C/C++ with regards to low level control.

- Jonathan M Davis
December 29, 2013
On 12/29/2013 11:15 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> It is probably feasible to create a real-time friendly garbage collector that
> can cooperate with realtime threads, but it isn't trivial. To get good cache
> coherency all cores have to "cooperate" on what memory areas they write/read to
> when you enter timing critical code sections. GC jumps all over memory real fast
> touching cacheline after cacheline basically invalidating the cache (the effect
> depends on the GC/application/cpu/memorybus).

I'll reiterate that the GC will NEVER EVER pause your program unless you are actually calling the GC to allocate memory. A loop that does not GC allocate WILL NEVER PAUSE.

Secondly, you can write C code in D. You can only make calls to C's standard library. It WILL NEVER PAUSE. You can do everything you can do in C. You can malloc/free. You don't have to throw exceptions. You don't have to use closures.

This fear and loathing of the GC is, in my opinion, wildly overblown.

Granted, you have to know what you're doing to write performant D code. You have to know the patterns of memory allocation happening in the code. Organizing your data structures for best caching is required. But you need to have that expertise to write fast C code, too.

December 29, 2013
On Sunday, 29 December 2013 at 20:36:27 UTC, Walter Bright wrote:
> I'll reiterate that the GC will NEVER EVER pause your program unless you are actually calling the GC to allocate memory. A loop that does not GC allocate WILL NEVER PAUSE.

That's fine, except when you have real-time threads.

So unless you use non-temporal load/save in your GC traversal (e.g. on x86 you have SSE instructions that bypass the cache), your GC might trash the cache for other cores that run real-time threads which are initiated as call-backs from the OS.

These callbacks might happen 120+ times per seconds and your runtime cannot control those, they have the highest user-level priority.

Granted, the latest CPUs have a fair amount of level 3 cache, and the most expensive ones might have a big level 4 cache, but I still think it is a concern. Level 1 and 2 caches are small: 64KB/128KB.
December 29, 2013
On Sunday, 29 December 2013 at 20:19:35 UTC, Walter Bright wrote:
> On 12/29/2013 5:46 AM, Dicebot wrote:
>> D lacks some low-level control C has
>
> For instance?
>
> On the other hand, D has an inline assembler and C (without vendor extensions) does not. C doesn't even have (without vendor extensions) alignment control on struct fields.

I have nothing to add over already discussed in http://forum.dlang.org/post/mppphhuomfpxyfxsyusp@forum.dlang.org and http://forum.dlang.org/post/mailman.479.1386854234.3242.digitalmars-d@puremagic.com threads
December 29, 2013
On 12/29/13 12:47 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Sunday, 29 December 2013 at 20:36:27 UTC, Walter Bright wrote:
>> I'll reiterate that the GC will NEVER EVER pause your program unless
>> you are actually calling the GC to allocate memory. A loop that does
>> not GC allocate WILL NEVER PAUSE.
>
> That's fine, except when you have real-time threads.
>
> So unless you use non-temporal load/save in your GC traversal (e.g. on
> x86 you have SSE instructions that bypass the cache), your GC might
> trash the cache for other cores that run real-time threads which are
> initiated as call-backs from the OS.
>
> These callbacks might happen 120+ times per seconds and your runtime
> cannot control those, they have the highest user-level priority.
>
> Granted, the latest CPUs have a fair amount of level 3 cache, and the
> most expensive ones might have a big level 4 cache, but I still think it
> is a concern. Level 1 and 2 caches are small: 64KB/128KB.

I think you and others are talking about different things. Walter was referring about never invoking GC collection, not the performance of the GC process once in progress.

Andrei

December 29, 2013
On 12/29/2013 12:47 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Sunday, 29 December 2013 at 20:36:27 UTC, Walter Bright wrote:
>> I'll reiterate that the GC will NEVER EVER pause your program unless you are
>> actually calling the GC to allocate memory. A loop that does not GC allocate
>> WILL NEVER PAUSE.
>
> That's fine, except when you have real-time threads.
>
> So unless you use non-temporal load/save in your GC traversal (e.g. on x86 you
> have SSE instructions that bypass the cache), your GC might trash the cache for
> other cores that run real-time threads which are initiated as call-backs from
> the OS.
>
> These callbacks might happen 120+ times per seconds and your runtime cannot
> control those, they have the highest user-level priority.
>
> Granted, the latest CPUs have a fair amount of level 3 cache, and the most
> expensive ones might have a big level 4 cache, but I still think it is a
> concern. Level 1 and 2 caches are small: 64KB/128KB.

Since you can control if and when the GC runs fairly simply, this is not any sort of blocking issue.
December 29, 2013
On Sunday, 29 December 2013 at 21:39:52 UTC, Walter Bright wrote:
> Since you can control if and when the GC runs fairly simply, this is not any sort of blocking issue.

I agree, it is not a blocking issue. It is a cache-trashing issue. So unless the GC is cache-friendly I am concerned about using D for audio-visual apps. Granted, GC would be great for managing graphs in application logic (game AI, music structures etc). So I am not anti-GC per se.

Lets assume 4 cores, 4MB level 3 cache, 512+ MB AI/game-world data-structures. Lets assume that 50% CPU is spent on graphics, 20% is spent on audio, 10% is spent on texture/mesh loading/building, 10% on AI and 10% is headroom (OS etc).

Ok, so assume you have 5 threads for simplicity:

thread 1, no GC: audio realtime hardware
thread 2/3, no GC: opengl "realtime" (designed to keep the GPU from starving)
thread 4, GC: texture/mesh loading/building and game logic

Thread 4 is halted during GC, but threads 1-3 keeps running consuming 70% of the CPU. Thread1-3 are tuned to keep most of their working set in cache level 3.

However, when the GC kicks in it will start to load 512+MB over the memory bus at fast pace. If there is one pointer per 32 bytes you touch all possible cache lines. So the the memory bus is under strain, and this pollutes cache level 3, which wipes out the look-up-tables used by thread 1&2 which then have to be loaded back into the cache over the memory bus... thread 1-3 fails their deadline, you get some audio-visual defects and the audio/graphics systems compensate by reducing the load by cutting down on audio-visual features. After a while the audio-visual system detects that the CPU is under-utilized and turn the high quality features back on. But I feel there is a high risk of getting disturbing noticable glitches, if this happens every 10 seconds it is going to be pretty annoying.

I think you need to take care, and have a cache-friendly GC-strategy tuned for real time. It is possible though. I don't deny it.
December 29, 2013
On 12/29/2013 2:10 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Sunday, 29 December 2013 at 21:39:52 UTC, Walter Bright wrote:
>> Since you can control if and when the GC runs fairly simply, this is not any
>> sort of blocking issue.

Your reply doesn't take into account that you can control if and when the GC runs fairly simply. So you can run it at a time when it won't matter to the cache.

December 29, 2013
On 29.12.2013 14:15, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> >
> D/Rust/Go/this-C#-language all claim to be system levels programming
> languages. I think they are not, as long as C/C++ is a better solution
> for embedded programming it will remain THE system level programming
> language. Which is kind of odd, considering that embedded systems would
> benefit a lot from a safe programming language (due to the
> cost/difficulty of updating software installed on deployed hardware).
>
> It doesn't matter if it is possible to write C++-like code in a language
> if the library support and knowhow isn't dominant in the ecosystem.
> (Like assuming a GC or trying too hard to be cross-platform).


Any language that can be used to write a full OS stack, excluding the usual stuff that can only be done via Assembly like boot loader and
device driver <-> DMA operations, is a systems level programming language.

Don't forget many things people assume are C features, are actually non
portable compiler extensions.

--
Paulo