December 07, 2013
On Saturday, 7 December 2013 at 00:40:52 UTC, Manu wrote:
> Assuming a comparison to C++, you know perfectly well that D has a severe
> disadvantage. Unless people micro-manage final (I've never seen anyone do
> this to date), then classes will have significantly inferior performance to
> C++.
> C++ coders don't write virtual on everything. Especially not trivial
> accessors which must be inlined.

I concur with Manu and if D gain more adoption we can only expect calls that should not be virtual be virtual. Removing unecessary virtual calls in a C++ codebase gives significant performance improvements in my experience.

But it's not even so much about virtual calls being slower than the myth that every function being redefinable in a sub-class _by default_ is somehow a good thing. I don't think it is at all.

December 07, 2013
Am Sat, 07 Dec 2013 03:12:02 +0400
schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> 07-Dec-2013 02:20, Walter Bright пишет:
> >
> > "there is no way proper C code can be slower than those languages."
> 
> > 3. Function inlining has generally been shown to be of tremendous value in optimization. D has access to all the source code in the program, or at least as much as you're willing to show it, and can inline across modules.
> 
> Uh-oh. I'd avoid advertising this particular point until after a
> critical bug is fixed:
> https://d.puremagic.com/issues/show_bug.cgi?id=10985
> Applies to all 3 compilers.
> 
> Otherwise - it's spot on. D has many ways to be typically "faster then Cee" ;)
> 

But cross-module inlining can't be done if you have shared libraries cause you can not know if the other module is in a shared library right?

If you inlined such code it wouldn't get updated if the shared library was updated and you'd have two versions of the code around...

I see only 2 solution to avoid this:

(1) If the source files are compiled at once it's safe to assume they
    must be part of the same library and inlining is safe
(2) The linker of course knows how objects fit together, so LTO.

December 07, 2013
On 12/07/2013 10:03 AM, Walter Bright wrote:
> On 12/7/2013 1:30 AM, Marco Leise wrote:
>> How is that easier in Java? When whole-program analysis finds
>> that there is no class extending C, it could devirtualize all
>> methods of C, but(!) you can load and unload new derived
>> classes at runtime, too.
>
> This can be done by noting what new derived classes are introduced by
> runtime loading, and re-JITing any functions that devirtualized base
> classes of it.
>
> I don't know if this is actually done, but I don't see an obvious
> problem with it.

It is actually done, eg. by the HotSpot JVM.
December 07, 2013
On 07/12/13 09:14, Walter Bright wrote:
> There are several D projects which show faster runs than C. If your goal is to
> pragmatically write faster D code than in C, you can do it without too much
> effort. If your goal is to find problem(s) with D, you can certainly do that, too.

Well, as the author of a D library which outperforms the C library that inspired it (at least within the limits of its much smaller range of functionality; it's been a bit neglected of late and needs more input) ...

... the practical experience I've had is that more than an outright performance comparison, what it often comes down to is effort vs. results, and the cleanliness/maintainability of the resulting code.  This is particularly true when it comes to C code that is designed to be "safe", with all the resulting boilerplate.  It's typically possible to match or exceed the performance of a C program with much more concise and easy to follow D code.

Another factor that's important here is that C and D in general seem to lead to different design solutions.  Even if one has an exact example in C to compare to, the natural thing to do in D is often something different, and that leads to subtle and not-so-subtle implementation differences that in turn affect performance.

Example: in the C library that was my inspiration, there's a function which requires the user to pass a buffer, to which it writes a certain set of values which are calculated from the underlying data.  I didn't much like the idea of compelling the user to pass a buffer, so when I wrote my D equivalent I used stuff from std.range and std.algorithm to make the function return a lazily-evaluated range that would offer the same values as the C code stored in the buffer array.

I assumed this might lead to a small overall performance hit because the C program could just write once to a buffer and re-use the buffer, whereas I might be lazily calculating and re-calculating.  Unfortunately it turned out that for whatever reason, my lazily-calculated range was somehow responsible for lots of micro allocations, which slowed things down a lot.  (I tried it out again earlier this morning, just to refresh my memory, and it looks like this may no longer be the case; so perhaps something has been fixed here...)

So, that in turn led me to another solution again, where instead of an external buffer being passed in, I created an internal cache which could be written to once and re-used again and again and again, never needing to recalculate unless the internal data was changed.

Now, _that_ turned out to be significantly faster than the C program, which was almost certainly doing unnecessary recalculation of the buffer -- because it recalculated every time the function was called, whereas my program could rely on the cache, calculate once, and after that just return the slice of calculated values.  On the other hand, if I tweaked the internals of the function so that every call _always_ involved recalculating and rewriting to the cache, it was slightly slower than the C -- probably because now it was the C code that was doing less recalculation, because code that was calling the function was calling it once and then using the buffer, rather than calling it multiple times.

TL;DR the point is that writing in D gave me the opportunity to spend mental and programming time exploring these different choices and focusing on algorithms and data structures, rather than all the effort and extra LOC required to get a _particular_ idea running in C.  That's where the real edge arises.
December 07, 2013
On 07/12/13 02:10, Walter Bright wrote:
> On 12/6/2013 4:40 PM, Manu wrote:
>> you know perfectly well that D has a severe
>> disadvantage. Unless people micro-manage final (I've never seen anyone do this
>> to date), then classes will have significantly inferior performance to C++.
>> C++ coders don't write virtual on everything. Especially not trivial accessors
>> which must be inlined.
>
> I know well that people used to C++ will likely do this. However, one can get in
> the habit of by default adding "final:" as the first line in a class definition,
> and then the compiler will tell you which ones need to be made virtual.

The disadvantage of this approach is that, if one forgets to add that "final", it doesn't just produce a performance hit -- it means that it may be impossible to correct without breaking downstream code, because users may have overridden class methods that weren't meant to be virtual.

By contrast, if you have final-by-default and accidentally leave the "virtual" keyword off a class or method, that can be fixed without hurting anyone.
December 07, 2013
On 12/07/2013 09:41 AM, Walter Bright wrote:
>>
>
> "there is no way proper C code can be slower than those languages."
>...

I think that statement is correct, but fully irrelevant.
http://en.wikipedia.org/wiki/No_true_Scotsman

> It's the qualifier "proper". You say that means theoretically possible,
> I disagree. I suggest that proper C code means code that is presentable,
> maintainable and professionally written using commonly accepted best
> practices. ...

I suggest what is meant by "proper" is "faster than any implementation in those languages". :)
December 07, 2013
On 07/12/13 09:41, Walter Bright wrote:
> For another, how many times have you seen bubble sort reimplemented in C code?
> How about the obvious implementation of string searching? etc.? I've seen that
> stuff a lot. But in D, using a best-of-breed implementation of quicksort is easy
> as pie, same with searching, etc. These kinds of things also make D faster. I've
> translated C code into D before and gotten it to run faster by doing these sorts
> of plug-in algorithm replacements.

Conversely, where it seems necessary, it's always possible to write D code in a "C-like", very detailed imperative style that really takes micro control of how something is implemented.  However, that can usually be hidden away inside a function so that the end user doesn't need to be bothered by it.

With C you're pretty much obliged to write complicated code in many situations.  With D, even where you need to write like this it's usually _less_ complicated (less boilerplate etc.) and you only have to break it out where it's really, really necessary.
December 07, 2013
Am Sat, 07 Dec 2013 10:34:53 +0100
schrieb Timon Gehr <timon.gehr@gmx.ch>:

> On 12/07/2013 10:03 AM, Walter Bright wrote:
> > On 12/7/2013 1:30 AM, Marco Leise wrote:
> >> How is that easier in Java? When whole-program analysis finds that there is no class extending C, it could devirtualize all methods of C, but(!) you can load and unload new derived classes at runtime, too.
> >
> > This can be done by noting what new derived classes are introduced by runtime loading, and re-JITing any functions that devirtualized base classes of it.
> >
> > I don't know if this is actually done, but I don't see an obvious problem with it.
> 
> It is actually done, eg. by the HotSpot JVM.

Nice! I thought that the overhead might be considered excessive in tracking these details for a JIT optimization.

-- 
Marco

December 07, 2013
On Saturday, 7 December 2013 at 00:26:34 UTC, H. S. Teoh wrote:
> On Sat, Dec 07, 2013 at 01:09:00AM +0100, John Colvin wrote:
>> On Friday, 6 December 2013 at 23:56:39 UTC, H. S. Teoh wrote:
>> >
>> >It would be nice to decouple Phobos modules more. A *lot* more.
>> 
>> Why? I've seen this point made several times and I can't understand
>> why this is an important concern.
>> 
>> I see the interplay between phobos modules as good, it saves
>> reinventing the wheel all over the place, making for a smaller,
>> cleaner standard library.
>> 
>> Am I missing something fundamental here?
>
> It's not that it's bad to reuse code. The problem is the dependency is
> too coarse-grained, so that if you want to, say, print "hello world", it
> pulls in all sorts of stuff, like algorithms for sorting arrays (just an
> example, not the actual case), or floating-point format parsers (may
> actually be the case), which aren't *needed* to perform that particular
> task. If printing "hello world" requires pulling in file locking code,
> then by all means, pull that in. But it shouldn't pull in, say,
> std.complex just because some obscure corner of writeln's implementation
> makes a reference to std.complex.
>
>
> T

Ok, so that describes what over-dependency is, but not why it's a problem we should care about.
December 07, 2013
On Saturday, 7 December 2013 at 09:46:11 UTC, Joseph Rushton Wakeling wrote:
> On 07/12/13 09:14, Walter Bright wrote:
>> There are several D projects which show faster runs than C. If your goal is to
>> pragmatically write faster D code than in C, you can do it without too much
>> effort. If your goal is to find problem(s) with D, you can certainly do that, too.
>
> Well, as the author of a D library which outperforms the C library that inspired it (at least within the limits of its much smaller range of functionality; it's been a bit neglected of late and needs more input) ...
>
> ... the practical experience I've had is that more than an outright performance comparison, what it often comes down to is effort vs. results, and the cleanliness/maintainability of the resulting code.  This is particularly true when it comes to C code that is designed to be "safe", with all the resulting boilerplate.  It's typically possible to match or exceed the performance of a C program with much more concise and easy to follow D code.
>
> Another factor that's important here is that C and D in general seem to lead to different design solutions.  Even if one has an exact example in C to compare to, the natural thing to do in D is often something different, and that leads to subtle and not-so-subtle implementation differences that in turn affect performance.
>
> Example: in the C library that was my inspiration, there's a function which requires the user to pass a buffer, to which it writes a certain set of values which are calculated from the underlying data.  I didn't much like the idea of compelling the user to pass a buffer, so when I wrote my D equivalent I used stuff from std.range and std.algorithm to make the function return a lazily-evaluated range that would offer the same values as the C code stored in the buffer array.
>
> I assumed this might lead to a small overall performance hit because the C program could just write once to a buffer and re-use the buffer, whereas I might be lazily calculating and re-calculating.  Unfortunately it turned out that for whatever reason, my lazily-calculated range was somehow responsible for lots of micro allocations, which slowed things down a lot.  (I tried it out again earlier this morning, just to refresh my memory, and it looks like this may no longer be the case; so perhaps something has been fixed here...)
>
> So, that in turn led me to another solution again, where instead of an external buffer being passed in, I created an internal cache which could be written to once and re-used again and again and again, never needing to recalculate unless the internal data was changed.
>
> Now, _that_ turned out to be significantly faster than the C program, which was almost certainly doing unnecessary recalculation of the buffer -- because it recalculated every time the function was called, whereas my program could rely on the cache, calculate once, and after that just return the slice of calculated values.  On the other hand, if I tweaked the internals of the function so that every call _always_ involved recalculating and rewriting to the cache, it was slightly slower than the C -- probably because now it was the C code that was doing less recalculation, because code that was calling the function was calling it once and then using the buffer, rather than calling it multiple times.
>
> TL;DR the point is that writing in D gave me the opportunity to spend mental and programming time exploring these different choices and focusing on algorithms and data structures, rather than all the effort and extra LOC required to get a _particular_ idea running in C.  That's where the real edge arises.

This is exactly how I see it too. Well said.