December 07, 2013
On Friday, 6 December 2013 at 23:56:39 UTC, H. S. Teoh wrote:
>
> It would be nice to decouple Phobos modules more. A *lot* more.

Why? I've seen this point made several times and I can't understand why this is an important concern.

I see the interplay between phobos modules as good, it saves reinventing the wheel all over the place, making for a smaller, cleaner standard library.

Am I missing something fundamental here?
December 07, 2013
On Friday, 6 December 2013 at 23:59:46 UTC, H. S. Teoh wrote:
> On Sat, Dec 07, 2013 at 12:40:35AM +0100, bearophile wrote:
> [...]
>> Regarding Java performance matters, from my experience another
>> significant source of optimization in the JavaVM that is often
>> overlooked is that the JavaVM is able to partially unroll even loops
>> with a statically-unknown number of cycles. Currently I think
>> GCC/DMD/LDC2 are not able or willing to do this.
> [...]
>
> Really? I've seen gcc/gdc unroll loops with unknown number of
> iterations, esp. when you're using -O3. It just unrolls into something
> like:
>
> 	loop_start:
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		goto loop_start;
> 	end:	...
>
> I'm pretty sure I've seen gcc/gdc do this before.
>
>
> T

LLVM is also able to generate Duff's device on the fly for loops where it make sense.
December 07, 2013
On Sat, Dec 07, 2013 at 12:56:48AM +0100, bearophile wrote:
> H. S. Teoh:
> 
> >(if your tree is 1 million nodes, then it has to do 1 million free's, right then, right there,
> 
> In practice real C programs use arenas and pools to allocate the nodes from. This sometimes doubles the performance of C code that has to allocate many nodes of a tree data structure.

The problem with this in C, is that the code has to be designed to work with that particular arena/pool implementation that you're using. This makes interoperating between libraries a pain, and usually this means you can't use a lot of libraries, and you have to reinvent a lot of code just so they will work with the pool implementation.


> A simple example:
> 
> http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version
> 
> Some of such code will become useless once Phobos has Andrei allocators :-)
> 
> In C sometimes you also use hierarchical memory allocation, to simplify the memory management (http://swapped.cc/?_escaped_fragment_=/halloc#!/halloc ), not currently supported by Andrei allocators.
[...]

Yes, but again, this requires the code to be written to use hierarchical memory allocation. So you can't use a library that doesn't support it (well, you can, but it will not have good performance). There's a lot of advantages to having a standard memory allocation scheme built into the language (or at least, endorsed by the language). People don't often think about this, but a lot of overhead comes from interfacing between libraries of incompatible APIs / memory allocation schemes. Having a common scheme for everybody helps a lot, by eliminating the need for interfacing between them, or the need to reinvent the wheel because some library is incompatible with your custom memory allocator.


T

-- 
By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. -- D. Knuth
December 07, 2013
On Sat, Dec 07, 2013 at 01:09:00AM +0100, John Colvin wrote:
> On Friday, 6 December 2013 at 23:56:39 UTC, H. S. Teoh wrote:
> >
> >It would be nice to decouple Phobos modules more. A *lot* more.
> 
> Why? I've seen this point made several times and I can't understand why this is an important concern.
> 
> I see the interplay between phobos modules as good, it saves reinventing the wheel all over the place, making for a smaller, cleaner standard library.
> 
> Am I missing something fundamental here?

It's not that it's bad to reuse code. The problem is the dependency is too coarse-grained, so that if you want to, say, print "hello world", it pulls in all sorts of stuff, like algorithms for sorting arrays (just an example, not the actual case), or floating-point format parsers (may actually be the case), which aren't *needed* to perform that particular task. If printing "hello world" requires pulling in file locking code, then by all means, pull that in. But it shouldn't pull in, say, std.complex just because some obscure corner of writeln's implementation makes a reference to std.complex.


T

-- 
People tell me I'm stubborn, but I refuse to accept it!
December 07, 2013
H. S. Teoh:

> I've seen gcc/gdc unroll loops with unknown number of
> iterations, esp. when you're using -O3. It just unrolls into something
> like:
>
> 	loop_start:
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		if (!loopCondition) goto end;
> 		loopBody();
> 		goto loop_start;
> 	end:	...
>
> I'm pretty sure I've seen gcc/gdc do this before.


deadalnix:

> LLVM is also able to generate Duff's device on the fly for loops where it make sense.


I have not seen this optimization done on my code (both ldc2 and gcc), but I am glad to be wrong on this.

The OracleVM uses a very different unrolling strategy: it splits the loop in two loops, the first loop has 2, 4 (or sometimes 8 times) unrolling and it doesn't contain tests beside one at the start and end, followed by a second normal (not unrolled) loop of the remaining n % 8 times.
I was able to reach the same performance as Java using this strategy manually in D using ldc2.

Bye,
bearophile
December 07, 2013
On 7 December 2013 08:52, Walter Bright <newshound2@digitalmars.com> wrote:

> On 12/6/2013 2:40 PM, bearophile wrote:
>
>> And when a D compiler because of separate compilation can't de-virtualize a virtual class method call.
>>
>
> Can C devirtualize function calls? Nope.
>

Assuming a comparison to C++, you know perfectly well that D has a severe
disadvantage. Unless people micro-manage final (I've never seen anyone do
this to date), then classes will have significantly inferior performance to
C++.
C++ coders don't write virtual on everything. Especially not trivial
accessors which must be inlined.


December 07, 2013
Manu:

> Assuming a comparison to C++, you know perfectly well that D has a severe
> disadvantage. Unless people micro-manage final (I've never seen anyone do
> this to date), then classes will have significantly inferior performance to C++.

Despite D has the two purities (currently they are three), const/immutable, and will hopefully have scope for function arguments, lot of D programmers will not add those annotations to D code (the D code I see in D.learn usually doesn't have those annotations), so the speed gains of D could be more theoretical than real.

So const/immutable/static/scope/@safe should be the default for a modern language, for efficiency, safety, code understandability and testing.

If you are a new D programmers, and the local variables in your function (including foreach loop variables) are immutable, you learn very quickly to add "mut" or "var" when you want to mutate them. And you will add that annotation only to the variables that you need to mutate. This avoids mutating variables by mistake, and mutating just copies by mistake as in a foreach on an array of structs. So this avoids some bugs.

Bye,
bearophile
December 07, 2013
On 12/6/2013 4:40 PM, Manu wrote:
> Assuming a comparison to C++,

This is a comparison to C; a comparison to C++ is something else.

> you know perfectly well that D has a severe
> disadvantage. Unless people micro-manage final (I've never seen anyone do this
> to date), then classes will have significantly inferior performance to C++.
> C++ coders don't write virtual on everything. Especially not trivial accessors
> which must be inlined.

I know well that people used to C++ will likely do this. However, one can get in the habit of by default adding "final:" as the first line in a class definition, and then the compiler will tell you which ones need to be made virtual.
December 07, 2013
On 12/06/2013 02:52 PM, Walter Bright wrote:

> On 12/6/2013 2:40 PM, bearophile wrote:
>> I think in your list you have missed the point 8, that is templates
>> allow for
>> data specialization, or for specialization based on compile-time values.
>>
>> The common example of the first is the C sort() function compared to
>> the type
>> specialized one.
>
> That's a good example.

Bjarne Stroustrup has the article "Learning Standard C++ as a New Language" where he demonstrates bearophile's point, as well as how C++ is a better language than C for novices:

  http://www.stroustrup.com/new_learning.pdf

Ali

December 07, 2013
07-Dec-2013 03:55, H. S. Teoh пишет:
> On Fri, Dec 06, 2013 at 03:19:24PM -0800, Walter Bright wrote:
>> On 12/6/2013 3:02 PM, Maxim Fomin wrote:
>>> - phobos snowball - one invocation of some function in standard
>>> library leads to dozens template instantiations and invocations of
>>> pretty much stuff

> One low-hanging fruit that comes to mind is to use local imports instead
> of module-wide imports. If the local imports are inside templated
> functions, I *think* it would prevent pulling in the imports until the
> function is actually used, which would have the desired effect. (Right?)
> Much of Phobos was written before we had this feature, but since we have
> it now, might as well make good use of it.

A major point is to decouple feather-weight "traits" part of modules and the API part of module (preferably also split by category).
Then a given Phobos module may do something like this:

import std.regex.traits;

auto dirEntries(C, RegEx)(in C[] path, RegEx re)
	if(isSomeChar!C && isRegexFor!(Regex, C))
{
	import std.regex; //full package
	...
}


-- 
Dmitry Olshansky