February 14, 2013
On Thursday, February 14, 2013 14:31:36 Steven Schveighoffer wrote:
> On Thu, 14 Feb 2013 13:45:30 -0500, Rob T <alanb@ucora.com> wrote:
> > When I look at the std.container source code, it seems that the payload element is passed by value multiple times unnecessarily, so to minimize copy construction you'll have to implement element as a class and implement a dup function for it.
> > 
> > I expect performance will increase substantially even with the extra heap allocations.
> > 
> > Alternatively, you can implement your own version of RedBlackTree with pass by ref optimizations for insertion of struct elements.
> 
> If it was pass by ref, then rbt.insert(5) would not work. And in fact, I wouldn't want things passed by ref if the element is int. I have to admit, I did not consider expensive postblits when I designed it. Almost all my testing is with integer types.

And Walter and Andrei both seem to think that having expensive postlbits is a design mistake. The range stuff sort of tries to support it with the move* primitives, but Andrei's been wanting to argue for just assuming that postblits are cheap. And Walter was actually arguing at one point for making it illegal (probably by getting rid of postblit all together - I don't remember the details). Plenty of the rest of us don't agree, but to some extent, it is true that having an expensive postblit is asking for it. On a related note, we still need a solution for dealing with const postblits (probably be introducing copy constructors - IIRC Andrei was considering phasing out postblits in favor of copy constructors, which would be unnecessary, but we do need a way to deal with deep copying const structs which hold reference types which need to be deep copied.

> Unfortunately, D is in this state where taking a ref parameter means strictly lvalue. So passing rvalues will not work. D does not have the notion that C++ has where const type& can accept both rvalue and lvalue. We desperately need something similar.

auto ref is the current solution for templated functions, but we do need a more general solution. DIP25 got created in part due to discussions on that, but it probably needs to be fully sorted out before we can fully sort out the const& solution.

- Jonathan M Davis
February 14, 2013
A small but nice intentioned advice of me is this article: http://dlang.org/dstyle.html
"The names of user-defined types should be PascalCased, which is the same as camelCased except that the first letter is uppercase."
February 14, 2013
On Thursday, 14 February 2013 at 20:44:13 UTC, bearophile wrote:
> Ivan Kazmenko:
>
>> Thank you all for assistance, I've finally pinned it down!
>
> This is great. The first step to solve a problem is to find it. The second step is to formalize it. So I suggest you to create a small benchmark that shows what you mean (it should not include RedBlackTree) and put it in Bugzilla, with some timings.
>
> Bye,
> bearophile


I've already begun investigating, and am now benching my solution.

But filing issues never hurt anyone.
February 14, 2013
On Thu, 14 Feb 2013 15:33:19 -0500, Ivan Kazmenko <gassa@mail.ru> wrote:

> Rob T wrote:
>> When I look at the std.container source code, it seems that the payload element is passed by value multiple times unnecessarily, so to minimize copy construction you'll have to implement element as a class and implement a dup function for it.
>
> Thank you all for assistance, I've finally pinned it down!  It's the default magic "a < b" that devours the most copies of my precious structs!  Once I change the declaration:
>
> -----
> alias RedBlackTree !(element) container;
> -----
>
> to
>
> -----
> bool lessfun (ref element a, ref element b)
> {
> 	return a < b;
> }
>
> alias RedBlackTree !(element, lessfun) container;
> -----
>
> ... I get only exactly 3 * LIMIT postblit constructor calls, which is 300,000 instead of 11,389,556.  While I still want to find where the excess 200,000 calls originate from, this is definitely asymptotically better than before: O (n) versus O (n log n) calls.  And it is now less of a bottleneck in my original program.
>
> This raises the question to the default behavior of std.functional.binaryFun.  Sure, for integers, class references and small types, the current way is the best.  On the other hand, for large structs and maybe some other complex objects that obey value semantics, it seems it would be beneficial to, given a string like "a < b", produce a function taking arguments by reference and not by value.  Maybe there is some simple criterion (size of type greater than size_t - seems bad? type is a struct - better?) which can be used with static if to generate "(ref a, ref b)" instead of "(a, b)" version by default.  Of course, all that could only work if there is a more detailed hierarchy of binary functions, at least a binaryFunConst taking const arguments.

It is great that you found this!

Again, the issue here is D has restrictions on ref parameters.  There are no such restrictions on pass-by-value parameters.  So by default, the predicate tries by-value.

In this case, RedBlackTree is always assured of having lvalues for the predicate, so I can probably make the default use pass-by-ref.  If you could file a bug, that would be most helpful.

http://d.puremagic.com/issues/enter_bug.cgi?product=D

-Steve
February 14, 2013
On Thu, 14 Feb 2013 15:33:19 -0500, Ivan Kazmenko <gassa@mail.ru> wrote:

> ... I get only exactly 3 * LIMIT postblit constructor calls, which is 300,000 instead of 11,389,556.  While I still want to find where the excess 200,000 calls originate from, this is definitely asymptotically better than before

As I said elsewhere, the issue is D not having a good solution for accepting both rvalues and lvalues by ref.  Jonathan mentioned auto ref, but it's not exactly implemented correctly.

In C++, it just takes all parameters as const T&, and that works for both lvalues and rvalues.

The extra 200k copies are from the implementation taking all parameters by value.  If we didn't do that, sans a working auto ref, things like tree.insert(element(5)) would fail to compile (cannot pass an rvalue by reference)

-Steve
February 14, 2013
> As I said elsewhere, the issue is D not having a good solution for accepting both rvalues and lvalues by ref.  Jonathan mentioned auto ref, but it's not exactly implemented correctly.
>
> In C++, it just takes all parameters as const T&, and that works for both lvalues and rvalues.
>
> The extra 200k copies are from the implementation taking all parameters by value.  If we didn't do that, sans a working auto ref, things like tree.insert(element(5)) would fail to compile (cannot pass an rvalue by reference)
>
> -Steve

And Jonathan said also that 'auto ref' is not the solution for non-template functions.
Probably 'auto ref' will never work for non-template functions. So a general solution must be found first.
February 15, 2013
Steven Schveighoffer wrote:

> Again, the issue here is D has restrictions on ref parameters.  There are no such restrictions on pass-by-value parameters.  So by default, the predicate tries by-value.

I am starting to understand the difficulty with ref being too restrictive.  Some kind of "const ref" in the language would have helped indeed.  But it would probably cause some allocation difficulties for actual constants... or worse?

> In this case, RedBlackTree is always assured of having lvalues for the predicate, so I can probably make the default use pass-by-ref.  If you could file a bug, that would be most helpful.

Done! http://d.puremagic.com/issues/show_bug.cgi?id=9513
February 15, 2013
Namespace wrote:
> A small but nice intentioned advice of me is this article: http://dlang.org/dstyle.html
> "The names of user-defined types should be PascalCased, which is the same as camelCased except that the first letter is uppercase."

Thanks for pointing that out.  Indeed, PascalCase seems
convenient to use with D structs.  On the other side, I still
have aesthetic considerations against camelCase and prefer
underscores_between_words for variables in the programs I write
for personal use.
February 15, 2013
> I am starting to understand the difficulty with ref being too restrictive.  Some kind of "const ref" in the language would have helped indeed.  But it would probably cause some allocation difficulties for actual constants... or worse?

Search for 'auto ref' or right after 'rvalue references' and read some of the discussions. It's instructive and then you can understand the whole topic maybe a little more.
No thread comes to a solution and unfortunately the answer of Walter and Andrei are very very thin on the subject.
February 15, 2013
On Thursday, 14 February 2013 at 20:33:21 UTC, Ivan Kazmenko wrote:
> Rob T wrote:
>> When I look at the std.container source code, it seems that the payload element is passed by value multiple times unnecessarily, so to minimize copy construction you'll have to implement element as a class and implement a dup function for it.
>
> Thank you all for assistance, I've finally pinned it down!  It's the default magic "a < b" that devours the most copies of my precious structs!  Once I change the declaration:
>
> -----
> alias RedBlackTree !(element) container;
> -----
>
> to
>
> -----
> bool lessfun (ref element a, ref element b)
> {
> 	return a < b;
> }
>
> alias RedBlackTree !(element, lessfun) container;
> -----
>
> ... I get only exactly 3 * LIMIT postblit constructor calls, which is 300,000 instead of 11,389,556.  While I still want to find where the excess 200,000 calls originate from, this is definitely asymptotically better than before: O (n) versus O (n log n) calls.  And it is now less of a bottleneck in my original program.
>
> This raises the question to the default behavior of std.functional.binaryFun.  Sure, for integers, class references and small types, the current way is the best.  On the other hand, for large structs and maybe some other complex objects that obey value semantics, it seems it would be beneficial to, given a string like "a < b", produce a function taking arguments by reference and not by value.  Maybe there is some simple criterion (size of type greater than size_t - seems bad? type is a struct - better?) which can be used with static if to generate "(ref a, ref b)" instead of "(a, b)" version by default.  Of course, all that could only work if there is a more detailed hierarchy of binary functions, at least a binaryFunConst taking const arguments.
>
> -----
> Ivan Kazmenko.

Also keep in mind that "a < b" is implemented as two calls to "a.opCmp(b)", and "a.opCmp(b)" is itself usually implemented as two calls to "something < something else" (!)

as convenient as opCmp is for the implementation implementation, it is not always the best in terms of raw power.

If you have can write a straight up less(S1, S2) function, such as "a.i < b.i", then using that can buy you a hefty amount of time.