std.container.RedBlackTree versus C++ std::set (page 2)

On Thursday, 14 February 2013 at 10:23:18 UTC, Namespace wrote: >> I agree. There are cases where structs make a lot of sense, usually when they are very simple simple and contain no pointers or references, otherwise structs should be avoided in favor of classes to avoid doing copy/move constructors and to avoid concerns over performance optimizations. With classes, only certain points in your code require that a duplicate copy be made of the class instance, the majority of code need only to pass around a reference which is the default behavior - easy and fast! >> >> --rt > > It sounds like Java philosophy: Objects are always better. > Or have I misunderstood? > In any case, a intensive use of classes / objects, instead of structs would be also an enormous heap effort. > I usually try to use as few classes as possible. It's a matter of balance. If you start having really complex objects (and big, eg > 100 bytes), then classes tend to scale better. If having a class is *really* too much overhead, but your objects start getting too big to pass around by value, you can just new them on the heap, and you'll get the "best" (or worst?) of both worlds. Another good balance are stack based struct pointer wrappers to implementation : You can pass them by value, but they carry a complex payload. The advantage to doing this over a naked array is the static type. The *dis*-advantage is that D has no standard default initialization scheme (classes do though). Most things in phobos use this scheme. The point I (we?) are trying to get across is that *usually* (not a hard rule) copying things in D is *expected* to be trivial and cheap. If this is not the case, then the tools you'll interface with will not work optimally.

> Another good balance are stack based struct pointer wrappers to implementation : You can pass them by value, but they carry a complex payload. I'm not sure what that is. Can you give a small example?

On Thursday, 14 February 2013 at 10:44:22 UTC, Namespace wrote: >> Another good balance are stack based struct pointer wrappers to implementation : You can pass them by value, but they carry a complex payload. > > I'm not sure what that is. > Can you give a small example? struct S { static struct Payload { //Tons of data here } Payload* _p; //fonctions } Ref counted is implemented that way. most of the containers are also implemented that way. associative arrays are also implemented that way under the hood.

> struct S > { > static struct Payload > { > //Tons of data here > } > Payload* _p; > > //fonctions > } > > Ref counted is implemented that way. most of the containers are also implemented that way. associative arrays are also implemented that way under the hood. But you have to allocate '_p' again on the heap. I see no advantage over classes, except that these structures are just not null by default. Is that the advantage?

February 14, 2013

Re: std.container.RedBlackTree versus C++ std::set

Posted by Ivan Kazmenko
in reply to Jonathan M Davis

Permalink

Ivan Kazmenko

Posted in reply to Jonathan M Davis

Permalink

First, thank you all for the replies!

Jonathan M Davis wrote:
> Also, it could make a big difference if you use gdc or ldc rather than dmd.

Thank you for the suggestion, I'll check that.  However, I don't expect the GCC optimizer to reduce the number of calls in this specific test, since it did not kick in in the more obvious case as I mentioned.

Right now, I am more concerned with the performance of the front-end (RedBlackTree): if it currently can't be tuned to perform fast enough, I'll just know that I should implement my own version of a binary search tree container for my needs.

Rob T wrote:
> You can check if disabling the GC just before the insert process
> improves the performance. You may see 3x performance improvement.
> Disabling is safe provided you re-enable, this can be done
> reliably with scope(exit) or something similar.

Hmm, it indeed performed faster, though not 3x in my setup.  However, the number of copy constructor calls stayed the same.

Steven Schveighoffer wrote:
> I find the number of postblit calls excessive too.
> I will have to look into why that happens.
> I can say that once an element is allocated, it is not moved or copied.

Could it be easily enforced in the library?  Something like this: when accessing data only for internal use in the data structure, use an interface that disables assignment?  Can such cases be distinguished from the ones when the library user actively wants to copy data?

Steven Schveighoffer wrote:
> I will note that std.container.RedBlackTree is a port of dcollections'
> RBTree implementation.

Thank you for pointing that out.  Perhaps I'll have to try the test with that implementation, too.

monarch_dodra wrote:
> Keep in mind that C++ and D have very different philosophies
> regarding copy construction.
> ...
> The conclusion is that the comparison is not fair: D's pass by
> value is not *very* different from C++'s pass by ref (in amount
> of data copied).

Well, perhaps I didn't make myself clear enough.  The comparison I posted is not intended to be a fair comparison of languages in the general case!  It is just my use case stripped to a minimal example that still shows the number of calls correctly.  In the actual use case, I have something like

-----
struct element
{
	long value;
	int [] one;
	int [] two;

	this (this)
	{
		one = one.dup;
		two = two.dup;
	}
}
-----

Sure I could think of some other way to represent the data I need as an object, this one just seems the most intuitive.

monarch_dodra wrote:
> If you *do* want a fair-er comparison, then I'd suggest you
> implement a ".dup" on your object, and see how many times THAT
> gets called. Guess what: 0.

Right, but it's the copy constructor that gets called when I copy a struct, and in these cases, I actually want it to copy the arrays as well.  Otherwise, the element object may end up in a bad state.

Do you imply that I should implement dup for every struct I create and then leave the copy constructor empty?  As the library makes use of the copy constructor and not dup, I fear it would be an incorrect design, since the library will make broken copies of my struct and pass them around.

monarch_dodra wrote:
> I'm not saying D's approach is perfect, just that D's library is
> optimized for D-like types.

and Rob T wrote:
> I agree. There are cases where structs make a lot of sense,
> usually when they are very simple simple and contain no pointers
> or references, otherwise structs should be avoided in favor of
> classes to avoid doing copy/move constructors and to avoid
> concerns over performance optimizations. With classes, only
> certain points in your code require that a duplicate copy be made
> of the class instance, the majority of code need only to pass
> around a reference which is the default behavior - easy and fast!

So, you suggest passing things by reference once they grow beyond several bytes in size?  Wouldn't that mean having to constantly pay attention to some counter-intuitive code when I actually want my objects to behave like values, not references (e.g., matrices or other complex math objects)?  In my example case with arrays one and two, I do want my whole object to obey value semantics.

Once again, thank you all for attention.

-----
Ivan Kazmenko.

On Thursday, 14 February 2013 at 10:58:19 UTC, Namespace wrote: >> struct S >> { >> static struct Payload >> { >> //Tons of data here >> } >> Payload* _p; >> >> //fonctions >> } >> >> Ref counted is implemented that way. most of the containers are also implemented that way. associative arrays are also implemented that way under the hood. > > But you have to allocate '_p' again on the heap. Well, yeah, that's the point. I'm not saying it's a best fit for everything. You are basically trading construction costs for copy costs. > I see no advantage over classes, except that these structures are just not null by default. Actually, (and IMO, this is a very big problem), these structures are *always* null by default. There is no easy way to "default initialize" structs in D :( > Is that the advantage? The advantage is deterministic RAII. With classes you only get non-deterministic RAII. For example: File. File will close the underlying FILE* when the last File is destroyed. But no later.

14-Feb-2013 03:22, Ivan Kazmenko пишет: > Hi! > > I'm learning to use D collections properly, and I'm looking for a sorted > data structure with logarithmic access time (i.e., a binary search tree > will do, but a hash table would not help). As far as I can see, > std.container.RedBlackTree is exactly what I need. However, I am not > sure if I use it as intended as its performance seems inferior to a C++ > STL solution (e.g., std::set). > > To be more specific, right now I wonder what is the best (or intended) > way to store an object in the RedBlackTree: should it be a class > reference, or a struct (passed by value), or something quirkier like an > integer pointing into an array or a simple pointer. The rest of my > program suggested to use structs, but the whole thing turned out to be > rather slow, and the profiler told me that these structs are being > copied around much more than I anticipated. > > And so I wrote a minimalistic test program to check the number of copy > (postblit) constructor calls. Here is the D version: [snip] > > And the results are: > D2 (DMD 2.059, -O): 11,389,556 I'd add : -release -inline or it may not inline away temporary copies. -- Dmitry Olshansky

On Thu, 14 Feb 2013 09:47:31 -0500, Namespace <rswhite4@googlemail.com> wrote: > If I use (as you do) > ---- > this(ref this) { > ---- > I get 0 as output. (dmd 2.062 beta1) > If I remove the 'ref' I get 11389556. this(ref this) is not a postblit. That it would even compile is a bug. -Steve

> If I use (as you do) > ---- > this(ref this) { > ---- > I get 0 as output. (dmd 2.062 beta1) > If I remove the 'ref' I get 11389556. Ouch, sorry! That's a copy & paste bug I introduced when posting here. Actually, I use this (this) of course. I tried this (ref this) at some point, it does indeed compile surprisingly, but it's not what gets called on copying (and thus is of no use).

Forums