Jump to page: 1 27  
Page
Thread overview
std.container.RedBlackTree versus C++ std::set
Feb 13, 2013
Ivan Kazmenko
Feb 13, 2013
Ivan Kazmenko
Feb 13, 2013
Jonathan M Davis
Feb 14, 2013
Ivan Kazmenko
Feb 14, 2013
Rob T
Feb 14, 2013
Rob T
Feb 14, 2013
Namespace
Feb 14, 2013
Rob T
Feb 14, 2013
Namespace
Feb 14, 2013
Jonathan M Davis
Feb 15, 2013
Dan
Feb 15, 2013
Namespace
Feb 15, 2013
Jonathan M Davis
Feb 15, 2013
Namespace
Feb 15, 2013
Jonathan M Davis
Feb 15, 2013
Jonathan M Davis
Feb 15, 2013
Namespace
Feb 15, 2013
Jonathan M Davis
Feb 15, 2013
Namespace
Feb 16, 2013
Dan
Feb 15, 2013
Jonathan M Davis
Feb 15, 2013
Jonathan M Davis
Feb 15, 2013
Dan
Feb 15, 2013
Jonathan M Davis
Feb 14, 2013
Ivan Kazmenko
Feb 14, 2013
bearophile
Feb 14, 2013
monarch_dodra
Feb 14, 2013
Namespace
Feb 15, 2013
Ivan Kazmenko
Feb 15, 2013
Ivan Kazmenko
Feb 15, 2013
Namespace
Feb 14, 2013
Namespace
Feb 15, 2013
monarch_dodra
Feb 15, 2013
Dan
Feb 15, 2013
monarch_dodra
Feb 15, 2013
H. S. Teoh
Feb 15, 2013
Jonathan M Davis
Feb 15, 2013
bearophile
Feb 16, 2013
Dan
Feb 15, 2013
monarch_dodra
Feb 14, 2013
Rob T
Feb 14, 2013
FG
Feb 14, 2013
Rob T
Feb 14, 2013
monarch_dodra
Feb 14, 2013
Rob T
Feb 14, 2013
Namespace
Feb 14, 2013
monarch_dodra
Feb 14, 2013
Namespace
Feb 14, 2013
monarch_dodra
Feb 14, 2013
Namespace
Feb 14, 2013
monarch_dodra
Feb 14, 2013
Jonathan M Davis
Feb 14, 2013
Dmitry Olshansky
Feb 14, 2013
Ivan Kazmenko
Feb 14, 2013
Namespace
Feb 14, 2013
Ivan Kazmenko
February 13, 2013
Hi!

I'm learning to use D collections properly, and I'm looking for a sorted data structure with logarithmic access time (i.e., a binary search tree will do, but a hash table would not help). As far as I can see, std.container.RedBlackTree is exactly what I need. However, I am not sure if I use it as intended as its performance seems inferior to a C++ STL solution (e.g., std::set).

To be more specific, right now I wonder what is the best (or intended) way to store an object in the RedBlackTree: should it be a class reference, or a struct (passed by value), or something quirkier like an integer pointing into an array or a simple pointer. The rest of my program suggested to use structs, but the whole thing turned out to be rather slow, and the profiler told me that these structs are being copied around much more than I anticipated.

And so I wrote a minimalistic test program to check the number of copy (postblit) constructor calls. Here is the D version:

-----
import std.container;
import std.stdio;

immutable int LIMIT = 100000;

struct element
{
	static int postblit_counter;

	long x;

	int opCmp (ref element other)
	{
		return (x > other.x) - (x < other.x);
	}

	this (long nx)
	{
		x = nx;
	}

	this (ref this)
	{
		assert (x == x);
		postblit_counter++;
	}
}

alias RedBlackTree !(element) container;

void main ()
{
	auto f = new container ();
	element.postblit_counter = 0;
	foreach (i; 0..LIMIT)
	{
		f.insert (element (i));
	}
	writefln ("%s", element.postblit_counter);
}
-----

And now here is a C++ equivalent:

-----
#include <cstdio>
#include <set>
#include <stdint.h>

const int LIMIT = 100000;

using namespace std;

struct element
{
	static int postblit_counter;

	int64_t x;

	bool operator < (const element other) const
	{
		return (x < other.x);
	}

	element (int64_t nx)
	{
		x = nx;
	}

	element (const element & other)
	{
		postblit_counter++;
	}
};

int element::postblit_counter;

typedef set <element> container;

int main (void)
{
	container f;
	element::postblit_counter = 0;
	for (int i = 0; i < LIMIT; i++)
	{
		f.insert (element (i));
	}
	printf ("%d\n", element::postblit_counter);
	return 0;
}
-----

And the results are:
D2 (DMD 2.059, -O):             11,389,556
C++ (MinGW GCC 4.7.2, -O2):      3,072,387

As you can see, in order to insert 100,000 elements, D needs a few times more copy constructor calls than C++. However, as far as I know, the internal structure used by C++ std::set is the very same red-black tree! Am I doing something wrong? And if not, what is this cost paid for, are there any features that RedBlackTree possesses and STL set doesn't?

Personally, I don't see why at all we should call the copy constructor more than once per element. I mean, if we intend to build a generic data structure, we sure need an internal node object with some extra bytes (internal references and counters) per each element, at least in the case of a red-black tree. So why don't we just bind each element to that internal node once and for all, and then, as long as the node is in the structure, use the data contained in it only by reference? What do we achieve if we choose to use it by value somewhere?

And if the intended design is "use class references", will that create an overhead for garbage collector later on?

...well, thank you for reading it to this point.

-----
Ivan Kazmenko.
February 13, 2013
P.S. More on C++ version:

> Personally, I don't see why at all we should call the copy constructor more than once per element. I mean, if we intend to build a generic data structure, we sure need an internal node object with some extra bytes (internal references and counters) per each element, at least in the case of a red-black tree. So why don't we just bind each element to that internal node once and for all, and then, as long as the node is in the structure, use the data contained in it only by reference? What do we achieve if we choose to use it by value somewhere?

Well, when I just changed
"bool operator < (const element other) const"
to
"bool operator < (const element & other) const"
in the C++ version of the program, it gave me the exact 100,000 copy constructor calls which justifies the above paragraph. Ouch... I had hopes GCC -O2 optimizes such obvious cases; perhaps it's not that obvious from the compiler point of view.

-----
Ivan Kazmenko.
February 13, 2013
On Thursday, February 14, 2013 00:33:26 Ivan Kazmenko wrote:
> P.S. More on C++ version:
> > Personally, I don't see why at all we should call the copy constructor more than once per element. I mean, if we intend to build a generic data structure, we sure need an internal node object with some extra bytes (internal references and counters) per each element, at least in the case of a red-black tree. So why don't we just bind each element to that internal node once and for all, and then, as long as the node is in the structure, use the data contained in it only by reference? What do we achieve if we choose to use it by value somewhere?
> 
> Well, when I just changed
> "bool operator < (const element other) const"
> to
> "bool operator < (const element & other) const"
> in the C++ version of the program, it gave me the exact 100,000
> copy constructor calls which justifies the above paragraph.
> Ouch... I had hopes GCC -O2 optimizes such obvious cases; perhaps
> it's not that obvious from the compiler point of view.

The biggest performance problem is the GC. The nodes in the RBT are allocated with the GC, so depending on what how many elements you're inserting, how often you're removing them, etc., it could have performance issues. A better GC is in the works but obviously hasn't made it in yet. Also, work is being done on designing custom allocators that std.container can use, but that also has a ways to go. So, the main items which would help make std.container's RBT perform on par with std::set are in the works but not ready yet. Also, it could make a big difference if you use gdc or ldc rather than dmd. dmd compiles code much faster than they do, but its optimizer is much worse - gdc and ldc consistently produce faster code, and if you're compiling the C++ code with gcc, then gdc is a much better comparison anyway, since then both the C++ and D code are using the same compiler backend.

- Jonathan M Davis
February 14, 2013
You can check if disabling the GC just before the insert process improves the performance. You may see 3x performance improvement. Disabling is safe provided you re-enable, this can be done reliably with scope(exit) or something similar.

import core.memory;

// ...


void main ()
{
	auto f = new container ();
	element.postblit_counter = 0;
        GC.disable;
	foreach (i; 0..LIMIT)
	{
		f.insert (element (i));
	}
        GC.enable;
	writefln ("%s", element.postblit_counter);
}

--rt
February 14, 2013
On 2013-02-14 01:09, Rob T wrote:
> You can check if disabling the GC just before the insert process improves the
> performance. You may see 3x performance improvement. Disabling is safe provided
> you re-enable, this can be done reliably with scope(exit) or something similar.

How did you know? It was 3x in my case. :)
Well, even more but the program had to wait 2 seconds at the end to collect.
With LIMIT at 10M, g++: 5.0s, gdc: 27.0s and 8.7s with GC.disable.
Internal memory handling by containers - yeah, can't wait to see that happen!


February 14, 2013
On Wed, 13 Feb 2013 18:22:02 -0500, Ivan Kazmenko <gassa@mail.ru> wrote:

> Hi!
>
> I'm learning to use D collections properly, and I'm looking for a sorted data structure with logarithmic access time (i.e., a binary search tree will do, but a hash table would not help). As far as I can see, std.container.RedBlackTree is exactly what I need. However, I am not sure if I use it as intended as its performance seems inferior to a C++ STL solution (e.g., std::set).
>
> To be more specific, right now I wonder what is the best (or intended) way to store an object in the RedBlackTree: should it be a class reference, or a struct (passed by value), or something quirkier like an integer pointing into an array or a simple pointer. The rest of my program suggested to use structs, but the whole thing turned out to be rather slow, and the profiler told me that these structs are being copied around much more than I anticipated.
>
> And so I wrote a minimalistic test program to check the number of copy (postblit) constructor calls. Here is the D version:
>
> -----
> import std.container;
> import std.stdio;
>
> immutable int LIMIT = 100000;
>
> struct element
> {
> 	static int postblit_counter;
>
> 	long x;
>
> 	int opCmp (ref element other)
> 	{
> 		return (x > other.x) - (x < other.x);
> 	}
>
> 	this (long nx)
> 	{
> 		x = nx;
> 	}
>
> 	this (ref this)
> 	{
> 		assert (x == x);
> 		postblit_counter++;
> 	}
> }
>
> alias RedBlackTree !(element) container;
>
> void main ()
> {
> 	auto f = new container ();
> 	element.postblit_counter = 0;
> 	foreach (i; 0..LIMIT)
> 	{
> 		f.insert (element (i));
> 	}
> 	writefln ("%s", element.postblit_counter);
> }
> -----
>
> And now here is a C++ equivalent:
>
> -----
> #include <cstdio>
> #include <set>
> #include <stdint.h>
>
> const int LIMIT = 100000;
>
> using namespace std;
>
> struct element
> {
> 	static int postblit_counter;
>
> 	int64_t x;
>
> 	bool operator < (const element other) const
> 	{
> 		return (x < other.x);
> 	}
>
> 	element (int64_t nx)
> 	{
> 		x = nx;
> 	}
>
> 	element (const element & other)
> 	{
> 		postblit_counter++;
> 	}
> };
>
> int element::postblit_counter;
>
> typedef set <element> container;
>
> int main (void)
> {
> 	container f;
> 	element::postblit_counter = 0;
> 	for (int i = 0; i < LIMIT; i++)
> 	{
> 		f.insert (element (i));
> 	}
> 	printf ("%d\n", element::postblit_counter);
> 	return 0;
> }
> -----
>
> And the results are:
> D2 (DMD 2.059, -O):             11,389,556
> C++ (MinGW GCC 4.7.2, -O2):      3,072,387
>
> As you can see, in order to insert 100,000 elements, D needs a few times more copy constructor calls than C++. However, as far as I know, the internal structure used by C++ std::set is the very same red-black tree! Am I doing something wrong? And if not, what is this cost paid for, are there any features that RedBlackTree possesses and STL set doesn't?
>
> Personally, I don't see why at all we should call the copy constructor more than once per element. I mean, if we intend to build a generic data structure, we sure need an internal node object with some extra bytes (internal references and counters) per each element, at least in the case of a red-black tree. So why don't we just bind each element to that internal node once and for all, and then, as long as the node is in the structure, use the data contained in it only by reference? What do we achieve if we choose to use it by value somewhere?

I find the number of postblit calls excessive too.  I will have to look into why that happens.  I can say that once an element is allocated, it is not moved or copied.  That is, the red black algorithm does not copy the data around inside the tree, it rotates by adjusting pointers.  I required this because it makes cursors sane (they always point at the same element).

I will note that std.container.RedBlackTree is a port of dcollections' RBTree implementation.  As std.container's API is unfinished, there certainly is room for improvement on the std.container version.  Dcollections' API is more complete, and may perform better.

Specifically, RBTree inside dcollections uses a custom allocator that should significantly reduce runtime.

>
> And if the intended design is "use class references", will that create an overhead for garbage collector later on?

It is not designed only for classes, your usage above should be perfectly valid.

-Steve
February 14, 2013
On Wednesday, 13 February 2013 at 23:22:03 UTC, Ivan Kazmenko wrote:
> Hi!
> -----
> Ivan Kazmenko.

Keep in mind that C++ and D have very different philosophies regarding copy construction.

C++ has "strong ownership", so for example, whenever you copy a string/vector, or pass it by value, the whole damn thing has to be completely duplicated. It's so expansive that C++ has gone out of its way to use pass-by-reference semantics, as well (now) RValue references.

D, on the other hand (being GC), has a "shallow ownership" philosopy: Basically, when you make a copy, nothing happens, appart from the binary copy of the object. You want to pass a string by value? that's 8 bytes of data. Period. The *most* complex CC I have *ever* seen in D merely increments a reference counter. That's it. 90% (99%?) of the time, there isn't even a postblit implemented. Because of this, D has embraced pass by value.

Not to mention, if you are using classes, those are already pointer types. Why pass them by ref?

--------

The conclusion is that the comparison is not fair: D's pass by value is not *very* different from C++'s pass by ref (in amount of data copied).

If you *do* want a fair-er comparison, then I'd suggest you implement a ".dup" on your object, and see how many times THAT gets called. Guess what: 0.

I'm not saying D's approach is perfect, just that D's library is optimized for D-like types.

Just the same, a lot the stl is not properlly optimized for little POD types, as they get passed around by ref, when their actual size is the same as the pointer referencing them...
February 14, 2013
On Thursday, 14 February 2013 at 00:25:15 UTC, FG wrote:
> On 2013-02-14 01:09, Rob T wrote:
>> You can check if disabling the GC just before the insert process improves the
>> performance. You may see 3x performance improvement. Disabling is safe provided
>> you re-enable, this can be done reliably with scope(exit) or something similar.
>
> How did you know? It was 3x in my case. :)
> Well, even more but the program had to wait 2 seconds at the end to collect.
> With LIMIT at 10M, g++: 5.0s, gdc: 27.0s and 8.7s with GC.disable.
> Internal memory handling by containers - yeah, can't wait to see that happen!

Oh yes, I forgot to mention you can expect ~2 second additional delay from re-enabling the CG.

How did I know? Read this ...
http://forum.dlang.org/thread/waarzqtfcxuzhzdelhtt@forum.dlang.org

So it seems that we can get close to g++ performance with some manual tweaking of the GC, which is good, but you have to know this and know where to make the adjustments.

What we really need is a better GC with manual fine tuning control over how it operates, with an ability to gain feed back from it to understand where and when it may be doing something stupid, and even better than that is if we could use D completely without the GC, which may be a requirement for certain types of applications. Unfortunately, a certain subset of D depends on there being a method of automated garbage collection in place, so currently you cannot use all of D without a GC.

--rt
February 14, 2013
On Thursday, 14 February 2013 at 06:56:38 UTC, monarch_dodra wrote:
> On Wednesday, 13 February 2013 at 23:22:03 UTC, Ivan Kazmenko wrote:
>> Hi!
>> -----
>> Ivan Kazmenko.
>
> Keep in mind that C++ and D have very different philosophies regarding copy construction.
>
> C++ has "strong ownership", so for example, whenever you copy a string/vector, or pass it by value, the whole damn thing has to be completely duplicated. It's so expansive that C++ has gone out of its way to use pass-by-reference semantics, as well (now) RValue references.
>
> D, on the other hand (being GC), has a "shallow ownership" philosopy: Basically, when you make a copy, nothing happens, appart from the binary copy of the object. You want to pass a string by value? that's 8 bytes of data. Period. The *most* complex CC I have *ever* seen in D merely increments a reference counter. That's it. 90% (99%?) of the time, there isn't even a postblit implemented. Because of this, D has embraced pass by value.
>
> Not to mention, if you are using classes, those are already pointer types. Why pass them by ref?
>
> --------
>
> The conclusion is that the comparison is not fair: D's pass by value is not *very* different from C++'s pass by ref (in amount of data copied).
>
> If you *do* want a fair-er comparison, then I'd suggest you implement a ".dup" on your object, and see how many times THAT gets called. Guess what: 0.
>
> I'm not saying D's approach is perfect, just that D's library is optimized for D-like types.
>
> Just the same, a lot the stl is not properlly optimized for little POD types, as they get passed around by ref, when their actual size is the same as the pointer referencing them...

I agree. There are cases where structs make a lot of sense, usually when they are very simple simple and contain no pointers or references, otherwise structs should be avoided in favor of classes to avoid doing copy/move constructors and to avoid concerns over performance optimizations. With classes, only certain points in your code require that a duplicate copy be made of the class instance, the majority of code need only to pass around a reference which is the default behavior - easy and fast!

--rt
February 14, 2013
> I agree. There are cases where structs make a lot of sense, usually when they are very simple simple and contain no pointers or references, otherwise structs should be avoided in favor of classes to avoid doing copy/move constructors and to avoid concerns over performance optimizations. With classes, only certain points in your code require that a duplicate copy be made of the class instance, the majority of code need only to pass around a reference which is the default behavior - easy and fast!
>
> --rt

It sounds like Java philosophy: Objects are always better.
Or have I misunderstood?
In any case, a intensive use of classes / objects, instead of structs would be also an enormous heap effort.
I usually try to use as few classes as possible.
« First   ‹ Prev
1 2 3 4 5 6 7