View mode: basic / threaded / horizontal-split · Log in · Help
December 14, 2012
Significant GC performance penalty
I created a D library wrapper for sqlite3 that uses a dynamically 
constructed result list for returned records from a SELECT 
statement. It works in a similar way to a C++ version that I 
wrote a while back.

The D code is D code, not a cloned up version of my earlier C++ 
code, so it makes use of many of the features of D, and one of 
them is the garbage collector.

When running comparison tests between the C++ version and the D 
version, both compiled using performance optimization flags, the 
C++ version runs 3x faster than the D version which was very 
unexpected. If anything I was hoping for a performance boost out 
of D or at least the same performance levels.

I remembered reading about people having performance problems 
with the GC, so I tried a quick fix, which was to disable the GC 
before the SELECT is run and re-enable afterwards. The result of 
doing that was a 3x performance boost, making the DMD compiled 
version run almost as fast as the C++ version. The DMD compiled 
version is now only 2 seconds slower on my stress test runs of a 
SELECT that returns 200,000+ records with 14 fields. Not too bad! 
I may get identical performance if I compile using gdc, but that 
will have to wait until it is updated to 2.061.

Fixing this was a major relief since the code is expected to be 
used in a commercial setting. I'm wondering though, why the GC 
causes such a large penalty, and what negative effect if any if 
there will be when disabling the GC temporarily. I know that 
memory won't be reclaimed until the GC is re-enabled, but is 
there anything else to worry about?

I feel it's worth commenting on my experience as feed back for 
the D developers and anyone else starting off with D.

Coming from C++ I *really* did not like having the GC, it made me 
very nervous, but now that I'm used to having it, I've come to 
like having it up to a point. It really does change the way you 
think and code. However as I've discovered, you still have to 
always be thinking about memory management issues because the GC 
can eat up a huge performance penalty under certain situations. I 
also NEED to know that I can always go full manual where 
necessary. There's no way I would want to give up that kind of 
control.

The trade off with having a GC seems to be that by default, C++ 
apps will perform considerably faster than equivalent D apps 
out-of-the-box, simply because the manual memory management is 
fine tuned by the programmer as the development proceeds. With D, 
when you simply let the GC take care of business, then you are 
not necessarily fine tuning as you go along, and when you do not 
take the resulting performance hit into consideration it means 
that your apps will likely perform poorly compared to a C++ 
equivalent. However, building the equivalent app in D is a much 
more pleasant experience in terms of the programming productivity 
gain. The code is simpler to deal with, and there's less to worry 
about with pointers and other memory management issues.

What I have not yet had the opportunity to explore, is using D in 
full manual memory management mode. My understanding is that if I 
take that route, then I cannot use certain parts of the std lib, 
and will also loose a few of the nice features of D that make it 
fun to work with. I'm not fully clear though on what to expect, 
so if there's any detailed information to look at, it would be a 
big help.

I wonder what can be done to allow a programmer to go fully 
manual, while not loosing any of the nice features of D?

Also, I think everyone agrees we really need a better GC, and I 
wonder once we do get a better GC, what kind of overall 
improvements we can expect to see?

Thanks for listening.

--rt
December 14, 2012
Re: Significant GC performance penalty
Allocating memory is simply slow. The same is true in C++ where 
you will see performance hits if you allocate memory too often. 
The GC makes things worse, but if you really care about 
performance then you'll avoid allocating memory so often.

Try to pre-allocate as much as possible, and use the stack 
instead of the heap where possible. Fixed size arrays and structs 
are your friend.

I avoid using the GC when using D and I feel like I still have a 
lot of freedom of expression, but maybe I'm just used to it.
December 14, 2012
Re: Significant GC performance penalty
Rob T:

> I wonder what can be done to allow a programmer to go fully 
> manual, while not loosing any of the nice features of D?

Even the Rust language, that has a more powerful type system than 
D, with region analysis and more, sometimes needs localized 
reference counting (or a localized per-thread GC) to allow the 
usage of its full features. So I don't think you can have all the 
nice features of D without its GC.

I believe the D design has bet too much on its (not precise) GC. 
Now the design of Phobos & D needs to show more love for stack 
allocations (see Variable Length arrays, array literals, etc), 
for some alternative allocators like reaps 
(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.6505 
), and so on. Someone implemented a stack-like data manager for 
D, but the voting didn't allow it into Phobos.

Bye,
bearophile
December 14, 2012
Re: Significant GC performance penalty
On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander 
wrote:
> Allocating memory is simply slow. The same is true in C++ where 
> you will see performance hits if you allocate memory too often. 
> The GC makes things worse, but if you really care about 
> performance then you'll avoid allocating memory so often.
>
> Try to pre-allocate as much as possible, and use the stack 
> instead of the heap where possible. Fixed size arrays and 
> structs are your friend.

In my situation, I can think of some ways to mitigate the memory 
allocation  problem, however it's a bit tricky when SELECT 
statement results have to be dynamically generated, since the 
number of rows returned and size and type of the rows are always 
different depending on the query and the data stored in the 
database. It's just not at all practical to custom fit for each 
SELECT to a pre-allocated array or list, it'll just be far too 
much manual effort.

I could consider generating a free list of pre-allocated record 
components that is re-used rather than destroyed and reallocated. 
However knowing how many records to pre-allocate is tricky and I 
could run out, or waste tons of RAM for nothing most of the time. 
End of day, I may be better off digging into the GC source code 
itself and look for solutions.

> I avoid using the GC when using D and I feel like I still have 
> a lot of freedom of expression, but maybe I'm just used to it.

I'd like to do that too and wish I had your experience with how 
you are going about it. My fear is that I'll end up allocating 
without knowing and having my apps silently eat up memory over 
time.

At the end of the day, there's just no point in having a GC if 
you don't want to use it, so the big question is if a GC can be 
made to work much better than what we have? Supposedly yes, but 
will the improvements really matter? I somehow doubt it will.

When I look at GC based apps, what they all seem to have in 
common, is that they tend to eat up vast amounts of RAM for 
nothing and perform poorly. I'm speaking mostly about Java apps, 
they are terrible with performance and memory foot print in 
general. But also C++ apps that use built in GC tend to have 
similar issues.

It may be that the GC concept works far better in theory than in 
practice, although due to the performance penalty work-a-rounds, 
you may end up writing better performing apps because of it, 
however that's NOT the intention of having a GC!

--rt
December 14, 2012
Re: Significant GC performance penalty
On Friday, 14 December 2012 at 18:27:29 UTC, Rob T wrote:
> I created a D library wrapper for sqlite3 that uses a 
> dynamically constructed result list for returned records from a 
> SELECT statement. It works in a similar way to a C++ version 
> that I wrote a while back.
>
> The D code is D code, not a cloned up version of my earlier C++ 
> code, so it makes use of many of the features of D, and one of 
> them is the garbage collector.
>
> When running comparison tests between the C++ version and the D 
> version, both compiled using performance optimization flags, 
> the C++ version runs 3x faster than the D version which was 
> very unexpected. If anything I was hoping for a performance 
> boost out of D or at least the same performance levels.
>
> I remembered reading about people having performance problems 
> with the GC, so I tried a quick fix, which was to disable the 
> GC before the SELECT is run and re-enable afterwards. The 
> result of doing that was a 3x performance boost, making the DMD 
> compiled version run almost as fast as the C++ version. The DMD 
> compiled version is now only 2 seconds slower on my stress test 
> runs of a SELECT that returns 200,000+ records with 14 fields. 
> Not too bad! I may get identical performance if I compile using 
> gdc, but that will have to wait until it is updated to 2.061.
>
> Fixing this was a major relief since the code is expected to be 
> used in a commercial setting. I'm wondering though, why the GC 
> causes such a large penalty, and what negative effect if any if 
> there will be when disabling the GC temporarily. I know that 
> memory won't be reclaimed until the GC is re-enabled, but is 
> there anything else to worry about?
>
> I feel it's worth commenting on my experience as feed back for 
> the D developers and anyone else starting off with D.
>
> Coming from C++ I *really* did not like having the GC, it made 
> me very nervous, but now that I'm used to having it, I've come 
> to like having it up to a point. It really does change the way 
> you think and code. However as I've discovered, you still have 
> to always be thinking about memory management issues because 
> the GC can eat up a huge performance penalty under certain 
> situations. I also NEED to know that I can always go full 
> manual where necessary. There's no way I would want to give up 
> that kind of control.
>
> The trade off with having a GC seems to be that by default, C++ 
> apps will perform considerably faster than equivalent D apps 
> out-of-the-box, simply because the manual memory management is 
> fine tuned by the programmer as the development proceeds. With 
> D, when you simply let the GC take care of business, then you 
> are not necessarily fine tuning as you go along, and when you 
> do not take the resulting performance hit into consideration it 
> means that your apps will likely perform poorly compared to a 
> C++ equivalent. However, building the equivalent app in D is a 
> much more pleasant experience in terms of the programming 
> productivity gain. The code is simpler to deal with, and 
> there's less to worry about with pointers and other memory 
> management issues.
>
> What I have not yet had the opportunity to explore, is using D 
> in full manual memory management mode. My understanding is that 
> if I take that route, then I cannot use certain parts of the 
> std lib, and will also loose a few of the nice features of D 
> that make it fun to work with. I'm not fully clear though on 
> what to expect, so if there's any detailed information to look 
> at, it would be a big help.
>
> I wonder what can be done to allow a programmer to go fully 
> manual, while not loosing any of the nice features of D?
>
> Also, I think everyone agrees we really need a better GC, and I 
> wonder once we do get a better GC, what kind of overall 
> improvements we can expect to see?
>
> Thanks for listening.
>
> --rt

Having lots of experience in GC enabled languages, even for 
systems programming (Oberon & Active Oberon).

I think there a few issues to consider:

- D's GC still has a lot of room to improve, so some of the 
issues you have found might eventually get improved;

- Having GC support, does not mean to do call new like crazy, one 
still needs to think how to code in a GC friendly way;

- Make proper use of weak references in case they are available;

- GC enabled languages runtimes usually offer ways to peak into 
the runtime, somehow, and allow the developer to understand how 
GC is working and what might be improved;

The goodness of having a GC is to have a safer way to manage 
memory across multiple modules, specially when ownership is not 
clear.

Even in C++ I seldom do manual memory management nowadays, if 
working on new codebases. Of course, others will have a different 
experience.

Other than that, thanks for sharing your experience.

--
Paulo
December 14, 2012
Re: Significant GC performance penalty
On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
> In my situation, I can think of some ways to mitigate the 
> memory allocation  problem, however it's a bit tricky when 
> SELECT statement results have to be dynamically generated, 
> since the number of rows returned and size and type of the rows 
> are always different depending on the query and the data stored 
> in the database. It's just not at all practical to custom fit 
> for each SELECT to a pre-allocated array or list, it'll just be 
> far too much manual effort.

Maybe I have misunderstood, but it sounds to me like you could
get away with a single allocation there. Just reducing the number
of allocations will improve things a lot.

>> I avoid using the GC when using D and I feel like I still have 
>> a lot of freedom of expression, but maybe I'm just used to it.
>
> I'd like to do that too and wish I had your experience with how 
> you are going about it. My fear is that I'll end up allocating 
> without knowing and having my apps silently eat up memory over 
> time.

This shouldn't be a problem. I occasionally recompile druntime
with a printf inside the allocation function just to make sure,
but normally I can tell if memory allocations are going on
because of the sudden GC pauses.

> At the end of the day, there's just no point in having a GC if 
> you don't want to use it, so the big question is if a GC can be 
> made to work much better than what we have? Supposedly yes, but 
> will the improvements really matter? I somehow doubt it will.

D's GC has a lot of headroom for improvement. A generational GC
will likely improve things a lot.

> When I look at GC based apps, what they all seem to have in 
> common, is that they tend to eat up vast amounts of RAM for 
> nothing and perform poorly. I'm speaking mostly about Java 
> apps, they are terrible with performance and memory foot print 
> in general. But also C++ apps that use built in GC tend to have 
> similar issues.

The problem with Java is not just because of the GC. Java eats up
huge amounts of memory because it has no value-types, so
*everything* has to be allocated on the heap, and every object
has 16 bytes of overhead (on 64-bit systems) in addition to
memory manager overhead.

This is a great presentation on the subject of Java memory
efficiency:
http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

> It may be that the GC concept works far better in theory than 
> in practice, although due to the performance penalty 
> work-a-rounds, you may end up writing better performing apps 
> because of it, however that's NOT the intention of having a GC!

When it comes to performance, there is always a compromise with
usability. Even malloc performs poorly compared to more manual
memory management. Even automatic register allocation by the
compiler can lead to poor performance.

The only question is where you want to draw the line between
usability and performance.
December 14, 2012
Re: Significant GC performance penalty
On Fri, Dec 14, 2012 at 08:27:46PM +0100, Paulo Pinto wrote:
[...]
> - Having GC support, does not mean to do call new like crazy, one
> still needs to think how to code in a GC friendly way;

It makes me think, though, that perhaps there is some way of optimizing
the GC for recursive data structures where you only ever keep a
reference to the head node, so that they can be managed in a much more
efficient way than a structure where there may be arbitrary number of
references to anything inside. I think this is a pretty common case, at
least in the kind of code I encounter frequently.

Also, coming from C/C++, I have to say that my coding style has been
honed over the years to think in terms of single-ownership structures,
so even when coding in D I tend to write code that way. However, having
the GC available means that there are some cases where using multiple
references to stuff will actually improve GC (and overall) performance
by eliminating the need to deep-copy stuff everywhere.


> - GC enabled languages runtimes usually offer ways to peak into the
> runtime, somehow, and allow the developer to understand how GC is
> working and what might be improved;
[...]

Yeah, I think for most applications, it's probably good enough to use
the functions in core.memory (esp. enable, disable, collect, and
minimize) to exercise some control over the GC so that you can use
manual memory management in the important hotspots, and just let the GC
do its thing in less important parts of the program. I think
core.memory.minimize will solve the OP's concern about GC'd apps having
bad memory footprints.


T

-- 
Computers aren't intelligent; they only think they are.
December 14, 2012
Re: Significant GC performance penalty
On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
> On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
[...]
> >It may be that the GC concept works far better in theory than in
> >practice, although due to the performance penalty work-a-rounds,
> >you may end up writing better performing apps because of it,
> >however that's NOT the intention of having a GC!
> 
> When it comes to performance, there is always a compromise with
> usability. Even malloc performs poorly compared to more manual
> memory management. Even automatic register allocation by the
> compiler can lead to poor performance.
> 
> The only question is where you want to draw the line between
> usability and performance.

Yeah. If you want to squeeze out every last drop of juice your CPU's got
to offer you, you could code directly in assembler, and no optimizing
compiler, GC or no GC, will be able to beat that.

But people stopped writing entire apps in assembler a long time ago. :-)

(I actually did that once, many years ago, for a real app that actually
made a sale or two. It was a good learning experience, and helped me
improve my coding skills just from knowing how the machine actually
works under the hood, as well as learning why it's so important to write
code in a well-structured way -- you have no choice when doing
large-scale coding in assembler, 'cos otherwise your assembly code
quickly devolves into a spaghetti paste soup that no human can possibly
comprehend. So I'd say it was a profitable, even rewarding experience.
But I wouldn't do it again today, given the choice.)


T

-- 
Ruby is essentially Perl minus Wall.
December 14, 2012
Re: Significant GC performance penalty
On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
>
> (I actually did that once, many years ago, for a real app that 
> actually
> made a sale or two. It was a good learning experience, and 
> helped me
> improve my coding skills just from knowing how the machine 
> actually
> works under the hood, as well as learning why it's so important 
> to write
> code in a well-structured way -- you have no choice when doing
> large-scale coding in assembler, 'cos otherwise your assembly 
> code
> quickly devolves into a spaghetti paste soup that no human can 
> possibly
> comprehend. So I'd say it was a profitable, even rewarding 
> experience.
> But I wouldn't do it again today, given the choice.)
>
>
> T

Yeah, I did that too long ago and I'm happy to have learned the 
skills because it's the ultimate coding experience imaginable. If 
you don't do it very carefully, it goes all to hell just like you 
say. Best to let the machines do it these days, even if I could 
do it 10x better, it'll take me 100's of years to do what I can 
do now in a day.

Everyone, thanks for the responses. I got some great ideas 
already to try out. I think at the end of the day, my code will 
be better performing than my old C++ version simply because I 
will be considering the costs of memory allocations which was 
something I really never thought about much before. I guess 
that's the positive side effect to the negative side effect of 
using a GC. I agree like many of you have commented, having a GC 
is a pro-con trade off, positive in some ways, but not all. 
Optimize only where you need to, and let the GC deal with the 
rest.

--rt
December 14, 2012
Re: Significant GC performance penalty
On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
> On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
>> On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
> [...]
>> >It may be that the GC concept works far better in theory than 
>> >in
>> >practice, although due to the performance penalty 
>> >work-a-rounds,
>> >you may end up writing better performing apps because of it,
>> >however that's NOT the intention of having a GC!
>> 
>> When it comes to performance, there is always a compromise with
>> usability. Even malloc performs poorly compared to more manual
>> memory management. Even automatic register allocation by the
>> compiler can lead to poor performance.
>> 
>> The only question is where you want to draw the line between
>> usability and performance.
>
> Yeah. If you want to squeeze out every last drop of juice your 
> CPU's got
> to offer you, you could code directly in assembler, and no 
> optimizing
> compiler, GC or no GC, will be able to beat that.
>

I think it depends on what you're trying to achieve.

If coding for resource constrained processors, or taking 
advantage of
special SIMD instructions, then I agree.

On the other hand if you're targeting processors with multiple 
execution
units, instruction re-ordering, multiple cache levels, NUMA, ..., 
then it is
another game level trying to beat the compiler. And when you win, 
it will
be for a specific set of processor + motherboard + memory 
combination.

Usually the compiler is way better keeping track of all possible 
instruction
combinations for certain scenarios.

Well this is just my opinion with my compiler design aficionado 
on, some guys here might prove me wrong.

--
Paulo
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home