Significant GC performance penalty - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Significant GC performance penalty

Thread overview

Significant GC performance penalty
Dec 14, 2012 Rob T
Dec 14, 2012 Peter Alexander
Dec 14, 2012 Rob T
Dec 14, 2012 Peter Alexander
Dec 14, 2012 H. S. Teoh
Dec 14, 2012 Rob T
Dec 14, 2012 Paulo Pinto
Dec 16, 2012 SomeDude
Dec 16, 2012 Rob T
Dec 16, 2012 bearophile
Dec 16, 2012 jerro
Dec 16, 2012 John Colvin
Dec 16, 2012 Rob T
Dec 16, 2012 SomeDude
Dec 14, 2012 bearophile
Dec 14, 2012 Paulo Pinto
Dec 14, 2012 H. S. Teoh
Dec 15, 2012 Jacob Carlborg
Dec 15, 2012 Mike Parker
Dec 15, 2012 Rob T
Dec 16, 2012 Jacob Carlborg

December 14, 2012

Significant GC performance penalty

Posted by Rob T

Rob T

I created a D library wrapper for sqlite3 that uses a dynamically constructed result list for returned records from a SELECT statement. It works in a similar way to a C++ version that I wrote a while back.

The D code is D code, not a cloned up version of my earlier C++ code, so it makes use of many of the features of D, and one of them is the garbage collector.

When running comparison tests between the C++ version and the D version, both compiled using performance optimization flags, the C++ version runs 3x faster than the D version which was very unexpected. If anything I was hoping for a performance boost out of D or at least the same performance levels.

I remembered reading about people having performance problems with the GC, so I tried a quick fix, which was to disable the GC before the SELECT is run and re-enable afterwards. The result of doing that was a 3x performance boost, making the DMD compiled version run almost as fast as the C++ version. The DMD compiled version is now only 2 seconds slower on my stress test runs of a SELECT that returns 200,000+ records with 14 fields. Not too bad! I may get identical performance if I compile using gdc, but that will have to wait until it is updated to 2.061.

Fixing this was a major relief since the code is expected to be used in a commercial setting. I'm wondering though, why the GC causes such a large penalty, and what negative effect if any if there will be when disabling the GC temporarily. I know that memory won't be reclaimed until the GC is re-enabled, but is there anything else to worry about?

I feel it's worth commenting on my experience as feed back for the D developers and anyone else starting off with D.

Coming from C++ I *really* did not like having the GC, it made me very nervous, but now that I'm used to having it, I've come to like having it up to a point. It really does change the way you think and code. However as I've discovered, you still have to always be thinking about memory management issues because the GC can eat up a huge performance penalty under certain situations. I also NEED to know that I can always go full manual where necessary. There's no way I would want to give up that kind of control.

The trade off with having a GC seems to be that by default, C++ apps will perform considerably faster than equivalent D apps out-of-the-box, simply because the manual memory management is fine tuned by the programmer as the development proceeds. With D, when you simply let the GC take care of business, then you are not necessarily fine tuning as you go along, and when you do not take the resulting performance hit into consideration it means that your apps will likely perform poorly compared to a C++ equivalent. However, building the equivalent app in D is a much more pleasant experience in terms of the programming productivity gain. The code is simpler to deal with, and there's less to worry about with pointers and other memory management issues.

What I have not yet had the opportunity to explore, is using D in full manual memory management mode. My understanding is that if I take that route, then I cannot use certain parts of the std lib, and will also loose a few of the nice features of D that make it fun to work with. I'm not fully clear though on what to expect, so if there's any detailed information to look at, it would be a big help.

I wonder what can be done to allow a programmer to go fully manual, while not loosing any of the nice features of D?

Also, I think everyone agrees we really need a better GC, and I wonder once we do get a better GC, what kind of overall improvements we can expect to see?

Thanks for listening.

--rt

December 14, 2012

Re: Significant GC performance penalty

Posted by Peter Alexander
in reply to Rob T

Peter Alexander

Posted in reply to Rob T

Allocating memory is simply slow. The same is true in C++ where you will see performance hits if you allocate memory too often. The GC makes things worse, but if you really care about performance then you'll avoid allocating memory so often.

Try to pre-allocate as much as possible, and use the stack instead of the heap where possible. Fixed size arrays and structs are your friend.

I avoid using the GC when using D and I feel like I still have a lot of freedom of expression, but maybe I'm just used to it.

December 14, 2012

Re: Significant GC performance penalty

Posted by bearophile
in reply to Rob T

bearophile

Posted in reply to Rob T

Rob T:

> I wonder what can be done to allow a programmer to go fully manual, while not loosing any of the nice features of D?

Even the Rust language, that has a more powerful type system than D, with region analysis and more, sometimes needs localized reference counting (or a localized per-thread GC) to allow the usage of its full features. So I don't think you can have all the nice features of D without its GC.

I believe the D design has bet too much on its (not precise) GC. Now the design of Phobos & D needs to show more love for stack allocations (see Variable Length arrays, array literals, etc), for some alternative allocators like reaps (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.6505 ), and so on. Someone implemented a stack-like data manager for D, but the voting didn't allow it into Phobos.

Bye,
bearophile

December 14, 2012

Re: Significant GC performance penalty

Posted by Rob T
in reply to Peter Alexander

Rob T

Posted in reply to Peter Alexander

On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander wrote:
> Allocating memory is simply slow. The same is true in C++ where you will see performance hits if you allocate memory too often. The GC makes things worse, but if you really care about performance then you'll avoid allocating memory so often.
>
> Try to pre-allocate as much as possible, and use the stack instead of the heap where possible. Fixed size arrays and structs are your friend.

In my situation, I can think of some ways to mitigate the memory allocation  problem, however it's a bit tricky when SELECT statement results have to be dynamically generated, since the number of rows returned and size and type of the rows are always different depending on the query and the data stored in the database. It's just not at all practical to custom fit for each SELECT to a pre-allocated array or list, it'll just be far too much manual effort.

I could consider generating a free list of pre-allocated record components that is re-used rather than destroyed and reallocated. However knowing how many records to pre-allocate is tricky and I could run out, or waste tons of RAM for nothing most of the time. End of day, I may be better off digging into the GC source code itself and look for solutions.

> I avoid using the GC when using D and I feel like I still have a lot of freedom of expression, but maybe I'm just used to it.

I'd like to do that too and wish I had your experience with how you are going about it. My fear is that I'll end up allocating without knowing and having my apps silently eat up memory over time.

At the end of the day, there's just no point in having a GC if you don't want to use it, so the big question is if a GC can be made to work much better than what we have? Supposedly yes, but will the improvements really matter? I somehow doubt it will.

When I look at GC based apps, what they all seem to have in common, is that they tend to eat up vast amounts of RAM for nothing and perform poorly. I'm speaking mostly about Java apps, they are terrible with performance and memory foot print in general. But also C++ apps that use built in GC tend to have similar issues.

It may be that the GC concept works far better in theory than in practice, although due to the performance penalty work-a-rounds, you may end up writing better performing apps because of it, however that's NOT the intention of having a GC!

--rt

December 14, 2012

Re: Significant GC performance penalty

Posted by Paulo Pinto
in reply to Rob T

Paulo Pinto

Posted in reply to Rob T

On Friday, 14 December 2012 at 18:27:29 UTC, Rob T wrote:
> I created a D library wrapper for sqlite3 that uses a dynamically constructed result list for returned records from a SELECT statement. It works in a similar way to a C++ version that I wrote a while back.
>
> The D code is D code, not a cloned up version of my earlier C++ code, so it makes use of many of the features of D, and one of them is the garbage collector.
>
> When running comparison tests between the C++ version and the D version, both compiled using performance optimization flags, the C++ version runs 3x faster than the D version which was very unexpected. If anything I was hoping for a performance boost out of D or at least the same performance levels.
>
> I remembered reading about people having performance problems with the GC, so I tried a quick fix, which was to disable the GC before the SELECT is run and re-enable afterwards. The result of doing that was a 3x performance boost, making the DMD compiled version run almost as fast as the C++ version. The DMD compiled version is now only 2 seconds slower on my stress test runs of a SELECT that returns 200,000+ records with 14 fields. Not too bad! I may get identical performance if I compile using gdc, but that will have to wait until it is updated to 2.061.
>
> Fixing this was a major relief since the code is expected to be used in a commercial setting. I'm wondering though, why the GC causes such a large penalty, and what negative effect if any if there will be when disabling the GC temporarily. I know that memory won't be reclaimed until the GC is re-enabled, but is there anything else to worry about?
>
> I feel it's worth commenting on my experience as feed back for the D developers and anyone else starting off with D.
>
> Coming from C++ I *really* did not like having the GC, it made me very nervous, but now that I'm used to having it, I've come to like having it up to a point. It really does change the way you think and code. However as I've discovered, you still have to always be thinking about memory management issues because the GC can eat up a huge performance penalty under certain situations. I also NEED to know that I can always go full manual where necessary. There's no way I would want to give up that kind of control.
>
> The trade off with having a GC seems to be that by default, C++ apps will perform considerably faster than equivalent D apps out-of-the-box, simply because the manual memory management is fine tuned by the programmer as the development proceeds. With D, when you simply let the GC take care of business, then you are not necessarily fine tuning as you go along, and when you do not take the resulting performance hit into consideration it means that your apps will likely perform poorly compared to a C++ equivalent. However, building the equivalent app in D is a much more pleasant experience in terms of the programming productivity gain. The code is simpler to deal with, and there's less to worry about with pointers and other memory management issues.
>
> What I have not yet had the opportunity to explore, is using D in full manual memory management mode. My understanding is that if I take that route, then I cannot use certain parts of the std lib, and will also loose a few of the nice features of D that make it fun to work with. I'm not fully clear though on what to expect, so if there's any detailed information to look at, it would be a big help.
>
> I wonder what can be done to allow a programmer to go fully manual, while not loosing any of the nice features of D?
>
> Also, I think everyone agrees we really need a better GC, and I wonder once we do get a better GC, what kind of overall improvements we can expect to see?
>
> Thanks for listening.
>
> --rt

Having lots of experience in GC enabled languages, even for systems programming (Oberon & Active Oberon).

I think there a few issues to consider:

- D's GC still has a lot of room to improve, so some of the issues you have found might eventually get improved;

- Having GC support, does not mean to do call new like crazy, one still needs to think how to code in a GC friendly way;

- Make proper use of weak references in case they are available;

- GC enabled languages runtimes usually offer ways to peak into the runtime, somehow, and allow the developer to understand how GC is working and what might be improved;

The goodness of having a GC is to have a safer way to manage memory across multiple modules, specially when ownership is not clear.

Even in C++ I seldom do manual memory management nowadays, if working on new codebases. Of course, others will have a different experience.

Other than that, thanks for sharing your experience.

--
Paulo

December 14, 2012

Re: Significant GC performance penalty

Posted by Peter Alexander
in reply to Rob T

Peter Alexander

Posted in reply to Rob T

On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
> In my situation, I can think of some ways to mitigate the memory allocation  problem, however it's a bit tricky when SELECT statement results have to be dynamically generated, since the number of rows returned and size and type of the rows are always different depending on the query and the data stored in the database. It's just not at all practical to custom fit for each SELECT to a pre-allocated array or list, it'll just be far too much manual effort.

Maybe I have misunderstood, but it sounds to me like you could
get away with a single allocation there. Just reducing the number
of allocations will improve things a lot.

>> I avoid using the GC when using D and I feel like I still have a lot of freedom of expression, but maybe I'm just used to it.
>
> I'd like to do that too and wish I had your experience with how you are going about it. My fear is that I'll end up allocating without knowing and having my apps silently eat up memory over time.

This shouldn't be a problem. I occasionally recompile druntime
with a printf inside the allocation function just to make sure,
but normally I can tell if memory allocations are going on
because of the sudden GC pauses.

> At the end of the day, there's just no point in having a GC if you don't want to use it, so the big question is if a GC can be made to work much better than what we have? Supposedly yes, but will the improvements really matter? I somehow doubt it will.

D's GC has a lot of headroom for improvement. A generational GC
will likely improve things a lot.

> When I look at GC based apps, what they all seem to have in common, is that they tend to eat up vast amounts of RAM for nothing and perform poorly. I'm speaking mostly about Java apps, they are terrible with performance and memory foot print in general. But also C++ apps that use built in GC tend to have similar issues.

The problem with Java is not just because of the GC. Java eats up
huge amounts of memory because it has no value-types, so
*everything* has to be allocated on the heap, and every object
has 16 bytes of overhead (on 64-bit systems) in addition to
memory manager overhead.

This is a great presentation on the subject of Java memory
efficiency:
http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

> It may be that the GC concept works far better in theory than in practice, although due to the performance penalty work-a-rounds, you may end up writing better performing apps because of it, however that's NOT the intention of having a GC!

When it comes to performance, there is always a compromise with
usability. Even malloc performs poorly compared to more manual
memory management. Even automatic register allocation by the
compiler can lead to poor performance.

The only question is where you want to draw the line between
usability and performance.

December 14, 2012

Re: Significant GC performance penalty

Posted by H. S. Teoh
in reply to Paulo Pinto

H. S. Teoh

Posted in reply to Paulo Pinto

On Fri, Dec 14, 2012 at 08:27:46PM +0100, Paulo Pinto wrote: [...]
> - Having GC support, does not mean to do call new like crazy, one still needs to think how to code in a GC friendly way;

It makes me think, though, that perhaps there is some way of optimizing the GC for recursive data structures where you only ever keep a reference to the head node, so that they can be managed in a much more efficient way than a structure where there may be arbitrary number of references to anything inside. I think this is a pretty common case, at least in the kind of code I encounter frequently.

Also, coming from C/C++, I have to say that my coding style has been honed over the years to think in terms of single-ownership structures, so even when coding in D I tend to write code that way. However, having the GC available means that there are some cases where using multiple references to stuff will actually improve GC (and overall) performance by eliminating the need to deep-copy stuff everywhere.

> - GC enabled languages runtimes usually offer ways to peak into the runtime, somehow, and allow the developer to understand how GC is working and what might be improved;
[...]

Yeah, I think for most applications, it's probably good enough to use the functions in core.memory (esp. enable, disable, collect, and minimize) to exercise some control over the GC so that you can use manual memory management in the important hotspots, and just let the GC do its thing in less important parts of the program. I think core.memory.minimize will solve the OP's concern about GC'd apps having bad memory footprints.

T

-- 
Computers aren't intelligent; they only think they are.

December 14, 2012

Re: Significant GC performance penalty

Posted by H. S. Teoh
in reply to Peter Alexander

H. S. Teoh

Posted in reply to Peter Alexander

On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
> On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
[...]
> >It may be that the GC concept works far better in theory than in practice, although due to the performance penalty work-a-rounds, you may end up writing better performing apps because of it, however that's NOT the intention of having a GC!
> 
> When it comes to performance, there is always a compromise with usability. Even malloc performs poorly compared to more manual memory management. Even automatic register allocation by the compiler can lead to poor performance.
> 
> The only question is where you want to draw the line between usability and performance.

Yeah. If you want to squeeze out every last drop of juice your CPU's got to offer you, you could code directly in assembler, and no optimizing compiler, GC or no GC, will be able to beat that.

But people stopped writing entire apps in assembler a long time ago. :-)

(I actually did that once, many years ago, for a real app that actually made a sale or two. It was a good learning experience, and helped me improve my coding skills just from knowing how the machine actually works under the hood, as well as learning why it's so important to write code in a well-structured way -- you have no choice when doing large-scale coding in assembler, 'cos otherwise your assembly code quickly devolves into a spaghetti paste soup that no human can possibly comprehend. So I'd say it was a profitable, even rewarding experience. But I wouldn't do it again today, given the choice.)

T

-- 
Ruby is essentially Perl minus Wall.

December 14, 2012

Re: Significant GC performance penalty

Posted by Rob T
in reply to H. S. Teoh

Rob T

Posted in reply to H. S. Teoh

On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
>
> (I actually did that once, many years ago, for a real app that actually
> made a sale or two. It was a good learning experience, and helped me
> improve my coding skills just from knowing how the machine actually
> works under the hood, as well as learning why it's so important to write
> code in a well-structured way -- you have no choice when doing
> large-scale coding in assembler, 'cos otherwise your assembly code
> quickly devolves into a spaghetti paste soup that no human can possibly
> comprehend. So I'd say it was a profitable, even rewarding experience.
> But I wouldn't do it again today, given the choice.)
>
>
> T

Yeah, I did that too long ago and I'm happy to have learned the skills because it's the ultimate coding experience imaginable. If you don't do it very carefully, it goes all to hell just like you say. Best to let the machines do it these days, even if I could do it 10x better, it'll take me 100's of years to do what I can do now in a day.

Everyone, thanks for the responses. I got some great ideas already to try out. I think at the end of the day, my code will be better performing than my old C++ version simply because I will be considering the costs of memory allocations which was something I really never thought about much before. I guess that's the positive side effect to the negative side effect of using a GC. I agree like many of you have commented, having a GC is a pro-con trade off, positive in some ways, but not all. Optimize only where you need to, and let the GC deal with the rest.

--rt

December 14, 2012

Re: Significant GC performance penalty

Posted by Paulo Pinto
in reply to H. S. Teoh

Paulo Pinto

Posted in reply to H. S. Teoh

On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
> On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
>> On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
> [...]
>> >It may be that the GC concept works far better in theory than in
>> >practice, although due to the performance penalty work-a-rounds,
>> >you may end up writing better performing apps because of it,
>> >however that's NOT the intention of having a GC!
>> 
>> When it comes to performance, there is always a compromise with
>> usability. Even malloc performs poorly compared to more manual
>> memory management. Even automatic register allocation by the
>> compiler can lead to poor performance.
>> 
>> The only question is where you want to draw the line between
>> usability and performance.
>
> Yeah. If you want to squeeze out every last drop of juice your CPU's got
> to offer you, you could code directly in assembler, and no optimizing
> compiler, GC or no GC, will be able to beat that.
>

I think it depends on what you're trying to achieve.

If coding for resource constrained processors, or taking advantage of
special SIMD instructions, then I agree.

On the other hand if you're targeting processors with multiple execution
units, instruction re-ordering, multiple cache levels, NUMA, ..., then it is
another game level trying to beat the compiler. And when you win, it will
be for a specific set of processor + motherboard + memory combination.

Usually the compiler is way better keeping track of all possible instruction
combinations for certain scenarios.

Well this is just my opinion with my compiler design aficionado on, some guys here might prove me wrong.

--
Paulo

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation