Thread overview
GC performance: collection frequency
Sep 14, 2015
H. S. Teoh
Sep 14, 2015
Adam D. Ruppe
Sep 14, 2015
Jonathan M Davis
Sep 14, 2015
Jonathan M Davis
Sep 17, 2015
Dmitry Olshansky
September 14, 2015
Over in the d.learn forum, somebody posted a question about poor
performance in a text-parsing program. After a bit of profiling I
discovered that reducing GC collection frequency (i.e., GC.disable()
then manually call GC.collect() at some interval) improved program
performance by about 20%.

This isn't the first time I encountered this.  Some time ago (late last year IIRC) I found that in one of my own CPU-intensive programs, manually scheduling GC collection cycles won me about 30-40% performance improvement.

While two data points is hardly statistically significant, these two do seem to suggest that perhaps part of the GC's perceived poor performance may stem from an overly-zealous collection schedule.

Since asking users to implement their own GC collection schedule can be a bit onerous (not to mention greatly uglifying user code), would it be a good idea to make the GC collection schedule configurable?  At least that way, people can just call GC.collectSchedule(/*some value*/) as a first stab at improving overall performance, without needing to rewrite a whole bunch of code to avoid the GC, or go all-out @nogc.

We could also reduce the default collection frequency, of course, but lacking sufficient data I wouldn't know what value to set it to.


T

-- 
Computers shouldn't beep through the keyhole.
September 14, 2015
On Monday, 14 September 2015 at 18:51:36 UTC, H. S. Teoh wrote:
> We could also reduce the default collection frequency, of course, but lacking sufficient data I wouldn't know what value to set it to.

Definitely. I think it hits a case where it is right at the edge of the line and you are allocating a small amount.

So it is like the limit is 1,000 bytes. You are at 980 and ask it to allocate 30. So it runs a collection cycle, frees the 30 from the previous loop iteration, then allocates it again... so the whole loop, it is on the edge and runs very often.

Of course, it has to scan everything to ensure it is safe to free those 30 bytes so the GC then runs way out of proportion.

Maybe we can make the GC detect this somehow and bump up the size. I don't actually know the implementation that well though.
September 14, 2015
On Monday, 14 September 2015 at 18:58:45 UTC, Adam D. Ruppe wrote:
> On Monday, 14 September 2015 at 18:51:36 UTC, H. S. Teoh wrote:
>> We could also reduce the default collection frequency, of course, but lacking sufficient data I wouldn't know what value to set it to.
>
> Definitely. I think it hits a case where it is right at the edge of the line and you are allocating a small amount.
>
> So it is like the limit is 1,000 bytes. You are at 980 and ask it to allocate 30. So it runs a collection cycle, frees the 30 from the previous loop iteration, then allocates it again... so the whole loop, it is on the edge and runs very often.
>
> Of course, it has to scan everything to ensure it is safe to free those 30 bytes so the GC then runs way out of proportion.
>
> Maybe we can make the GC detect this somehow and bump up the size. I don't actually know the implementation that well though.

My first inclination would be to make it just allocate more memory and not run a collection if the last collection was too recent, but there are bound to be papers and studies on this sort of thing already. And the exact strategy to use likely depends heavily on the type of GC - e.g. if our GC were updated to be concurrent like we've talked about for a while now, then triggering a concurrent collection at 80% could make it so that the program didn't actually run out of memory while still not slowing it down much (just long enough to fork for the concurrent collection), whereas if we don't have a concurrent GC (like now), then triggering at 80% would just make things worse.

- Jonathan M Davis
September 14, 2015
On Monday, 14 September 2015 at 18:51:36 UTC, H. S. Teoh wrote:
> Over in the d.learn forum, somebody posted a question about poor
> performance in a text-parsing program. After a bit of profiling I
> discovered that reducing GC collection frequency (i.e., GC.disable()
> then manually call GC.collect() at some interval) improved program
> performance by about 20%.
>
> This isn't the first time I encountered this.  Some time ago (late last year IIRC) I found that in one of my own CPU-intensive programs, manually scheduling GC collection cycles won me about 30-40% performance improvement.
>
> While two data points is hardly statistically significant, these two do seem to suggest that perhaps part of the GC's perceived poor performance may stem from an overly-zealous collection schedule.
>
> Since asking users to implement their own GC collection schedule can be a bit onerous (not to mention greatly uglifying user code), would it be a good idea to make the GC collection schedule configurable?  At least that way, people can just call GC.collectSchedule(/*some value*/) as a first stab at improving overall performance, without needing to rewrite a whole bunch of code to avoid the GC, or go all-out @nogc.
>
> We could also reduce the default collection frequency, of course, but lacking sufficient data I wouldn't know what value to set it to.

Isn't there some amount of configuration that can currently be done via environment variables? Or was that just something that someone had done in one of the GC-related dconf talks that never made it into druntime proper? It definitely seemed like a good idea in any case.

- Jonathan M Davis
September 17, 2015
On 14-Sep-2015 21:47, H. S. Teoh via Digitalmars-d wrote:
> Over in the d.learn forum, somebody posted a question about poor
> performance in a text-parsing program. After a bit of profiling I
> discovered that reducing GC collection frequency (i.e., GC.disable()
> then manually call GC.collect() at some interval) improved program
> performance by about 20%.
>

One thing that any remotely production-quality GC does is analyze the result of collection with respect to minimal headroom - X % (typically 30-50%). If we freed Y % of heap where Y < X, then the GC should extend the heap so that it get within X % mark of free space in the extended heap.


-- 
Dmitry Olshansky