Thread overview
Can D's Garbage Collector be Very Fast?
May 08, 2019
Vijay Nayar
May 08, 2019
Nicholas Wilson
May 08, 2019
Vijay Nayar
May 08, 2019
sarn
May 09, 2019
Per Nordlöw
May 09, 2019
sarn
May 08, 2019
Alex
May 10, 2019
user1234
May 08, 2019
This is a very interesting read that talks about Go's evolution of their garbage collector.  They have benchmarks showing a "stop the world" pause of between 5 and 10 milliseconds consistently.

https://blog.golang.org/ismmkeynote

> So what about write barriers? The write barrier is on only during the GC. At other times the compiled code loads a global variable and looks at it. Since the GC was typically off the hardware correctly speculates to branch around the write barrier. When we are inside the GC that variable is different, and the write barrier is responsible for ensuring that no reachable objects get lost during the tri-color operations.

There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using @nogc?
May 08, 2019
On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:
> This is a very interesting read that talks about Go's evolution of their garbage collector.  They have benchmarks showing a "stop the world" pause of between 5 and 10 milliseconds consistently.
>
> https://blog.golang.org/ismmkeynote
>
>> So what about write barriers? The write barrier is on only during the GC. At other times the compiled code loads a global variable and looks at it. Since the GC was typically off the hardware correctly speculates to branch around the write barrier. When we are inside the GC that variable is different, and the write barrier is responsible for ensuring that no reachable objects get lost during the tri-color operations.
>
> There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using @nogc?

Thanks to Rainers we now have parallel mark phase of the mark & sweep.
May 08, 2019
On Wednesday, 8 May 2019 at 10:41:54 UTC, Nicholas Wilson wrote:
> On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:
>
> Thanks to Rainers we now have parallel mark phase of the mark & sweep.

That sounds great! Do any benchmarks exist to give an approximate idea of how long D GC events take?  Is it on the order of seconds, hundreds of milliseconds, tens of milliseconds, or less?


May 08, 2019
On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:
> This is a very interesting read that talks about Go's evolution of their garbage collector.  They have benchmarks showing a "stop the world" pause of between 5 and 10 milliseconds consistently.
>
> https://blog.golang.org/ismmkeynote
>
>> So what about write barriers? The write barrier is on only during the GC. At other times the compiled code loads a global variable and looks at it. Since the GC was typically off the hardware correctly speculates to branch around the write barrier. When we are inside the GC that variable is different, and the write barrier is responsible for ensuring that no reachable objects get lost during the tri-color operations.
>
> There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using @nogc?

D's GC is based on ancient ideas, it could be improved... it is not if but who. It takes work and no one is currently willing to put in the necessary work to make it happen.

D's GC could, for example, use a parallel thread which determines orphans and then release them. One does not have to stop the world if it knows a memory block is an orphan.

That is, if no pointers access a memory block then one knows absolutely it is orphaned(within proper program design). Reference counting is effectively tracking this.

The idea is to offload much of the work as possible to a parallel process that is provably safe and leave the work that requires stop the world to that process.

That is just one of many things that can be done. One of the problems is that D does not maintain contextual information that it could derive from the program structure itself to also reduce these problems. For example, not all memory is allocated as equal but treating it as such forces one in to the same algorithm to release it.

There has been some work on D's GC but the GC has not really been a focal point.

One can use custom memory management when needed and deal along with other patterns and it ideally reduces the GC dependence.

The worse thing about D's GC is it's implicit use in some operations such as AA's. It makes it much more difficult to reason about what really is going on in a program and where and forces one to use the GC even if it truly is not necessary.


May 08, 2019
On Wednesday, 8 May 2019 at 11:48:43 UTC, Vijay Nayar wrote:
> That sounds great! Do any benchmarks exist to give an approximate idea of how long D GC events take?  Is it on the order of seconds, hundreds of milliseconds, tens of milliseconds, or less?

I just posted this a couple of weeks ago:
https://theartofmachinery.com/2019/04/26/bpftrace_d_gc.html

It was <0.5ms typical, ~1ms max for collections on a dscanner workload (8GB RAM laptop), but maybe you want to do your own benchmarks.  Even if you can't use bpftrace to get the full timing histogram, the built-in GC profiler can easily give you the max pause time for a D program.  (Read the full post for details.)
May 09, 2019
On Wednesday, 8 May 2019 at 22:43:54 UTC, sarn wrote:
> It was <0.5ms typical, ~1ms max for collections on a dscanner workload (8GB RAM laptop), but maybe you want to do your own benchmarks.  Even if you can't use bpftrace to get the full timing histogram, the built-in GC profiler can easily give you the max pause time for a D program.  (Read the full post for details.)

Where these tests run using DMD or LDC?

It would be very interesting to rerun these tests with DMD 2.087 (or current master), having a GC with a parallel marking phase, when it has been released.
May 09, 2019
On Thursday, 9 May 2019 at 08:20:18 UTC, Per Nordlöw wrote:
> On Wednesday, 8 May 2019 at 22:43:54 UTC, sarn wrote:
>> It was <0.5ms typical, ~1ms max for collections on a dscanner workload (8GB RAM laptop), but maybe you want to do your own benchmarks.  Even if you can't use bpftrace to get the full timing histogram, the built-in GC profiler can easily give you the max pause time for a D program.  (Read the full post for details.)
>
> Where these tests run using DMD or LDC?

dscanner was built with DMD (version 2.081, I think).

> It would be very interesting to rerun these tests with DMD 2.087 (or current master), having a GC with a parallel marking phase, when it has been released.

I might do more profiling one day.  Remember: the built-in GC profiler requires no setup at all, so if you're interested enough to do a quick comparison of worst-case timings, it's very easy to just do it.
May 10, 2019
On Wednesday, 8 May 2019 at 10:27:43 UTC, Vijay Nayar wrote:
> This is a very interesting read that talks about Go's evolution of their garbage collector.  They have benchmarks showing a "stop the world" pause of between 5 and 10 milliseconds consistently.
>
> https://blog.golang.org/ismmkeynote
>
>> So what about write barriers? The write barrier is on only during the GC. At other times the compiled code loads a global variable and looks at it. Since the GC was typically off the hardware correctly speculates to branch around the write barrier. When we are inside the GC that variable is different, and the write barrier is responsible for ensuring that no reachable objects get lost during the tri-color operations.
>
> There are obviously cases where even such a pause is not permitted, but these are the cases where custom allocators will likely be written in any language. Given that the write-barrier is only on when the Garbage Collector is being run, could D also do something similar to have a very fast garbage collector, but also allow even that small performance penalty of a write barrier during GC collections to be avoided entirely while using @nogc?

I'm not sure if I understand your concept of "write barrier". During collection it's not just a write barrier, it's a complete world that's stopped and the only thing that can run is a of low level thread written in a C style and allocating in a C style. Everything else would crash the program or not would not work at all.