Thread overview
Eclipse OMR project provides a reusable Garbage Collector
Dec 03, 2016
Dibyendu Majumdar
Dec 03, 2016
Dibyendu Majumdar
Dec 03, 2016
Chris Wright
Dec 03, 2016
Dibyendu Majumdar
Dec 03, 2016
Dibyendu Majumdar
Dec 03, 2016
Chris Wright
Dec 03, 2016
Chris Wright
Dec 04, 2016
Chris Wright
Dec 03, 2016
Dibyendu Majumdar
December 03, 2016
Hi

Not sure if I am repeating information already known here - if so I do apologise!

IBM has open sourced the J9 Garbage Collector as part of Eclipse OMR project. Perhaps D could use this. It is claimed that the GC implementation is relatively straight forward to reuse.

https://github.com/eclipse/omr

Regards
Dibyendu
December 03, 2016
On Saturday, 3 December 2016 at 02:23:13 UTC, Dibyendu Majumdar wrote:
>
> IBM has open sourced the J9 Garbage Collector as part of Eclipse OMR project. Perhaps D could use this. It is claimed that the GC implementation is relatively straight forward to reuse.
>
> https://github.com/eclipse/omr
>

The documentation is a bit sparse but here is some additional info:

http://www.slideshare.net/craiglehmann/the-omr-gc-talk-ruby-kaigi-2015



December 03, 2016
On Sat, 03 Dec 2016 02:23:13 +0000, Dibyendu Majumdar wrote:
> IBM has open sourced the J9 Garbage Collector as part of Eclipse OMR project. Perhaps D could use this. It is claimed that the GC implementation is relatively straight forward to reuse.

That's awesome!

Unfortunately, it looks like incorporating it would take a fair bit of effort, and upstreaming it will not happen. I'm going to take a deeper look this weekend, but it's not looking good so far.

Druntime is under the Boost public license and will remain under it. OMR is dual licensed Apache2 / Eclipse Public License.

Druntime's GC system is pluggable, but the GC instance is set in a private variable in the `gc.proxy` module. You need to modify the runtime to add another GC implementation -- it's not something an application can just insert, and it's not something you can override just by linking in another library. (As far as I can determine. Corrections welcome.) And that makes the licensing problem much harder to work around.

That said, let's look at the API:

omrobjectptr_t OMR_GC_Allocate(OMR_VMThread * omrVMThread, uintptr_t allocationCategory, uintptr_t size, uintptr_t allocateFlags);

omrobjectptr_t OMR_GC_AllocateNoGC(OMR_VMThread * omrVMThread, uintptr_t allocationCategory, uintptr_t size, uintptr_t allocateFlagss);

omr_error_t OMR_GC_SystemCollect(OMR_VMThread* omrVMThread, uint32_t gcCode);

First parameter is a pointer to an OMR thread. I'd have to rewrite core.thread to use OMR's thread system, or maybe I could fake OMR threads with enough effort. That's another barrier.

The `allocationCategory` element is basically an object that describes the object model and lets OMR access type information. Looks like it expects us to be able to get the size of an object from a pointer to it. D's runtime *does* let you get this information for heap-allocated objects, but for arrays and heap-allocated structs, it does so by accessing the GC's internal data structures...yeah.

In D, you can malloc memory or memory map a file and put pointers in there. You manage this with the runtime by calling `GC.addRoot`. There doesn't appear to be an equivalent in OMR's GC. (Hard to tell, since it clocks in at 60KLOC.)

I *think* the AllocateNoGC function allocates but guarantees it won't collect memory. Not entirely certain. D's current runtime rather requires that.
December 03, 2016
On Saturday, 3 December 2016 at 06:26:22 UTC, Chris Wright wrote:
> On Sat, 03 Dec 2016 02:23:13 +0000, Dibyendu Majumdar wrote:
>> IBM has open sourced the J9 Garbage Collector as part of Eclipse OMR project. Perhaps D could use this. It is claimed that the GC implementation is relatively straight forward to reuse.
>
> That's awesome!
>
> Unfortunately, it looks like incorporating it would take a fair bit of effort, and upstreaming it will not happen. I'm going to take a deeper look this weekend, but it's not looking good so far.
>

I would imagine that some effort is necessary ... but the payoff is huge isn't it ? To have a robust GC would change the game for D.

> That said, let's look at the API:
>
> omrobjectptr_t OMR_GC_Allocate(OMR_VMThread * omrVMThread, uintptr_t allocationCategory, uintptr_t size, uintptr_t allocateFlags);
>
> omrobjectptr_t OMR_GC_AllocateNoGC(OMR_VMThread * omrVMThread, uintptr_t allocationCategory, uintptr_t size, uintptr_t allocateFlagss);
>
> omr_error_t OMR_GC_SystemCollect(OMR_VMThread* omrVMThread, uint32_t gcCode);
>
> First parameter is a pointer to an OMR thread. I'd have to rewrite core.thread to use OMR's thread system, or maybe I could fake OMR threads with enough effort. That's another barrier.
>

I think all GCs need to interact with threads so there has to be some linkage between the thread sub-system and GC.

I saw a slide deck or some docs that explained the requirements to use OMR GC. Unfortunately I did not save a link and cannot locate this now. I will post a link if I find it again.

It said that the OMR expects objects to be allocated on 8-byte aligned memory and it requires one GC byte - which is assumed to be at the start of the allocated object but can be changed if required.

Regards
Dibyendu

December 03, 2016
On Saturday, 3 December 2016 at 10:29:02 UTC, Dibyendu Majumdar wrote:
>> That said, let's look at the API:
>>
>> omrobjectptr_t OMR_GC_Allocate(OMR_VMThread * omrVMThread, uintptr_t allocationCategory, uintptr_t size, uintptr_t allocateFlags);
>>
>> omrobjectptr_t OMR_GC_AllocateNoGC(OMR_VMThread * omrVMThread, uintptr_t allocationCategory, uintptr_t size, uintptr_t allocateFlagss);
>>
>> omr_error_t OMR_GC_SystemCollect(OMR_VMThread* omrVMThread, uint32_t gcCode);
>>
>> First parameter is a pointer to an OMR thread. I'd have to rewrite core.thread to use OMR's thread system, or maybe I could fake OMR threads with enough effort. That's another barrier.
>>
>
> I think all GCs need to interact with threads so there has to be some linkage between the thread sub-system and GC.
>
> I saw a slide deck or some docs that explained the requirements to use OMR GC. Unfortunately I did not save a link and cannot locate this now. I will post a link if I find it again.
>
> It said that the OMR expects objects to be allocated on 8-byte aligned memory and it requires one GC byte - which is assumed to be at the start of the allocated object but can be changed if required.
>

Here is the video that describes how to use the GC. Says initial integration should take <100 lines of code! GC details are about 30 minutes into the talk.

https://developer.ibm.com/open/videos/eclipse-omr-tech-talk/

December 03, 2016
On Saturday, 3 December 2016 at 02:23:13 UTC, Dibyendu Majumdar wrote:
> IBM has open sourced the J9 Garbage Collector as part of Eclipse OMR project. Perhaps D could use this. It is claimed that the GC implementation is relatively straight forward to reuse.
>
> https://github.com/eclipse/omr
>

BTW the .Net Core CLR also appears to be reusable - and maybe even easier to integrate. There is a standalone sample - and the integration interface seems well defined.

https://github.com/dotnet/coreclr/tree/master/src/gc
December 03, 2016
On Sat, 03 Dec 2016 10:29:02 +0000, Dibyendu Majumdar wrote:
> I would imagine that some effort is necessary ... but the payoff is huge isn't it ? To have a robust GC would change the game for D.

It's possible that ldc and gdc will switch their runtimes to use OMR, but dmd will not.

I might be able to provide an alternate runtime with dub. Will have to investigate. If that works, we might end up splitting the runtime in two -- a lot of the runtime won't need to change, and I'd rather not have the maintenance overhead for typeinfo and unicode handling just to provide a better GC.

> I think all GCs need to interact with threads so there has to be some linkage between the thread sub-system and GC.

True. If this project were designed to be used a la carte, it would have a pluggable mechanism for identifying thread stacks and pausing threads. That would make it much easier to port a runtime to OMR, but it would make OMR more complex -- and it's already 775kLOC.

It adds more work to my plate, but I can deal.

> I saw a slide deck or some docs that explained the requirements to use OMR GC. Unfortunately I did not save a link and cannot locate this now. I will post a link if I find it again.

Thanks! Watching it now.

> It said that the OMR expects objects to be allocated on 8-byte aligned memory and it requires one GC byte - which is assumed to be at the start of the allocated object but can be changed if required.

One byte for the GC, or one byte of overhead for the glue layer to use?

Assuming that all structs are padded to word boundaries, one byte would allow a 256-word struct. I can create a struct with 10,000 fields, and each can be more than one word.
December 03, 2016
On Sat, 03 Dec 2016 15:21:02 +0000, Chris Wright wrote:

> On Sat, 03 Dec 2016 10:29:02 +0000, Dibyendu Majumdar wrote:
>> I would imagine that some effort is necessary ... but the payoff is
>> huge isn't it ? To have a robust GC would change the game for D.
>> It said that the OMR expects objects to be allocated on 8-byte aligned
>> memory and it requires one GC byte - which is assumed to be at the
>> start of the allocated object but can be changed if required.
> 
> One byte for the GC, or one byte of overhead for the glue layer to use?
> 
> Assuming that all structs are padded to word boundaries, one byte would allow a 256-word struct. I can create a struct with 10,000 fields, and each can be more than one word.

Also, OMR assumes there are no unaligned pointers to GC memory. So if I wrote the following (perfectly valid) D code, it would crash:

class Foo {
  bool[4] bools;
}
auto a = &new Foo().bools[1];
December 04, 2016
On Sat, 03 Dec 2016 15:45:00 +0000, Chris Wright wrote:
> Also, OMR assumes there are no unaligned pointers to GC memory.

Also no pointers-to-members, but I have a way around both these problems. Obvious in hindsight.

The memory overhead is O(#allocations). The time overhead is O(lg #allocations) per pointer reference per scan.

So we won't see numbers as impressive as Java's, unfortunately -- not unless we forbid pointers-to-members and rework array slicing.


The details:

When I allocate a thing, I insert an entry for the thing into an ordered set of allocated ranges. The entry will be a (begin, end) pointer pair for the allocation. I can get this info because allocations tell me how much to allocate, and OMR's GC gives me a pointer.

When inserting that allocated range into the set, I will have to erase any overlapping elements. This shouldn't be difficult -- as long as the no-overlapping-ranges invariant was maintained before this call, I only have to look one element to the left, and it's quick to determine how far to the right I need to look.

When scanning an object, we find the pointers in that object as we currently do. Then, instead of enqueueing those pointers to be scanned, we look them up in the allocation set and enqueue the allocation's `begin` pointer.

Pretty much what we do today. It's just replacing a little less of the GC than we would prefer.