| Thread overview | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
November 16, 2014 A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
I always wondered why we would use the shared keyword on GC allocations if only the stack can be optimized for TLS Storage.
After thinking about how shared objects should work with the GC, it's become obvious that the GC should be optimized for local data. Anything shared would have to be manually managed, because the biggest slowdown of all is stopping the world to facilitate concurrency.
With a precise GC on the way, it's become easy to filter out allocations from shared objects. Simply proxy them through malloc and get right of the locks. Make the GC thread-local, and you can expect it to scale with the number of processors.
Any thread-local data should already have to be duplicated into a shared object to be used from another thread, and the lifetime is easy to manage manually.
SomeTLS variable = new SomeTLS("Data");
shared SomeTLS variable2 = cast(shared) variable.dupShared();
Tid tid = spawn(&doSomething, variable2);
variable = receive!variable2(tid).dupLocal();
delete variable2;
Programming with a syntax that makes use of shared objects, and forces manual management on those, seems to make "stop the world" a thing of the past. Any thoughts?
| ||||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Etienne | We'll have to change the way "immutable" is treated for allocations. Which I think is a good thing. Just because something can be shared doesn't meant that I intend to share it. | |||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Etienne | On Sunday, 16 November 2014 at 13:58:19 UTC, Etienne wrote:
>
> I always wondered why we would use the shared keyword on GC allocations if only the stack can be optimized for TLS Storage.
>
> After thinking about how shared objects should work with the GC, it's become obvious that the GC should be optimized for local data. Anything shared would have to be manually managed, because the biggest slowdown of all is stopping the world to facilitate concurrency.
>
> With a precise GC on the way, it's become easy to filter out allocations from shared objects. Simply proxy them through malloc and get right of the locks. Make the GC thread-local, and you can expect it to scale with the number of processors.
>
> Any thread-local data should already have to be duplicated into a shared object to be used from another thread, and the lifetime is easy to manage manually.
>
> SomeTLS variable = new SomeTLS("Data");
> shared SomeTLS variable2 = cast(shared) variable.dupShared();
> Tid tid = spawn(&doSomething, variable2);
> variable = receive!variable2(tid).dupLocal();
> delete variable2;
>
> Programming with a syntax that makes use of shared objects, and forces manual management on those, seems to make "stop the world" a thing of the past. Any thoughts?
How about immutable data which is implicitly shareable? Granted you can destroy/free the data asynchronously, but you would still need to check all threads for references to that data.
| |||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Xinok | On 2014-11-16 10:21, Xinok wrote:
> How about immutable data which is implicitly shareable? Granted you can
> destroy/free the data asynchronously, but you would still need to check
> all threads for references to that data.
Immutable data would proxy through malloc and would not be scanned as it can only contain immutable data that cannot be deleted nor scanned.
This is also shared by every thread without any locking. Currently, immutable data is global in storage but may be local in access rights I think? I would have assumed it would automatically be in the .rdata process segments.
| |||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On 2014-11-16 10:20, Sean Kelly wrote:
> We'll have to change the way "immutable" is treated for allocations.
> Which I think is a good thing. Just because something can be shared
> doesn't meant that I intend to share it.
Exactly, I'm not sure how DMD currently handles immutable but it should automatically be mangled in the global namespace in the application data.
If this seems feasible to everyone I wouldn't mind forking the precise GC into a thread-local library, without any "stop the world" slowdown.
A laptop with 4 cores in a multi-threaded application would (theoretically) run through the marking/collect process 4 times faster, and allocate unbelievably faster due to no locks :)
The only problem is having to manually allocate shared objects, which seems fine because most of the time they'd be deallocated in shared static ~this anyways.
| |||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Etienne Cimon | This GC model also seems to work fine for locally-allocated __gshared objects. Since they're registered locally but available globally, they'll be collected once the thread that created it is gone. Also, when an object is cast(shared) before being sent to another thread, it's usually still in scope once the other thread returns. So there seems to be some very thin chances that existing code will be broken with a thread-local GC. | |||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Etienne | On Sunday, 16 November 2014 at 13:58:19 UTC, Etienne wrote:
> After thinking about how shared objects should work with the GC, it's become obvious that the GC should be optimized for local data. Anything shared would have to be manually managed, because the biggest slowdown of all is stopping the world to facilitate concurrency.
If you go for thread local garbage collection then there is no reason for being more general and support per-data-structure garbage collection as well. That's more useful, it can be used for collecting cycles in graphs. Just let the application initiate collection when there are no reference pointing into it.
But keep in mind that you also have to account for fibers that move between threads.
| |||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Sunday, 16 November 2014 at 17:38:54 UTC, Ola Fosheim Grøstad wrote:
>
> But keep in mind that you also have to account for fibers that move between threads.
Yes. There are a lot of little "gotchas" with thread-local allocation.
| |||
November 16, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On Sunday, 16 November 2014 at 17:40:30 UTC, Sean Kelly wrote:
> On Sunday, 16 November 2014 at 17:38:54 UTC, Ola Fosheim Grøstad wrote:
>>
>> But keep in mind that you also have to account for fibers that move between threads.
>
> Yes. There are a lot of little "gotchas" with thread-local allocation.
I can't even think of a situation when this would be necessary. It sounds like all I would need is to take the precise GC and store each instance in the thread data, I'll probably only need the rtinfo to see if it's shared during allocation to proxy towards malloc. Am I missing something?
| |||
November 17, 2014 Re: A different, precise TLS garbage collector? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Etienne | On Sunday, 16 November 2014 at 19:13:27 UTC, Etienne wrote:
> I can't even think of a situation when this would be necessary. It sounds like all I would need is to take the precise GC and store each instance in the thread data, I'll probably only need the rtinfo to see if it's shared during allocation to proxy towards malloc. Am I missing something?
There is a reason for why "elegant" GC languages pick one primary type of concurrency.
If you say that all code is running on a fiber and that there is no such thing as thread local, then you can tie the local GC partition to the fiber and collect it on any thread.
If you say that functions called from a fiber sometimes call into global statespace, sometimes into thread statespace and sometimes into fiber statespace… then you need to figure out ownership on all allocations. Does the allocated object belong to a global database, a thread local database or a fiber cache which is flushed automatically when moving to a new thread? Or is it an extension of the fiber statespace that should be transparent to threads?
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply