September 23, 2014
On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole wrote:
> Short, I dislike pretty much all changes to __gshared/shared. Breaks too many things.
> Atleast with Cmsed, (I'm evil here) where I use __gshared essentially as a read only variable but modifiable when starting up (to modify need synchronized, to read doesn't).
>

Yeah, these changes break many things, and so are not suitable for D2. My intention was only to point out how expensive is for the GC to deal with shared memory.

Come to think a little more: what if each thread can have its own GC, but by default all use the current GC (this would require minimal changes to druntime). "__gshared", "shared" and "immutable", continue as now, which does not break anything. If I as a programmer take care of managing (either manually or through reference counting) all of the shared memory ("__gshared", "shared" or "immutable") that can be referenced from multiple threads, I could replace in my program the global GC by a indiviual thread GC

I'll try to implement a GC optimized for a thread and try that solution
September 23, 2014
On Tuesday, 23 September 2014 at 15:28:30 UTC, Marc Schütz wrote:
> On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:
>> And what GC does? Pins the allocated blocks for another thread?
>
> Assuming there is one thread-local GC per thread, it transfers responsibility of the allocated data from the sender to the receiver. This means, the old GC doesn't need to scan it any more, but the new one does.

Yes. A mechanism for transfer of responsibility and pins would be needed.

Basically we have to think that a thread GC just look for roots on his stack/registers and managed memory, and may move the managed objects in a collection, so a reference used in another thread may become invalid for that other thread anytime
September 23, 2014
On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:
> On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:
>> The question is how thread-local GC will account for data passed to another thread.
>
> I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared. Assigning types with indirections to a __gshared variable might also trigger this, unless we can find a better design. IIRC, Andrei dismissed this as impractical due to the overhead and need for precise scanning. I still like to think that it would be worth it, though, even if I can't spare the time for looking into an implementation right now.
>
> David

Yes, it could be a palliative measure, and yes, it require precise scanning. I do not think it is easy to implement on the stack.

And in any case I believe the problem is to have multiple references to the same object from different threads, which forces you to "stop-the-world". That problem still exist
September 24, 2014
On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:
> The question is how thread-local GC will account for data passed to another thread.

I don't think you clearly understand what thread local means.

Also, there is reason why I'm beating the drum to get an
ownership type qualifier. So you can transfer ownership.
September 24, 2014
On Tuesday, 23 September 2014 at 18:53:04 UTC, Oscar Martin wrote:
> On Tuesday, 23 September 2014 at 15:28:30 UTC, Marc Schütz wrote:
>> On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:
>>> And what GC does? Pins the allocated blocks for another thread?
>>
>> Assuming there is one thread-local GC per thread, it transfers responsibility of the allocated data from the sender to the receiver. This means, the old GC doesn't need to scan it any more, but the new one does.
>
> Yes. A mechanism for transfer of responsibility and pins would be needed.
>
> Basically we have to think that a thread GC just look for roots on his stack/registers and managed memory, and may move the managed objects in a collection, so a reference used in another thread may become invalid for that other thread anytime

Physically moving the objects is not necessary, it only needs to "move" the responsibility. With some work, it might even be possible to move the objects' metadata from the old to the new heap, so that each GC would only need to access its own heap during a scan, which avoids synchronization with the other threads.
September 24, 2014
On Tuesday, 23 September 2014 at 18:39:09 UTC, Oscar Martin wrote:
> On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole wrote:
>> Short, I dislike pretty much all changes to __gshared/shared. Breaks too many things.
>> Atleast with Cmsed, (I'm evil here) where I use __gshared essentially as a read only variable but modifiable when starting up (to modify need synchronized, to read doesn't).
>>
>
> Yeah, these changes break many things, and so are not suitable for D2. My intention was only to point out how expensive is for the GC to deal with shared memory.
>
> Come to think a little more: what if each thread can have its own GC, but by default all use the current GC (this would require minimal changes to druntime). "__gshared", "shared" and "immutable", continue as now, which does not break anything. If I as a programmer take care of managing (either manually or through reference counting) all of the shared memory ("__gshared", "shared" or "immutable") that can be referenced from multiple threads, I could replace in my program the global GC by a indiviual thread GC
>
> I'll try to implement a GC optimized for a thread and try that solution

There can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap).
September 24, 2014
On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:
> I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared.

Yes, that sounds expensive. A real example from my work: client receives big dataset (~1GB) from server in a background thread, builds and checks constraints and indexes (which is sort of expensive too; RBTree) and hands it over to the main thread. And client machine is not quite powerful for frequent marshaling of such big dataset, handling it at all is enough of a problem. If you copied it twice, you have 3GB working set, and GC needs somewhat 2x reserve, raising memory requirements to 6GB, without dup requirements are 1-2GB. Also when you trigger collection during copying to shared GC, what it does, stops the world again?
September 24, 2014
On Wednesday, 24 September 2014 at 11:59:52 UTC, Kagamin wrote:
> On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:
>> I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared.
>
> Yes, that sounds expensive. A real example from my work: client receives big dataset (~1GB) from server in a background thread, builds and checks constraints and indexes (which is sort of expensive too; RBTree) and hands it over to the main thread. And client machine is not quite powerful for frequent marshaling of such big dataset, handling it at all is enough of a problem. If you copied it twice, you have 3GB working set, and GC needs somewhat 2x reserve, raising memory requirements to 6GB, without dup requirements are 1-2GB. Also when you trigger collection during copying to shared GC, what it does, stops the world again?

Large allocations are the easy case, as the allocation lives in its own pool and you can just move the entire pool.  Copying objects is the tricky part...
September 24, 2014
On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz wrote:
> On Tuesday, 23 September 2014 at 18:39:09 UTC, Oscar Martin wrote:
>> On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole wrote:
>>> Short, I dislike pretty much all changes to __gshared/shared. Breaks too many things.
>>> Atleast with Cmsed, (I'm evil here) where I use __gshared essentially as a read only variable but modifiable when starting up (to modify need synchronized, to read doesn't).
>>>
>>
>> Yeah, these changes break many things, and so are not suitable for D2. My intention was only to point out how expensive is for the GC to deal with shared memory.
>>
>> Come to think a little more: what if each thread can have its own GC, but by default all use the current GC (this would require minimal changes to druntime). "__gshared", "shared" and "immutable", continue as now, which does not break anything. If I as a programmer take care of managing (either manually or through reference counting) all of the shared memory ("__gshared", "shared" or "immutable") that can be referenced from multiple threads, I could replace in my program the global GC by a indiviual thread GC
>>
>> I'll try to implement a GC optimized for a thread and try that solution
>
> There can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap).

Yes, a shared GC should be a possibility, but how you avoid the "stop-the-world" phase for that GC?

Obviously this pause can be minimized by performing the most work out of that phase, but after seeing the test of other people on internet about advanced GCs (java, .net) I do not think it's enough for some programs

But hey, I guess it's enough to cover the greatest number of cases. My goal is to start implementing the thread GC. Then I will do testing of performance and pauses (my program requires managing audio every 10 ms) and then I might dare to implement the shared GC, which is obviously more complex if desired to minimize the pauses. We'll see what the outcome
September 24, 2014
On Wednesday, 24 September 2014 at 11:59:52 UTC, Kagamin wrote:
> On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:
>> I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared.
>
> Yes, that sounds expensive. A real example from my work: client receives big dataset (~1GB) from server in a background thread, builds and checks constraints and indexes (which is sort of expensive too; RBTree) and hands it over to the main thread. And client machine is not quite powerful for frequent marshaling of such big dataset, handling it at all is enough of a problem. If you copied it twice, you have 3GB working set, and GC needs somewhat 2x reserve, raising memory requirements to 6GB, without dup requirements are 1-2GB. Also when you trigger collection during copying to shared GC, what it does, stops the world again?

Yes, that's the problem I see with the shared GC. But I think cases like this should be solved "easily" with a mechanism for transfer of responsibility between thread GCs. The truly problematic cases are shared objects with roots in various threads