February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Tuesday, 24 February 2015 at 15:14:03 UTC, Ola Fosheim Grøstad wrote:
> On Tuesday, 24 February 2015 at 14:45:17 UTC, Marc Schütz wrote:
>> It's possible to (ab)use the MMU as a write barrier.
>
> Uhm...
>
> The throughput for L/SFENCE is 5 cycles and SFENCE 33 cycles... The cost of a page miss is 1000 cycles + overhead for copying.
>
> And that's assuming that you have enough memory and that the TLB isn't affected...
Yes, but if the program has a small working set, you pay that price only a few times (once per page), and in any case, only during a running background scan, rather than always.
| |||
February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Marc Schütz | On Tuesday, 24 February 2015 at 16:05:06 UTC, Marc Schütz wrote: > On Tuesday, 24 February 2015 at 15:14:03 UTC, Ola Fosheim Grøstad wrote: >> On Tuesday, 24 February 2015 at 14:45:17 UTC, Marc Schütz wrote: >>> It's possible to (ab)use the MMU as a write barrier. >> >> Uhm... >> >> The throughput for L/SFENCE is 5 cycles and SFENCE 33 (Typo, MFENCE is 33...) >> cycles... The cost of a page miss is 1000 cycles + overhead for copying. >> >> And that's assuming that you have enough memory and that the TLB isn't affected... > > Yes, but if the program has a small working set, you pay that price only a few times (once per page), and in any case, only during a running background scan, rather than always. But if you can control where the code is running when you run the GC scan then you might as well have a separate code path for concurrent GC too (i.e. two versions of the code). One with fences and one without... If you loose the TLB, then you also loose all caches too IIRC. | |||
February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Tuesday, 24 February 2015 at 17:07:03 UTC, Ola Fosheim Grøstad wrote:
> But if you can control where the code is running when you run the GC scan then you might as well have a separate code path for concurrent GC too (i.e. two versions of the code). One with fences and one without...
I'd say this is impractical. You could only reasonably expect this to work at certain checkpoints, say, an event loop with short calls into the "actual" application code. You cannot simply switch right in the middle of a deep call.
And with Sociomantic's GC, you don't have any more "control" over the GC than you have with the current one in druntime.
| |||
February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Kagamin | On Tuesday, 24 February 2015 at 08:39:02 UTC, Kagamin wrote:
> On Tuesday, 24 February 2015 at 00:30:43 UTC, ketmar wrote:
>> On Mon, 23 Feb 2015 21:11:22 +0000, Sativa wrote:
>>
>>> How hard would it be to modify D's GC to do the following two things:
>>>
>>> 1. Scan the heap in the BG on another thread/cpu for compactification.
>>
>> needs read/write barriers added to generated code. a major slowdown for
>> ALL memory access.
>
> Only modifications of pointers, which introduce new cross-block dependencies (so that GC knows to recheck the new dependency). Other memory access goes without slowdown.
That is not quite true. You don't know before hand which one are these, so you always need to, at least, check if you need to do the work.
| |||
February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sativa Attachments: | On Tue, 24 Feb 2015 15:22:01 +0000, Sativa wrote:
> Free pointers are more difficult as they can, say, be randomly initiated and point to anywhere on the heap and have to be looked in a locked way. (to prevent them changing in the middle of some GC operation)
and they can live in CPU register... ooops.
| |||
February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Tuesday, 24 February 2015 at 15:14:03 UTC, Ola Fosheim Grøstad wrote:
> On Tuesday, 24 February 2015 at 14:45:17 UTC, Marc Schütz wrote:
>> It's possible to (ab)use the MMU as a write barrier.
>
> Uhm...
>
> The throughput for L/SFENCE is 5 cycles and SFENCE 33 cycles... The cost of a page miss is 1000 cycles + overhead for copying.
>
> And that's assuming that you have enough memory and that the TLB isn't affected...
That's why this is used for immutable, or mostly immutable data, but generally avoided in the case of mutable data.
| |||
February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Marc Schütz Attachments: | On Tue, 24 Feb 2015 14:45:16 +0000, Marc Schütz wrote:
> It's possible to (ab)use the MMU as a write barrier. Sociomantic's GC implementation does this by forking and running the scan in the child, then communicating information back about what can be freed. Video:
sure, it's possible sometimes. i'd like to see forking GC in druntime.
but all in all it's just a clever hack. concurrent GC in the same process can't be efficiently done with MMU only: MMU protection has too big granularity and page faults are too slow.
| |||
February 24, 2015 Re: D GC theory | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sativa | On Tuesday, 24 February 2015 at 15:22:02 UTC, Sativa wrote:
>> Only modifications of pointers, which introduce new cross-block dependencies (so that GC knows to recheck the new dependency). Other memory access goes without slowdown.
>
> But this type of thinking is the reason why the current GC is in the state it is.
>
There are limitation you have in a system language, that you don't in a VM language. That being said, D has a type system that allow for various optimization when it come to the GC, so I don't think it is hopeless, but we need to plug the remaining holes.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply