February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On Wednesday, 5 February 2014 at 15:46:21 UTC, Dicebot wrote:
> Which does not fix a single problem mentioned in embedded/gamedev threads. Difference between "somewhat less" and "reliably constrained" is beyond measure.
If you write code in a way that does not create much cycles you don't have to call the GC at all. So getting the GC out of the implicit allocations the language make might be the most important thing, but how much memory is wasted over an hour that way?
A game should perhaps run for 1 hour without a hiccup, ARC might be good enough if RC collect 98% of all garbage.
A real time audio application should run for 12 hours without a hiccup… you probably want a GC free audio callback.
A real time server that monitors some vital resource should run for hours without a hiccup... You either want a real time GC or no GC.
Different scenarios have different needs.
|
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad Attachments:
| On 6 February 2014 01:32, <"Ola Fosheim Grøstad\" <ola.fosheim.grostad+dlang@gmail.com>"@puremagic.com> wrote: > On Wednesday, 5 February 2014 at 15:01:27 UTC, Manu wrote: > >> Or I wonder if there's opportunity to pinch a single bit from pointers to mark that it is raw or RC allocated? Probably fine on 64bit. >> > > Yes, on 64 bit that is ok. I think current x86 map something like 53 bits, the top and bottom half of the 64 bit address space. The middle is unused. Anyway, you could have two heaps if the OS allows you to. > > > 32bit probably needs to match a bit pattern or something. >> > > Or one could forget about 32 bit for ARC. The applications I describe where it is a necessity will often be 32bit systems. Aligned data is a challenge. I have often wondered if it would be feasible >> to access the RC via a pointer hash or something, and keep it in a side table... sounds tricky, but I wonder if it's possible. >> > > If you have your own allocator you probably could? Segment memory regions into allocations of a particular size and have a RC-count index at the start indexed by the masked MSBs of the pointer address and have smart pointers that know the object size. Kind of tricky to get acceptable speed, but possible to do. The problem is that you will get a latency of perhaps 200+ cycles on a cache miss. Then again, you could probably make do with 32 bit counters and they are probably accessed in proximity if they hold the same type of object. One cache line is 64 bytes, so you get 16 counters per cache line. With smart allocation you might get good cache locality of the counters (8MB of L3 cache is quite a bit). > Cache locality is a problem that can easily be refined. It would just need lots of experimental data. (I guess alignment is primarily a problem when you want 4KiB alignment > (page size), maybe not worth worrying about.) > > > Rather than (allocated_base, offset), I suspect (pointer, >> offset_from_base) >> would be better; typical dereferences would have no penalty. >> > > Yes, probably. I was thinking about avoiding GC of internal pointers too. I think scanning might be easier if all pointers point to the allocation base. That way the GC does not have to consider offsets. > > > You mean like the smart pointer double indirection? I really don't like >> the >> double indirection, but it's a possibility. Or did I misunderstand? >> > > Yes: > > struct { > void* object_ptr; > /* offset */ > uint weakcounter; > uint strongcounter; > } > > The advantage is that it works well with RC ignorant collectors/allocators. "if in doubt just set the pointer to null and forget about freeing". I see. Well, I don't like it :) ... but it's an option. I'm sure there's a clever solution out there which would allow the ARC to >> detect if it's a raw C pointer or not... >> > > Well, for a given OS/architecture you could probably always allocate your heap in a fixed memory range on 64 bit systems then test against that range. > |
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot Attachments:
| On 6 February 2014 01:46, Dicebot <public@dicebot.lv> wrote:
> On Wednesday, 5 February 2014 at 15:25:27 UTC, Michel Fortin wrote:
>
>> In general ARC+GC would release memory faster so you need less memory overall. Less garbage memory blocks floating around might make processor caches more efficients. And less memory pressure means the GC itself runs less often, thus less pauses, and shorter pauses since there is less memory to scan.
>>
>
> Which does not fix a single problem mentioned in embedded/gamedev threads. Difference between "somewhat less" and "reliably constrained" is beyond measure.
>
The problem is completely solved; you turn the backing GC off. Devs are responsible for correct weak pointer attribution.
|
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Wednesday, 5 February 2014 at 15:57:51 UTC, Ola Fosheim Grøstad wrote: > If you write code in a way that does not create much cycles you don't have to call the GC at all. So getting the GC out of the implicit allocations the language make might be the most important thing, but how much memory is wasted over an hour that way? It is up to programmer to decide. Right now he does not have a choice and sometimes you can't afford to have GC in your program at all (as in can't have it linked to the binary), not just can't call collection cycles. Having sane fallback is very desired. Proposed solution does not seem to save you from uncontrollably long collection cycles anyway as it still uses same memory pool so I don't see how it can help even games, not even speaking about more demanding applications. > A game should perhaps run for 1 hour without a hiccup, ARC might be good enough if RC collect 98% of all garbage. > > A real time audio application should run for 12 hours without a hiccup… you probably want a GC free audio callback. > > A real time server that monitors some vital resource should run for hours without a hiccup... You either want a real time GC or no GC. > > Different scenarios have different needs. Haven't you just basically confirmed my opinion? :) |
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Wednesday, 5 February 2014 at 16:03:56 UTC, Manu wrote:
> The problem is completely solved; you turn the backing GC off. Devs are
> responsible for correct weak pointer attribution.
What does it give you over current situation with GC switched off and RefCounted used everywhere? Language features will still leak GC memory.
|
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot Attachments:
| On 6 February 2014 02:05, Dicebot <public@dicebot.lv> wrote:
> On Wednesday, 5 February 2014 at 16:03:56 UTC, Manu wrote:
>
>> The problem is completely solved; you turn the backing GC off. Devs are responsible for correct weak pointer attribution.
>>
>
> What does it give you over current situation with GC switched off and RefCounted used everywhere? Language features will still leak GC memory.
>
Huh? Why would they? They don't create cycles, and would clean up reliably.
|
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Wednesday, 5 February 2014 at 16:13:33 UTC, Manu wrote:
> Huh? Why would they? They don't create cycles, and would clean up reliably.
Because they still return T* and not RC!T ? Andrei's post speaks purely about extra library type and does not mention about possibility to make it default allocation type for language.
Or it is just silently assumed?
|
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On Wednesday, 5 February 2014 at 16:04:06 UTC, Dicebot wrote: > It is up to programmer to decide. Right now he does not have a choice and sometimes you can't afford to have GC in your program at all (as in can't have it linked to the binary), not just can't call collection cycles. Having sane fallback is very desired. Yes, if D is going to be a system level programming language then there is no other option than. > Proposed solution does not seem to save you from uncontrollably long collection cycles anyway as it still uses same memory pool so I don't see how it can help even games, not even speaking about more demanding applications. Well, for games and game servers I think a 100ms delay once or twice per hour is inconsequential in terms of impact. If you can reduce the GC load by various means it might work out for most applications. 1. Reduce the set considered for GC by having the GC not scanning paths that are known to be covered by RC. 2. Improving speed of GC by avoiding interior pointers etc. 3. Reducing the number of calls to GC by having RC take care of the majority of the memory releases. 4. Have local GC by collecting roots of nodes that are known to create cycles. I don't think ARC is an option for OS-level development and critical applications anyway. :-) >> Different scenarios have different needs. > > Haven't you just basically confirmed my opinion? :) In a way. :-) But what if the question is this: How can you in a pragmatic way come up with a solution that cover most soft real time applications? A compiler switch that default to RC (i.e. turn standard GC features into standard RC features) could in theory get you pretty close, but I think clever RC/GC memory management requires whole program analysis… |
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On 2014-02-05 15:52:36 +0000, Manu <turkeyman@gmail.com> said: > On 6 February 2014 01:22, Michel Fortin <michel.fortin@michelf.ca> wrote: > >> On 2014-02-05 15:01:04 +0000, Manu <turkeyman@gmail.com> said: >> >> Since we're talking about adding reference counts to GC-allocated memory, >> you could use the GC to find the base address of the memory block. What is >> the cost of that? > > Can you elaborate? How would the GC know this? How do you think the GC tracks internal pointers today? ;-) Just call addrOf if you need to know: http://dlang.org/phobos/core_memory.html#.GC.addrOf We'd have to call it too for incrementing/decrementing the reference count. It'd work, even though it seems rather heavyweight. The slow part of that function is the call to findPool, which does a binary search: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L1581 That said, the GC is already doing that for every word of memory it scans, so it might not be as heavyweight as it seems (especially if the GC has less to scan later because of ARC). See the mark function: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L2274 I'd tend to say that if you're in control of the reference count system, the code genereration, the allocation pools, as well as the GC algorithm you can probably do something that'll work well enough, by architecting things to work well together. But it requires an integrated approach. -- Michel Fortin michel.fortin@michelf.ca http://michelf.ca |
February 05, 2014 Re: Idea #1 on integrating RC with GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Wednesday, 5 February 2014 at 16:01:09 UTC, Manu wrote: > On 6 February 2014 01:32, <"Ola Fosheim Grøstad\" > <ola.fosheim.grostad+dlang@gmail.com>"@puremagic.com> wrote: > The applications I describe where it is a necessity will often be 32bit > systems. You mean for embedded? Mobile CPUs are going 64 bit… > Cache locality is a problem that can easily be refined. It would just need > lots of experimental data. Well, in that case the math is not so difficult. You could have 1 index every 4MiB, and if your smallest allocation unit is 256 bytes then you get a counter index of 16384 uint32 (64Kib) The access would be easy and something like (probably not 100% correct): counter_addr = (ptr&~0xffff) + ( (ptr>>12)&0xfffc ) >> Yes: >> >> struct { >> void* object_ptr; >> /* offset */ >> uint weakcounter; >> uint strongcounter; >> } >> >> The advantage is that it works well with RC ignorant >> collectors/allocators. "if in doubt just set the pointer to null and forget >> about freeing". > > > I see. Well, I don't like it :) ... but it's an option. The aesthetics isn't great, it is not a minimalist approach, but consider the versatility: You could put in a function pointer to a deallocator (c-malloc, QT, GTK or some other library deallocator etc) and other kind of meta information that makes you able to treat reference counting in a uniform manner even for external resources. With the right semantics you can have pointers to cached objects that suddenly disappear etc (by using weak counter in a clever way). |
Copyright © 1999-2021 by the D Language Foundation