February 05, 2014
On Wednesday, 5 February 2014 at 12:40:13 UTC, Manu wrote:
> Counter question; why approach it this way?
> Is there a reason that it needs to be of one kind or the other?

Sure, you could make all allocations with RC enabled make space for a counter at a negative offset (ptr-offset), but that would not work with C structs or internal pointers and aligned data might hog a bit of extra space.

You also need to handle internal pointers to embedded objects and arrays of objects (not pointers to objects). How do you ref count those? I guess you could switch all pointers into (allocated_base,offset) tuples and special case (ptr,0).

You could do like C++, have the ref counter be a separate object. Then record the properties of the pointer (such as offset), then have special magic checks for deallocation: testing for internal reference then decrease the real ref counter of the parent rather than deallocate. This is quite compatible with having a GC, you could free the object only when proven safe and just ignore it and leave it for GC when you cannot prove it.


IMHO you proabably need to redesign the language in order to support transparent or efficient automatic memory handling. If you retain C-legacy, you also retain manual memory handling or a limited set of opportunites for automatic garbage collection.

February 05, 2014
On 6 February 2014 00:04, <"Ola Fosheim Grøstad\" <ola.fosheim.grostad+dlang@gmail.com>"@puremagic.com> wrote:

> On Wednesday, 5 February 2014 at 12:40:13 UTC, Manu wrote:
>
>> Counter question; why approach it this way?
>> Is there a reason that it needs to be of one kind or the other?
>>
>
> Sure, you could make all allocations with RC enabled make space for a counter at a negative offset (ptr-offset), but that would not work with C structs or internal pointers and aligned data might hog a bit of extra space.
>

Perhaps match a very particular and unlikely bit pattern in the negative
offset to know if it is RC or not?
Or I wonder if there's opportunity to pinch a single bit from pointers to
mark that it is raw or RC allocated? Probably fine on 64bit. 32bit probably
needs to match a bit pattern or something.

Aligned data is a challenge. I have often wondered if it would be feasible to access the RC via a pointer hash or something, and keep it in a side table... sounds tricky, but I wonder if it's possible.

You also need to handle internal pointers to embedded objects and arrays of
> objects (not pointers to objects). How do you ref count those? I guess you
> could switch all pointers into (allocated_base,offset) tuples and special
> case (ptr,0).
>

That's a good point. This is probably the trickiest detail. Maybe a clever
way to make any pointer within the allocated range hash to the right index
in the side table I referred to above. That sounds like a tricky hashing
function, and probably slow.
Fat pointers might be necessary. That's a bit annoying. Hmmm...

Rather than (allocated_base, offset), I suspect (pointer, offset_from_base)
would be better; typical dereferences would have no penalty.

You could do like C++, have the ref counter be a separate object. Then
> record the properties of the pointer (such as offset), then have special magic checks for deallocation: testing for internal reference then decrease the real ref counter of the parent rather than deallocate. This is quite compatible with having a GC, you could free the object only when proven safe and just ignore it and leave it for GC when you cannot prove it.
>

You mean like the smart pointer double indirection? I really don't like the double indirection, but it's a possibility. Or did I misunderstand?

IMHO you proabably need to redesign the language in order to support
> transparent or efficient automatic memory handling. If you retain C-legacy, you also retain manual memory handling or a limited set of opportunites for automatic garbage collection.
>

I'm sure there's a clever solution out there which would allow the ARC to detect if it's a raw C pointer or not...


February 05, 2014
On Tuesday, 4 February 2014 at 23:51:35 UTC, Andrei Alexandrescu wrote:
> ...
>
> Destroy.
>
> Andrei

I simply don't understand what problem this is trying to solve. I can't remember anyone wanting RC just of the sake of RC - it is just a compromise to enable language features that require allocation with undeterministic lifetime in absense of GC.

Only thing having RC on top of GC may help with is reducing overall memory footprint but it is not an issue anyway.

I must be missing something, but this looks perfectly useless.
February 05, 2014
On 2014-02-05 15:01:04 +0000, Manu <turkeyman@gmail.com> said:

> Aligned data is a challenge. I have often wondered if it would be feasible
> to access the RC via a pointer hash or something, and keep it in a side
> table... sounds tricky, but I wonder if it's possible.

That's what Apple is doing (as seen in the CoreFoundation source code). They actually have eight such tables on OS X, each protected by a spin lock. The right table is chosen according to a few bits of the pointer.

The obvious advantage is you can put immutable data in read-only memory without making the reference count immutable. The downside is that it's more complicated to access the counter.


> Fat pointers might be necessary. That's a bit annoying. Hmmm...

Since we're talking about adding reference counts to GC-allocated memory, you could use the GC to find the base address of the memory block. What is the cost of that?


> I'm sure there's a clever solution out there which would allow the ARC to
> detect if it's a raw C pointer or not...

Ask the GC for the base address of the memory block. If it does not come from a GC block, there's no reference counter to update.


-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

February 05, 2014
On 2014-02-04 23:51:35 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Consider we add a library slice type called RCSlice!T. It would have the same primitives as T[] but would use reference counting through and through. When the last reference count is gone, the buffer underlying the slice is freed. The underlying allocator will be the GC allocator.
> 
> Now, what if someone doesn't care about the whole RC thing and aims at convenience? There would be a method .toGC that just detaches the slice and disables the reference counter (e.g. by setting it to uint.max/2 or whatever).
> 
> Then people who want reference counting say
> 
> auto x = fun();
> 
> and those who don't care say:
> 
> auto x = fun().toGC();
> 
> 
> Destroy.

I don't think it makes much sense. ARC when used for D constructs should be treated an alternate GC algorithm, not a different kind of pointer.

There's another possible use for ARC which is to manage reference-counted external objects not allocated by the D GC that are using reference counting (such as COM objects, or Objective-C objects). This could justify a different kind of pointer. But that's a separate issue from the GC algorithm used for D constructs.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

February 05, 2014
On 2014-02-05 12:25:48 +0000, "Francesco Cattoglio" <francesco.cattoglio@gmail.com> said:

> On Wednesday, 5 February 2014 at 12:12:01 UTC, Paulo Pinto wrote:
>> Yes, the GC just needs to check roots for already released blocks, if I am not mistaken.
> 
> Perhaps I lost some explanation in the discussions, but does this improve performance in any way (other than allowing you to disable the GC)?

In general ARC+GC would release memory faster so you need less memory overall. Less garbage memory blocks floating around might make processor caches more efficients. And less memory pressure means the GC itself runs less often, thus less pauses, and shorter pauses since there is less memory to scan.

On the other hand, if you allocate everything you need from the start and then just shuffle pointers around, ARC will make things slower.

So there's some good and some bad. It's hard to know exactly how much of each in practice without an implementation to take some measurements with real world programs. But what's certain is that with ARC the overhead is more evenly distributed in time.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

February 05, 2014
On 2/5/14, 12:03 AM, Nick Sabalausky wrote:
> IIUC, it sounds like it'd work like this:
>
> // Library author provides either one or both of these:
> T[] getFooGC() {...}
> RCSlice!T getFooARC() {...}
>
> And then if, for whatever reason, you have a RCSlice!T and need to pass
> it to something that expects a T[], then you can cancel the RC-ing via
> toGC.

Yah. We'd then need to do something similar on the parameter side, e.g. extend isSomeString to comprehend the RC variety as well.

> If that's so, then I'd think lib authors could easily provide APIs that
> offer GC by default and ARC as an opt-in choice with templating:
>
> enum WantARC { Yes, No }
> auto getFoo(WantARC arc = WantARC.No)() {
>      static if(arc == WantARC.No)
>          return getFoo().toGC();
>      else {
>      RCSlice!T x = ...;
>          return x;
>      }
> }

Nice. Though I'd say just return RC strings and let the client call toGC. They'd need to specify WantArc.No anyway so one way or another the user has to do something.


Andrei
February 05, 2014
On Wednesday, 5 February 2014 at 15:01:27 UTC, Manu wrote:
> Or I wonder if there's opportunity to pinch a single bit from pointers to
> mark that it is raw or RC allocated? Probably fine on 64bit.

Yes, on 64 bit that is ok. I think current x86 map something like 53 bits, the top and bottom half of the 64 bit address space. The middle is unused. Anyway, you could have two heaps if the OS allows you to.

> 32bit probably needs to match a bit pattern or something.

Or one could forget about 32 bit for ARC.

> Aligned data is a challenge. I have often wondered if it would be feasible
> to access the RC via a pointer hash or something, and keep it in a side
> table... sounds tricky, but I wonder if it's possible.

If you have your own allocator you probably could? Segment memory regions into allocations of a particular size and have a RC-count index at the start indexed by the masked MSBs of the pointer address and have smart pointers that know the object size. Kind of tricky to get acceptable speed, but possible to do. The problem is that you will get a latency of perhaps 200+ cycles on a cache miss. Then again, you could probably make do with 32 bit counters and they are probably accessed in proximity if they hold the same type of object. One cache line is 64 bytes, so you get 16 counters per cache line. With smart allocation you might get good cache locality of the counters (8MB of L3 cache is quite a bit).

(I guess alignment is primarily a problem when you want 4KiB alignment (page size), maybe not worth worrying about.)

> Rather than (allocated_base, offset), I suspect (pointer, offset_from_base)
> would be better; typical dereferences would have no penalty.

Yes, probably. I was thinking about avoiding GC of internal pointers too. I think scanning might be easier if all pointers point to the allocation base. That way the GC does not have to consider offsets.

> You mean like the smart pointer double indirection? I really don't like the
> double indirection, but it's a possibility. Or did I misunderstand?

Yes:

struct {
    void* object_ptr;
    /* offset */
    uint weakcounter;
    uint strongcounter;
}

The advantage is that it works well with RC ignorant collectors/allocators. "if in doubt just set the pointer to null and forget about freeing".

> I'm sure there's a clever solution out there which would allow the ARC to
> detect if it's a raw C pointer or not...

Well, for a given OS/architecture you could probably always allocate your heap in a fixed memory range on 64 bit systems then test against that range.
February 05, 2014
On Wednesday, 5 February 2014 at 15:25:27 UTC, Michel Fortin wrote:
> In general ARC+GC would release memory faster so you need less memory overall. Less garbage memory blocks floating around might make processor caches more efficients. And less memory pressure means the GC itself runs less often, thus less pauses, and shorter pauses since there is less memory to scan.

Which does not fix a single problem mentioned in embedded/gamedev threads. Difference between "somewhat less" and "reliably constrained" is beyond measure.
February 05, 2014
On 6 February 2014 01:22, Michel Fortin <michel.fortin@michelf.ca> wrote:

> On 2014-02-05 15:01:04 +0000, Manu <turkeyman@gmail.com> said:
>
>  Aligned data is a challenge. I have often wondered if it would be feasible
>> to access the RC via a pointer hash or something, and keep it in a side table... sounds tricky, but I wonder if it's possible.
>>
>
> That's what Apple is doing (as seen in the CoreFoundation source code). They actually have eight such tables on OS X, each protected by a spin lock. The right table is chosen according to a few bits of the pointer.
>
> The obvious advantage is you can put immutable data in read-only memory without making the reference count immutable. The downside is that it's more complicated to access the counter.


Indeed. Good to know someone else is doing it. Sounds like a realistic option then :)


 Fat pointers might be necessary. That's a bit annoying. Hmmm...
>>
>
> Since we're talking about adding reference counts to GC-allocated memory, you could use the GC to find the base address of the memory block. What is the cost of that?


Can you elaborate? How would the GC know this?


 I'm sure there's a clever solution out there which would allow the ARC to
>> detect if it's a raw C pointer or not...
>>
>
> Ask the GC for the base address of the memory block. If it does not come from a GC block, there's no reference counter to update.
>
>
> --
> Michel Fortin
> michel.fortin@michelf.ca
> http://michelf.ca
>
>