October 10, 2013
On 6/25/2013 2:47 PM, Steven Schveighoffer wrote:
> I'm not sure exactly what is required for ARC to guarantee proper memory management (whether it requires flow-analysis or not), but it seems to work quite well for Objective-C. I think it helps minimize the expensive release/retain calls when you can just say "oh, someone else will clean that up later", just like you can with a GC.
>
> It might be good for someone who knows the ARC eliding techniques that clang uses to explain how they work.  We certainly shouldn't ignore those techniques.
>

Also remember that O-C doesn't guarantee memory safety, so they are freed from some of the constraints we operate under. They can say "don't do that", we can't.

C++ shared_ptr<> is memory safe as long as you don't escape a pointer - and no C++ compiler checks for that.

COM is also memory safe as long as you carefully follow the conventions - and again, no C++ compiler checks it.

October 10, 2013
> On 6/25/2013 2:47 PM, Steven Schveighoffer wrote:
>> I'm not sure exactly what is required for ARC to guarantee proper memory management (whether it requires flow-analysis or not), but it seems to work quite well for Objective-C. I think it helps minimize the expensive release/retain calls when you can just say "oh, someone else will clean that up later", just like you can with a GC.
>>
>> It might be good for someone who knows the ARC eliding techniques that clang uses to explain how they work.  We certainly shouldn't ignore those techniques.
>>
>
> Also remember that O-C doesn't guarantee memory safety, so they are freed from some of the constraints we operate under. They can say "don't do that", we can't.

I'm not sure that it doesn't.  At least when we are talking about object references.

The only thing clang complains about is when you try to call any memory management manually, or if you disobey the naming conventions.

-Steve
October 10, 2013
Michel Fortin wrote:

## Some general comments

While its a start, this is hardly enough for Objective-C. Mostly for legacy reasons, most Objective-C methods return autoreleased objects (deferred release using an autorelease pool) based on a naming convention. Also, Objective-C objects can't be allocated from the D heap, so to avoid cycles we need weak pointers. More on Objective-C later.

While it's good that a direct call to AddRef/Release is forbidden in @safe code, I think it should be forbidden in @system code too. The reason is that if the compiler is inserting calls to these automatically and you're also adding your own explicitly in the same function, it becomes practically impossible to reason about the reference counts, short of looking at the assembly. Instead, I think you should create a @noarc attribute for functions: it'll prevent the compiler for inserting any of those calls so it becomes the responsibility of the author to make those calls (which are then allowed). @noarc would be incompatible with @safe, obviously.

Finally, that's a nitpick but I wish you'd use function names that fit D better, such as opRetain and opRelease. Then you can add a "final void opRetain() { AddRef(); }" function to the IUnknown COM interface and we could do the same for Objective-C.

## Objective-C autoreleased objects

Objective-C is a special case. In Objective-C we need to know whether the returned object of a function is already retained or if it is deferred released (autoreleased). This is easily deducted from the naming convention. Occasionally, we might need to create autorelease pools too, but that can probably stay @system.

(Note: all this idea of autoreleased objects might sound silly, but it was a great help before ARC, and Objective-C ARC has to be compatible with legacy code so it conforms to those conventions.)

You can easily implement ARC for COM using an implementation of ARC for Objective-C, the reverse is not true because COM does not have this (old but still needed) concept of autorelease pools and deferred release where you need to know at each function boundary whether returned values (including those returned by pointer arguments) whether the object is expected to be retained or not.

If I were you Walter, I would just not care about Objective-C idioms while implementing this feature at first. It'll have to be special cased anyway. Here's how I expect that'll be done:

What will need to be done later when adding Objective-C support is to add an internal "autoreleasedReturn" flag to a function that'll make codegen call "autorelease" in the callee when returning an object and "retain" in the caller where it receives an object from a function with that flag. Also, the same flag and behaviour is needed for out parameters (to mimick those cases where an object is returned by pointer). That flag will then be set automatically internally depending on the function name (only for Objective-C member functions), and it should be possible to override it explicitly with an attribute or a pragma of some sort. This is what Clang is doing, and we must match that to allow things to work.

Checking for null is redundant in the Objective-C case: that check is done by the runtime. That's of minor importance, but it might impact performance and should probably special-cased in this case.

## Optimizations

With Apple's implementation of reference counting (using global hash tables protected by spin locks), it is more efficient to update many counters in one operation. The codegen for Objective-C ARC upon assignement to a variable calls "objc_storeStrong(id *object, id value)", incrementing and decrementing the two counters presumably in one operation (as well as replacing the content of the variable pointed by the first argument with the new value).

Ideally, the codegen for Objective-C ARC in D would call the same functions so we have the same performance. This means that codegen should make a call "objc_retain" when first initializing a variable, "objc_storeStrong" when doing an assignment, and "objc_release" when destructing a variable.

As for returning autoreleased objects, there are two functions to choose from depending on whether the object needs to be retained at the same time nor not. (In general, the object needs to be retained prior autoreleasing if it comes from a variable not part of the function's stack frame.)

Here's Clang's documentation for how it implements ARC:
http://clang.llvm.org/docs/AutomaticReferenceCounting.html

## Objective-C weak pointers

Weak pointers are essential in order to break retain cycles in Objective-C where there is no GC. They are implemented with the same kind of function calls as strong pointers. Unfortunately, Apple's Objective-C implementation won't sit well with D the way it works now.

Weak pointers are implemented in Objective-C by registering the address of the pointer with the runtime. This means that when a pointer is moved from one location to another, the need to be notified of that through a call to objc_moveWeak. This breaks one assumption of D that you can move memory at will without calling anything.

While we could still implement a working weak pointer with a template struct, that struct would have to allocate a pointer on the heap (where it is guarantied to not move) so it can store the true weak pointer recognized by the runtime. I'm not sure that would be acceptable, but at least it would work.

## More on reference counting

I feel like I should share some of my thoughts here about a broader use of reference counting in D.

First, we don't have to assume the reference counter has to be part of the object. Apple implements reference counting using global hash tables where the key is the address. It works very well.

If we added a hash table like this for all memory allocated from the GC, we'd just have to find the base address of any memory block to get to its reference counter. I know you were designing with only classes in mind, but I want to point out that it is possible to reference-count everything the GC allocates if we want to.

The downside is that every assignment to a pointer anywhere has to call a function. While this is some overhead, it is more predictable than overhead from a GC scan and would be preferred in some situation (games I guess). Another downside is you have an object retained by being present on the stack frame of a C function, it'd have to be explicitly retained from elsewhere.

As for pointers not pointing to GC memory, the generic addRef/release functions can ignore those pointers just like the GC ignores them today when it does its scan.

Finally, cycles can still be reclaimed by having the GC scan for them. Those scans should be less frequent however since most of the memory can be reclaimed through reference counting.

October 10, 2013
Michel Fortin wrote:

Le 25-juin-2013 à 17:20, Steven Schveighoffer a écrit :

> On Jun 25, 2013, at 4:44 PM, Walter Bright wrote:
>
>> On 6/25/2013 1:19 PM, Steven Schveighoffer wrote:
>>>
>>> Would this be a mechanism that's worth putting in? I think it goes really well with something like TempAlloc.  I don't think we should use convention, though...
>>
>> I agree with not relying on convention. But also reserving the new*, init*, alloc* and copy* namespaces seems excessive for D.
>>
>> As for autoreleasepool, it is relying on a convention that its fields are not leaking. I don't see how we can enforce this.
>
> I don't think the autoreleasepool is relying on convention, it's simply giving the compiler a way to elide careful tracking of temporaries' reference counts.

Not at all. Autorelease pools were useful at a time before ARC so you wouldn't have to think of releasing manually every object called functions were returned to you. Instead, most functions would return autoreleased object and you'd only have to retain those objects you were storing elsewhere.

Nowadays, with ARC, we still have them but that's mostly for interoperability already existing code. Most functions still return autoreleased objects because that's the convention, and breaking that convention would cause objects to be retained or released to many times. So we still need autorelease pools. But ARC is hard at work[^1] behind the scene to reduce the number of those autoreleased objects.

So no, we shouldn't introduce autorelease pools to D... well, except maybe for the part were we want interoperability with Objective-C (because we have no choice).

And finally, there's nothing unsafe with autorelease pools as long as you don't keep an unretained reference to an autoreleased object when the pool drains. Making sure you have no unretained reference is ARC's job, so with ARC it should not be no problem.

[^1]: One clever trick ARC does is inside the implicit call to objc_returnAutoreleased it adds at the end of an autoreleasing function, the runtime checks to see if the return address points to an instruction that'll call objc_retain on that same pointer. If that's the case, it skips the autorelease and also skip objc_retain and goes to the next instruction directly. Of course if the convention was always to return object retained, none of this would be needed. I saw that explained on a WWDC video a couple of years back.

October 10, 2013
Steven Schveighoffer wrote:

On Jun 25, 2013, at 9:31 PM, Michel Fortin wrote:

> Le 25-juin-2013 à 17:20, Steven Schveighoffer a écrit :
>
>> I don't think the autoreleasepool is relying on convention, it's simply giving the compiler a way to elide careful tracking of temporaries' reference counts.
>
> Not at all. Autorelease pools were useful at a time before ARC so you wouldn't have to think of releasing manually every object called functions were returned to you. Instead, most functions would return autoreleased object and you'd only have to retain those objects you were storing elsewhere.

Having used MRC, I appreciate what autoreleasepool did, but I thought of it being also as a kind of blanket way to allow the compiler to remove extra retains/releases in ARC.

Is it not advantageous to release a whole pool of objects vs. releasing them individually during execution?  All releases and retains are atomic, so I figured one could do some optimization when it's all lumped together.

I find the autorelease pools very GC-like -- you don't have to worry who uses or forgets the reference, it's kept in memory until you don't need it.

Anyway, everything I know about Obj-C ARC I learned from my iOS 5 book :)  So don't take me as an expert.

-Steve
October 10, 2013
Michel Fortin wrote:

Le 25-juin-2013 à 21:40, Steven Schveighoffer a écrit :

> On Jun 25, 2013, at 9:31 PM, Michel Fortin wrote:
>
>> Not at all. Autorelease pools were useful at a time before ARC so you wouldn't have to think of releasing manually every object called functions were returned to you. Instead, most functions would return autoreleased object and you'd only have to retain those objects you were storing elsewhere.
>
> Having used MRC, I appreciate what autoreleasepool did, but I thought of it being also as a kind of blanket way to allow the compiler to remove extra retains/releases in ARC.
>
> Is it not advantageous to release a whole pool of objects vs. releasing them individually during execution?  All releases and retains are atomic, so I figured one could do some optimization when it's all lumped together.

I haven't done any benchmarking, but I'd have to assume it is more advantageous to just return objects retained since Apple went to great lengths to make sure this can happen even when the convention is to return autoreleased.

There's no question it also simplifies the compiler. It's much easier to reason about pairs of retain/release than retain/autorelease.

> I find the autorelease pools very GC-like -- you don't have to worry who uses or forgets the reference, it's kept in memory until you don't need it.


The concept was truly great, no doubt about that.
October 10, 2013
On 6/25/2013 6:31 PM, Michel Fortin wrote:
> And finally, there's nothing unsafe with autorelease pools as long as you don't keep an unretained reference to an autoreleased object when the pool drains.

Well, that's exactly the issue - an escaping reference.

October 10, 2013
Walter Bright:

Where are the benchmarks that show that this is a good idea in some real situations? This is the essential first step.


> If a class contains the following methods, in either itself or a base class, it is
> an RC class:
>
>
>     T AddRef();
>     T Release();

What if a programmer adds only one of those two? Currently if you add only part of the hashing protocol (or you bork a function signature) the compiler often gives no errors.

What are the plans for coalescing and optimizing away some reference counts updates?

Bye,
bearophile
October 10, 2013
On Wednesday, 9 October 2013 at 22:31:34 UTC, Walter Bright wrote:
> ==Arrays==
>
> Built-in arrays have no place to put a reference count. Ref counted arrays would hence
> become a library type, based on a ref counted class with overloaded operators for
> the array operations.
>

You'll soon see the roadblock here with templates and type qualifiers. const(A!T) and A!const(T) has nothing to do with each other as far as the compiler is concerned, and no we have no way around this ATM.
October 10, 2013
Steven Schveighoffer wrote:

On Jun 25, 2013, at 11:04 PM, Walter Bright wrote:

>
> On 6/25/2013 6:31 PM, Michel Fortin wrote:
>> And finally, there's nothing unsafe with autorelease pools as long as you don't keep an unretained reference to an autoreleased object when the pool drains.
>
> Well, that's exactly the issue - an escaping reference.
>

Read the next sentence after your quote though:

"Making sure you have no unretained reference is ARC's job, so with ARC it should not be no problem."

So with ARC, it's not unsafe.  I think that was the ultimate point.

-Steve