Jump to page: 1 2
Thread overview
Improving DIP74: functions borrow by default, retain only if needed
Feb 27, 2015
H. S. Teoh
Feb 27, 2015
Zach the Mystic
Feb 27, 2015
Michel Fortin
Feb 27, 2015
Zach the Mystic
Feb 28, 2015
Zach the Mystic
Feb 27, 2015
Michel Fortin
Feb 27, 2015
deadalnix
Feb 28, 2015
Michel Fortin
Mar 06, 2015
John McCall
Feb 27, 2015
Zach the Mystic
February 27, 2015
DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias.

However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead.

So I was thinking of changing DIP74 as follows:

* Caller does NOT insert an opAddRef for byval RCOs

* Callee does NOT insert an opRelease for its byval RCO parameters

It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something. Thoughts?


Andrei
February 27, 2015
On Fri, Feb 27, 2015 at 10:24:26AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:
> DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias.
> 
> However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead.
> 
> So I was thinking of changing DIP74 as follows:
> 
> * Caller does NOT insert an opAddRef for byval RCOs
> 
> * Callee does NOT insert an opRelease for its byval RCO parameters

So if the callee assigns the RCO to something else, that's when opAddRef/retain will get called, but if the callee doesn't do that, then no call is inserted? Sounds reasonable.


> It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something.  Thoughts?
[...]

As long as there is no sharing of the RCO between threads, this looks like it should work. (But I'm no ARC expert, so don't take my word for it.) But if there's sharing, the story becomes drastically more complex.


T

-- 
Lottery: tax on the stupid. -- Slashdotter
February 27, 2015
On Friday, 27 February 2015 at 18:24:27 UTC, Andrei Alexandrescu wrote:
> DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias.
>
> However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead.
>
> So I was thinking of changing DIP74 as follows:
>
> * Caller does NOT insert an opAddRef for byval RCOs
>
> * Callee does NOT insert an opRelease for its byval RCO parameters
>
> It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something. Thoughts?

I think it's fine. I couldn't even figure out the original motive for wanting to add those calls -- I thought it must have something to do with threads or exceptions or something, but even then I couldn't figure it out. Any reference argument will, by definition, outlive its function -- it can't possibly die within the function itself, since the caller still thinks it's a valid reference.

Another thing is that local references in general need not participate in reference counting. They will retain and release the reference automatically when they go in and out of scope. I'm really no expert (except that I like to study and think and by thinking become somewhat expert it appears), but if all ARC could be confined to global/heap <=> global/heap copies, you'd get the most efficient code. And I'm not trying to advertise a reference tracking system :-), but the real hiccup is that global reference can go *through* the stack and land back at a global... and you would need to keep track of that.
February 27, 2015
On 2/27/15 1:24 PM, Andrei Alexandrescu wrote:
> DIP74's function call protocol for RCOs has the caller insert opAddRef
> for each RCO passed by value. Then the callee has the responsibility to
> call opRelease (or defer that to another entity). This choice of
> protocol mimics the constructor/destructor protocol and probably shows
> our C++ bias.
>
> However, ARC does not do that. Instead, it implicitly assumes the callee
> is a borrower of the reference. Only if the callee wants to copy the
> parameter to a member or a global (i.e. save it beyond the duration of
> the call), a new call to retain() (= opAddRef) is inserted. That way,
> functions that only need to look at the object but not store it incur no
> reference call overhead.
>
> So I was thinking of changing DIP74 as follows:
>
> * Caller does NOT insert an opAddRef for byval RCOs
>
> * Callee does NOT insert an opRelease for its byval RCO parameters
>
> It seems everything will just work with this change (including all move
> scenarios), but it is simple enough to make me worry I'm missing
> something. Thoughts?

I recall saying something like this, and someone came up with a reason why you still have to add the calls. I'll see if I can dig it up.

OK, I found the offending issue. It's when you pass a parameter, the only reference holding onto it may be also passed as well. Something like:

void foo(C c, C2 c2)
{
   c2.c = null; // this destroys 'c' unless you opAddRef it before passing
   c.someFunc(); // crash
}

void main()
{
C c = new C; // ref counted class
C2 c2 = new C2; // another ref counted class
c2.c = c;
foo(c, c2);
}

How does the compiler know in this case that it *does* have to opAddRef c before calling? Maybe your ARC expert can explain how that works.

BTW, Michel Fortin is who pointed this out.

-Steve
February 27, 2015
On 2/27/15 3:30 PM, Steven Schveighoffer wrote:

> void main()
> {
> C c = new C; // ref counted class
> C2 c2 = new C2; // another ref counted class
> c2.c = c;
> foo(c, c2);
> }

Bleh, that was dumb.

void main()
{
   C2 c2 = new C2;
   c2.c = new C;
   foo(c2.c, c2);
}

Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.

-Steve
February 27, 2015
On 2015-02-27 20:34:08 +0000, Steven Schveighoffer said:

> On 2/27/15 3:30 PM, Steven Schveighoffer wrote:
> 
>> void main()
>> {
>> C c = new C; // ref counted class
>> C2 c2 = new C2; // another ref counted class
>> c2.c = c;
>> foo(c, c2);
>> }
> 
> Bleh, that was dumb.
> 
> void main()
> {
>     C2 c2 = new C2;
>     c2.c = new C;
>     foo(c2.c, c2);
> }
> 
> Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.

You have to retain 'c' for the duration of the call unless you can prove somehow that calling the function will not cause it to be released. You can prove it in certain situations:

- you are passing a local variable as a parameter and nobody has taken a mutable reference (or pointer) to that variable, or to the stack frame (be wary of nested functions accessing the stack frame)
- you are passing a global variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable.
- you are passing a member variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable or its class.

There are surely other cases, but you get the idea. These three situations are probably the most common, especially the first one. For instance, inside a member function, 'this' is a local variable and you will never pass it to another function by ref, so it's safe to call 'this.otherFunction()' without retaining 'this' first.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

February 27, 2015
On Friday, 27 February 2015 at 20:30:20 UTC, Steven Schveighoffer wrote:
> OK, I found the offending issue. It's when you pass a parameter, the only reference holding onto it may be also passed as well. Something like:
>
> void foo(C c, C2 c2)
> {
>    c2.c = null; // this destroys 'c' unless you opAddRef it before passing
>    c.someFunc(); // crash
> }
>
> void main()
> {
> C c = new C; // ref counted class
> C2 c2 = new C2; // another ref counted class
> c2.c = c;
> foo(c, c2);
> }
>
> How does the compiler know in this case that it *does* have to opAddRef c before calling? Maybe your ARC expert can explain how that works.

Split-passing nested ref-counted classes with null loads! How insidious!
February 27, 2015
On 2/27/15 12:34 PM, Steven Schveighoffer wrote:
> On 2/27/15 3:30 PM, Steven Schveighoffer wrote:
>
>> void main()
>> {
>> C c = new C; // ref counted class
>> C2 c2 = new C2; // another ref counted class
>> c2.c = c;
>> foo(c, c2);
>> }
>
> Bleh, that was dumb.
>
> void main()
> {
>     C2 c2 = new C2;
>     c2.c = new C;
>     foo(c2.c, c2);
> }
>
> Still same question. The issue here is how do you know that the
> reference that you are sure is keeping the thing alive is not going to
> release it through some back door.

Thanks! In ARC, there are autorelease pools that keep at least one reference to the objects they own. I think that's what they are for.

So let me add a complete example:

class C {
   void someFunc();
   void opAddRef();
   void opRelease();
}

class C2 {
   C c;
   void opAddRef();
   void opRelease();
}

void foo(C c, C2 c2) {
   c2.c = null;
   c.someFunc(); // crash
}

void main() {
   C2 c2 = new C2;
   c2.c = new C;
   foo(c2.c, c2);
}

Distinguishing these is an interesting problem. In fact we can reduce the matter to one class only:

class C {
   C c;
   void someFunc();
   void opAddRef();
   void opRelease();
}

void foo(C c1, C c2) {
   c2.c = null;
   c1.someFunc(); // crash
}

void main() {
   C obj = new C;
   obj.c = new C;
   foo(obj.c, obj);
}


Andrei

February 27, 2015
On 2/27/15 1:02 PM, Michel Fortin wrote:
> On 2015-02-27 20:34:08 +0000, Steven Schveighoffer said:
>
>> On 2/27/15 3:30 PM, Steven Schveighoffer wrote:
>>
>>> void main()
>>> {
>>> C c = new C; // ref counted class
>>> C2 c2 = new C2; // another ref counted class
>>> c2.c = c;
>>> foo(c, c2);
>>> }
>>
>> Bleh, that was dumb.
>>
>> void main()
>> {
>>     C2 c2 = new C2;
>>     c2.c = new C;
>>     foo(c2.c, c2);
>> }
>>
>> Still same question. The issue here is how do you know that the
>> reference that you are sure is keeping the thing alive is not going to
>> release it through some back door.
>
> You have to retain 'c' for the duration of the call unless you can prove
> somehow that calling the function will not cause it to be released. You
> can prove it in certain situations:
>
> - you are passing a local variable as a parameter and nobody has taken a
> mutable reference (or pointer) to that variable, or to the stack frame
> (be wary of nested functions accessing the stack frame)
> - you are passing a global variable as a parameter to a pure function
> and aren't giving to that pure function a mutable reference to that
> variable.
> - you are passing a member variable as a parameter to a pure function
> and aren't giving to that pure function a mutable reference to that
> variable or its class.
>
> There are surely other cases, but you get the idea. These three
> situations are probably the most common, especially the first one. For
> instance, inside a member function, 'this' is a local variable and you
> will never pass it to another function by ref, so it's safe to call
> 'this.otherFunction()' without retaining 'this' first.

Thanks. So it seems we continue as we were with DIP74 and leave the rest to the implementation.

Andrei

February 27, 2015
On 2/27/15 4:15 PM, Andrei Alexandrescu wrote:
> On 2/27/15 12:34 PM, Steven Schveighoffer wrote:
>> On 2/27/15 3:30 PM, Steven Schveighoffer wrote:
>>
>>> void main()
>>> {
>>> C c = new C; // ref counted class
>>> C2 c2 = new C2; // another ref counted class
>>> c2.c = c;
>>> foo(c, c2);
>>> }
>>
>> Bleh, that was dumb.
>>
>> void main()
>> {
>>     C2 c2 = new C2;
>>     c2.c = new C;
>>     foo(c2.c, c2);
>> }
>>
>> Still same question. The issue here is how do you know that the
>> reference that you are sure is keeping the thing alive is not going to
>> release it through some back door.
>
> Thanks! In ARC, there are autorelease pools that keep at least one
> reference to the objects they own. I think that's what they are for.

No, that's not what they are for. They are for returning data without having to worry about the receiving function needing to release. This was really important for manual reference counting. Otherwise, you would always have to return a value with it's count at 1, and put the onus on the receiving function to store the result and release it after using. One-liners would turn into huge nests of temporary variables.

I believe autorelease pools are not needed for ARC, but are maintained because much Objective-C code contains MRC, and that protocol needs to be supported. I used them a lot in my Objective-C code. With ARC you can create autorelease pools, but you never did any autoRetain manually, the ARC system did it. Creating the pool just allows you to scope where data should be released. Otherwise, you are adding it to the event loop pool which is only released after an event is done processing.

-Steve
« First   ‹ Prev
1 2