RAII (page 2)

On Mon, 26 Aug 2002 11:39:00 -0700 Russell Lewis <spamhole-2001-07-16@deming-os.org> wrote:

> I seem to remember that "auto" had some meaning back in the early C

It is in ANSI C/C++ standard and means local non-static variable. Since
variables are
non-static by default, it's never used, but still it is there (and for that
reason I proposed it).

> days.  I wonder if it isn't a bad choice.  Maybe, going off of Patrick's point 2 here, we could use the keyword "stack":
> 
>      stack Foo a;

I also like "counted", since the object is not actually on stack - it can
outlive
the function in which it was created if you pass it outside (due to
refcounting).
Maybe "counted" is a better idea then? Still I like "auto"... =)

> Which might make it clear that it is a stack variable, not an ordinary reference.  I like Patrick's idea that you wouldn't need to "new" a copy.

Alternatively, it could new it by defaullt, and where you don't want it, you
can initialize
it to null:

	auto File a;		// default = new File();
	auto File b = null;	// no object created.

August 26, 2002

Re: RAII

Posted by Mac Reiter
in reply to Walter

Permalink

Mac Reiter

Posted in reply to Walter

Permalink

>> >And the same problem arises:
>> >In practice, RAII is not the property of the instance, but of the class,
>> >isn't it?
>> >Then you need to specify this at the class declaration of "A", so it
>cannot
>> >be a storage class.
>> >Thoughts?
>
>Making it a property of the class, rather than the instance, leads to much implementation grief. For example, pulling on that thread a bit <g>, it seems to lead to needing to implement two versions of each class - one counted, one not.

To avoid constantly saying "scoped or reference counted", I'm just going to say dof'ed (deterministic object finalized, or something like that...)

I cannot offhand think of any classes that would need both a dof'ed and normal implementation.  Most standard library classes certainly don't need dof behavior.  Of the dof'able classes I can consider (Locks, Files, Ports), they should always be dof'ed.  If I am finished with a Lock, it needs to be released now, or it may deadlock the program.  If I am finished with a File or Port, it should be released now so that other applications can work with it.

If someone really does need both versions of a class, it isn't all that difficult to do:

class Foo{}
dof class DofFoo : Foo{}

(I am going to go slightly off my primary point here because I want to head off any comments about having to remember which version of the class to use...) This is no better or worse than the instance property approach for this case, but for the more common case of a dof'ed-only class, it saves the user having to remember the extra keyword each time they use the class.  The number of polymorphic C++ designs that have blown up because a member functions somewhere in the class hierarchy left off the "virtual" keyword should be a clear enough example of why it would be preferable to make dof a class property rather than requiring a keyword at each instance.

The quick point is that I personally cannot imagine a class that needs both dof'ed and normal implementations, but even if such a class arises it can be handled with inheritance with no more effort cost than the instance property form would force on all dof'ed instances.

>> If it is doable, I would prefer that 'auto' be a class property ( auto
>class
>> Lock{} ).  If flexibility is a problem, and it becomes necessary for a particular instance of an 'auto class' to outlive its scope (unlikely, but
>it is
>> possible), and if un-autoing the class is simply not an acceptable
>solution,
>> perhaps a keyword like 'collected' or 'non_auto' could be added (it
>doesn't have
>> to look nice -- it will be used VERY infrequently and should only be used
>by
>> people who really know what they are doing) as a storage class so that it disabled 'auto'ness for a particular instance.
>
>While in C++ it is common to have a destructor (to manage memory), in D having a resource that needs cleaning up should be the exception, not the rule.

I think this comment is suggesting that dof'ing is an exception, so it's OK to require a keyword at each instance.  While a dof'ed instance may be an exception compared to the broader scope of programming (see below for my doubts), for classes that need dof the dof'ed instance is definitely the rule rather than the exception.  Consider the all-too-common Lock.  Lock would be dof'ed at least 99.9% of the time.  Any non-dof usage of Lock would almost certainly be better handled by performing the lock/unlock function on the underlying Mutex/Semaphore/CriticalSection, rather than bundling it up in a useless Lock. Lock's very existence is to provide dof services.  Lock doesn't even need to have any member functions -- its constructor locks the sync object, and its destructor/finalizer unlocks it.  It has no other user interface.  If such services can be disabled by forgetting to add a keyword, then they might as well not exist.

When using classes that have dof behavior, dof is the rule, and non-dof is either non-existent or a rare exception.  This suggests that dof classes should not require a keyword for the common dof'ed instances, and possibly have a keyword for the exceptional non-dof'ed instances.

Slightly off topic, since I know that you recognize the importance of dof (otherwise you would not have offered up an implementation, right?).  The statement that the use of dof is the exceptional case bothers me.  I suspect that that is not the case.  How many people have stayed away from Java, C#, and D specifically because of the lack of dof?  Once a programmer becomes familiar with RAII-style programming, relying on some form of dof, it can very quickly permeate designs.  This is not simply syntactic sugar, but is a fundamental design practice that helps make the resulting implementation stabler, more correct, and more robust.  A programmer that is used to RAII would never even consider attempting to make a multithreaded program without RAII.  I realize that D's synchronized keyword and inherent multithreading simplify this process, but there are always other resources that are commonly used, and misused, that can benefit from dof/RAII behaviors.

>> Or, to flog a fatally wounded horse one last time, you could implement
>reference
>> counting and get the RAII that you originally described : "RAII is a
>programming
>> paradigm where resources are automatically released when an object is no
>longer
>> referenced."  C++'s method is a limited subset of what RAII really should
>be.
>> If you can automatically implement the pseudo-finally to perform a delete
>for
>> auto instances, you can automatically implement a pseudo-finally to
>decrement
>> the reference count for a counted instance.  The only reason for
[clip]
>Reference counting involves far, far more than just a pseudo-finally.

Granted.  But in one of my previous refcounting posts it was mentioned that the primary difficulty was in maintaining the reference counting invariant in the presence of exceptions.  Well, you have a mechanism for handling arbitrarily complicated stuff in the presence of exceptions, so it shouldn't be any harder to maintain the invariant with exceptions than it would have been without exceptions.

>> Reference counting also solves the:
>>
>> {
>> auto A a = new A; // A1
>> a = new A;        // A2
>> }
>>
>> where A2 gets deleted but A1 waits for the GC to get around to it (which
>is not
>> RAII or deterministic).  Refcounting would have to watch assignments, and
>modify
>> the refcount for the old object as well as the new.  But when it did that,
>it
>> would notice that A1 was no longer used and could finalize it (I defer to
>the
>> masses on whether it should be deleted or not, but it definitely should be finalized).
>
>At least in D the resource would eventually get cleaned up on the GC pass, whereas in C++ it is a memory leak that will never get cleaned up.

It would eventually get cleaned up as long as some thread remained active to pump the GC.  Back to Locks and threads: if I do not write a separate thread whose sole job is to run this code:

while (!done)
{
gc.Collect();
}

then it is possible to get in a state where all threads are locked on a Lock instance that has been lost.  _If_ the GC could ever get cycles to run, it would notice this object and finalize it.  But since all of the threads are locked on this object, the GC never gets a chance to run, so the object never gets finalized, and the application sits there forever.  Unlikely, but the first rule in multithreaded programming is that (to plagiarize shamelessly from Terry Pratchett's Discworld books) "1 in a million chances happen 9 times out of 10". "Should" isn't "will", and unless it "will" be collected, it isn't safe.

>> As for whether the suggested 'counted' should be an instance or class
>property,
>> I suspect the argument given above for class property still holds.
>Classes
>> should be counted, not instances.  Part of the reason for this is that a
[clip]
>> Because derived types can be assigned into base type references, it may be necessary to consider the interaction of inheritance and the 'counted'
>class
>> specifier.  Are all children of a counted class automatically counted?
[clip]
>> require a runtime check.  Perhaps a compile time restraint that simply
>says that
>> counted classes are not allowed to participate in polymorphic assignments
>at all
>> would be a better first solution...
>
>Suppose you have a printing function that takes an object of type Object. Object is not counted, so the counted class is cast to Object. To support this, it becomes necessary to 1) extend counting to every object 2) ignore the possibility of bugs from dangling references 3) disallow conversions to Object.
>
>1) has too many penalties for D as whole
>2) is a similar bug to handing off a reference to an 'auto' instance, but I
>think worse because it will happen more often
>3) requires creation of two versions of most things in the library, one to
>handle counted and one for non-counted

This is why I suggested that 'counted' classes not be allowed to participate in polymorphic actions (conversion to Object, in your example).  I certainly don't want everything in D to be refcounted.  I have mentioned several times that refcounting adds a very noticeable performance penalty, and thus should be limited only to those things that need it.  I also agree that "Ignoring the possibility of bugs" is never a wise choice of action.  That is why I went for disabling type conversions for counted classes.  I actually went further than your 3), because I suggested disabling _any_ conversion, up or down, to any level of the class hierarchy.  The compiler simply disallows casting a counted class, implicitly or explicitly, to anything else.  Maybe someday, when more experience is gained, some method of safely handling conversions may be found. But I don't think a moratorium on 'counted' conversions would cause any problems.  Let me explain why:

(I will continue to use thread synchronization primitives and the Lock class, because it is my primary experience with the RAII idiom that does not work just as well with GC)

Say you have some system that needs to be able to manage a collection of Locks, some of which will lock Mutexes, others lock Semaphores, etc.  You might think that you need polymorphism here so that you can treat all the different types of Locks as the same type.  But that is a flawed design.  There is only _one_ Lock type.  What is changing is not the type of lock, but the type of object being locked.  Lock has a private data member, and that member will be a polymorphic base class pointer/reference to the base class of all of the synchronization objects:

class SyncObject{}
class Mutex : SyncObject{}
class Semaphore : SyncObject{}
counted class Lock
{
SyncObject* lockableObject;
}
Lock[] MyLocks;
bool HandleLock(Lock TheLock);

Note that Lock does not derive from anything, and nothing derives from Lock, and yet you can still maintain a collection of Locks that are each internally polymorphizing SyncObjects.  You don't need any implicit or explicit conversions, so the 'counted' on Lock doesn't cause any problems.

I already talked about the "requires creation of two versions of most things in the library, one to handle counted and one for non-counted" issue above, and why I don't think it is an issue.

>> Yes, there are complications with 'counted' classes.  But realistically,
>either
>> compiler writers or programmers are going to experience similar
>complications
>> with 'auto' classes (and worse problems with 'auto' instances).  If the
>problems
>> are roughly the same, I would vote for the more flexible and complete
>solution.
>> Of course, I am somewhat biased on the issue...
>
>No, I believe the 'auto' approach has an order of magnitude less implementation effort than ref counting. For example, it won't be necessary to handle arrays of counted objects, counted objects as members of structs, counted objects as members of non-counted objects, assignment overloading, etc.

1. arrays of counted objects
Pragmatically, this means that the array is counted.  The slightly tricky bit is
that the array doesn't have a refcount of its own.  Its refcount is the largest
refcount of any of its elements.  For pathological cases, this could be
expensive to check, but you do get to stop as soon as you hit any non-zero
refcount, so it's only a problem for large arrays of counted objects, where all
but the last object have already been "freed".  Because of that, I would
recommend that when you construct the pseudo-finally, you decrement refcounts
for the array elements starting at the back, so that each check will see the
positive refcount in the first array element.  Then when you finish by
decrementing the refcount of the first element, the array refcount check will
make one big sweep through and free up the memory.  (Or, if you write the count
checking loop as a back-to-front loop, then do the decrements from
front-to-back.)  Of course, static analysis could be used in more advanced
implementations to notice when the refcount would never exceed 1 and just
finalize the whole array, bypassing the whole refcounting proceedings, at scope
exit.  But it doesn't have to be that fancy to begin with.  I also don't see
really big arrays of counted objects, but just 'coz I don't see it doesn't mean
somebody won't try to do it...

2. counted objects as members of structs
I assume this is referring to the lack of destructors for structs, which would
mean that there was nowhere to perform the refcount decrement.  My simple answer
for this is the same as it would be for (keep reading):

3. counted objects as members of non-counted objects
Similar to the "Lock isn't polymorphic, it just contains a polymorphic member"
example above, I suspect that it is a bad design to have a counted object as a
member of a non-counted object, whether it is a struct or a normal class.  I
think that the composite class should also be counted.  But I also believe that
you shouldn't walk around with pointers to members of objects, so maybe I'm a
little too strict.  My simple answer is to disallow it.  Compiler error.  It
would have to wait until the symbol table is available during actual compilation
to detect, given the "property of a class" nature of the counted keyword.  Since
nobody but the compiler particularly needs to know about the counted/non-counted
nature of classes, I don't think that's a problem.

4. assignment overloading
I think that this is addressed by the "disallow conversions" and "store a flag
in the symbol table to know that you need additional prolog/epilog code for
assignment" topics.  It would not surprise me, however, to discover that I am
simply overlooking something.

Reference counting _will_ eventually be implemented.  It's been done several times in C++, and C++ already had dof.  No matter how hard it is to do in the compiler, it is even harder to do correctly from source-code-land.  Sometimes it is even impossible to do correctly from out here, due to optimizations that can be performed by the compiler, which can result is out-of-order operation that makes the classes think that the last reference was removed before another reference was created.

If I had to pick someone I trust to implement difficult things correctly, I would rather trust you (Walter) than someone who decides to provide an add-on library that "does reference counting, mostly, as long as you don't ever do <list of not entirely uncommon things>".  Yes, it does add to the complexity of the compiler.  But I suspect that once you did it you would find some simplifications and tricks that made it considerably easier.

Having rambled on for so long, I will close with this.  I would _really_ prefer the dof mechanism to be reference counting, but if scoping is all anyone else needs, then that is still almost infinitely better than no dof at all.  Most dof usage will be fine with scoping.  The problem of "losing" instances when their controlling reference switches to a new instance (A1 and A2 example) is a little disturbing, but probably not common in practice.  C++ avoids it by removing the reference indirection -- RAII in C++ only applies to local objects, not to local pointers/references to heap objects.  I suppose you could try to do something similar in D, but only for dof'ed instances.  This doesn't mean it has to come from the stack -- SmallEiffel knows the difference between objects and references to objects, but all of them come from the heap.  It's just that objects can't ever be attached to anything except the particular instance that was created for them at scope entry.  Or you could do what was suggested in one of the earlier posts, and when an assignment happens to a dof'ed variable, you finalize the previous occupant (after making sure that they haven't done something foolish like "a = a;").

Hoping that I'm not being too annoying,
Mac

Forums