Jump to page: 1 2
Thread overview
Memory Safety without a GC or Ref Counting
Jan 21, 2013
F i L
Jan 21, 2013
H. S. Teoh
Jan 21, 2013
F i L
Jan 21, 2013
F i L
Jan 21, 2013
H. S. Teoh
Jan 21, 2013
Paulo Pinto
Jan 22, 2013
H. S. Teoh
Jan 25, 2013
F i L
Jan 25, 2013
TommiT
Jan 25, 2013
F i L
Jan 25, 2013
TommiT
Jan 25, 2013
TommiT
Jan 25, 2013
F i L
Jan 21, 2013
Arne
Jan 21, 2013
Jacob Carlborg
Jan 22, 2013
Michel Fortin
Jan 22, 2013
Jacob Carlborg
January 21, 2013
So I've been thinking about this problem off and on for awhile, and I think I have a rather simple solution. I highly doubt I'm the first to think of it, but haven't found any info about a similar idea on the net. So in the interest of science I'm posting it here so anyone interested can attempt to poke holes in the idea, which you folks have been so good at in the past.

This isn't really D related, but I usually like the feedback I get from this community, plus, if there's any value in the idea, maybe it would be a worth while discussion for future directions D might make use of.

I'll be illustrating with Pseudo-code. The basic idea spawns from conceptually separating variables as either "memory owners" or "memory references". Basically, the pseudo-code has three types:

var - Memory Owner, never null, deleted at end of scope**
ref - Memory Reference, can't "own" memory
ptr - Unsafe manually managed memory pointer (like C pointers)

So, 'ptr's aside for now, 'var's would be the only way to allocate memory, and memory is automatically allocated to them at their definition**. Vars would be deleted at the end of their scope (or when they're removed from dynamic-arrays, etc) unless they're returned, in which case they're "injected" into the callers scope, and deleted at the end of that scope. Without too much lengthy explanation, here's some code:

    type Foo
    {
        var bar = 0
    }

    func main
    {
        ref f : Foo
        ref b : int

        scope
        {
            var foo = Foo # allocates Foo

            f => foo      # f points to foo
            f.bar = 1     # foo.bar = 1

            b => f.bar    # b points to foo.bar
            b = 3         # foo.bar = 3

            # auto 'clean up code' at end of scope:
            #
            #     Memory.delete(foo)
            #     f => null
            #
        }

        f.bar = 4 # ERROR: f is null

        var n = 6

        b => n
        b = 7     # n = 7

        # No clean-up code needed, 'n' is stack-allocated
        #
        # Also, notice we didn't need to set 'b' to 'null' at
        # the end of the scope, since it wasn't accessed
        # directly after the end of the scope.
    }

So this is sort of like ref-counted scopes (smart scopes), except the compiler has a higher degree of guarantee on the lifetime of memory, and there's a lot of static analysis that can happen to optimize the clean up code (like how we didn't need to assign 'b' to null in the example above). All "ref counting" would be done through static-analysis at compile-time, and injected by the compiler, so it wouldn't have the runtime overhead of ref-counting, nor would there be a need for "weak references" since references are guaranteed to never "own" memory.

There's more to it of course. For instance, in order to have expanding memory, you'd need the concepts of a dynamic-array and type inheritance. You'd also need to not clean up memory that was returned... so...

    type Foo
    {
        var name : text
        init { name = "Foo" }
        func write { Console.write(name) }
    }

    type Bar : Foo
    {
        init { name = "Bar" }
    }

    ref global_foo : Foo

    func main
    {
        var foos = Foo[]

        foos += Foo     # allocation
        foos += Foo     # allocation
        foos += Bar     # allocation
        foos[].write()  # prints: 'Foo', 'Foo', 'Bar'

        foos -= foos[1] # deallocation
        foos[].write()  # prints: 'Foo', 'Bar'

        # ---

        func getFoo
        {
            var f = Foo
            f.name = "Fip"
            return f
            # no cleanup, since 'f' is returned
        }

        global_foo = getFoo()
        global_foo.write() # prints: 'Fip'

        # auto cleanup:
        #
        #    Memory.delete(foos)
        #    Memory.delete(global_foo)
        #    global_foo => null
        #
    }

Okay, so with that you could have global var arrays, and assign global refs to function return values, etc. If you need extra control, there's also 'ptr' types, which would require manual clean-up. However, 'ptr's would be automatically cleaned when the program exist, so they would be safe to use for global references which only sometimes needed to point to usable memory.

Anyways.. comments and criticisms would be great. Sorry this isn't more D related. Like I said, I hope it does have value D might make use of at some point.
January 21, 2013
On Mon, Jan 21, 2013 at 06:37:36PM +0100, F i L wrote:
> So I've been thinking about this problem off and on for awhile, and I think I have a rather simple solution. I highly doubt I'm the first to think of it, but haven't found any info about a similar idea on the net. So in the interest of science I'm posting it here so anyone interested can attempt to poke holes in the idea, which you folks have been so good at in the past.
> 
> This isn't really D related, but I usually like the feedback I get from this community, plus, if there's any value in the idea, maybe it would be a worth while discussion for future directions D might make use of.
> 
> I'll be illustrating with Pseudo-code. The basic idea spawns from conceptually separating variables as either "memory owners" or "memory references". Basically, the pseudo-code has three types:
> 
> var - Memory Owner, never null, deleted at end of scope**
> ref - Memory Reference, can't "own" memory
> ptr - Unsafe manually managed memory pointer (like C pointers)
[...]

I invented this scheme (independently -- I'm sure the concept has been thought of before) way back when I was coding in C and C++.

Of course, I didn't have language/compiler support, but I did adopt a convention of marking every pointer with [O] (for owner) and [R] (for reference) in comments, in all my code. Owner pointers can never be assigned a reference pointer, and owner pointers can only be destructively copied (the original must be set to NULL after copying). Owner pointers must be free()'d if they are non-NULL and going out of scope (unless they are returned from the function, in which case I consider it as a conceptual destructive copy followed by NULLing). I did allow owner pointers to be NULL -- mainly because it is required for destructive copy to be sound.

In newer versions of C++ (I invented this scheme back in the 90's when templates and STL were only starting to emerge), owner pointers are essentially equivalent to STL's auto_ptr<T>.

This was essentially as far as I developed the concept -- and probably it's as far as it will get, because the concept has limitations:

1) It required code to be manually written to deal with owner pointers correctly (auto_ptr mostly made this automatic, but it is certainly NOT a drop-in replacement for pointers);

2) It can't deal with cyclic structures correctly;

3) It is still unsafe: you can have a dangling reference to owned memory, because the owner pointer goes out of scope and the memory gets deallocated, but there can still be references lingering around somewhere.

IOW, it doesn't solve the problem of memory management. It is one level up from having no distinction whatsoever between pointers (and scarily enough, I've seen "enterprise" code that was obviously written without any such distinction in mind), for sure, but it's not much more than that.


T

-- 
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs
January 21, 2013
On Monday, 21 January 2013 at 17:37:38 UTC, F i L wrote:
> This isn't really D related, but I usually like the feedback I get from this community, plus, if there's any value in the idea, maybe it would be a worth while discussion for future directions D might make use of.
>

I haven´t found any hole or improvement suggestion yet, my only comment is, I`d really love to use a system like this.
January 21, 2013
On 2013-01-21 18:37, F i L wrote:
> So I've been thinking about this problem off and on for awhile, and I
> think I have a rather simple solution. I highly doubt I'm the first to
> think of it, but haven't found any info about a similar idea on the net.
> So in the interest of science I'm posting it here so anyone interested
> can attempt to poke holes in the idea, which you folks have been so good
> at in the past.

Seems a bit like ARC (Automatic Reference Counting) that Clang has implemented for Objective-C (and some C libraries on Mac OS X). The compiler automatically inserts calls to "retain" and "release" where appropriate.

-- 
/Jacob Carlborg
January 21, 2013
F i L wrote:
> [...code...]
>     global_foo = getFoo()
> [...code...]

Typo, that should have been:

    global_foo => getFoo()




H. S. Teoh wrote:
> [...] because the concept has limitations:
>
> 1) It required code to be manually written to deal with owner pointers
> correctly (auto_ptr mostly made this automatic, but it is certainly NOT
> a drop-in replacement for pointers);

I don't do a whole lot of C++, so I don't know a lot about their smart_ptr types, however, I doubt you could do anything like my idea in C++ (without a lot of extra syntax like you described) since it requires a lot of automatic per-scope bookkeeping I doubt C++ types alone could accomplish.


> 2) It can't deal with cyclic structures correctly;

That's why there would be a 'ptr' type :)


> 3) It is still unsafe: you can have a dangling reference to owned
> memory, because the owner pointer goes out of scope and the memory gets
> deallocated, but there can still be references lingering around
> somewhere.

This is why there would need to be a lot of magic happening in the compiler. When the compiler can't know, for sure, weather a reference that was assigned to a local-scope var is *still* referencing that var at the end of the scope (pretty common), then it asks before setting assigning to null. If the compiler can't determine, at all, which refs will be pointing to a particular local-var, it falls back to regular ref-counting (for the var in question).

I've given it a bit of thought, and while runtime checks and even regular ref-counting wouldn't be too uncommon, due to it's var/ref design there would be a great many places where the compiler could optimize away all the runtime checks and you normally wouldn't have to deal with the cyclic-ref problem ref-counting memory-management usually bring to the table (granted ptr's aren't much better). Also, I can't remember exactly, but because var's are tied to scopes, I think there's more optimization that can happen with ref-counting because it happens at the scope level... more thoughts needed here though.

One benefit of this I forgot to mention is that, like ref-counting, memory is deterministic (with optional late-collection). That wou


Thanks for your feedback!




Arne wrote:
> I haven´t found any hole or improvement suggestion yet, my only comment is, I`d really love to use a system like this.

Me too! Let me know if you think of any flaws.




Jacob Carlborg wrote:
> Seems a bit like ARC (Automatic Reference Counting) that Clang has implemented for Objective-C (and some C libraries on Mac OS X). The compiler automatically inserts calls to "retain" and "release" where appropriate.

Interesting, I will have to look into that. Thanks for the info.
January 21, 2013
F i L wrote:
> ... Also, I can't remember exactly, but because var's are tied to scopes, I think there's more optimization that can happen with ref-counting because it happens at the scope level... more thoughts needed here though.

Ah right I remember. Since vars only ever get collected at the end of a scope, you don't need to check for zero-refs on every dereference, only at the end of the var's scope.

January 21, 2013
On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote: [...]
> H. S. Teoh wrote:
[...]
> >3) It is still unsafe: you can have a dangling reference to owned memory, because the owner pointer goes out of scope and the memory gets deallocated, but there can still be references lingering around somewhere.
> 
> This is why there would need to be a lot of magic happening in the compiler. When the compiler can't know, for sure, weather a reference that was assigned to a local-scope var is *still* referencing that var at the end of the scope (pretty common), then it asks before setting assigning to null. If the compiler can't determine, at all, which refs will be pointing to a particular local-var, it falls back to regular ref-counting (for the var in question).
[...]

Hmm. So you're essentially saying that the compiler will do compile-time analysis of variable usage, and based on that choose the simplest memory management model for it?

That's not a bad idea. You can have a series of increasingly complex management techniques, and the compiler could choose the least complex one that it can statically prove is safe. So if something can be proven not to escape the current scope, it can even be stack-allocated (or if it's big, put it on the manual heap and delete it on scope exit). If it is returned with no other references to it, then it can be a single-reference pointer (i.e. your owner type). If it has cyclic references, use a reference-counted pointer. If all else fails, fall back to the GC.

The problem is that quite often, the compiler will not be able to prove much about the data structure, esp. when you have non-trivial manipulation of pointers -- it requires solving the halting problem to achieve that level of analysis on arbitrary code -- so it will probably fall back to the GC quite often, depending on what the code does.


T

-- 
My program has no bugs! Only undocumented features...
January 21, 2013
Am 21.01.2013 22:25, schrieb H. S. Teoh:
> On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote:
> [...]
>> H. S. Teoh wrote:
> [...]
>>> 3) It is still unsafe: you can have a dangling reference to owned
>>> memory, because the owner pointer goes out of scope and the memory
>>> gets deallocated, but there can still be references lingering around
>>> somewhere.
>>
>> This is why there would need to be a lot of magic happening in the
>> compiler. When the compiler can't know, for sure, weather a reference
>> that was assigned to a local-scope var is *still* referencing that var
>> at the end of the scope (pretty common), then it asks before setting
>> assigning to null. If the compiler can't determine, at all, which refs
>> will be pointing to a particular local-var, it falls back to regular
>> ref-counting (for the var in question).
> [...]
>
> Hmm. So you're essentially saying that the compiler will do compile-time
> analysis of variable usage, and based on that choose the simplest memory
> management model for it?
>
> That's not a bad idea. You can have a series of increasingly complex
> management techniques, and the compiler could choose the least complex
> one that it can statically prove is safe. So if something can be proven
> not to escape the current scope, it can even be stack-allocated (or if
> it's big, put it on the manual heap and delete it on scope exit). If it
> is returned with no other references to it, then it can be a
> single-reference pointer (i.e. your owner type). If it has cyclic
> references, use a reference-counted pointer. If all else fails, fall
> back to the GC.
>
> The problem is that quite often, the compiler will not be able to prove
> much about the data structure, esp. when you have non-trivial
> manipulation of pointers -- it requires solving the halting problem to
> achieve that level of analysis on arbitrary code -- so it will probably
> fall back to the GC quite often, depending on what the code does.
>
>
> T
>


I see how this could work in a JIT environment, but not in a static compiler with modules compiled in separate ways.

--
Paulo
January 22, 2013
On 2013-01-21 19:22:04 +0000, Jacob Carlborg <doob@me.com> said:

> On 2013-01-21 18:37, F i L wrote:
>> So I've been thinking about this problem off and on for awhile, and I
>> think I have a rather simple solution. I highly doubt I'm the first to
>> think of it, but haven't found any info about a similar idea on the net.
>> So in the interest of science I'm posting it here so anyone interested
>> can attempt to poke holes in the idea, which you folks have been so good
>> at in the past.
> 
> Seems a bit like ARC (Automatic Reference Counting) that Clang has implemented for Objective-C (and some C libraries on Mac OS X). The compiler automatically inserts calls to "retain" and "release" where appropriate.

And I'll confirm that it's no magic either. Avoiding cyclic references isn't that easy when you start mixing implicitly retained pointers with Objective-C's blocks (lambas in C++ or delegate literals in D).

I'm a little sad Apple decided to deprecate its Objective-C garbage collector. Having the choice between GC and reference counting was great (although it surely sucks for library developers who need to support both modes).

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

January 22, 2013
On Mon, Jan 21, 2013 at 11:36:58PM +0100, Paulo Pinto wrote:
> Am 21.01.2013 22:25, schrieb H. S. Teoh:
[...]
> >Hmm. So you're essentially saying that the compiler will do compile-time analysis of variable usage, and based on that choose the simplest memory management model for it?
> >
> >That's not a bad idea. You can have a series of increasingly complex management techniques, and the compiler could choose the least complex one that it can statically prove is safe. So if something can be proven not to escape the current scope, it can even be stack-allocated (or if it's big, put it on the manual heap and delete it on scope exit). If it is returned with no other references to it, then it can be a single-reference pointer (i.e. your owner type). If it has cyclic references, use a reference-counted pointer. If all else fails, fall back to the GC.
> >
> >The problem is that quite often, the compiler will not be able to prove much about the data structure, esp. when you have non-trivial manipulation of pointers -- it requires solving the halting problem to achieve that level of analysis on arbitrary code -- so it will probably fall back to the GC quite often, depending on what the code does.
[...]
> I see how this could work in a JIT environment, but not in a static compiler with modules compiled in separate ways.
[...]

It can be made to work with separately-compiled modules if there's language support (say in the form of exported pointer attributes). But probably not feasible in the current version of D.


T

-- 
Designer clothes: how to cover less by paying more.
« First   ‹ Prev
1 2