View mode: basic / threaded / horizontal-split · Log in · Help
January 21, 2013
Memory Safety without a GC or Ref Counting
So I've been thinking about this problem off and on for awhile, 
and I think I have a rather simple solution. I highly doubt I'm 
the first to think of it, but haven't found any info about a 
similar idea on the net. So in the interest of science I'm 
posting it here so anyone interested can attempt to poke holes in 
the idea, which you folks have been so good at in the past.

This isn't really D related, but I usually like the feedback I 
get from this community, plus, if there's any value in the idea, 
maybe it would be a worth while discussion for future directions 
D might make use of.

I'll be illustrating with Pseudo-code. The basic idea spawns from 
conceptually separating variables as either "memory owners" or 
"memory references". Basically, the pseudo-code has three types:

var - Memory Owner, never null, deleted at end of scope**
ref - Memory Reference, can't "own" memory
ptr - Unsafe manually managed memory pointer (like C pointers)

So, 'ptr's aside for now, 'var's would be the only way to 
allocate memory, and memory is automatically allocated to them at 
their definition**. Vars would be deleted at the end of their 
scope (or when they're removed from dynamic-arrays, etc) unless 
they're returned, in which case they're "injected" into the 
callers scope, and deleted at the end of that scope. Without too 
much lengthy explanation, here's some code:

    type Foo
    {
        var bar = 0
    }

    func main
    {
        ref f : Foo
        ref b : int

        scope
        {
            var foo = Foo # allocates Foo

            f => foo      # f points to foo
            f.bar = 1     # foo.bar = 1

            b => f.bar    # b points to foo.bar
            b = 3         # foo.bar = 3

            # auto 'clean up code' at end of scope:
            #
            #     Memory.delete(foo)
            #     f => null
            #
        }

        f.bar = 4 # ERROR: f is null

        var n = 6

        b => n
        b = 7     # n = 7

        # No clean-up code needed, 'n' is stack-allocated
        #
        # Also, notice we didn't need to set 'b' to 'null' at
        # the end of the scope, since it wasn't accessed
        # directly after the end of the scope.
    }

So this is sort of like ref-counted scopes (smart scopes), except 
the compiler has a higher degree of guarantee on the lifetime of 
memory, and there's a lot of static analysis that can happen to 
optimize the clean up code (like how we didn't need to assign 'b' 
to null in the example above). All "ref counting" would be done 
through static-analysis at compile-time, and injected by the 
compiler, so it wouldn't have the runtime overhead of 
ref-counting, nor would there be a need for "weak references" 
since references are guaranteed to never "own" memory.

There's more to it of course. For instance, in order to have 
expanding memory, you'd need the concepts of a dynamic-array and 
type inheritance. You'd also need to not clean up memory that was 
returned... so...

    type Foo
    {
        var name : text
        init { name = "Foo" }
        func write { Console.write(name) }
    }

    type Bar : Foo
    {
        init { name = "Bar" }
    }

    ref global_foo : Foo

    func main
    {
        var foos = Foo[]

        foos += Foo     # allocation
        foos += Foo     # allocation
        foos += Bar     # allocation
        foos[].write()  # prints: 'Foo', 'Foo', 'Bar'

        foos -= foos[1] # deallocation
        foos[].write()  # prints: 'Foo', 'Bar'

        # ---

        func getFoo
        {
            var f = Foo
            f.name = "Fip"
            return f
            # no cleanup, since 'f' is returned
        }

        global_foo = getFoo()
        global_foo.write() # prints: 'Fip'

        # auto cleanup:
        #
        #    Memory.delete(foos)
        #    Memory.delete(global_foo)
        #    global_foo => null
        #
    }

Okay, so with that you could have global var arrays, and assign 
global refs to function return values, etc. If you need extra 
control, there's also 'ptr' types, which would require manual 
clean-up. However, 'ptr's would be automatically cleaned when the 
program exist, so they would be safe to use for global references 
which only sometimes needed to point to usable memory.

Anyways.. comments and criticisms would be great. Sorry this 
isn't more D related. Like I said, I hope it does have value D 
might make use of at some point.
January 21, 2013
Re: Memory Safety without a GC or Ref Counting
On Mon, Jan 21, 2013 at 06:37:36PM +0100, F i L wrote:
> So I've been thinking about this problem off and on for awhile, and
> I think I have a rather simple solution. I highly doubt I'm the
> first to think of it, but haven't found any info about a similar
> idea on the net. So in the interest of science I'm posting it here
> so anyone interested can attempt to poke holes in the idea, which
> you folks have been so good at in the past.
> 
> This isn't really D related, but I usually like the feedback I get
> from this community, plus, if there's any value in the idea, maybe
> it would be a worth while discussion for future directions D might
> make use of.
> 
> I'll be illustrating with Pseudo-code. The basic idea spawns from
> conceptually separating variables as either "memory owners" or
> "memory references". Basically, the pseudo-code has three types:
> 
> var - Memory Owner, never null, deleted at end of scope**
> ref - Memory Reference, can't "own" memory
> ptr - Unsafe manually managed memory pointer (like C pointers)
[...]

I invented this scheme (independently -- I'm sure the concept has been
thought of before) way back when I was coding in C and C++.

Of course, I didn't have language/compiler support, but I did adopt a
convention of marking every pointer with [O] (for owner) and [R] (for
reference) in comments, in all my code. Owner pointers can never be
assigned a reference pointer, and owner pointers can only be
destructively copied (the original must be set to NULL after copying).
Owner pointers must be free()'d if they are non-NULL and going out of
scope (unless they are returned from the function, in which case I
consider it as a conceptual destructive copy followed by NULLing). I did
allow owner pointers to be NULL -- mainly because it is required for
destructive copy to be sound.

In newer versions of C++ (I invented this scheme back in the 90's when
templates and STL were only starting to emerge), owner pointers are
essentially equivalent to STL's auto_ptr<T>.

This was essentially as far as I developed the concept -- and probably
it's as far as it will get, because the concept has limitations:

1) It required code to be manually written to deal with owner pointers
correctly (auto_ptr mostly made this automatic, but it is certainly NOT
a drop-in replacement for pointers);

2) It can't deal with cyclic structures correctly;

3) It is still unsafe: you can have a dangling reference to owned
memory, because the owner pointer goes out of scope and the memory gets
deallocated, but there can still be references lingering around
somewhere.

IOW, it doesn't solve the problem of memory management. It is one level
up from having no distinction whatsoever between pointers (and scarily
enough, I've seen "enterprise" code that was obviously written without
any such distinction in mind), for sure, but it's not much more than
that.


T

-- 
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs
January 21, 2013
Re: Memory Safety without a GC or Ref Counting
On Monday, 21 January 2013 at 17:37:38 UTC, F i L wrote:
> This isn't really D related, but I usually like the feedback I 
> get from this community, plus, if there's any value in the 
> idea, maybe it would be a worth while discussion for future 
> directions D might make use of.
>

I haven´t found any hole or improvement suggestion yet, my only 
comment is, I`d really love to use a system like this.
January 21, 2013
Re: Memory Safety without a GC or Ref Counting
On 2013-01-21 18:37, F i L wrote:
> So I've been thinking about this problem off and on for awhile, and I
> think I have a rather simple solution. I highly doubt I'm the first to
> think of it, but haven't found any info about a similar idea on the net.
> So in the interest of science I'm posting it here so anyone interested
> can attempt to poke holes in the idea, which you folks have been so good
> at in the past.

Seems a bit like ARC (Automatic Reference Counting) that Clang has 
implemented for Objective-C (and some C libraries on Mac OS X). The 
compiler automatically inserts calls to "retain" and "release" where 
appropriate.

-- 
/Jacob Carlborg
January 21, 2013
Re: Memory Safety without a GC or Ref Counting
F i L wrote:
> [...code...]
>     global_foo = getFoo()
> [...code...]

Typo, that should have been:

    global_foo => getFoo()




H. S. Teoh wrote:
> [...] because the concept has limitations:
>
> 1) It required code to be manually written to deal with owner 
> pointers
> correctly (auto_ptr mostly made this automatic, but it is 
> certainly NOT
> a drop-in replacement for pointers);

I don't do a whole lot of C++, so I don't know a lot about their 
smart_ptr types, however, I doubt you could do anything like my 
idea in C++ (without a lot of extra syntax like you described) 
since it requires a lot of automatic per-scope bookkeeping I 
doubt C++ types alone could accomplish.


> 2) It can't deal with cyclic structures correctly;

That's why there would be a 'ptr' type :)


> 3) It is still unsafe: you can have a dangling reference to 
> owned
> memory, because the owner pointer goes out of scope and the 
> memory gets
> deallocated, but there can still be references lingering around
> somewhere.

This is why there would need to be a lot of magic happening in 
the compiler. When the compiler can't know, for sure, weather a 
reference that was assigned to a local-scope var is *still* 
referencing that var at the end of the scope (pretty common), 
then it asks before setting assigning to null. If the compiler 
can't determine, at all, which refs will be pointing to a 
particular local-var, it falls back to regular ref-counting (for 
the var in question).

I've given it a bit of thought, and while runtime checks and even 
regular ref-counting wouldn't be too uncommon, due to it's 
var/ref design there would be a great many places where the 
compiler could optimize away all the runtime checks and you 
normally wouldn't have to deal with the cyclic-ref problem 
ref-counting memory-management usually bring to the table 
(granted ptr's aren't much better). Also, I can't remember 
exactly, but because var's are tied to scopes, I think there's 
more optimization that can happen with ref-counting because it 
happens at the scope level... more thoughts needed here though.

One benefit of this I forgot to mention is that, like 
ref-counting, memory is deterministic (with optional 
late-collection). That wou


Thanks for your feedback!




Arne wrote:
> I haven´t found any hole or improvement suggestion yet, my only 
> comment is, I`d really love to use a system like this.

Me too! Let me know if you think of any flaws.




Jacob Carlborg wrote:
> Seems a bit like ARC (Automatic Reference Counting) that Clang 
> has implemented for Objective-C (and some C libraries on Mac OS 
> X). The compiler automatically inserts calls to "retain" and 
> "release" where appropriate.

Interesting, I will have to look into that. Thanks for the info.
January 21, 2013
Re: Memory Safety without a GC or Ref Counting
F i L wrote:
> ... Also, I can't remember exactly, but because var's are tied 
> to scopes, I think there's more optimization that can happen 
> with ref-counting because it happens at the scope level... more 
> thoughts needed here though.

Ah right I remember. Since vars only ever get collected at the 
end of a scope, you don't need to check for zero-refs on every 
dereference, only at the end of the var's scope.
January 21, 2013
Re: Memory Safety without a GC or Ref Counting
On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote:
[...]
> H. S. Teoh wrote:
[...]
> >3) It is still unsafe: you can have a dangling reference to owned
> >memory, because the owner pointer goes out of scope and the memory
> >gets deallocated, but there can still be references lingering around
> >somewhere.
> 
> This is why there would need to be a lot of magic happening in the
> compiler. When the compiler can't know, for sure, weather a reference
> that was assigned to a local-scope var is *still* referencing that var
> at the end of the scope (pretty common), then it asks before setting
> assigning to null. If the compiler can't determine, at all, which refs
> will be pointing to a particular local-var, it falls back to regular
> ref-counting (for the var in question).
[...]

Hmm. So you're essentially saying that the compiler will do compile-time
analysis of variable usage, and based on that choose the simplest memory
management model for it?

That's not a bad idea. You can have a series of increasingly complex
management techniques, and the compiler could choose the least complex
one that it can statically prove is safe. So if something can be proven
not to escape the current scope, it can even be stack-allocated (or if
it's big, put it on the manual heap and delete it on scope exit). If it
is returned with no other references to it, then it can be a
single-reference pointer (i.e. your owner type). If it has cyclic
references, use a reference-counted pointer. If all else fails, fall
back to the GC.

The problem is that quite often, the compiler will not be able to prove
much about the data structure, esp. when you have non-trivial
manipulation of pointers -- it requires solving the halting problem to
achieve that level of analysis on arbitrary code -- so it will probably
fall back to the GC quite often, depending on what the code does.


T

-- 
My program has no bugs! Only undocumented features...
January 21, 2013
Re: Memory Safety without a GC or Ref Counting
Am 21.01.2013 22:25, schrieb H. S. Teoh:
> On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote:
> [...]
>> H. S. Teoh wrote:
> [...]
>>> 3) It is still unsafe: you can have a dangling reference to owned
>>> memory, because the owner pointer goes out of scope and the memory
>>> gets deallocated, but there can still be references lingering around
>>> somewhere.
>>
>> This is why there would need to be a lot of magic happening in the
>> compiler. When the compiler can't know, for sure, weather a reference
>> that was assigned to a local-scope var is *still* referencing that var
>> at the end of the scope (pretty common), then it asks before setting
>> assigning to null. If the compiler can't determine, at all, which refs
>> will be pointing to a particular local-var, it falls back to regular
>> ref-counting (for the var in question).
> [...]
>
> Hmm. So you're essentially saying that the compiler will do compile-time
> analysis of variable usage, and based on that choose the simplest memory
> management model for it?
>
> That's not a bad idea. You can have a series of increasingly complex
> management techniques, and the compiler could choose the least complex
> one that it can statically prove is safe. So if something can be proven
> not to escape the current scope, it can even be stack-allocated (or if
> it's big, put it on the manual heap and delete it on scope exit). If it
> is returned with no other references to it, then it can be a
> single-reference pointer (i.e. your owner type). If it has cyclic
> references, use a reference-counted pointer. If all else fails, fall
> back to the GC.
>
> The problem is that quite often, the compiler will not be able to prove
> much about the data structure, esp. when you have non-trivial
> manipulation of pointers -- it requires solving the halting problem to
> achieve that level of analysis on arbitrary code -- so it will probably
> fall back to the GC quite often, depending on what the code does.
>
>
> T
>


I see how this could work in a JIT environment, but not in a static 
compiler with modules compiled in separate ways.

--
Paulo
January 22, 2013
Re: Memory Safety without a GC or Ref Counting
On 2013-01-21 19:22:04 +0000, Jacob Carlborg <doob@me.com> said:

> On 2013-01-21 18:37, F i L wrote:
>> So I've been thinking about this problem off and on for awhile, and I
>> think I have a rather simple solution. I highly doubt I'm the first to
>> think of it, but haven't found any info about a similar idea on the net.
>> So in the interest of science I'm posting it here so anyone interested
>> can attempt to poke holes in the idea, which you folks have been so good
>> at in the past.
> 
> Seems a bit like ARC (Automatic Reference Counting) that Clang has 
> implemented for Objective-C (and some C libraries on Mac OS X). The 
> compiler automatically inserts calls to "retain" and "release" where 
> appropriate.

And I'll confirm that it's no magic either. Avoiding cyclic references 
isn't that easy when you start mixing implicitly retained pointers with 
Objective-C's blocks (lambas in C++ or delegate literals in D).

I'm a little sad Apple decided to deprecate its Objective-C garbage 
collector. Having the choice between GC and reference counting was 
great (although it surely sucks for library developers who need to 
support both modes).

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/
January 22, 2013
Re: Memory Safety without a GC or Ref Counting
On Mon, Jan 21, 2013 at 11:36:58PM +0100, Paulo Pinto wrote:
> Am 21.01.2013 22:25, schrieb H. S. Teoh:
[...]
> >Hmm. So you're essentially saying that the compiler will do
> >compile-time analysis of variable usage, and based on that choose the
> >simplest memory management model for it?
> >
> >That's not a bad idea. You can have a series of increasingly complex
> >management techniques, and the compiler could choose the least
> >complex one that it can statically prove is safe. So if something can
> >be proven not to escape the current scope, it can even be
> >stack-allocated (or if it's big, put it on the manual heap and delete
> >it on scope exit). If it is returned with no other references to it,
> >then it can be a single-reference pointer (i.e. your owner type). If
> >it has cyclic references, use a reference-counted pointer. If all
> >else fails, fall back to the GC.
> >
> >The problem is that quite often, the compiler will not be able to
> >prove much about the data structure, esp. when you have non-trivial
> >manipulation of pointers -- it requires solving the halting problem
> >to achieve that level of analysis on arbitrary code -- so it will
> >probably fall back to the GC quite often, depending on what the code
> >does.
[...]
> I see how this could work in a JIT environment, but not in a static
> compiler with modules compiled in separate ways.
[...]

It can be made to work with separately-compiled modules if there's
language support (say in the form of exported pointer attributes). But
probably not feasible in the current version of D.


T

-- 
Designer clothes: how to cover less by paying more.
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home