April 15, 2012
I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass.

I'm currently looking at four allocation models:

* straight GC, safe (no deallocation)
* GC + the ability to free memory
* malloc/free, unsafe, buyer beware (STL-level safety)
* reference counted (based on either malloc or GC+free)
* region (scoped)

I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges.

Consider this motivating example:

void main()
{
    int[] x;
    {
        auto b = new int[20];
        x = b[5 .. $ - 5];
    }
    ... use x ...
}

The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone.

Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.


Thanks,

Andrei
April 15, 2012
On Sunday, 15 April 2012 at 16:34:12 UTC, Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
> I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges.
>
> Consider this motivating example:
>
> void main()
> {
>     int[] x;
>     {
>         auto b = new int[20];
>         x = b[5 .. $ - 5];
>     }
>     ... use x ...
> }
>
> The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone.
>
> Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.
>
>
> Thanks,
>
> Andrei

Wow, cool!

To flat out disallow Orphan ranges is imho too restrictive, especially considering we already have a safe solution, what is missing is an unsafe(no overhead) version.

If the design can handle safe/unsafe as a policy for a stack(scoped) based allocator... that would be kick ass, indeed.

April 15, 2012
On 2012-04-15 18:34, Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together
> as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
> I need to kink out a few details, most important being - is safety part
> of the allocator or an extricable property that's a different policy?
> For now I have a related question about orphan ranges.
>
> Consider this motivating example:
>
> void main()
> {
> int[] x;
> {
> auto b = new int[20];
> x = b[5 .. $ - 5];
> }
> ... use x ...
> }
>
> The range x is a 10-elements range that originates in a 20-element
> array. There is no safe way to access the original array again, so in a
> sense the other 10 elements are "lost". That's why I call x an orphan
> range - a range of which original container is gone.
>
> Built-in arrays work routinely like that, and in fact the originating
> arrays are not distinguished by type in any way from their ranges (be
> they orphan or not). The question is, what should std.container do about
> orphan ranges in general? Should it allow them, disallow them, or leave
> the decision open (e.g. to be made by the allocator)? Depending on what
> way we go, the low-level design would be quite different.
>
>
> Thanks,
>
> Andrei

As you say, built-in arrays work like this. Therefore at least an allocator using the GC should allow this.

-- 
/Jacob Carlborg
April 15, 2012
On 04/15/2012 12:34 PM, Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together
> as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
> I need to kink out a few details, most important being - is safety part
> of the allocator or an extricable property that's a different policy?
> For now I have a related question about orphan ranges.
>
> ...
> Thanks,
>
> Andrei

How about
* User-defined allocator
??

At the very least, I think we should be able to choose from the above strategies for a given container.  If this isn't done, then there will always be people wanting to re-implement the containers just to accomplish silly stuff like allocating a little differently.

As for safety:
I would probably want to be able to tweak the ways my containers are safe/unsafe.  I'd want safety by default, with the ability to choose my vulnerabilities.  "Safety" is a fairly vague notion, since there are a number of things that could cause memory corruption or undefined behavior.  Allowing me to decide which risk factors to take allows me to decide which causes of corruption/undefined behavior I want to expose myself to.  Once there's a sufficiently large panel of twiddley knobs and toggle switches, then it might make sense to create "policies" like "safe" vs "blazing fast".

As for orphan ranges:
I've always written under the assumption that an array like 'x' would keep 'b' around.  It would be nice if large amounts of memory wasted from orphans could be reused in some way, but not at the expense of a lot of speed.  If large waste from orphans sticks around, a warning in the docs would be nice.

Other note:
For the longest time I've thought that D should do reference counting for types that can be proven to have no reference cycles.  I don't understand why this is difficult, but even if it is it might be worth it.  I don't actually need it personally, but I constantly hear complaints about "oh, D requires garbage collection because you can't use most of Phobos without it".  What they really want, I bet, is deterministic memory management for realtime systems that does not cause thread contention/world stops.  Add reference counting and this will suddenly be doable for what I'd image would be a lot of important stuff in Phobos (think CoW string functions--probably reference counter friendly).  At this point the "reference counted" option gets subsumed into the "straight GC" / "GC+free" options, because the language/compiler would decide at compile-time which strategy to use for a range.

I see D's current coverage of memory management techniques looking like this:

- GC garbage collection: allocation for objects with complicated lifespans, with the caveat of complex allocation/deallocation code (check, not %100 featureful)

- malloc/free for mediocre-speed allocation of objects whose lifespans can't be easily calculated but are known by the programmer (check)

- custom allocators for when everything else fails (check, in most cases)

- memory reuse with slices/immutability which gives us an awesome zero-overhead solution in a number of situations (check)

- reference counting for almost-deterministic allocation of objects with easily calculated lifespans (**D does not cover this case!**)

- stack for fast allocation of short-lived objects (check)
April 15, 2012
On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
> Built-in arrays work routinely like that, and in fact the originating arrays are
> not distinguished by type in any way from their ranges (be they orphan or not).
> The question is, what should std.container do about orphan ranges in general?
> Should it allow them, disallow them, or leave the decision open (e.g. to be made
> by the allocator)? Depending on what way we go, the low-level design would be
> quite different.

I suspect that disallowing them would be surprising behavior, as D with array slices supports it well.

April 15, 2012
On 4/15/12 12:50 PM, Chad J wrote:
> How about
> * User-defined allocator
> ??

Yah, the allocator would be a generic parameter. Any allocator that supports a compatible interface with the archetypes used would work. But making e.g. file-backed allocators or whatnot may be difficult.

> As for safety:
> I would probably want to be able to tweak the ways my containers are
> safe/unsafe. I'd want safety by default, with the ability to choose my
> vulnerabilities. "Safety" is a fairly vague notion, since there are a
> number of things that could cause memory corruption or undefined
> behavior. Allowing me to decide which risk factors to take allows me to
> decide which causes of corruption/undefined behavior I want to expose
> myself to. Once there's a sufficiently large panel of twiddley knobs and
> toggle switches, then it might make sense to create "policies" like
> "safe" vs "blazing fast".

Makes sense.

> As for orphan ranges:
> I've always written under the assumption that an array like 'x' would
> keep 'b' around. It would be nice if large amounts of memory wasted from
> orphans could be reused in some way, but not at the expense of a lot of
> speed. If large waste from orphans sticks around, a warning in the docs
> would be nice.

So orphans should be allowed. Wonder if that should be an option or just "it".

> Other note:
> For the longest time I've thought that D should do reference counting
> for types that can be proven to have no reference cycles. I don't
> understand why this is difficult, but even if it is it might be worth
> it. I don't actually need it personally, but I constantly hear
> complaints about "oh, D requires garbage collection because you can't
> use most of Phobos without it". What they really want, I bet, is
> deterministic memory management for realtime systems that does not cause
> thread contention/world stops. Add reference counting and this will
> suddenly be doable for what I'd image would be a lot of important stuff
> in Phobos (think CoW string functions--probably reference counter
> friendly). At this point the "reference counted" option gets subsumed
> into the "straight GC" / "GC+free" options, because the
> language/compiler would decide at compile-time which strategy to use for
> a range.

Agreed. RC is not that difficult, and implementing it in terms of containers should be doable pretty nicely.

> I see D's current coverage of memory management techniques looking like
> this:
>
> - GC garbage collection: allocation for objects with complicated
> lifespans, with the caveat of complex allocation/deallocation code
> (check, not %100 featureful)
>
> - malloc/free for mediocre-speed allocation of objects whose lifespans
> can't be easily calculated but are known by the programmer (check)
>
> - custom allocators for when everything else fails (check, in most cases)
>
> - memory reuse with slices/immutability which gives us an awesome
> zero-overhead solution in a number of situations (check)
>
> - reference counting for almost-deterministic allocation of objects with
> easily calculated lifespans (**D does not cover this case!**)
>
> - stack for fast allocation of short-lived objects (check)

Sounds great. There's quite a few difficulties along the road, but I think the roadmap is good.


Andrei
April 16, 2012
On 4/15/2012 11:43 AM, Walter Bright wrote:
> On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
>> Built-in arrays work routinely like that, and in fact the originating arrays are
>> not distinguished by type in any way from their ranges (be they orphan or not).
>> The question is, what should std.container do about orphan ranges in general?
>> Should it allow them, disallow them, or leave the decision open (e.g. to be made
>> by the allocator)? Depending on what way we go, the low-level design would be
>> quite different.
>
> I suspect that disallowing them would be surprising behavior, as D with array
> slices supports it well.
>

I'd add that unless there's a pretty compelling reason not to, the collections should support orphan ranges.
April 16, 2012
On 15.04.2012 20:34, Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together
> as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
Nice overview. I believe there should adapters on top of that.
Say a pool allocator on top of any of these. Same goes for free-list which is arguably a limited form of pool allocator.

> I need to kink out a few details, most important being - is safety part
> of the allocator or an extricable property that's a different policy?

That was the main Q apparently. Potential answers follow.

> For now I have a related question about orphan ranges.
>
> Consider this motivating example:
>
> void main()
> {
> int[] x;
> {
> auto b = new int[20];

Imagine it's an Array!int(20) from here on. And x is a slice type for it.

> x = b[5 .. $ - 5];
> }
> ... use x ...
> }

Here, obviously the ability to use x not thinking at all of b is largely a property of GC allocator. I'd call it "any reference suffice" property, meaning that no extra effort needed to keep container alive.

Otherwise if say used refcounted scheme then Array *itself* has to be aware of the fact its allocator is !any_reference_suffice thus put a an extra reference into it's slice.

Now scale-up from this low-level issue. All of the above was written in the usual memory-safety-not-negotiable perspective. Call it a policy and enact it by default. Now kill all of extra safety hooks and call it unsafe-speed-comes-first policy.

The answer then in my opinion that it's *both*. Allocator tells what kind of guarantees it has and container then observes if it has to do some extra work to ensure his policy stands (e.g. add extra refs to allow memory safety for orphan ranges). Because by the end of day it's container that provides ranges and thus responsible for this decision.


-- 
Dmitry Olshansky
April 16, 2012
On Sun, 15 Apr 2012 12:34:11 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)

* pool/freelist based.  Dcollections has such an allocator (Called a chunk allocator), which I hope would be applicable.  It's in fact the default allocator.

I saw you responded later that it will be a generic parameter.  While I support that, and it is the same model in dcollections, it leaves a lot to be desired.  For instance, if I have a RedBlackTree of SList!int, will each SList have it's own copy of an allocator?  Sure you could use a singleton allocator to avoid overhead, but what if I simply want only the SLists in my RedBlackTree to share an allocator, and not with all SList!int?

I've been occasionally putting some thought into that for dcollections, and I haven't really come up with a good solution yet, but it feels withink reach with some dedicated effort.  I'm interested if you have a solution for this in mind.

> I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges.

It's definitely an allocator property.  For example, malloc/free is unsafe, there is no way to make it safe.

> Consider this motivating example:
>
> void main()
> {
>      int[] x;
>      {
>          auto b = new int[20];
>          x = b[5 .. $ - 5];
>      }
>      ... use x ...
> }
>
> The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone.
>
> Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.

I'd like to slightly correct your statement ;)  b is a 20-element range that originates in a 31-element array (b.capacity is 31).  The GC is the allocator and owner of the array type, which has no formal type.

But I understand the question.  and I think the situation is somewhat different -- most ranges will not be able to be attached spontaneously to different container instances.  Nor will they rely on the allocator to generate extra instances at will to appease the requests.  Built-in arrays/slices are like that.

However, I think absolutely we need orphan range support.  I think it's really a property of the allocator, not the container, as to whether orphan ranges exist.  For proof, look no further than your list of possible allocators at the malloc/free version.  Clearly orphan ranges are not implementable using that back-end, since once you lose your original container reference, the memory is leaked.

If we make it container policy, any container would have to disallow that allocator.  I think that's too restrictive, and makes the malloc/free allocator somewhat useless.

In terms of "allowance", I don't see how you disallow them.  All I think you can do is make using orphaned ranges defined behavior or not.

-Steve
April 16, 2012
On Sunday, April 15, 2012 11:34:11 Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass.
> 
> I'm currently looking at four allocation models:
> 
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
> 
> I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges.
> 
> Consider this motivating example:
> 
> void main()
> {
> int[] x;
> {
> auto b = new int[20];
> x = b[5 .. $ - 5];
> }
> ... use x ...
> }
> 
> The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone.
> 
> Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.

So, the problem that we have is basically

void main()
{
 Range r;
 {
 Container c = //allocated using allocator, however that works
 r = c[];
 }
 //Container may or may not exist in memory, depending on the allocator,
 //but the range does exist and may or may not be valid.
}

How is this different from an iterator or range being invalidated after a function is called on the container which alters its state? You could theoretically add plumbing to the range to keep track of whether it's valid or not, but we're not doing that or planning to do that with std.container are we? And if we're not doing that, why would we go out of our way to do so in the case where a custom allocator causes a container to exist for a shorter lifetime than it would normally?

- Jonathan M Davis
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home