View mode: basic / threaded / horizontal-split · Log in · Help
April 15, 2012
Orphan ranges
I'm making good progress on an allocator design. If things come together 
as I hope, it'll kick some serious ass.

I'm currently looking at four allocation models:

* straight GC, safe (no deallocation)
* GC + the ability to free memory
* malloc/free, unsafe, buyer beware (STL-level safety)
* reference counted (based on either malloc or GC+free)
* region (scoped)

I need to kink out a few details, most important being - is safety part 
of the allocator or an extricable property that's a different policy? 
For now I have a related question about orphan ranges.

Consider this motivating example:

void main()
{
    int[] x;
    {
        auto b = new int[20];
        x = b[5 .. $ - 5];
    }
    ... use x ...
}

The range x is a 10-elements range that originates in a 20-element 
array. There is no safe way to access the original array again, so in a 
sense the other 10 elements are "lost". That's why I call x an orphan 
range - a range of which original container is gone.

Built-in arrays work routinely like that, and in fact the originating 
arrays are not distinguished by type in any way from their ranges (be 
they orphan or not). The question is, what should std.container do about 
orphan ranges in general? Should it allow them, disallow them, or leave 
the decision open (e.g. to be made by the allocator)? Depending on what 
way we go, the low-level design would be quite different.


Thanks,

Andrei
April 15, 2012
Re: Orphan ranges
On Sunday, 15 April 2012 at 16:34:12 UTC, Andrei Alexandrescu 
wrote:
> I'm making good progress on an allocator design. If things come 
> together as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
> I need to kink out a few details, most important being - is 
> safety part of the allocator or an extricable property that's a 
> different policy? For now I have a related question about 
> orphan ranges.
>
> Consider this motivating example:
>
> void main()
> {
>     int[] x;
>     {
>         auto b = new int[20];
>         x = b[5 .. $ - 5];
>     }
>     ... use x ...
> }
>
> The range x is a 10-elements range that originates in a 
> 20-element array. There is no safe way to access the original 
> array again, so in a sense the other 10 elements are "lost". 
> That's why I call x an orphan range - a range of which original 
> container is gone.
>
> Built-in arrays work routinely like that, and in fact the 
> originating arrays are not distinguished by type in any way 
> from their ranges (be they orphan or not). The question is, 
> what should std.container do about orphan ranges in general? 
> Should it allow them, disallow them, or leave the decision open 
> (e.g. to be made by the allocator)? Depending on what way we 
> go, the low-level design would be quite different.
>
>
> Thanks,
>
> Andrei

Wow, cool!

To flat out disallow Orphan ranges is imho too restrictive, 
especially considering we already have a safe solution, what is 
missing is an unsafe(no overhead) version.

If the design can handle safe/unsafe as a policy for a 
stack(scoped) based allocator... that would be kick ass, indeed.
April 15, 2012
Re: Orphan ranges
On 2012-04-15 18:34, Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together
> as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
> I need to kink out a few details, most important being - is safety part
> of the allocator or an extricable property that's a different policy?
> For now I have a related question about orphan ranges.
>
> Consider this motivating example:
>
> void main()
> {
> int[] x;
> {
> auto b = new int[20];
> x = b[5 .. $ - 5];
> }
> ... use x ...
> }
>
> The range x is a 10-elements range that originates in a 20-element
> array. There is no safe way to access the original array again, so in a
> sense the other 10 elements are "lost". That's why I call x an orphan
> range - a range of which original container is gone.
>
> Built-in arrays work routinely like that, and in fact the originating
> arrays are not distinguished by type in any way from their ranges (be
> they orphan or not). The question is, what should std.container do about
> orphan ranges in general? Should it allow them, disallow them, or leave
> the decision open (e.g. to be made by the allocator)? Depending on what
> way we go, the low-level design would be quite different.
>
>
> Thanks,
>
> Andrei

As you say, built-in arrays work like this. Therefore at least an 
allocator using the GC should allow this.

-- 
/Jacob Carlborg
April 15, 2012
Re: Orphan ranges
On 04/15/2012 12:34 PM, Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together
> as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
> I need to kink out a few details, most important being - is safety part
> of the allocator or an extricable property that's a different policy?
> For now I have a related question about orphan ranges.
>
> ...
> Thanks,
>
> Andrei

How about
* User-defined allocator
??

At the very least, I think we should be able to choose from the above 
strategies for a given container.  If this isn't done, then there will 
always be people wanting to re-implement the containers just to 
accomplish silly stuff like allocating a little differently.

As for safety:
I would probably want to be able to tweak the ways my containers are 
safe/unsafe.  I'd want safety by default, with the ability to choose my 
vulnerabilities.  "Safety" is a fairly vague notion, since there are a 
number of things that could cause memory corruption or undefined 
behavior.  Allowing me to decide which risk factors to take allows me to 
decide which causes of corruption/undefined behavior I want to expose 
myself to.  Once there's a sufficiently large panel of twiddley knobs 
and toggle switches, then it might make sense to create "policies" like 
"safe" vs "blazing fast".

As for orphan ranges:
I've always written under the assumption that an array like 'x' would 
keep 'b' around.  It would be nice if large amounts of memory wasted 
from orphans could be reused in some way, but not at the expense of a 
lot of speed.  If large waste from orphans sticks around, a warning in 
the docs would be nice.

Other note:
For the longest time I've thought that D should do reference counting 
for types that can be proven to have no reference cycles.  I don't 
understand why this is difficult, but even if it is it might be worth 
it.  I don't actually need it personally, but I constantly hear 
complaints about "oh, D requires garbage collection because you can't 
use most of Phobos without it".  What they really want, I bet, is 
deterministic memory management for realtime systems that does not cause 
thread contention/world stops.  Add reference counting and this will 
suddenly be doable for what I'd image would be a lot of important stuff 
in Phobos (think CoW string functions--probably reference counter 
friendly).  At this point the "reference counted" option gets subsumed 
into the "straight GC" / "GC+free" options, because the 
language/compiler would decide at compile-time which strategy to use for 
a range.

I see D's current coverage of memory management techniques looking like 
this:

- GC garbage collection: allocation for objects with complicated 
lifespans, with the caveat of complex allocation/deallocation code 
(check, not %100 featureful)

- malloc/free for mediocre-speed allocation of objects whose lifespans 
can't be easily calculated but are known by the programmer (check)

- custom allocators for when everything else fails (check, in most cases)

- memory reuse with slices/immutability which gives us an awesome 
zero-overhead solution in a number of situations (check)

- reference counting for almost-deterministic allocation of objects with 
easily calculated lifespans (**D does not cover this case!**)

- stack for fast allocation of short-lived objects (check)
April 15, 2012
Re: Orphan ranges
On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
> Built-in arrays work routinely like that, and in fact the originating arrays are
> not distinguished by type in any way from their ranges (be they orphan or not).
> The question is, what should std.container do about orphan ranges in general?
> Should it allow them, disallow them, or leave the decision open (e.g. to be made
> by the allocator)? Depending on what way we go, the low-level design would be
> quite different.

I suspect that disallowing them would be surprising behavior, as D with array 
slices supports it well.
April 15, 2012
Re: Orphan ranges
On 4/15/12 12:50 PM, Chad J wrote:
> How about
> * User-defined allocator
> ??

Yah, the allocator would be a generic parameter. Any allocator that 
supports a compatible interface with the archetypes used would work. But 
making e.g. file-backed allocators or whatnot may be difficult.

> As for safety:
> I would probably want to be able to tweak the ways my containers are
> safe/unsafe. I'd want safety by default, with the ability to choose my
> vulnerabilities. "Safety" is a fairly vague notion, since there are a
> number of things that could cause memory corruption or undefined
> behavior. Allowing me to decide which risk factors to take allows me to
> decide which causes of corruption/undefined behavior I want to expose
> myself to. Once there's a sufficiently large panel of twiddley knobs and
> toggle switches, then it might make sense to create "policies" like
> "safe" vs "blazing fast".

Makes sense.

> As for orphan ranges:
> I've always written under the assumption that an array like 'x' would
> keep 'b' around. It would be nice if large amounts of memory wasted from
> orphans could be reused in some way, but not at the expense of a lot of
> speed. If large waste from orphans sticks around, a warning in the docs
> would be nice.

So orphans should be allowed. Wonder if that should be an option or just 
"it".

> Other note:
> For the longest time I've thought that D should do reference counting
> for types that can be proven to have no reference cycles. I don't
> understand why this is difficult, but even if it is it might be worth
> it. I don't actually need it personally, but I constantly hear
> complaints about "oh, D requires garbage collection because you can't
> use most of Phobos without it". What they really want, I bet, is
> deterministic memory management for realtime systems that does not cause
> thread contention/world stops. Add reference counting and this will
> suddenly be doable for what I'd image would be a lot of important stuff
> in Phobos (think CoW string functions--probably reference counter
> friendly). At this point the "reference counted" option gets subsumed
> into the "straight GC" / "GC+free" options, because the
> language/compiler would decide at compile-time which strategy to use for
> a range.

Agreed. RC is not that difficult, and implementing it in terms of 
containers should be doable pretty nicely.

> I see D's current coverage of memory management techniques looking like
> this:
>
> - GC garbage collection: allocation for objects with complicated
> lifespans, with the caveat of complex allocation/deallocation code
> (check, not %100 featureful)
>
> - malloc/free for mediocre-speed allocation of objects whose lifespans
> can't be easily calculated but are known by the programmer (check)
>
> - custom allocators for when everything else fails (check, in most cases)
>
> - memory reuse with slices/immutability which gives us an awesome
> zero-overhead solution in a number of situations (check)
>
> - reference counting for almost-deterministic allocation of objects with
> easily calculated lifespans (**D does not cover this case!**)
>
> - stack for fast allocation of short-lived objects (check)

Sounds great. There's quite a few difficulties along the road, but I 
think the roadmap is good.


Andrei
April 16, 2012
Re: Orphan ranges
On 4/15/2012 11:43 AM, Walter Bright wrote:
> On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
>> Built-in arrays work routinely like that, and in fact the originating arrays are
>> not distinguished by type in any way from their ranges (be they orphan or not).
>> The question is, what should std.container do about orphan ranges in general?
>> Should it allow them, disallow them, or leave the decision open (e.g. to be made
>> by the allocator)? Depending on what way we go, the low-level design would be
>> quite different.
>
> I suspect that disallowing them would be surprising behavior, as D with array
> slices supports it well.
>

I'd add that unless there's a pretty compelling reason not to, the collections 
should support orphan ranges.
April 16, 2012
Re: Orphan ranges
On 15.04.2012 20:34, Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together
> as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
>
Nice overview. I believe there should adapters on top of that.
Say a pool allocator on top of any of these. Same goes for free-list 
which is arguably a limited form of pool allocator.

> I need to kink out a few details, most important being - is safety part
> of the allocator or an extricable property that's a different policy?

That was the main Q apparently. Potential answers follow.

> For now I have a related question about orphan ranges.
>
> Consider this motivating example:
>
> void main()
> {
> int[] x;
> {
> auto b = new int[20];

Imagine it's an Array!int(20) from here on. And x is a slice type for it.

> x = b[5 .. $ - 5];
> }
> ... use x ...
> }

Here, obviously the ability to use x not thinking at all of b is largely 
a property of GC allocator. I'd call it "any reference suffice" 
property, meaning that no extra effort needed to keep container alive.

Otherwise if say used refcounted scheme then Array *itself* has to be 
aware of the fact its allocator is !any_reference_suffice thus put a an 
extra reference into it's slice.

Now scale-up from this low-level issue. All of the above was written in 
the usual memory-safety-not-negotiable perspective. Call it a policy and 
enact it by default. Now kill all of extra safety hooks and call it 
unsafe-speed-comes-first policy.

The answer then in my opinion that it's *both*. Allocator tells what 
kind of guarantees it has and container then observes if it has to do 
some extra work to ensure his policy stands (e.g. add extra refs to 
allow memory safety for orphan ranges). Because by the end of day it's 
container that provides ranges and thus responsible for this decision.


-- 
Dmitry Olshansky
April 16, 2012
Re: Orphan ranges
On Sun, 15 Apr 2012 12:34:11 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> I'm making good progress on an allocator design. If things come together  
> as I hope, it'll kick some serious ass.
>
> I'm currently looking at four allocation models:
>
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)

* pool/freelist based.  Dcollections has such an allocator (Called a chunk  
allocator), which I hope would be applicable.  It's in fact the default  
allocator.

I saw you responded later that it will be a generic parameter.  While I  
support that, and it is the same model in dcollections, it leaves a lot to  
be desired.  For instance, if I have a RedBlackTree of SList!int, will  
each SList have it's own copy of an allocator?  Sure you could use a  
singleton allocator to avoid overhead, but what if I simply want only the  
SLists in my RedBlackTree to share an allocator, and not with all  
SList!int?

I've been occasionally putting some thought into that for dcollections,  
and I haven't really come up with a good solution yet, but it feels  
withink reach with some dedicated effort.  I'm interested if you have a  
solution for this in mind.

> I need to kink out a few details, most important being - is safety part  
> of the allocator or an extricable property that's a different policy?  
> For now I have a related question about orphan ranges.

It's definitely an allocator property.  For example, malloc/free is  
unsafe, there is no way to make it safe.

> Consider this motivating example:
>
> void main()
> {
>      int[] x;
>      {
>          auto b = new int[20];
>          x = b[5 .. $ - 5];
>      }
>      ... use x ...
> }
>
> The range x is a 10-elements range that originates in a 20-element  
> array. There is no safe way to access the original array again, so in a  
> sense the other 10 elements are "lost". That's why I call x an orphan  
> range - a range of which original container is gone.
>
> Built-in arrays work routinely like that, and in fact the originating  
> arrays are not distinguished by type in any way from their ranges (be  
> they orphan or not). The question is, what should std.container do about  
> orphan ranges in general? Should it allow them, disallow them, or leave  
> the decision open (e.g. to be made by the allocator)? Depending on what  
> way we go, the low-level design would be quite different.

I'd like to slightly correct your statement ;)  b is a 20-element range  
that originates in a 31-element array (b.capacity is 31).  The GC is the  
allocator and owner of the array type, which has no formal type.

But I understand the question.  and I think the situation is somewhat  
different -- most ranges will not be able to be attached spontaneously to  
different container instances.  Nor will they rely on the allocator to  
generate extra instances at will to appease the requests.  Built-in  
arrays/slices are like that.

However, I think absolutely we need orphan range support.  I think it's  
really a property of the allocator, not the container, as to whether  
orphan ranges exist.  For proof, look no further than your list of  
possible allocators at the malloc/free version.  Clearly orphan ranges are  
not implementable using that back-end, since once you lose your original  
container reference, the memory is leaked.

If we make it container policy, any container would have to disallow that  
allocator.  I think that's too restrictive, and makes the malloc/free  
allocator somewhat useless.

In terms of "allowance", I don't see how you disallow them.  All I think  
you can do is make using orphaned ranges defined behavior or not.

-Steve
April 16, 2012
Re: Orphan ranges
On Sunday, April 15, 2012 11:34:11 Andrei Alexandrescu wrote:
> I'm making good progress on an allocator design. If things come together
> as I hope, it'll kick some serious ass.
> 
> I'm currently looking at four allocation models:
> 
> * straight GC, safe (no deallocation)
> * GC + the ability to free memory
> * malloc/free, unsafe, buyer beware (STL-level safety)
> * reference counted (based on either malloc or GC+free)
> * region (scoped)
> 
> I need to kink out a few details, most important being - is safety part
> of the allocator or an extricable property that's a different policy?
> For now I have a related question about orphan ranges.
> 
> Consider this motivating example:
> 
> void main()
> {
> int[] x;
> {
> auto b = new int[20];
> x = b[5 .. $ - 5];
> }
> ... use x ...
> }
> 
> The range x is a 10-elements range that originates in a 20-element
> array. There is no safe way to access the original array again, so in a
> sense the other 10 elements are "lost". That's why I call x an orphan
> range - a range of which original container is gone.
> 
> Built-in arrays work routinely like that, and in fact the originating
> arrays are not distinguished by type in any way from their ranges (be
> they orphan or not). The question is, what should std.container do about
> orphan ranges in general? Should it allow them, disallow them, or leave
> the decision open (e.g. to be made by the allocator)? Depending on what
> way we go, the low-level design would be quite different.

So, the problem that we have is basically

void main()
{
Range r;
{
Container c = //allocated using allocator, however that works
r = c[];
}
//Container may or may not exist in memory, depending on the allocator,
//but the range does exist and may or may not be valid.
}

How is this different from an iterator or range being invalidated after a 
function is called on the container which alters its state? You could 
theoretically add plumbing to the range to keep track of whether it's valid or 
not, but we're not doing that or planning to do that with std.container are 
we? And if we're not doing that, why would we go out of our way to do so in 
the case where a custom allocator causes a container to exist for a shorter 
lifetime than it would normally?

- Jonathan M Davis
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home