Thread overview
Array slices and copy on write
Apr 03, 2011
simendsjo
Apr 03, 2011
spir
Apr 03, 2011
simendsjo
Apr 03, 2011
spir
Apr 03, 2011
Jonathan M Davis
Apr 04, 2011
simendsjo
April 03, 2011
D will copy the original content if a slice expands, but is the following behavior implementation specific, or part of the specification?

	auto a = [0,1,2];
	auto b = a[0..2];
	a.length = 2; // is a[3] already marked for collection by the gc?
	assert(a.ptr == b.ptr);
	b.length += 1; // as b cannot reuse it even though noone is using it
	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
April 03, 2011
On 04/03/2011 02:29 PM, simendsjo wrote:
> D will copy the original content if a slice expands, but is the following behavior
> implementation specific, or part of the specification?
>
> 	auto a = [0,1,2];
> 	auto b = a[0..2];
> 	a.length = 2; // is a[3] already marked for collection by the gc?
> 	assert(a.ptr == b.ptr);
> 	b.length += 1; // as b cannot reuse it even though noone is using it
> 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

As I understand it (there is no real spec for D), this is consistent with the often commented design of D slices.
People often explain that slices are like views on (implicite or explicit) ordinary arrays. Then, change on either is seen by both, just like references --*except* when change require resizing.
Another point of view is that slices somewhat work like copy-on-write: there is no actual copy until one variable value is changed, then only copy happens --expect when change can happen on place.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

April 03, 2011
On 03.04.2011 14:55, spir wrote:
> On 04/03/2011 02:29 PM, simendsjo wrote:
>> D will copy the original content if a slice expands, but is the
>> following behavior
>> implementation specific, or part of the specification?
>>
>> auto a = [0,1,2];
>> auto b = a[0..2];
>> a.length = 2; // is a[3] already marked for collection by the gc?
>> assert(a.ptr == b.ptr);
>> b.length += 1; // as b cannot reuse it even though noone is using it
>> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
>
> As I understand it (there is no real spec for D), this is consistent
> with the often commented design of D slices.
> People often explain that slices are like views on (implicite or
> explicit) ordinary arrays. Then, change on either is seen by both, just
> like references --*except* when change require resizing.
> Another point of view is that slices somewhat work like copy-on-write:
> there is no actual copy until one variable value is changed, then only
> copy happens --expect when change can happen on place.
>
> Denis

I know the dangers of slices and copy on write. I was thinking of a very specific example. I'll try to explain better.

auto a = [0,1,2];
auto b = a[0..2];
a.length = 2;
// a[3] should now not be referenced, but my guess is it's not collected by the gc
assert(a.ptr == b.ptr);
b.length += 1;
// b could potentionally use the last part of a instead of reallocating
// Could another implementation reuse a's old memory without reallocating?

April 03, 2011
On 04/03/2011 04:29 PM, simendsjo wrote:
> On 03.04.2011 14:55, spir wrote:
>> On 04/03/2011 02:29 PM, simendsjo wrote:
>>> D will copy the original content if a slice expands, but is the
>>> following behavior
>>> implementation specific, or part of the specification?
>>>
>>> auto a = [0,1,2];
>>> auto b = a[0..2];
>>> a.length = 2; // is a[3] already marked for collection by the gc?
>>> assert(a.ptr == b.ptr);
>>> b.length += 1; // as b cannot reuse it even though noone is using it
>>> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
>>
>> As I understand it (there is no real spec for D), this is consistent
>> with the often commented design of D slices.
>> People often explain that slices are like views on (implicite or
>> explicit) ordinary arrays. Then, change on either is seen by both, just
>> like references --*except* when change require resizing.
>> Another point of view is that slices somewhat work like copy-on-write:
>> there is no actual copy until one variable value is changed, then only
>> copy happens --expect when change can happen on place.

I was trying to explain my views on D's slice semantics, thus showing how/why the above behaviour matches it.

> I know the dangers of slices and copy on write. I was thinking of a very
> specific example. I'll try to explain better.
>
> auto a = [0,1,2];
> auto b = a[0..2];
> a.length = 2;
> // a[3] should now not be referenced, but my guess is it's not collected by the gc
> assert(a.ptr == b.ptr);
> b.length += 1;
> // b could potentionally use the last part of a instead of reallocating
> // Could another implementation reuse a's old memory without reallocating?

I think this is how Go does slices (their cap(acity) extends to the end of the original array). For me, this would not match D's semantics, but since there is no spec AFAIK...

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

April 03, 2011
On 2011-04-03 07:29, simendsjo wrote:
> On 03.04.2011 14:55, spir wrote:
> > On 04/03/2011 02:29 PM, simendsjo wrote:
> >> D will copy the original content if a slice expands, but is the
> >> following behavior
> >> implementation specific, or part of the specification?
> >> 
> >> auto a = [0,1,2];
> >> auto b = a[0..2];
> >> a.length = 2; // is a[3] already marked for collection by the gc?
> >> assert(a.ptr == b.ptr);
> >> b.length += 1; // as b cannot reuse it even though noone is using it
> >> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
> > 
> > As I understand it (there is no real spec for D), this is consistent
> > with the often commented design of D slices.
> > People often explain that slices are like views on (implicite or
> > explicit) ordinary arrays. Then, change on either is seen by both, just
> > like references --*except* when change require resizing.
> > Another point of view is that slices somewhat work like copy-on-write:
> > there is no actual copy until one variable value is changed, then only
> > copy happens --expect when change can happen on place.
> > 
> > Denis
> 
> I know the dangers of slices and copy on write. I was thinking of a very specific example. I'll try to explain better.
> 
> auto a = [0,1,2];
> auto b = a[0..2];
> a.length = 2;
> // a[3] should now not be referenced, but my guess is it's not collected
> by the gc
> assert(a.ptr == b.ptr);
> b.length += 1;
> // b could potentionally use the last part of a instead of reallocating
> // Could another implementation reuse a's old memory without reallocating?

What the GC does and doesn't choose to collect is implementation-specific and doesn't necessarily have anything to do with the compiler. Also, remember that memory allocation is done in blocks, so there's pretty much no way that the last element of an array would be collected simply because you shrunk the length of the array by one. And you may not want it to anyway. There is at least a chance that b.length += 1 would cause b to use the now unreferenced 3rd element of a, so giving that one element's worth of memory to the OS could be very ineffecient. Regardless, whether b.length +=1 manages to avoid a reallocation is implementation-dependent.

You should _not_ generally rely on when you are or aren't going to get a memory reallocation with an array. If you want to guarantee that a reallocation takes place, use dup or idup. If you want to guarantee that one doesn't take place, then don't increase the size of any array/slice which points to a particular block of memory. Beyond that, you don't really have any guarantees about reallocation. druntime will guarantee that you don't end up with your arrays stomping on each other, and it will try and minimize reallocations, but how it does all that is implementation-dependent.

- Jonathan M  Davis
April 04, 2011
On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo <simen.endsjo@pandavre.com> wrote:

> D will copy the original content if a slice expands, but is the following behavior
> implementation specific, or part of the specification?
>
> 	auto a = [0,1,2];
> 	auto b = a[0..2];
> 	a.length = 2; // is a[3] already marked for collection by the gc?
> 	assert(a.ptr == b.ptr);
> 	b.length += 1; // as b cannot reuse it even though noone is using it
> 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

There are two incorrect assumptions here:

1. The D array runtime code cannot possibly know how many references there are to a certain memory block, so your assertion that a[3] is unreferenced is not something that the compiler or runtime can know without horrendous performance implications.
2. The D GC collects entire memory blocks, not portions of them.  There is no way for the GC to "collect" a[3].  It can only collect the entire memory block which contains a.

The array runtime code does it's best effort to expand an array into its own memory block.  Because slicing an array or copying an array reference is a very fast, efficient event, it is not hooked by the array runtime code.  Therefore, the array runtime code cannot know all the array slices that point to the memory block.  The assumption it MUST make is that somebody somewhere has a pointer to the data it considers valid.  If you think about it, the bookkeeping necessary to track how many pointers/arrays point to each byte of an array would dwarf the cost of storing and using the array in the first place.

You must remember, copy-on-expansion (note, copy-on-write is not the right term, that implies changing any data in the array makes a copy), or rather the lack of copying, is an optimization.  In most cases, you should expect expansion can make a copy, but you shouldn't rely on that behavior.

For those cases where you want to *force* the runtime to consider data not referenced by an array as being 'free', you can use assumeSafeAppend:

b.assumeSafeAppend(); // marks b[3] as being unused

b.length += 1; // no copy made.

-Steve
April 04, 2011
On 04.04.2011 15:53, Steven Schveighoffer wrote:
> On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo
> <simen.endsjo@pandavre.com> wrote:
>
>> D will copy the original content if a slice expands, but is the
>> following behavior
>> implementation specific, or part of the specification?
>>
>> auto a = [0,1,2];
>> auto b = a[0..2];
>> a.length = 2; // is a[3] already marked for collection by the gc?
>> assert(a.ptr == b.ptr);
>> b.length += 1; // as b cannot reuse it even though noone is using it
>> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
>
> There are two incorrect assumptions here:
>
> 1. The D array runtime code cannot possibly know how many references
> there are to a certain memory block, so your assertion that a[3] is
> unreferenced is not something that the compiler or runtime can know
> without horrendous performance implications.

Ah, of course. Makes sense.

> 2. The D GC collects entire memory blocks, not portions of them. There
> is no way for the GC to "collect" a[3]. It can only collect the entire
> memory block which contains a.

Given the above, I guessed so.
Then you must copy data to a new array manually in case you want to free excess memory..?

int[] a;
a.length = 10_000_000; // my initial guess
// fill a
a.length = 100; // only 100 used. GC won't ever free the memory not used
auto b = a[0..100].dup; // copy
a = null; // and destroy

Or is it possible to mark part this for the GC as unused?


> The array runtime code does it's best effort to expand an array into its
> own memory block. Because slicing an array or copying an array reference
> is a very fast, efficient event, it is not hooked by the array runtime
> code. Therefore, the array runtime code cannot know all the array slices
> that point to the memory block. The assumption it MUST make is that
> somebody somewhere has a pointer to the data it considers valid. If you
> think about it, the bookkeeping necessary to track how many
> pointers/arrays point to each byte of an array would dwarf the cost of
> storing and using the array in the first place.

Makes sense.

> You must remember, copy-on-expansion (note, copy-on-write is not the
> right term, that implies changing any data in the array makes a copy),
> or rather the lack of copying, is an optimization. In most cases, you
> should expect expansion can make a copy, but you shouldn't rely on that
> behavior.

Copy on expansion is a much better term, yes.

> For those cases where you want to *force* the runtime to consider data
> not referenced by an array as being 'free', you can use assumeSafeAppend:
>
> b.assumeSafeAppend(); // marks b[3] as being unused
>
> b.length += 1; // no copy made.
>
> -Steve

I cannot find assumeSafeAppend.

Thanks a lot for your very thorough and detailed answer. Cleared up a couple of blind spots.
April 04, 2011
On Mon, 04 Apr 2011 14:30:34 -0400, simendsjo <simen.endsjo@pandavre.com> wrote:

> On 04.04.2011 15:53, Steven Schveighoffer wrote:
>> 2. The D GC collects entire memory blocks, not portions of them. There
>> is no way for the GC to "collect" a[3]. It can only collect the entire
>> memory block which contains a.
>
> Given the above, I guessed so.
> Then you must copy data to a new array manually in case you want to free excess memory..?
>
> int[] a;
> a.length = 10_000_000; // my initial guess
> // fill a
> a.length = 100; // only 100 used. GC won't ever free the memory not used
> auto b = a[0..100].dup; // copy
> a = null; // and destroy\

-or-

a = a[0..100].dup;

Yes, the GC/array code does not move the array to a smaller block when it could (note in the example above, the array code again cannot know that nothing has a reference to the full 10,000,000 elements).  Simply because the array runtime cannot know what you intend to do with that array.  It has that block allocated, it's not going to spend extra cycles trying to optimize storage for it just in case that's what you wanted.

People who shrink an array down to 100 then build it back up would (rightfully) be complaining if the runtime reallocated to smaller blocks on shrinking.

> Or is it possible to mark part this for the GC as unused?

The way the GC works, it *could* do this for arrays that span multiple pages (i.e. give back some of the pages to the GC) without moving the data, but there is no mechanism that I know of to do that.  Re-allocating is a good enough solution, I guess nobody has yet had a very good need to do anything else.

>> For those cases where you want to *force* the runtime to consider data
>> not referenced by an array as being 'free', you can use assumeSafeAppend:
>>
>> b.assumeSafeAppend(); // marks b[3] as being unused
>>
>> b.length += 1; // no copy made.
>
> I cannot find assumeSafeAppend.

It's in object.di (along with the documentation for it):

http://www.digitalmars.com/d/2.0/phobos/object.html#assumeSafeAppend

> Thanks a lot for your very thorough and detailed answer. Cleared up a couple of blind spots.

No problem.  Arrays are a very tricky business in D, and I'm still finding new ways of looking at them :)  Having modified most of the array append code, I can say they are certainly one of the more deceptively complex parts of the runtime.

-Steve