View mode: basic / threaded / horizontal-split · Log in · Help
April 03, 2011
Array slices and copy on write
D will copy the original content if a slice expands, but is the following behavior
implementation specific, or part of the specification?

	auto a = [0,1,2];
	auto b = a[0..2];
	a.length = 2; // is a[3] already marked for collection by the gc?
	assert(a.ptr == b.ptr);
	b.length += 1; // as b cannot reuse it even though noone is using it
	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
April 03, 2011
Re: Array slices and copy on write
On 04/03/2011 02:29 PM, simendsjo wrote:
> D will copy the original content if a slice expands, but is the following behavior
> implementation specific, or part of the specification?
>
> 	auto a = [0,1,2];
> 	auto b = a[0..2];
> 	a.length = 2; // is a[3] already marked for collection by the gc?
> 	assert(a.ptr == b.ptr);
> 	b.length += 1; // as b cannot reuse it even though noone is using it
> 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

As I understand it (there is no real spec for D), this is consistent with the 
often commented design of D slices.
People often explain that slices are like views on (implicite or explicit) 
ordinary arrays. Then, change on either is seen by both, just like references 
--*except* when change require resizing.
Another point of view is that slices somewhat work like copy-on-write: there is 
no actual copy until one variable value is changed, then only copy happens 
--expect when change can happen on place.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com
April 03, 2011
Re: Array slices and copy on write
On 03.04.2011 14:55, spir wrote:
> On 04/03/2011 02:29 PM, simendsjo wrote:
>> D will copy the original content if a slice expands, but is the
>> following behavior
>> implementation specific, or part of the specification?
>>
>> auto a = [0,1,2];
>> auto b = a[0..2];
>> a.length = 2; // is a[3] already marked for collection by the gc?
>> assert(a.ptr == b.ptr);
>> b.length += 1; // as b cannot reuse it even though noone is using it
>> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
>
> As I understand it (there is no real spec for D), this is consistent
> with the often commented design of D slices.
> People often explain that slices are like views on (implicite or
> explicit) ordinary arrays. Then, change on either is seen by both, just
> like references --*except* when change require resizing.
> Another point of view is that slices somewhat work like copy-on-write:
> there is no actual copy until one variable value is changed, then only
> copy happens --expect when change can happen on place.
>
> Denis

I know the dangers of slices and copy on write. I was thinking of a very 
specific example. I'll try to explain better.

auto a = [0,1,2];
auto b = a[0..2];
a.length = 2;
// a[3] should now not be referenced, but my guess is it's not collected 
by the gc
assert(a.ptr == b.ptr);
b.length += 1;
// b could potentionally use the last part of a instead of reallocating
// Could another implementation reuse a's old memory without reallocating?
April 03, 2011
Re: Array slices and copy on write
On 04/03/2011 04:29 PM, simendsjo wrote:
> On 03.04.2011 14:55, spir wrote:
>> On 04/03/2011 02:29 PM, simendsjo wrote:
>>> D will copy the original content if a slice expands, but is the
>>> following behavior
>>> implementation specific, or part of the specification?
>>>
>>> auto a = [0,1,2];
>>> auto b = a[0..2];
>>> a.length = 2; // is a[3] already marked for collection by the gc?
>>> assert(a.ptr == b.ptr);
>>> b.length += 1; // as b cannot reuse it even though noone is using it
>>> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
>>
>> As I understand it (there is no real spec for D), this is consistent
>> with the often commented design of D slices.
>> People often explain that slices are like views on (implicite or
>> explicit) ordinary arrays. Then, change on either is seen by both, just
>> like references --*except* when change require resizing.
>> Another point of view is that slices somewhat work like copy-on-write:
>> there is no actual copy until one variable value is changed, then only
>> copy happens --expect when change can happen on place.

I was trying to explain my views on D's slice semantics, thus showing how/why 
the above behaviour matches it.

> I know the dangers of slices and copy on write. I was thinking of a very
> specific example. I'll try to explain better.
>
> auto a = [0,1,2];
> auto b = a[0..2];
> a.length = 2;
> // a[3] should now not be referenced, but my guess is it's not collected by the gc
> assert(a.ptr == b.ptr);
> b.length += 1;
> // b could potentionally use the last part of a instead of reallocating
> // Could another implementation reuse a's old memory without reallocating?

I think this is how Go does slices (their cap(acity) extends to the end of the 
original array). For me, this would not match D's semantics, but since there is 
no spec AFAIK...

Denis
-- 
_________________
vita es estrany
spir.wikidot.com
April 03, 2011
Re: Array slices and copy on write
On 2011-04-03 07:29, simendsjo wrote:
> On 03.04.2011 14:55, spir wrote:
> > On 04/03/2011 02:29 PM, simendsjo wrote:
> >> D will copy the original content if a slice expands, but is the
> >> following behavior
> >> implementation specific, or part of the specification?
> >> 
> >> auto a = [0,1,2];
> >> auto b = a[0..2];
> >> a.length = 2; // is a[3] already marked for collection by the gc?
> >> assert(a.ptr == b.ptr);
> >> b.length += 1; // as b cannot reuse it even though noone is using it
> >> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
> > 
> > As I understand it (there is no real spec for D), this is consistent
> > with the often commented design of D slices.
> > People often explain that slices are like views on (implicite or
> > explicit) ordinary arrays. Then, change on either is seen by both, just
> > like references --*except* when change require resizing.
> > Another point of view is that slices somewhat work like copy-on-write:
> > there is no actual copy until one variable value is changed, then only
> > copy happens --expect when change can happen on place.
> > 
> > Denis
> 
> I know the dangers of slices and copy on write. I was thinking of a very
> specific example. I'll try to explain better.
> 
> auto a = [0,1,2];
> auto b = a[0..2];
> a.length = 2;
> // a[3] should now not be referenced, but my guess is it's not collected
> by the gc
> assert(a.ptr == b.ptr);
> b.length += 1;
> // b could potentionally use the last part of a instead of reallocating
> // Could another implementation reuse a's old memory without reallocating?

What the GC does and doesn't choose to collect is implementation-specific and 
doesn't necessarily have anything to do with the compiler. Also, remember that 
memory allocation is done in blocks, so there's pretty much no way that the 
last element of an array would be collected simply because you shrunk the 
length of the array by one. And you may not want it to anyway. There is at 
least a chance that b.length += 1 would cause b to use the now unreferenced 
3rd element of a, so giving that one element's worth of memory to the OS could 
be very ineffecient. Regardless, whether b.length +=1 manages to avoid a 
reallocation is implementation-dependent.

You should _not_ generally rely on when you are or aren't going to get a 
memory reallocation with an array. If you want to guarantee that a 
reallocation takes place, use dup or idup. If you want to guarantee that one 
doesn't take place, then don't increase the size of any array/slice which 
points to a particular block of memory. Beyond that, you don't really have any 
guarantees about reallocation. druntime will guarantee that you don't end up 
with your arrays stomping on each other, and it will try and minimize 
reallocations, but how it does all that is implementation-dependent.

- Jonathan M  Davis
April 04, 2011
Re: Array slices and copy on write
On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo <simen.endsjo@pandavre.com>  
wrote:

> D will copy the original content if a slice expands, but is the  
> following behavior
> implementation specific, or part of the specification?
>
> 	auto a = [0,1,2];
> 	auto b = a[0..2];
> 	a.length = 2; // is a[3] already marked for collection by the gc?
> 	assert(a.ptr == b.ptr);
> 	b.length += 1; // as b cannot reuse it even though noone is using it
> 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

There are two incorrect assumptions here:

1. The D array runtime code cannot possibly know how many references there  
are to a certain memory block, so your assertion that a[3] is unreferenced  
is not something that the compiler or runtime can know without horrendous  
performance implications.
2. The D GC collects entire memory blocks, not portions of them.  There is  
no way for the GC to "collect" a[3].  It can only collect the entire  
memory block which contains a.

The array runtime code does it's best effort to expand an array into its  
own memory block.  Because slicing an array or copying an array reference  
is a very fast, efficient event, it is not hooked by the array runtime  
code.  Therefore, the array runtime code cannot know all the array slices  
that point to the memory block.  The assumption it MUST make is that  
somebody somewhere has a pointer to the data it considers valid.  If you  
think about it, the bookkeeping necessary to track how many  
pointers/arrays point to each byte of an array would dwarf the cost of  
storing and using the array in the first place.

You must remember, copy-on-expansion (note, copy-on-write is not the right  
term, that implies changing any data in the array makes a copy), or rather  
the lack of copying, is an optimization.  In most cases, you should expect  
expansion can make a copy, but you shouldn't rely on that behavior.

For those cases where you want to *force* the runtime to consider data not  
referenced by an array as being 'free', you can use assumeSafeAppend:

b.assumeSafeAppend(); // marks b[3] as being unused

b.length += 1; // no copy made.

-Steve
April 04, 2011
Re: Array slices and copy on write
On 04.04.2011 15:53, Steven Schveighoffer wrote:
> On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo
> <simen.endsjo@pandavre.com> wrote:
>
>> D will copy the original content if a slice expands, but is the
>> following behavior
>> implementation specific, or part of the specification?
>>
>> auto a = [0,1,2];
>> auto b = a[0..2];
>> a.length = 2; // is a[3] already marked for collection by the gc?
>> assert(a.ptr == b.ptr);
>> b.length += 1; // as b cannot reuse it even though noone is using it
>> assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
>
> There are two incorrect assumptions here:
>
> 1. The D array runtime code cannot possibly know how many references
> there are to a certain memory block, so your assertion that a[3] is
> unreferenced is not something that the compiler or runtime can know
> without horrendous performance implications.

Ah, of course. Makes sense.

> 2. The D GC collects entire memory blocks, not portions of them. There
> is no way for the GC to "collect" a[3]. It can only collect the entire
> memory block which contains a.

Given the above, I guessed so.
Then you must copy data to a new array manually in case you want to free 
excess memory..?

int[] a;
a.length = 10_000_000; // my initial guess
// fill a
a.length = 100; // only 100 used. GC won't ever free the memory not used
auto b = a[0..100].dup; // copy
a = null; // and destroy

Or is it possible to mark part this for the GC as unused?


> The array runtime code does it's best effort to expand an array into its
> own memory block. Because slicing an array or copying an array reference
> is a very fast, efficient event, it is not hooked by the array runtime
> code. Therefore, the array runtime code cannot know all the array slices
> that point to the memory block. The assumption it MUST make is that
> somebody somewhere has a pointer to the data it considers valid. If you
> think about it, the bookkeeping necessary to track how many
> pointers/arrays point to each byte of an array would dwarf the cost of
> storing and using the array in the first place.

Makes sense.

> You must remember, copy-on-expansion (note, copy-on-write is not the
> right term, that implies changing any data in the array makes a copy),
> or rather the lack of copying, is an optimization. In most cases, you
> should expect expansion can make a copy, but you shouldn't rely on that
> behavior.

Copy on expansion is a much better term, yes.

> For those cases where you want to *force* the runtime to consider data
> not referenced by an array as being 'free', you can use assumeSafeAppend:
>
> b.assumeSafeAppend(); // marks b[3] as being unused
>
> b.length += 1; // no copy made.
>
> -Steve

I cannot find assumeSafeAppend.

Thanks a lot for your very thorough and detailed answer. Cleared up a 
couple of blind spots.
April 04, 2011
Re: Array slices and copy on write
On Mon, 04 Apr 2011 14:30:34 -0400, simendsjo <simen.endsjo@pandavre.com>  
wrote:

> On 04.04.2011 15:53, Steven Schveighoffer wrote:
>> 2. The D GC collects entire memory blocks, not portions of them. There
>> is no way for the GC to "collect" a[3]. It can only collect the entire
>> memory block which contains a.
>
> Given the above, I guessed so.
> Then you must copy data to a new array manually in case you want to free  
> excess memory..?
>
> int[] a;
> a.length = 10_000_000; // my initial guess
> // fill a
> a.length = 100; // only 100 used. GC won't ever free the memory not used
> auto b = a[0..100].dup; // copy
> a = null; // and destroy\

-or-

a = a[0..100].dup;

Yes, the GC/array code does not move the array to a smaller block when it  
could (note in the example above, the array code again cannot know that  
nothing has a reference to the full 10,000,000 elements).  Simply because  
the array runtime cannot know what you intend to do with that array.  It  
has that block allocated, it's not going to spend extra cycles trying to  
optimize storage for it just in case that's what you wanted.

People who shrink an array down to 100 then build it back up would  
(rightfully) be complaining if the runtime reallocated to smaller blocks  
on shrinking.

> Or is it possible to mark part this for the GC as unused?

The way the GC works, it *could* do this for arrays that span multiple  
pages (i.e. give back some of the pages to the GC) without moving the  
data, but there is no mechanism that I know of to do that.  Re-allocating  
is a good enough solution, I guess nobody has yet had a very good need to  
do anything else.

>> For those cases where you want to *force* the runtime to consider data
>> not referenced by an array as being 'free', you can use  
>> assumeSafeAppend:
>>
>> b.assumeSafeAppend(); // marks b[3] as being unused
>>
>> b.length += 1; // no copy made.
>
> I cannot find assumeSafeAppend.

It's in object.di (along with the documentation for it):

http://www.digitalmars.com/d/2.0/phobos/object.html#assumeSafeAppend

> Thanks a lot for your very thorough and detailed answer. Cleared up a  
> couple of blind spots.

No problem.  Arrays are a very tricky business in D, and I'm still finding  
new ways of looking at them :)  Having modified most of the array append  
code, I can say they are certainly one of the more deceptively complex  
parts of the runtime.

-Steve
Top | Discussion index | About this forum | D home