Thread overview
Re: Array types not treated uniformly when passed as ranges
Feb 15, 2011
Jonathan M Davis
Feb 15, 2011
Jonathan M Davis
Feb 15, 2011
spir
Feb 15, 2011
Jonathan M Davis
February 15, 2011
On Monday 14 February 2011 18:18:39 jam wrote:
> Hi all,
> 
> Just curious as to the difference in the built-in variable length array vs. the std.container.Array and fixed length arrays when it comes to using them in functions that take Ranges.
> 
> For instance the following does not compile:
> 
> import std.algorithm;
> import std.stdio;
> import std.range;
> import std.conv;
> import std.container;
> import std.array;
> 
> void main() {
> 
>     int[5] builtin_fixed;
>     int[] builtin_variable;
>     Array!(int) con_array;
> 
>     con_array.length(5);
>     builtin_variable.length = 5;
> 
>     fill(builtin_variable, 9); //ok, no error
>     isSorted(builtin_variable); //ditto
> 
>     //The following 4 statements produce errors
>     fill(builtin_fixed, 9);
>     fill(con_array, 9);
> 
>     isSorted(con_array);
>     isSorted(builtin_fixed);
> 
> }
> 
> The errors are variations on:
> 
> Error: template std.algorithm.fill(Range,Value) if
> (isForwardRange!(Range) && is(typeof(range.front = filler))) does not
> match any function template declaration
> Error: template std.algorithm.fill(Range,Value) if
> (isForwardRange!(Range) && is(typeof(range.front = filler))) cannot
> deduce template function from argument types !()(int[5LU],int)
> 
> If I change those 4 statements to:
> 
>     fill(builtin_fixed[], 9);
>     fill(con_array[], 9);
> 
>     isSorted(con_array[]);
>     isSorted(builtin_fixed[]);
> 
> effectively passing ranges (std.container.Array!(int).Array.Range in the case of con_array, and int[] for builtin_fixed) which then works as expected.  This all makes sense, and it's easy enough to write wrappers,   but I would (well and I did) expect the first way to just work.   This may just be a nitpick I guess, but being new to the language this little detour involved quite a bit of time research (not a bad thing, I did learn quite a bit in the process), but makes me wonder if I am missing something fundamental regarding when I should be using these different array types.

It's because an Array is not a range. Dynamic arrays are a bit special in that they're both a container and a range. An Array is just a container. But honestly, you wouldn't really want it to work.

When you pass a dynamic array, it creates a new array which references the old one. That means that as long as you don't increase the size of either array or assign another array to one of them, they will continue to point to the same data and altering one's elements will alter the others, since their elements are the same. However, increasing the size of either array might cause a reallocation (it might not if there's room passed the end of the array) and they would then point to two separate sets of data. This means that you can pass an array to a range-based function and it can continually use popFront or popBack to take a smaller and smaller slice of the array. The elements can be altered if that's what the algorithm does, but altering the size of the passed in array (which popFront and popBack would obviously do) does _not_ alter the size of the original array. So, it's safe for the range-based function to treat the passed in array as a range without altering the original array.

However, this won't work for containers in general. If you pass an Array, you're passing the container. The passed in Array is identical to the original (currently, Array is a struct with reference semantics thanks to internal reference counting, but soon it will become a class and more obviously have reference semantics). As such, if you alter the passed in Array, you alter the original. So, if you were to continuously pop the front off of the passed in Array, you would be continually popping the front off of the original Array. So, a function like find would actually remove the front part of your Array as it processed it, and if what you were trying to find wasn't there, you'd have an empty Array. The way to fix this is to not have Array be a range. Instead, you have a helper type which is a range over it. You get that with the slice operator.

You don't have the problem with arrays that you'd have with user-defined container types, because the semantics of arrays are a bit odd when you copy them. So, if anything were faulty, it would be the built in arrays, not user- defined container types like Array. It is quite handy to have arrays work how they work, however, so that's not likely to change.

So, if you want consistency, use [] even when using arrays. It's the same thing really. The resulting code probably isn't even any different, and you have to do that with static arrays anyway. So, it's just plain more generic to always slice when passing to a range function.

- Jonathan M Davis
February 15, 2011
On Mon, 14 Feb 2011 22:59:52 -0500, Jonathan M Davis <jmdavisProg@gmx.com> wrote:


> It's because an Array is not a range. Dynamic arrays are a bit special in that
> they're both a container and a range. An Array is just a container. But
> honestly, you wouldn't really want it to work.

Dynamic arrays are not containers.  They do not own the data they reference, they just reference that data.  In fact, the owner of the data is really the runtime.

The naming of Array makes this a difficult thing to understand.  An Array owns its data, it manages the data, creates it, destroys it, and there is no way to "slice" the Array into a smaller Array.  It's a true reference type.  A builtin array is not really a container, so it really should be named differently.  But there's no way to change that.

> However, this won't work for containers in general. If you pass an Array, you're
> passing the container. The passed in Array is identical to the original
> (currently, Array is a struct with reference semantics thanks to internal
> reference counting, but soon it will become a class and more obviously have
> reference semantics). As such, if you alter the passed in Array, you alter the
> original. So, if you were to continuously pop the front off of the passed in
> Array, you would be continually popping the front off of the original Array. So,
> a function like find would actually remove the front part of your Array as it
> processed it, and if what you were trying to find wasn't there, you'd have an
> empty Array. The way to fix this is to not have Array be a range. Instead, you
> have a helper type which is a range over it. You get that with the slice
> operator.

This is a good way to explain it.

> You don't have the problem with arrays that you'd have with user-defined
> container types, because the semantics of arrays are a bit odd when you copy
> them. So, if anything were faulty, it would be the built in arrays, not user-
> defined container types like Array. It is quite handy to have arrays work how
> they work, however, so that's not likely to change.

What is faulty is that they are called arrays.  They are slices.

-Steve
February 15, 2011
On Tuesday, February 15, 2011 06:10:57 Steven Schveighoffer wrote:
> On Mon, 14 Feb 2011 22:59:52 -0500, Jonathan M Davis <jmdavisProg@gmx.com>
> 
> wrote:
> > It's because an Array is not a range. Dynamic arrays are a bit special
> > in that
> > they're both a container and a range. An Array is just a container. But
> > honestly, you wouldn't really want it to work.
> 
> Dynamic arrays are not containers.  They do not own the data they reference, they just reference that data.  In fact, the owner of the data is really the runtime.
> 
> The naming of Array makes this a difficult thing to understand.  An Array owns its data, it manages the data, creates it, destroys it, and there is no way to "slice" the Array into a smaller Array.  It's a true reference type.  A builtin array is not really a container, so it really should be named differently.  But there's no way to change that.
> 
> > However, this won't work for containers in general. If you pass an
> > Array, you're
> > passing the container. The passed in Array is identical to the original
> > (currently, Array is a struct with reference semantics thanks to internal
> > reference counting, but soon it will become a class and more obviously
> > have
> > reference semantics). As such, if you alter the passed in Array, you
> > alter the
> > original. So, if you were to continuously pop the front off of the
> > passed in
> > Array, you would be continually popping the front off of the original
> > Array. So,
> > a function like find would actually remove the front part of your Array
> > as it
> > processed it, and if what you were trying to find wasn't there, you'd
> > have an
> > empty Array. The way to fix this is to not have Array be a range.
> > Instead, you
> > have a helper type which is a range over it. You get that with the slice
> > operator.
> 
> This is a good way to explain it.
> 
> > You don't have the problem with arrays that you'd have with user-defined
> > container types, because the semantics of arrays are a bit odd when you
> > copy
> > them. So, if anything were faulty, it would be the built in arrays, not
> > user-
> > defined container types like Array. It is quite handy to have arrays
> > work how
> > they work, however, so that's not likely to change.
> 
> What is faulty is that they are called arrays.  They are slices.

Yeah. I guess that arrays aren't really technically containers in that they don't actually own the memory that they refer to. Arrays are just a bit weird overall (not necessarily badly designed, just a bit weird). And there really is no difference in D between a slice and an array as far as arrays go, which may not be clear to everyone...

The overal result is that arrays are actually pretty easy to use and definitely very powerful, but discussing and explaining them can get a bit entertaining.

- Jonathan M Davis
February 15, 2011
On 02/15/2011 03:10 PM, Steven Schveighoffer wrote:
> On Mon, 14 Feb 2011 22:59:52 -0500, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
>
>
>> It's because an Array is not a range. Dynamic arrays are a bit special in that
>> they're both a container and a range. An Array is just a container. But
>> honestly, you wouldn't really want it to work.
>
> Dynamic arrays are not containers. They do not own the data they reference,
> they just reference that data. In fact, the owner of the data is really the
> runtime.
>
> The naming of Array makes this a difficult thing to understand. An Array owns
> its data, it manages the data, creates it, destroys it, and there is no way to
> "slice" the Array into a smaller Array. It's a true reference type. A builtin
> array is not really a container, so it really should be named differently. But
> there's no way to change that.

> [snip]

>> You don't have the problem with arrays that you'd have with user-defined
>> container types, because the semantics of arrays are a bit odd when you copy
>> them. So, if anything were faulty, it would be the built in arrays, not user-
>> defined container types like Array. It is quite handy to have arrays work how
>> they work, however, so that's not likely to change.
>
> What is faulty is that they are called arrays. They are slices.

It took me some time to really understand builtin dynamic arrays, but since then have had no issue with them. Instead, I now appreciate their "bastard" semantics/behaviour ;-) But I agree they're weird, hard to explain, and their relation to ranges maybe still lacks integration.

How would speak about static arrays, from this point of view? Do you also consider them rather slices (view upon external data?) than containers in the strict sense of the term?

denis
-- 
_________________
vita es estrany
spir.wikidot.com

February 15, 2011
On Tuesday, February 15, 2011 13:15:22 spir wrote:
> On 02/15/2011 03:10 PM, Steven Schveighoffer wrote:
> > On Mon, 14 Feb 2011 22:59:52 -0500, Jonathan M Davis <jmdavisProg@gmx.com>
wrote:
> >> It's because an Array is not a range. Dynamic arrays are a bit special in that they're both a container and a range. An Array is just a container. But honestly, you wouldn't really want it to work.
> > 
> > Dynamic arrays are not containers. They do not own the data they reference, they just reference that data. In fact, the owner of the data is really the runtime.
> > 
> > The naming of Array makes this a difficult thing to understand. An Array owns its data, it manages the data, creates it, destroys it, and there is no way to "slice" the Array into a smaller Array. It's a true reference type. A builtin array is not really a container, so it really should be named differently. But there's no way to change that.
> > 
> > [snip]
> > 
> >> You don't have the problem with arrays that you'd have with user-defined container types, because the semantics of arrays are a bit odd when you copy them. So, if anything were faulty, it would be the built in arrays, not user- defined container types like Array. It is quite handy to have arrays work how they work, however, so that's not likely to change.
> > 
> > What is faulty is that they are called arrays. They are slices.
> 
> It took me some time to really understand builtin dynamic arrays, but since then have had no issue with them. Instead, I now appreciate their "bastard" semantics/behaviour ;-) But I agree they're weird, hard to explain, and their relation to ranges maybe still lacks integration.
> 
> How would speak about static arrays, from this point of view? Do you also consider them rather slices (view upon external data?) than containers in the strict sense of the term?

No, they _are_ containers. They're value types. If they go away, their memory goes away. They _do_ own their data. That's why you have to slice them (and therefore get a dynamic array to them) when you pass them to range functions.

- Jonathan M Davis