Slices and Dynamic Arrays (page 3)

January 02, 2018

Re: Slices and Dynamic Arrays

Posted by Ali Çehreli
in reply to Jonathan M Davis

Permalink

Ali Çehreli

Posted in reply to Jonathan M Davis

Permalink

First, I'm in complete agreement with Steve on this. I wrote a response to you yesterday, which I decided to not send after counting to ten because despite being much more difficult, I see that your view can also be aggreable.

On 01/02/2018 10:02 AM, Jonathan M Davis wrote:
> On Tuesday, January 02, 2018 07:53:00 Steven Schveighoffer via Digitalmars-
> d-learn wrote:
>> On 1/1/18 12:18 AM, Jonathan M Davis wrote:
>>> A big problem with the term slice though is that it means more than just
>>> dynamic arrays - e.g. you slice a container to get a range over it, so
>>> that range is a slice of the container even though no arrays are
>>> involved at all. So, you really can't rely on the term slice meaning
>>> dynamic array. Whether it does or not depends on the context. That
>>> means that the fact that a number of folks have taken to using the term
>>> slice to mean T[] like the D Slices article talks about tends to create
>>> confusion when the context is not clear. IMHO, the D Slices article
>>> should be updated to use the correct terminology, but I don't think
>>> that the author is willing to do that.
>> The problem with all of this is that dynamic array is a defined term
>> *outside* of D [1]. And it doesn't mean exactly what D calls dynamic
>> arrays.
>>
>> This is why it's confusing to outsiders, because they are expecting the
>> same thing as a C++ std::vector, or a Java/.Net ArrayList, etc.

My view as well.

>> And D
>> "array slices" (the proper term IMO) are not the same.

Exactly!

>> I'm willing to change the article to mention "Array slices" instead of
>> just "slices", because that is a valid criticism. But I don't want to
>> change it from slices to dynamic arrays, since the whole article is
>> written around the subtle difference. I think the difference is important.
>>
>> -Steve
>>
>> [1] https://en.wikipedia.org/wiki/Dynamic_array
>
> I completely agree that the distinction between the dynamic array and the
> memory that backs it is critical to understanding the semantics when copying
> arrays around, and anyone who thinks that the dynamic array itself directly
> controls and owns the memory is certainly going to have some problems
> understanding the full semantics, but I don't agree that it's required to
> talk about the underlying GC-allocated memory buffer as being the dynamic
> array for that to be understood - especially when the dynamic array can be
> backed with other memory to begin with and still have the same semantics
> (just with a capacity of 0 and thus guaranteed reallocation upon appending
> or calling reserve). That distinction can be made just fine using the
> official D terminology.

As soon as we call it "dynamic array", I can't help but think "adding elements". Since GC is in the picture when that happens, it's essential to think GC when adding an element is involved.

Further, evident from your description it's a "slice" until you add elements because the underlying memory e.g. can be a stack-allocated fixed-length array.

For these reasons, the interface that the program is using is a "slice". Dynamic array is a different concept owned and implemented by the GC.

> I also don't agree that the way that D uses the term dynamic array
> contradicts the wikipedia article. What it describes is very much how D's
> dynamic arrays behave. It's just that D's dynamic arrays are a bit special
> in that they let the GC manage the memory instead of encapsulating it all in
> the type itself, and copying them slices the memory instead of copying it
> and thus causing an immediate reallocation like you would get with
> std::vector or treating it as a full-on reference type like Java does. But
> the semantics of what happens when you append to a D dynamic array are the
> same as appending to something like std::vector save for the fact that you
> might end up having the capacity filled sooner, because another dynamic
> array referring to the same memory grew into that space, resulting in a
> reallocation - but std::vector would have reallocated as soon as you copied
> it. So, some of the finer details get a bit confusing if you expect a
> dynamic array to behave _exactly_ like std::vector, but at a high level, the
> semantics are basically the same.

You seem to anchor your view of array slices on appending elements to them. I see them mainly as accessors into existing elements. Add to that the fact that a slice does not have instruments itself to manage its memory, it remains a slice for me. Again, dynamic array is a GC thing that works behind the scenes.

I can understand your point of view but I find it more confusing.

> On the basis that you seem to be arguing that D's dynamic arrays aren't
> really dynamic arrays, I could see someone arguing that std::vector isn't a
> dynamic array, because unlike ArrayList, it isn't a reference type and thus
> appending to the copy doesn't append to the original - or the other way
> around; ArrayList isn't a dynamic array, because appending to a "copy"
> affects the original. The semantics of what happens when copying the array
> around are secondary to what being a dynamic array actually means, much as
> they obviously have a significant effect on how you write your code. The
> critical bits are how the memory is continguous and how appending is
> amortized to O(1). The semantics of copying clearly vary considerably
> depending on the exact implementation even if you ignore what D has done.
>
> I think that your article has been a great help, and the fact that you do a
> good job of describing the distinction between T[] and the memory behind it
> is critical. I just disagree with the terminology that you used when you did
> it, and I honestly think that the terminology used has a negative effect on
> understanding and dealing with dynamic arrays backed by non-GC-allocated
> memory,

Is there really such a thing? D's dynamic arrays sit on GC-allocated memory, no?

> because the result seems to be that folks think that there's
> something different about them and how they behave (since they don't point
> to a "dynamic array" as your article uses the term), when in reality,
> there's really no difference in the semantics aside from the fact that their
> capacity is guaranteed to be 0 and thus reallocation is guaranteed upon
> appending or calling reserve, whereas for GC-backed dynamic arrays, capacity
> could be other numbers, and immediate reallocation is not guaranteed any
> more than it's guaranteed not to happen; it depends on the capacity, but you
> can always know when a reallocation is going to occur based on the capacity,
> regardless of what memory backs it.

That description gives the false impression that there are non-GC-backed dynamic arrays as if e.g. they can grow into their larger stack space like std.experimental.allocator.StackFront provides. D's dynamic arrays are not like that at all.

> Yes, if you're dealing with dynamic arrays backed by malloc-ed memory or a
> static array, you're going to have to worry about the lifetime of those
> dynamic arrays differently if you don't want @safety problems, and for
> malloc-ed memory, you're going to have to keep track of a pointer to the
> original memory so that it can be freed later,

What you're describing is a slice into manually managed dynamic array. As soon as you append to it, now there is a GC-owned dynamic array.

> since the GC won't do it for
> you, but all of the semantics of the dynamic array itself are the same.
> regardless of what memory backs it. Now, maybe that's hard enough to
> understand

Yes, it is hard to understand.

> that lots of folks would be misunderstanding that and thinking
> that GC-backed dynamic arrays are inherently different even if your article
> used the terms in the official manner, but I'm convinced that the way that
> it refers to dynamic arrays as being the memory buffer

Of course dynamic arrays cannot be just the memory buffer. There is also the behavior that comes with it, one of which being appending.

> rather than T[]
> itself makes that misunderstanding worse

I've always disaggreed with this.

> as the article as a whole clears up
> other misunderstandings.
>
> Regardless, given that the term slice is a rather overloaded term in D,
> having the article consistently use the term array slice rather than simply
> slice would be an improvement.

I completely agree that "array slice" would be better but I disagree that slice is an overused term. Array slices are the same concept as slices of elements of other container types.

>
> - Jonathan M Davis
>

Regarding your reference to how both the language and the spec use "dynamic array", given how much confusion can be on these concepts, I have no trouble accepting that the terms used in the language and the spec are somewhat arbitrary; they could have easily used "slice" at the time. So, no matter what terms they used, there would be some disagreement. In this case, you're one who is happy with the current terms in the language and the spec but I would be happy if they were changed because I think the current terms are confusing to people who are new to D.

Ali

On Tuesday, January 02, 2018 10:37:17 Ali Çehreli via Digitalmars-d-learn wrote:
> As soon as we call it "dynamic array", I can't help but think "adding elements". Since GC is in the picture when that happens, it's essential to think GC when adding an element is involved.
>
> Further, evident from your description it's a "slice" until you add elements because the underlying memory e.g. can be a stack-allocated fixed-length array.
>
> For these reasons, the interface that the program is using is a "slice". Dynamic array is a different concept owned and implemented by the GC.

Except that from the standpoint of the API, T[] _is_ the dynamic array - just like std::vector is the dynamic array and not whatever its guts are - and the semantics are the same whether it's backed by the GC or by a static array or by malloc-ed memory or whatever. Appending works exactly the same. Reallocation works the same. None of that changes based on whether the dynamic array is backed by GC-allocated memory or not. It's just that the capacity is guaranteed to be 0 if it isn't GC-allocated and so the first append operation is guaranteed to reallocate. The semantics of T[] itself don't change regardless, and most code doesn't need to care one whit about what kind of memory backs the dynamic array. No matter what memory backed it to start with, you get the same appending semantics. You get the same semantics when accessing the data. You get the same semantics when passing the dynamic array around. None of that depends on what kind of memory the dynamic array is a slice of. T[] functions as a dynamic array regardless of what memory backed it to start with, and as such, I completely agree with the spec calling it the dynamic array.

And as soon as you start talking about T[] not being a dynamic array, you get this weird situation where T[] has all of the operations and semantics of a dynamic array, but you're not calling it a dynamic array simply because it happens to be a slice of memory that wasn't GC-allocated. So, you have this type in the type system whose semantics don't care what memory currently backs it and where code will act on it identically whether it's GC-backed or not, but folks want to then act like it's something different and treat it differently just because it happens to not be GC-backed at the moment - and the same function could be called with both GC-backed and non-GC-backed dynamic arrays. The type and its semantics are the same regardless.

Of course, understanding how and when reallocation occurs matters if you want to understand the exact semantics of copying a dynamic array around or when appending or reserve is going to result in a reallocation, but that doesn't necessitate calling the GC-managed buffer the dynamic array. It just requires understanding how it's the GC that manages capacity, reserve, and appending rather than the dynamic array itself. But the API is that of a dynamic array regardless. If it weren't, you couldn't append to T[] any more than you can append to an arbitrary range. As soon as you insist on calling them slices, you're basically talking about them as if they were simply ranges rather that than the container/range hybrid that they are.

Regardless, the fact that they're a container/range hybrid is what makes this such a mess to understand. The semantics actually work fantastically if you understand them, but it sure makes understanding them annoyingly difficult.

- Jonathan M Davis

Forums