April 30, 2009
On Thu, 30 Apr 2009 15:21:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Robert Jacques wrote:
>> On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
>>> In addition, there's this: suppose you have a struct containing a
>>> (mutable) array.  When you make a copy of that struct, you will almost
>>> always want to make a copy of the contained array.  Therefore value
>>> semantics should be the default, because it simplifies the most common
>>> use case.
>>  That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.
>
> The problem is what to do for the containers in Phobos.
>
> Andrei

Yes. I agree. Sorry, looking back, I interpreted what Rainer was talking about as applying to all structs and not as a supporting argument. It would have helped my understanding if the argument was concrete, i.e. about Rainer's actual experiences, and not a broad over-generalization.

Now, back on topic. I actually make heavy use of what amounts to structs containing arrays and don't want a deep copy >90% of the time. In fact, I haven't used any of the (my array-my array) deep copy functions and I only do copying when I have to convert my arrays to D style arrays or other container types. Also, D arrays are structs that contain C style arrays and I almost never dup them in final code. When I debug/prototype I do tend to use []= and dup more often. So I'd disagree that you'd almost always want to make a copy. Second, I don't think we should simplify the most common use case, but instead the majority of use cases. This is partly semantics, but partly not: everybody uses containers differently.

Also, Andrei, I would had preferred it if you had separated the discussion on arrays from Phobos containers. Many languages have arrays and a separate container library and I feel the decision of whether D's arrays should behave like Phobos' containers would be best left to after the best decision for Phobos is made.
April 30, 2009
On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
> Robert Jacques wrote:
>> On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd@eldwood.com>
>> wrote:
>>> In addition, there's this: suppose you have a struct containing a
>>> (mutable) array.  When you make a copy of that struct, you will almost
>>> always want to make a copy of the contained array.  Therefore value
>>> semantics should be the default, because it simplifies the most common
>>> use case.
>>
>> That's what struct ctors/dtors/opAssign are for. Personally, I think
>> letting the struct designer decide the behaviour is better.
>
> I hope you're not suggesting writing ctor/dtor/opAssign for every single
> struct that uses arrays as value types.

I'm wasn't _suggesting_ anything. This is how D currently works, and is how any D library you're using that has structs that uses arrays as value types, works. Being surprised at my response gives me the feeling that you've never needed to code what you're talking about. If you have, your concrete example would be helpful as a use case in this discussion.

> It's possible to write a reference wrapper around a value type.  It's
> also possible to write a value wrapper around a reference type.
> However, the former is both easier and more efficient than the latter.

Yes and no. Yes, the deep copy mixin is more complex to write the first time, but after that it's easy to use. And second, it's just as efficient as the later, since the deep copy mixin generates the same code as a value type.

April 30, 2009
Robert Jacques wrote:
> Now, back on topic. I actually make heavy use of what amounts to structs containing arrays and don't want a deep copy >90% of the time. In fact, I haven't used any of the (my array-my array) deep copy functions and I only do copying when I have to convert my arrays to D style arrays or other container types. Also, D arrays are structs that contain C style arrays and I almost never dup them in final code. When I debug/prototype I do tend to use []= and dup more often. So I'd disagree that you'd almost always want to make a copy. Second, I don't think we should simplify the most common use case, but instead the majority of use cases. This is partly semantics, but partly not: everybody uses containers differently.

It must be a matter of style. Having used both semantics extensively, I personally think one never looks back after using value containers. When you fail to pass by reference there's a performance hit and a logic bug (changes are not effected). Both are local issues that manifest immediately, close to the place of the mistake, and are trivial to fix. But undue sharing is a global, long-distance issue - one much harder to tackle.

I agree that reference semantics is what people got used to in Java and other languages. For entity types, reference semantics make a ton of sense, but I think containers are not entity classes most of the time. But I also agree that through experience one can design around undue sharing. But it should be remembered that it took years to shake off security bugs in the JDK caused by undue sharing. (One that comes to mind was a sensitive linked list returned by a system API. User code could maliciously change it to mess with the system's security manager. Solution? They returned a compulsory copy of the list.)

So I'm not sure where this leaves us. I hear two good arguments:

* Some aren't bothered by reference semantics
* We are used to reference semantics from other languages

These are good arguments, but don't quite end the discussion.

> Also, Andrei, I would had preferred it if you had separated the discussion on arrays from Phobos containers. Many languages have arrays and a separate container library and I feel the decision of whether D's arrays should behave like Phobos' containers would be best left to after the best decision for Phobos is made.

*If* we add a new array built-in type, that will fundamentally affect Phobos because Phobos will design all containers to be like the pair array-slice. If we don't add a new built-in type, then I only have Phobos to discuss.

Also something that wasn't discussed that much is the connection of whatever design we devise, with the GC. I am mightily excited by the possibility to operate without GC or with tight reference counting, and I thought many around here would share that excitement. If we go for non-gc ways of memory management, that will probably affect container design.


Andrei
April 30, 2009
Andrei Alexandrescu wrote:
> A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible.

It sounds like you’re suggesting that arrays become like all containers, and that slices become a kind of range over the array.

Functions that now take arrays (which are possibly slices) and write to them but don’t shrink or append to them, should therefore be rewritten to take the range that covers the array.

B.T.W.:  What’s the syntax for a range that accesses the entire container in index order?

—Joel Salomon
April 30, 2009
Rainer Deyke Wrote:

> Michel Fortin wrote:
> > On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd@eldwood.com> said:
> >> Andrei Alexandrescu wrote:
> >>> 3. Reference semantics
> >> I'm strongly opposed to this option.  Either of the other options would be acceptable.
> > 
> > Andrei mentioned a couple of reasons against 3, which are yours?
> 
> I agree with all of Andrei's reasons.
> 
> In addition, there's this: suppose you have a struct containing a (mutable) array.  When you make a copy of that struct, you will almost always want to make a copy of the contained array.  Therefore value semantics should be the default, because it simplifies the most common use case.

I don't think copying an array member in a struct is as automatic as you imply. We really should figure out use cases that would or would not benefit from a copy. I'm sure we can find lazy range examples where copying would be a performance killer. Also, how deeply should copying go? Does it make sense to do a shallow copy of an array of reference types? What about passing an array into a function?
May 01, 2009
On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Robert Jacques wrote:
>> On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>> It has become increasingly clear we'll need to support arrays in
>>> addition to slices.
>>  No, Andrei it hasn't. A detailed paragraph (or more) explaining we you think so should be included in the full RFC.
>
> There are several converging pieces of evidence. Each can be debated in separation, yet together they make for a rather strong case.
>
> People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended.

Actually, this is a problem due to an optimization in the current GC. Fixing it is rather trivial: simply store the maximum requested array end at either the beginning or end of the memory block.

> The idiom:
>
> array.length = n;
> array.length = 0;
> // append to array up to n...
>
> is rather disingenuous and looks odd to the casual reader.

My view is that this is an argument for a reserve function:
array.reserve = n;
// append to array up to n...

> As predicted by the "range theory" which predicates that a range will never grow, only shrink, slices should always shrink. Growing is a foreign operation for a slice. This is a big problem that goes beyond a need for purity or cleanliness - it's just terrible to have a slice stomping around, and then uncontrollably being rebound elsewhere and so on. People pass slices into functions, functions modify them and also append to them, and depending on the phase of the moon, some updates are being seen in the caller.

I agree this is a problem, but the bugs are bugs in the implementation, not design. (see above)
But the bigger [ (x ~ y) may or may not create a new array ] resulting in either unseen intended updates or seen unintended updates affects both array and slice types.  Note that with the bugs fixed, this would only occur when concatenating to the logically complete array, which is why I say it affects both an array and slice types equally. Value semantics might help though.
I understand where you're coming from with "range theory", though I don't personally see that slices are impure: I view (x~y) as creating a logically new array and returning a view of it.

> So people have defined various types (such as ArrayCreator or ArrayAppender or whatnot) that take care of that aspect. But really what we're looking at is an Array type.

An Array type doesn't address the needs of ArrayCreator/ArrayAppender. The reason ArrayAppender exists is to optimize finding the array capacity. While this can be added to an array type, it can just as easily be added to a slice type. The main reason this enhancement got shot down long ago, was that the D community thought making arrays 3 words long, instead of 2, would result in a net performance loss. Personally, I'd love to see some real numbers proving or dis-proving this.

> Another issue is that we can get (partly) away with disowned slices because of the garbage collector: whenever a slice is to be rebound, whatever chunk it was bound to is left behind and will be duly collected. If, however, we want to support other programming disciplines that exercise stricter control over memory, the "parent" arrays must become an explicit type that code can keep around and manipulate directly.

I'm a little sad to hear you say this, as both Daniel Keep and I posted sample pseudo code on how to do slice types properly in both reference counted and manual managed memory systems. To reiterate:

struct T[]{
    T* ptr;
    size_t length;
    MemInfo* memInfo;
}

struct MemInfo {
    size_t refCount; //not included if not reference counting
    size_t capacity;
}

With meminfo being located at the start of the array's memory block. (i.e. com style)

> A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible.

I'd disagree with regard to a selling point/compelling. My lab maintains an open source C++ array/slice type pair as part of a larger robotics package. It's laid outed with an ArrayBase containing all the implementation. The Array type simply extends ArrayBase and adds memory management. And Slice extends Arraybase, only adding binding capabilities. And save for memory allocation, they are identical, i.e. the container and range are interchangeable.

> (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.)

I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.
May 01, 2009
Joel C. Salomon wrote:
> Andrei Alexandrescu wrote:
>> A design that has one container type and several slice types that crawl
>> it in various ways is very compelling, and I think it will be a huge
>> selling point for D. It would be odd, not natural, to have arrays as a
>> singularly distinct container that is in fact its own range. We get away
>> with container == range for arrays partly because arrays are a simple
>> structure, but that only blurs thinking and understanding of more
>> complex containers. In fact, ranges do predict that any attempt to grow
>> a range will cause topological issues in the owning container; that's
>> why SList wisely (IMHO :o)) includes a template parameter that tells
>> whether its topology is fixed or flexible.
> 
> It sounds like you’re suggesting that arrays become like all containers,
> and that slices become a kind of range over the array.

Yes. I think that that would be a good design.

> Functions that now take arrays (which are possibly slices) and write to
> them but don’t shrink or append to them, should therefore be rewritten
> to take the range that covers the array.

To clarify: T[] is a slice. It has one array testis in the form of the infamous ~=. And it has the problems that I described because of that gender ambiguity.

Most functions manipulate ranges (and therefore slices), not full array objects. So the addition of arrays would not be disruptive.

> B.T.W.:  What’s the syntax for a range that accesses the entire
> container in index order?

To get the "default" range for a container c, you write c[]. foreach automatically calls it. For a T[], asking a = b[] is an identity function. For a T[5], it happens to do the right thing.

Allow me to bring the std.range.SListRange example again, because it's a very good example outside of arrays. An approach used e.g. in LISP is to conflate lists as containers with lists as iterators in containers. So when you pass a list into a function it could not only look and change at the elements that the list contains, but it could rewire the nodes in the list any way it wants. The root of the list itself is a bit special because it is usually passed "by value" so the caller will hold the root even if the callee would want to even change that. This is not a huge trouble for LISP as lists are the focal data structure, but the behavior comes off as a bit odd for someone who'd like to manipulate lists and other structures in uniform ways.

So there comes the notion that the container is concerned with topology: how slots "sit" in memory, how they are arranged with respect to each other etc. The exact ways to manipulate topology and the tradeoffs involved are container-specific (for example it's cheap to prepend to a list but not an array). Ranges, on the other hand, do not allow topological changes - they only have uniform primitives for accessing the elements in containers, whereas they prevent any topological changes.

So a SListRange!(int, Topology.flexible) would be a container and offer topological changes such as insertAfter. When such a list is asked for a range (via c[]) it will return a SListRange!(int, Topology.fixed). That guy will know how to move about the host list, but would be unable to exact any changes to its topology. As such, SListRange!(int, Topology.fixed) is as useable as any other forward range.

Does any of this make sense? :o)


Andrei
May 01, 2009
Robert Jacques wrote:
> On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended.
> 
> Actually, this is a problem due to an optimization in the current GC. Fixing it is rather trivial: simply store the maximum requested array end at either the beginning or end of the memory block.

How does an arbitrary slice get to the beginning/end of the memory block? I guess that's not a fat pointer anymore, it's an obese pointer.

> But the bigger [ (x ~ y) may or may not create a new array ]

x ~ y always creates a new array.

> I'm a little sad to hear you say this, as both Daniel Keep and I posted sample pseudo code on how to do slice types properly in both reference counted and manual managed memory systems.

No need to be sad. The code is obvious. What I was saying is that people actually need to get their hand on the container so they can control its scope.

>> A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible.
> 
> I'd disagree with regard to a selling point/compelling. My lab maintains an open source C++ array/slice type pair as part of a larger robotics package. It's laid outed with an ArrayBase containing all the implementation. The Array type simply extends ArrayBase and adds memory management. And Slice extends Arraybase, only adding binding capabilities. And save for memory allocation, they are identical, i.e. the container and range are interchangeable.

How about a hashtable? A tree? Are the ranges crawling them identical with the containers? Not at all.

Arrays are probably the worst example to use as an archetype for a design because they are very simple. Pretty much any design looks a bit baroque when only applied to arrays.

>> (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.)
> 
> I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.

We discussed at length. Without being an expert in .NET, I can't tell what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it.


Andrei
May 01, 2009
Andrei Alexandrescu:
> Does any of this make sense? :o)

Yes. I think problems aren't finished yet and more thinking is expected, but you are getting somewhere (I can tell it also because I am starting to understand. I am not bright. When even I understand it often means that the design is meaningful).
Regarding the foreach on the items I don't know/understand the syntactic need for the [] to scan.

Bye,
bearophile
May 01, 2009
bearophile wrote:
> Andrei Alexandrescu:
>> Does any of this make sense? :o)
> 
> Yes. I think problems aren't finished yet and more thinking is
> expected, but you are getting somewhere (I can tell it also because I
> am starting to understand. I am not bright.

Bright is only one :o).

> When even I understand it
> often means that the design is meaningful). Regarding the foreach on
> the items I don't know/understand the syntactic need for the [] to
> scan.

Oh, that's simple. A while ago people asked, how do I iterate a container (assuming the container is distinct from range). So I said, here's how I think:

Container!int c;
...
foreach (element; c.all)
{
   ...use element...
}

People said, that sucks! With opApply I don't need to append the .all thingie! So I told Walter to automatically call [] against the container. So people can now write:

foreach (element; c)
{
   ...use element...
}

All the container has to to is define opSlice() to return its "all" range.


Andrei