April 30, 2009
On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd@eldwood.com> said:

> Andrei Alexandrescu wrote:
>> 2. Value semantics with reference counting
> 
> I like this optimization and use it all the time in my own code, but I'm
> not convinced that it should be the default.  It's also problematic in
> multithreaded situations.
> 
> I think a generic CopyOnWrite wrapper over arbitrary value types would
> be more useful.  CopyOnWrite!(int[]).

But don't forget that D2 is going to have the shared keyword, possibly with thread-local heaps. So multithreading won't be much of a problem: if the variable is shared, the compiler should force locks as needed or use atomic operations; if the variable is local to a thread, locks should become no-ops and atomic operations regular ones, improving performance.

Or at least that's what I'm getting from the hints we currently have.


>> 3. Reference semantics
> 
> I'm strongly opposed to this option.  Either of the other options would
> be acceptable.

Andrei mentioned a couple of reasons against 3, which are yours?


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

April 30, 2009
Rainer Deyke wrote:
> Andrei Alexandrescu wrote:
>> 1. Value semantics (arrays are like int)
> 
> Don't forget:
>  + Supports timely destruction of contents (i.e. RAII).

These are collections. RAII doesn't matter when you have a garbage collector and the only resource you have is memory, unless you're quite strapped for memory.
April 30, 2009
On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>
> It has become increasingly clear we'll need to support arrays in
> addition to slices.

No, Andrei it hasn't. A detailed paragraph (or more) explaining we you think so should be included in the full RFC.

> One big question mark for me is, should we define
> for arrays (and associative arrays and containers in general) value
> semantics, reference counted value semantics, or reference semantics? It
> is not clear to me yet what the best option is. Think of arrays as an
> archetypal example.

Here are some additional pros/cons:

1. Value semantics (arrays are like int)
 +  can return static arrays from functions
 -  can be emulated with slice assign and .dup, where needed
 -  requires arrays and slices to be separate types
 +- requires opImplicitCast from container to slice type
 +- requires template argument deduction to be opImplicitCast/alias this aware
 -  If you forget to pass by reference (when you meant to), logical bugs are produced (i.e. my mutation doesn't get published).
 -  Dis-allows inheritance from all containers (i.e. implemented as structs)

I think clarifying 2 as being copy-on-write is useful.
2. Copy-on-write (Value semantics with reference counting)
 +- The added trade-offs of value semantics
 -  arrays may no longer be passed in registers (performance reduction)

3. Reference semantics
 ++ Allows arrays and slices to be the same type
 +  How D's arrays/slices work today
 +  Familiar to C/C++ refugees (Similar to plain old arrays)
 +  Familiar to most OO language refugees: containers are objects
 +  Allows inheritance from most containers (i.e. implemented as objects, except for arrays)

> - Hard to reason about: squirrel an array into an object, and way later
> and farther it modifies the array and effects a change at long distance
> - Long-distance effects may as well be some of the toughest bugs after
> memory errors
> - Potentially inefficient as people might start sprinkling copies in the
> hope the eliminate bugs or chances of bugs
> + const system helps: if a function doesn't change something just mark
> it as const and you're home free
> + immutable also helps: immutable data can never change so it can be
> shared no problem. Not sure how interesting immutable containers are though.

All these issues also exist with objects, which people are familiar with.

> - Implementing fast/unsafe semantics with reference semantics
> effectively forces reference counting to be part of containers, which
> complicates implementation.

I'm drawing a blank on this. My thinking is this;
1) garbage-collected:   reference counting isn't needed
2) reference counted:   reference counting is expected and built into objects in general.
3) "unsafe" management: safety is explicitly not expected, so reference counting isn't needed

My vote is for reference semantics. It's fast, familiar, easy to reason about (i.e. like objects) and doesn't require seperate array and slice types.

April 30, 2009
Michel Fortin wrote:
> On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd@eldwood.com> said:
>> Andrei Alexandrescu wrote:
>>> 3. Reference semantics
>> I'm strongly opposed to this option.  Either of the other options would be acceptable.
> 
> Andrei mentioned a couple of reasons against 3, which are yours?

I agree with all of Andrei's reasons.

In addition, there's this: suppose you have a struct containing a (mutable) array.  When you make a copy of that struct, you will almost always want to make a copy of the contained array.  Therefore value semantics should be the default, because it simplifies the most common use case.


-- 
Rainer Deyke - rainerd@eldwood.com
April 30, 2009
Christopher Wright wrote:
> Rainer Deyke wrote:
>> Don't forget:
>>  + Supports timely destruction of contents (i.e. RAII).
> 
> These are collections. RAII doesn't matter when you have a garbage collector and the only resource you have is memory, unless you're quite strapped for memory.

It also doesn't matter if the resource deallocation pixie deallocates all my non-memory resources the moment they are no longer being used, which is about as likely as only having memory resources in arrays.


-- 
Rainer Deyke - rainerd@eldwood.com
April 30, 2009
On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
> In addition, there's this: suppose you have a struct containing a
> (mutable) array.  When you make a copy of that struct, you will almost
> always want to make a copy of the contained array.  Therefore value
> semantics should be the default, because it simplifies the most common
> use case.

That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.
April 30, 2009
Robert Jacques wrote:
> On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
>> In addition, there's this: suppose you have a struct containing a
>> (mutable) array.  When you make a copy of that struct, you will almost
>> always want to make a copy of the contained array.  Therefore value
>> semantics should be the default, because it simplifies the most common
>> use case.
> 
> That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.

The problem is what to do for the containers in Phobos.

Andrei
April 30, 2009
Robert Jacques wrote:
> On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>
>> It has become increasingly clear we'll need to support arrays in
>> addition to slices.
> 
> No, Andrei it hasn't. A detailed paragraph (or more) explaining we you think so should be included in the full RFC.

There are several converging pieces of evidence. Each can be debated in separation, yet together they make for a rather strong case.

People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended.

The idiom:

array.length = n;
array.length = 0;
// append to array up to n...

is rather disingenuous and looks odd to the casual reader.

As predicted by the "range theory" which predicates that a range will never grow, only shrink, slices should always shrink. Growing is a foreign operation for a slice. This is a big problem that goes beyond a need for purity or cleanliness - it's just terrible to have a slice stomping around, and then uncontrollably being rebound elsewhere and so on. People pass slices into functions, functions modify them and also append to them, and depending on the phase of the moon, some updates are being seen in the caller.

So people have defined various types (such as ArrayCreator or ArrayAppender or whatnot) that take care of that aspect. But really what we're looking at is an Array type.

Another issue is that we can get (partly) away with disowned slices because of the garbage collector: whenever a slice is to be rebound, whatever chunk it was bound to is left behind and will be duly collected. If, however, we want to support other programming disciplines that exercise stricter control over memory, the "parent" arrays must become an explicit type that code can keep around and manipulate directly.

A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible.

(Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.)


Andrei
April 30, 2009
On Thu, 30 Apr 2009 23:57:17 +0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Robert Jacques wrote:
>> On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>> It has become increasingly clear we'll need to support arrays in
>>> addition to slices.
>>  No, Andrei it hasn't. A detailed paragraph (or more) explaining we you think so should be included in the full RFC.
>
> There are several converging pieces of evidence. Each can be debated in separation, yet together they make for a rather strong case.
>
> People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended.
>
> The idiom:
>
> array.length = n;
> array.length = 0;
> // append to array up to n...
>
> is rather disingenuous and looks odd to the casual reader.
>
> As predicted by the "range theory" which predicates that a range will never grow, only shrink, slices should always shrink. Growing is a foreign operation for a slice. This is a big problem that goes beyond a need for purity or cleanliness - it's just terrible to have a slice stomping around, and then uncontrollably being rebound elsewhere and so on. People pass slices into functions, functions modify them and also append to them, and depending on the phase of the moon, some updates are being seen in the caller.
>
> So people have defined various types (such as ArrayCreator or ArrayAppender or whatnot) that take care of that aspect. But really what we're looking at is an Array type.
>
> Another issue is that we can get (partly) away with disowned slices because of the garbage collector: whenever a slice is to be rebound, whatever chunk it was bound to is left behind and will be duly collected. If, however, we want to support other programming disciplines that exercise stricter control over memory, the "parent" arrays must become an explicit type that code can keep around and manipulate directly.
>
> A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible.
>
> (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.)
>
>
> Andrei

I'm all for introducing Array type, but what would using it look like? Is it a plain Array!(T), or is there any shortcut planned. One of the selling points of slices is their ease of use. T[] is freaking awesome!

I also would like to see what would Phobos look like after introducing Arrays.

A good thing is that it's quite easy start implementing Array right now - all you need is to disallow ~ and ~= operations on slices. Then, go through Phobos code and see what got broken and start fixing it using newly written Array template (compiler doesn't have to know anything about it). I beleve it will become immediately clear whether D Arrays should be a reference or value type.

April 30, 2009
Robert Jacques wrote:
> On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
>> In addition, there's this: suppose you have a struct containing a (mutable) array.  When you make a copy of that struct, you will almost always want to make a copy of the contained array.  Therefore value semantics should be the default, because it simplifies the most common use case.
> 
> That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.

I hope you're not suggesting writing ctor/dtor/opAssign for every single struct that uses arrays as value types.

It's possible to write a reference wrapper around a value type.  It's also possible to write a value wrapper around a reference type. However, the former is both easier and more efficient than the latter.


-- 
Rainer Deyke - rainerd@eldwood.com