October 19, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> > 3.  A T[new] should be implicitly convertible to a slice.  For example:
> >
> > auto foo = someFunctionThatReturnsTnew();
> > // foo is a T[new].
> > T[] bar = someFunctionThatReturnsTnew();
> > // Works.  bar is a T[].  The T[new] went into oblivion.
> >
> > This solves the problem of slices not being closed over .dup and ~.
> Check.

So then why is slices not being closed over .dup, ~, etc. still a problem?  With implicit conversion, they for all practical purposes are.

> > 4.  It should be guaranteed that no block of memory is ever referenced by more than one T[new] instance.  This is needed to guarantee safety when appending to immutable arrays, etc.
> That doesn't go with reference semantics. Uncheck.
> > 5.  Assigning a T[new] to another T[new] should be by reference, just like assigning a class instance to another class instance.
> Check. (BTW contradicts 4)

There was a slight misunderstanding here.  When I say an instance, I mean one T[new] heap object == one instance, no matter how many pointers/references to this instance you have.  The assumption is that T[new] objects live entirely on the heap and have semantics similar to classes.  For example:

uint[new] foo = new uint[100];
uint[new] bar = foo;  // Really just a pointer assignment.

Now, both foo and bar point to the same uint[new] instance.  If anything is modified via foo, it is seen when viewed from bar and vice-versa.

To clarify, no *main block of memory that holds the data in the array* is ever referenced by more than one T[new] *heap object*.

> > Assigning a T[] to a T[new]
> > should duplicate the memory block referenced by the T[] because this is probably
> > the only way to guarantee (4).
> No check, but could have been done.

Actually, this can even be done by COW.  Initially, assigning a T[] to a T[new] could just be a pointer assignment and setting a flag, as long as a copy of the data is made when you try to increase the length of the T[new] or append to it. Basically, as long as you use the T[new] as if it were a T[], no copying needs to be done.

> > 6.  Since T[new] guarantees unique access to a memory block, it should have an assumeUnique() method that returns an immutable slice and sets the T[new]'s reference to the memory block to null.  This solves the problem of building immutable arrays without the performance penalty of not being able to pre-allocate or the unsafeness of having to cowboy cast it to immutable.
> Uncheck.

Now that I've explained my uniqueness thing better, does this sound feasible?

> > 7.  As long as the GC is conservative, there absolutely *must* be a method of manually freeing the memory block referenced by a T[new] provided that the GC supports this operation, though it doesn't have to be particularly pretty.  In general, since D is a systems language, T[new] should not be too opaque.  A good way to do this might be to make all of the fields of the T[new] public but undocumented.  If you *really* want to mess with it, you'll read the source code and figure it out.
> Check while delete still exists. Please use malloc for that stuff.

As long as I can get at the pointer to the memory block held by T[new] I can pass it to GC.free.  All that would really be needed is a .ptr property that does something like what .ptr does for T[]s.

> So now: every place I've said "check" means there was implementation and
> book-quality illustrated documentation that Walter and I have done with
> the sweat of our brow. At the end we looked at the result and concluded
> we should throw away all that work.
> Andrei

This begs the question:  Why?  Walter's post on the subject was rather brief and I can't understand for the life of me why you guys would throw away such an elegant solution.  Given that we already agree that a T[new] can be implicitly cast to a T[], the lack of closure under ~, .dup, etc. seems like a non-issue.  When I was a beginner, before I got into templates, a major attraction to D was that it was a fast, compiled language where basic things like arrays (mostly) just worked.  To take away all the syntactic sugar for appending and lengthening from arrays and push this into a separate library type that doesn't feel like a first class object or (I assume) even support most array semantics would be a massive, massive kludge no matter how well-implemented that library type was.

October 19, 2009
dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>>> 3.  A T[new] should be implicitly convertible to a slice.  For example:
>>>
>>> auto foo = someFunctionThatReturnsTnew();
>>> // foo is a T[new].
>>> T[] bar = someFunctionThatReturnsTnew();
>>> // Works.  bar is a T[].  The T[new] went into oblivion.
>>>
>>> This solves the problem of slices not being closed over .dup and ~.
>> Check.
> 
> So then why is slices not being closed over .dup, ~, etc. still a problem?  With
> implicit conversion, they for all practical purposes are.

The problems are with auto and template argument deduction.

>>> 4.  It should be guaranteed that no block of memory is ever referenced by more
>>> than one T[new] instance.  This is needed to guarantee safety when appending to
>>> immutable arrays, etc.
>> That doesn't go with reference semantics. Uncheck.
>>> 5.  Assigning a T[new] to another T[new] should be by reference, just like
>>> assigning a class instance to another class instance.
>> Check. (BTW contradicts 4)
> 
> There was a slight misunderstanding here.  When I say an instance, I mean one
> T[new] heap object == one instance, no matter how many pointers/references to this
> instance you have.  The assumption is that T[new] objects live entirely on the
> heap and have semantics similar to classes.  For example:
> 
> uint[new] foo = new uint[100];
> uint[new] bar = foo;  // Really just a pointer assignment.
> 
> Now, both foo and bar point to the same uint[new] instance.  If anything is
> modified via foo, it is seen when viewed from bar and vice-versa.
> 
> To clarify, no *main block of memory that holds the data in the array* is ever
> referenced by more than one T[new] *heap object*.

Oh yah. Check that sucker.

>>> Assigning a T[] to a T[new]
>>> should duplicate the memory block referenced by the T[] because this is probably
>>> the only way to guarantee (4).
>> No check, but could have been done.
> 
> Actually, this can even be done by COW.  Initially, assigning a T[] to a T[new]
> could just be a pointer assignment and setting a flag, as long as a copy of the
> data is made when you try to increase the length of the T[new] or append to it.
> Basically, as long as you use the T[new] as if it were a T[], no copying needs to
> be done.
> 
>>> 6.  Since T[new] guarantees unique access to a memory block, it should have an
>>> assumeUnique() method that returns an immutable slice and sets the T[new]'s
>>> reference to the memory block to null.  This solves the problem of building
>>> immutable arrays without the performance penalty of not being able to pre-allocate
>>> or the unsafeness of having to cowboy cast it to immutable.
>> Uncheck.
> 
> Now that I've explained my uniqueness thing better, does this sound feasible?

Nah, there are still problems with Ts that themselves are elaborate types containing references, aliasing etc. assumeUnique makes a rather strong claim.

>>> 7.  As long as the GC is conservative, there absolutely *must* be a method of
>>> manually freeing the memory block referenced by a T[new] provided that the GC
>>> supports this operation, though it doesn't have to be particularly pretty.  In
>>> general, since D is a systems language, T[new] should not be too opaque.  A good
>>> way to do this might be to make all of the fields of the T[new] public but
>>> undocumented.  If you *really* want to mess with it, you'll read the source code
>>> and figure it out.
>> Check while delete still exists. Please use malloc for that stuff.
> 
> As long as I can get at the pointer to the memory block held by T[new] I can pass
> it to GC.free.  All that would really be needed is a .ptr property that does
> something like what .ptr does for T[]s.

Alrighty. Check that.

>> So now: every place I've said "check" means there was implementation and
>> book-quality illustrated documentation that Walter and I have done with
>> the sweat of our brow. At the end we looked at the result and concluded
>> we should throw away all that work.
>> Andrei
> 
> This begs the question:  Why?  Walter's post on the subject was rather brief and I
> can't understand for the life of me why you guys would throw away such an elegant
> solution.

Here's what I wrote to Walter:

====================
I'm going to suggest something terrible - let's get rid of T[new]. I know it's difficult to throw away work you've already done, but really things with T[new] start to look like a Pyrrhic victory. Here are some issues:

* The abstraction doesn't seem to come off as crisp and clean as we both wanted;

* There are efficiency issues, such as the two allocations that you valiantly tried to eliminate in a subset of cases;

* Explaining two very similar but subtly different types to newcomers is excruciatingly difficult (I'll send you a draft of the chapter - it looks like a burn victim who didn't make it);

* Furthermore, explaining people when to use one vs. the other is much more difficult than it seems. On the surface, it goes like this: "If you need to append stuff, use T[new]. If not, use T[]." Reality is much more subtle. For one thing, T[new] does not allow contraction from the left, whereas T[] does. That puts T[] at an advantage. So if you want to append stuff and also contract from the left, there's nothing our abstractions can help you with.

Instead of all T[new] stuff, I suggest the following:

1. We stay with T[] and we define a struct ArrayBuilder that replaces T[new] with a much more clear name and charter. Phobos already has Appender which works very well. We can beef that up to allow array-like primitives.

2. Assigning to a slice's .length allocates a new slice if growth is needed.

3. Disallow ~= for slices. ArrayBuilder will define it.

4. That's it.

Java got away with a similar approach using StringBuilder:

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html

Scala has something very similar called ArrayBuffer:

http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html

And guess what, C# stole Java's StringBuilder as well:

http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx

So it looks like many programmers coming from other languages will already be familiar with the idea that you use a "builder" to grow an array, and then you use a non-growable array. One thing that Appender has and is really cool is that it can grow an already-existing slice. So you can grow a slice, play with it for a while, and then grow it again at low cost. I don't think the other languages allow that.

I understand how you must feel about having implemented T[new] and all, but really please please try to detach for a minute and think back. Does what we've got now with T[new] make D a much better place? Between the increase of the language, the difficulty to explain the minute subtleties, and the annoying corner cases and oddities, I think it might be good to reconsider.
================

> Given that we already agree that a T[new] can be implicitly cast to a
> T[], the lack of closure under ~, .dup, etc. seems like a non-issue.  When I was a
> beginner, before I got into templates, a major attraction to D was that it was a
> fast, compiled language where basic things like arrays (mostly) just worked.  To
> take away all the syntactic sugar for appending and lengthening from arrays and
> push this into a separate library type that doesn't feel like a first class object
> or (I assume) even support most array semantics would be a massive, massive kludge
> no matter how well-implemented that library type was.

I think the language will be in considerably better shape without T[new] than with it. I understand this is a subjective point. You can argue against on virtually every one of my objections.


Andrei
October 19, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> dsimcha wrote:
> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> >>> 3.  A T[new] should be implicitly convertible to a slice.  For example:
> >>>
> >>> auto foo = someFunctionThatReturnsTnew();
> >>> // foo is a T[new].
> >>> T[] bar = someFunctionThatReturnsTnew();
> >>> // Works.  bar is a T[].  The T[new] went into oblivion.
> >>>
> >>> This solves the problem of slices not being closed over .dup and ~.
> >> Check.
> >
> > So then why is slices not being closed over .dup, ~, etc. still a problem?  With implicit conversion, they for all practical purposes are.
> The problems are with auto and template argument deduction.

Ok, now I can see how this would be a legitimate problem, but only in a few corner cases:

void doStuff(T)(T someArrayLikeObject) {
    someArrayLikeObject = someArrayLikeObject[0..$ - 1];
}

int[] foo = someFunction();
int[] bar = someFunction();
auto baz = foo ~ bar;

// baz is an int[new].  This doesn't usually matter, since you can use it
// just like an int[] and if you care about the small performance difference,
// you should be more careful.

doStuff(baz);

// Are the effects of doStuff() visible in this scope?
// Depends if baz is an int[] or an int[new].

However, this is really, really, *really* a corner case.  Any reasonable template would either slice the array to make sure it owns the slice information or pass by reference to make sure the change is propagated, so that there is no ambiguity.

> Here's what I wrote to Walter:
> ====================
> I'm going to suggest something terrible - let's get rid of T[new]. I
> know it's difficult to throw away work you've already done, but really
> things with T[new] start to look like a Pyrrhic victory. Here are some
> issues:
> * The abstraction doesn't seem to come off as crisp and clean as we both
> wanted;
> * There are efficiency issues, such as the two allocations that you
> valiantly tried to eliminate in a subset of cases;

Once you've opened the can of worms of having to perform allocations, one versus two allocations isn't very important.

> * Explaining two very similar but subtly different types to newcomers is excruciatingly difficult (I'll send you a draft of the chapter - it looks like a burn victim who didn't make it);

This is admittedly a legitimate concern.  However, you can get off the ground in
D, or any language for that matter, without being a full-fledged language lawyer.
 I frankly don't think it's important for beginners to understand every subtlety
and corner case.  Heck, I've been using D for a while and have done some
non-trivial projects in it and there are definitely dark corners that I don't
understand.  (Complex numbers come to mind.)  I just don't care because I figure
I'll learn them if/when I need to know them.

> * Furthermore, explaining people when to use one vs. the other is much more difficult than it seems. On the surface, it goes like this: "If you need to append stuff, use T[new]. If not, use T[]." Reality is much more subtle. For one thing, T[new] does not allow contraction from the left, whereas T[] does. That puts T[] at an advantage. So if you want to append stuff and also contract from the left, there's nothing our abstractions can help you with.

Why can't a T[new] contract from the left?  As far as I can tell, you could do something like:

struct TNew(T) {
    typeof(this) opAssign(T[] slice) {
        if(slice.ptr >= this.ptr && slice.ptr < this.ptr + capacity) {
            // Then we own this block of memory and can assign
            // a slice by reference.

            // Adjust effective capacity.
            capacity -= cast(size_t) (slice.ptr - this.ptr);
            length = slice.length;
            this.ptr = slice.ptr;
       } else {
           // Assign the slice by copying.
       }
    }

    // Other stuff.
}

You would then contract from the left the same way as for a slice:

int[new] foo = new int[5];
foo = foo[1..$];

It would simply require a few integer/pointer comparisons and still be reasonably efficient.

> Instead of all T[new] stuff, I suggest the following:
> 1. We stay with T[] and we define a struct ArrayBuilder that replaces
> T[new] with a much more clear name and charter. Phobos already has
> Appender which works very well. We can beef that up to allow array-like
> primitives.
> 2. Assigning to a slice's .length allocates a new slice if growth is needed.
> 3. Disallow ~= for slices. ArrayBuilder will define it.
> 4. That's it.
> Java got away with a similar approach using StringBuilder:
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
> Scala has something very similar called ArrayBuffer:
> http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html
> And guess what, C# stole Java's StringBuilder as well:
> http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx
> So it looks like many programmers coming from other languages will
> already be familiar with the idea that you use a "builder" to grow an
> array, and then you use a non-growable array. One thing that Appender
> has and is really cool is that it can grow an already-existing slice. So
> you can grow a slice, play with it for a while, and then grow it again
> at low cost. I don't think the other languages allow that.
> I understand how you must feel about having implemented T[new] and all,
> but really please please try to detach for a minute and think back. Does
> what we've got now with T[new] make D a much better place? Between the
> increase of the language, the difficulty to explain the minute
> subtleties, and the annoying corner cases and oddities, I think it might
> be good to reconsider.

Yes, Java, etc. get away with this because Java has no pretense of being a make simple things simple kind of language, but one big thing that D has over Java, etc. is that it can be used almost like a scripting language for simple stuff.  It is about the only language that scales well all the way from simple scripts to uber-complicated metaprogramming.  If we get rid of nice builtin arrays that support everything with clean syntax, we're throwing out making simple things simple in exchange for fixing a few corner cases.  IMHO this is a terrible tradeoff.  If you and Walter *really* despise T[new], I would prefer to see builtin arrays kept exactly the way they are, bugs and all, and for array builders/appenders/whatever to be improved but still be considered purely a performance hack.  The bugs in the current arrays are pretty nasty from a theoretical safety/purity point of view (esp. the one that's a hole in immutability), but are seldom run into in practice.
October 19, 2009
Walter Bright wrote:
> The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?
> 
> The solution we came up with was to create a third array type, T[new], which was a reference type.
> 
> Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.
> 
> The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].
> 
> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> 
> What do you think?

I think ArrayBuilder can work almost entirely transparently provided the following conditions are met:

1) when trying to cat to an array, the first array automatically promotes to ArrayBuilder

2) Setting .length is, depending on new length, either a slice or a cat operation with the length difference.

2) ArrayBuilders implicitly cast to string.

This should allow a syntax that is almost entirely identical to the one used in D1, aside from typeof("a"~"b").stringof :)

Is this feasible in D2?
October 19, 2009
== Quote from downs (default_357-line@yahoo.de)'s article
> Walter Bright wrote:
> > The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?
> >
> > The solution we came up with was to create a third array type, T[new], which was a reference type.
> >
> > Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.
> >
> > The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].
> >
> > We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> >
> > What do you think?
> I think ArrayBuilder can work almost entirely transparently provided the
following conditions are met:
> 1) when trying to cat to an array, the first array automatically promotes to
ArrayBuilder
> 2) Setting .length is, depending on new length, either a slice or a cat
operation with the length difference.
> 2) ArrayBuilders implicitly cast to string.
> This should allow a syntax that is almost entirely identical to the one used in
D1, aside from typeof("a"~"b").stringof :)
> Is this feasible in D2?

Once you throw in all the implicit conversions to ArrayBuilder and having the core language know about ArrayBuilder, aren't you, for all practical purposes, implementing T[new] but calling it ArrayBuilder?
October 19, 2009
Andrei Alexandrescu Wrote:

> grauzone wrote:
> > Walter Bright wrote:
> >> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> > 
> > How would this remove this corner case?
> 
> Can't increment length if it's read-only.
> 
> Andrei

removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
October 19, 2009
Jason House wrote:
> Andrei Alexandrescu Wrote:
> 
>> grauzone wrote:
>>> Walter Bright wrote:
>>>> We both feel that this would simplify D, make it more flexible,
>>>> and remove some awkward corner cases like the inability to say
>>>> a.length++.
>>> How would this remove this corner case?
>> Can't increment length if it's read-only.
>> 
>> Andrei
> 
> removing syntactic sugar doesn't really remove corner cases. T[new]
> is being replaced with an array builder. What usage semantics are
> given up with an array builder? How will implicit conversions be
> handled? (alias this, etc...). How will the Phobos array buiilder
> support user specified lengths (for efficiency)?

I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done.

Andrei
October 19, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> Jason House wrote:
> > Andrei Alexandrescu Wrote:
> >
> >> grauzone wrote:
> >>> Walter Bright wrote:
> >>>> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> >>> How would this remove this corner case?
> >> Can't increment length if it's read-only.
> >>
> >> Andrei
> >
> > removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
> I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei

So basically, ArrayBuilder would be like T[new], except:

1.  It would be a plain old library type, and the core language would know nothing
about it.
2.  T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new].

Is this basically correct?
October 19, 2009
dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> Jason House wrote:
>>> Andrei Alexandrescu Wrote:
>>>
>>>> grauzone wrote:
>>>>> Walter Bright wrote:
>>>>>> We both feel that this would simplify D, make it more flexible,
>>>>>> and remove some awkward corner cases like the inability to say
>>>>>> a.length++.
>>>>> How would this remove this corner case?
>>>> Can't increment length if it's read-only.
>>>>
>>>> Andrei
>>> removing syntactic sugar doesn't really remove corner cases. T[new]
>>> is being replaced with an array builder. What usage semantics are
>>> given up with an array builder? How will implicit conversions be
>>> handled? (alias this, etc...). How will the Phobos array buiilder
>>> support user specified lengths (for efficiency)?
>> I think ArrayBuilder could a logic similar to Appender for "~=" and
>> otherwise offer regular array primitives. I don't think implicit
>> conversion to T[] is an essential feature, but it can be done.
>> Andrei
> 
> So basically, ArrayBuilder would be like T[new], except:
> 
> 1.  It would be a plain old library type, and the core language would know nothing
> about it.
> 2.  T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new].
> 
> Is this basically correct?

Yes, that's the plan. I hope you're not setting me up or something :o).

Andrei
October 19, 2009
Andrei Alexandrescu Wrote:

> dsimcha wrote:
> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> >> Jason House wrote:
> >>> Andrei Alexandrescu Wrote:
> >>>
> >>>> grauzone wrote:
> >>>>> Walter Bright wrote:
> >>>>>> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> >>>>> How would this remove this corner case?
> >>>> Can't increment length if it's read-only.
> >>>>
> >>>> Andrei
> >>> removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
> >> I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei
> > 
> > So basically, ArrayBuilder would be like T[new], except:
> > 
> > 1.  It would be a plain old library type, and the core language would know nothing
> > about it.
> > 2.  T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new].
> > 
> > Is this basically correct?
> 
> Yes, that's the plan. I hope you're not setting me up or something :o).
> 
> Andrei

I was ;) Well, only sort of. I'm undecided if array-like syntactic sugar makes sense for array builder. Afterall, AA's fit into a similar category. I did notice you wanted to make array builder a struct which implies it won't be used in the same manner as T[new]