October 19, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> dsimcha wrote:
> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> >> Jason House wrote:
> >>> Andrei Alexandrescu Wrote:
> >>>
> >>>> grauzone wrote:
> >>>>> Walter Bright wrote:
> >>>>>> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> >>>>> How would this remove this corner case?
> >>>> Can't increment length if it's read-only.
> >>>>
> >>>> Andrei
> >>> removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
> >> I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei
> >
> > So basically, ArrayBuilder would be like T[new], except:
> >
> > 1.  It would be a plain old library type, and the core language would know nothing
> > about it.
> > 2.  T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new].
> >
> > Is this basically correct?
> Yes, that's the plan. I hope you're not setting me up or something :o). Andrei

Given the issues surrounding T[new] and how ugly some cases of the implicit conversion can get, you've convinced me that this may be a good solution as long as:

1.  ArrayBuilder implements full array semantics and is usable as a "real" array
as you suggest.  I wasn't aware that this was what was being proposed, as
ArrayBuilder implied to me that it's only good for building arrays as opposed to
being a full-fledged array library type.  Ideally, for convenience it should be
implicitly convertible to T[].
2.  The semantics are such that bug 2093
(http://d.puremagic.com/issues/show_bug.cgi?id=2093) is fixed, while still
allowing appending to arrays of immutables.  This would probably require some
casting under the hood, but the standard lib. is only one step above the core
language, so it's acceptable.  The semantics of the length property would probably
have to be reference semantics, any block of array memory would have to be owned
exclusively by one ArrayBuilder, etc. just like with T[new].
October 19, 2009
dsimcha wrote:
> == Quote from downs (default_357-line@yahoo.de)'s article
>> Walter Bright wrote:
>>> The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?
>>>
>>> The solution we came up with was to create a third array type, T[new], which was a reference type.
>>>
>>> Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.
>>>
>>> The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].
>>>
>>> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
>>>
>>> What do you think?
>> I think ArrayBuilder can work almost entirely transparently provided the
> following conditions are met:
>> 1) when trying to cat to an array, the first array automatically promotes to
> ArrayBuilder
>> 2) Setting .length is, depending on new length, either a slice or a cat
> operation with the length difference.
>> 2) ArrayBuilders implicitly cast to string.
>> This should allow a syntax that is almost entirely identical to the one used in
> D1, aside from typeof("a"~"b").stringof :)
>> Is this feasible in D2?
> 
> Once you throw in all the implicit conversions to ArrayBuilder and having the core language know about ArrayBuilder, aren't you, for all practical purposes, implementing T[new] but calling it ArrayBuilder?

Yes but changing the semantics of a library object is easier than changing the semantics of a language feature. Besides, I think the builder works differently internally than T[new], although I'm not sure - storing lists or trees of sub-arrays and concatenating them lazily? At least that's what I would do :p
October 19, 2009
dsimcha wrote:
> The bugs in the current arrays are pretty nasty from a
> theoretical safety/purity point of view (esp. the one that's a hole in
> immutability), but are seldom run into in practice.

We'd like to get D to the point where guarantees can be offered. This really sets it apart from C++, which offers no guarantees, and safety only happens if the programmer carefully follows convention. We need to do something about those nasty corner cases.

Another issue is that D currently has two array types. Adding T[new] makes for 3 array types, and that starts to look like we couldn't figure out what we were doing.

And lastly, the reason to make a type a built-in one is in order to offer substantial advantages over a library one. If a library one can do  a good job, it is better to push it into the library. Making for a smaller core language makes it easier for people to get into D and feel comfortable with it.

I was getting hard pressed to find significant advantages for T[new] over a library type. In particular, I find few uses for resizeable arrays as opposed to slices which are everywhere. When resizeable arrays are needed, they are usually isolated in one function, and when the function returns the resizeable array no longer needs to be resized - it can be converted to a slice type.

Another interesting advantage to the library type for T[new] is it solves the problem of creating a safe immutable string. When the library T[new] is converted to an immutable string, the T[new] contents are removed - hence it won't be possible to create a mutable alias of an immutable string without a user-applied cast, which would not be allowed in safe mode.
October 19, 2009
Walter Bright wrote:
> The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?
> 
> The solution we came up with was to create a third array type, T[new], which was a reference type.
> 
> Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.
> 
> The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].
> 
> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> 
> What do you think?

I remember seeing a lot of CTFE code that created a dynamic array and then appended stuff to it, like for example to build a list of prime numbers. Would that still work with ArrayBuilder?
October 19, 2009
Ary Borenszweig wrote:
> I remember seeing a lot of CTFE code that created a dynamic array and then appended stuff to it, like for example to build a list of prime numbers. Would that still work with ArrayBuilder?

Probably not. But you can rewrite:

  a ~= stuff;

as:

  a = a ~ stuff;

to make it work.
October 19, 2009
On Mon, Oct 19, 2009 at 2:48 AM, Walter Bright <newshound1@digitalmars.com> wrote:
> Ary Borenszweig wrote:
>>
>> I remember seeing a lot of CTFE code that created a dynamic array and then appended stuff to it, like for example to build a list of prime numbers. Would that still work with ArrayBuilder?
>
> Probably not. But you can rewrite:
>
>  a ~= stuff;
>
> as:
>
>  a = a ~ stuff;
>
> to make it work.

Ouch.  Gives me Matlab nightmares.

--bb
October 19, 2009
On Mon, 19 Oct 2009 05:16:45 +0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> dsimcha wrote:
>> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>>>> 3.  A T[new] should be implicitly convertible to a slice.  For example:
>>>>
>>>> auto foo = someFunctionThatReturnsTnew();
>>>> // foo is a T[new].
>>>> T[] bar = someFunctionThatReturnsTnew();
>>>> // Works.  bar is a T[].  The T[new] went into oblivion.
>>>>
>>>> This solves the problem of slices not being closed over .dup and ~.
>>> Check.
>>  So then why is slices not being closed over .dup, ~, etc. still a problem?  With
>> implicit conversion, they for all practical purposes are.
>
> The problems are with auto and template argument deduction.
>

Could you please bring any example? You talk a lot about corner cases and inconsistencies, but didn't bring a single example.

>>  This begs the question:  Why?  Walter's post on the subject was rather brief and I
>> can't understand for the life of me why you guys would throw away such an elegant
>> solution.
>
> Here's what I wrote to Walter:
>
> ====================
> I'm going to suggest something terrible - let's get rid of T[new]. I know it's difficult to throw away work you've already done, but really things with T[new] start to look like a Pyrrhic victory. Here are some issues:
>
> * The abstraction doesn't seem to come off as crisp and clean as we both wanted;
>
> * There are efficiency issues, such as the two allocations that you valiantly tried to eliminate in a subset of cases;
>
> * Explaining two very similar but subtly different types to newcomers is excruciatingly difficult (I'll send you a draft of the chapter - it looks like a burn victim who didn't make it);
>
> * Furthermore, explaining people when to use one vs. the other is much more difficult than it seems. On the surface, it goes like this: "If you need to append stuff, use T[new]. If not, use T[]." Reality is much more subtle. For one thing, T[new] does not allow contraction from the left, whereas T[] does. That puts T[] at an advantage. So if you want to append stuff and also contract from the left, there's nothing our abstractions can help you with.
>

An Array!(T) is really just a different name to a T[new]. You'll have the same problem explaining difference between Array!(T) and T[].

But you are also creating a nightmare for CTFE. Since you can't use "a ~= b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not only it is syntactically less pleasant, this way you render this function useless at run-time - who in the sane mind will use such an inefficient stuff?

> Instead of all T[new] stuff, I suggest the following:
>
> 1. We stay with T[] and we define a struct ArrayBuilder that replaces T[new] with a much more clear name and charter. Phobos already has Appender which works very well. We can beef that up to allow array-like primitives.
>
> 2. Assigning to a slice's .length allocates a new slice if growth is needed.
>
> 3. Disallow ~= for slices. ArrayBuilder will define it.
>
> 4. That's it.
>
> Java got away with a similar approach using StringBuilder:
>
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
>
> Scala has something very similar called ArrayBuffer:
>
> http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html
>
> And guess what, C# stole Java's StringBuilder as well:
>
> http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx
>

This is not a fair comparison. StringBuilders only exists because strings are immutable and any operation on them create a new object. They are just a performance optimization, nothing else. StringBuilder is great, but its functionality doesn't really overlap with a general-purpose array.

> So it looks like many programmers coming from other languages will already be familiar with the idea that you use a "builder" to grow an array, and then you use a non-growable array. One thing that Appender has and is really cool is that it can grow an already-existing slice. So you can grow a slice, play with it for a while, and then grow it again at low cost. I don't think the other languages allow that.
>
> I understand how you must feel about having implemented T[new] and all, but really please please try to detach for a minute and think back. Does what we've got now with T[new] make D a much better place? Between the increase of the language, the difficulty to explain the minute subtleties, and the annoying corner cases and oddities, I think it might be good to reconsider.
> ================
>
>> Given that we already agree that a T[new] can be implicitly cast to a
>> T[], the lack of closure under ~, .dup, etc. seems like a non-issue.  When I was a
>> beginner, before I got into templates, a major attraction to D was that it was a
>> fast, compiled language where basic things like arrays (mostly) just worked.  To
>> take away all the syntactic sugar for appending and lengthening from arrays and
>> push this into a separate library type that doesn't feel like a first class object
>> or (I assume) even support most array semantics would be a massive, massive kludge
>> no matter how well-implemented that library type was.
>
> I think the language will be in considerably better shape without T[new] than with it. I understand this is a subjective point. You can argue against on virtually every one of my objections.
>
>

It is a shame that D will have built-in associative arrays and no dynamic arrays. After all, which one of them do you use more frequently?
October 19, 2009
This discussion originated in the T[new] thread, but I think it deserves its own thread.

== Quote from Denis Koroskin (2korden@gmail.com)'s article
> An Array!(T) is really just a different name to a T[new]. You'll have the
> same problem explaining difference between Array!(T) and T[].
> But you are also creating a nightmare for CTFE. Since you can't use "a ~=
> b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not
> only it is syntactically less pleasant, this way you render this function
> useless at run-time - who in the sane mind will use such an inefficient
> stuff?

Maybe what we need is a version(ctfe) statement.  Stuff inside such a block would be executed only if a function is being compile time evaluated.  When code is generated for runtime evaluation the else block would be used.  This would allow problems like this to be solved in a well-encapsulated way.  Example:

uint[] findPrimes(uint maxPrime) {
    version(ctfe) {
        uint[] ret;
    } else {
        ArrayBuilder!uint ret;
    }

    foreach(i; 0..maxPrime) {
        if(!isPrime(i)) {
             continue;
        }

        version(ctfe) {
            ret = ret ~ i;
        } else {
            ret ~= i;
        }
    }
}

Given that CTFE will likely never support everything that is supported at runtime, this will likely make it much more useful.
October 19, 2009
On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha@yahoo.com> wrote:

> This discussion originated in the T[new] thread, but I think it deserves its own
> thread.
>
> == Quote from Denis Koroskin (2korden@gmail.com)'s article
>> An Array!(T) is really just a different name to a T[new]. You'll have the
>> same problem explaining difference between Array!(T) and T[].
>> But you are also creating a nightmare for CTFE. Since you can't use "a ~=
>> b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not
>> only it is syntactically less pleasant, this way you render this function
>> useless at run-time - who in the sane mind will use such an inefficient
>> stuff?
>
> Maybe what we need is a version(ctfe) statement.  Stuff inside such a block would
> be executed only if a function is being compile time evaluated.  When code is
> generated for runtime evaluation the else block would be used.  This would allow
> problems like this to be solved in a well-encapsulated way.  Example:
>
> uint[] findPrimes(uint maxPrime) {
>     version(ctfe) {
>         uint[] ret;
>     } else {
>         ArrayBuilder!uint ret;
>     }
>
>     foreach(i; 0..maxPrime) {
>         if(!isPrime(i)) {
>              continue;
>         }
>
>         version(ctfe) {
>             ret = ret ~ i;
>         } else {
>             ret ~= i;
>         }
>     }
> }
>
> Given that CTFE will likely never support everything that is supported at runtime,
> this will likely make it much more useful.

It was suggested before and IIRC Walter said it is next to impossible to implement.

What you suggest is almost the same as writing 2 different functions. The killer feature of CTFE is that you write it just once and use both at compile and run-time without modifications.

I'd like to write it as

immutable(uint)[] findPrimes(uint maxPrime) {
    ArrayOfInt ret;

    foreach(i; 0..maxPrime) {
        if(!isPrime(i)) {
             continue;
        }

        ret ~= i;
    }

    return assumeUnique(ret[]);
}

ArrayOfInt may be an alias to Array!(int), int[] or int[new], or whatever else. All I care is it should be valid and efficient for both run- and compile-time cases.
October 19, 2009
dsimcha wrote:
> This discussion originated in the T[new] thread, but I think it deserves its own
> thread.
> 
> == Quote from Denis Koroskin (2korden@gmail.com)'s article
>> An Array!(T) is really just a different name to a T[new]. You'll have the
>> same problem explaining difference between Array!(T) and T[].
>> But you are also creating a nightmare for CTFE. Since you can't use "a ~=
>> b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not
>> only it is syntactically less pleasant, this way you render this function
>> useless at run-time - who in the sane mind will use such an inefficient
>> stuff?
> 
> Maybe what we need is a version(ctfe) statement.  Stuff inside such a block would
> be executed only if a function is being compile time evaluated.  When code is
> generated for runtime evaluation the else block would be used.  This would allow
> problems like this to be solved in a well-encapsulated way.  Example:
> 
> uint[] findPrimes(uint maxPrime) {
>     version(ctfe) {
>         uint[] ret;
>     } else {
>         ArrayBuilder!uint ret;
>     }
> 
>     foreach(i; 0..maxPrime) {
>         if(!isPrime(i)) {
>              continue;
>         }
> 
>         version(ctfe) {
>             ret = ret ~ i;
>         } else {
>             ret ~= i;
>         }
>     }
> }
> 
> Given that CTFE will likely never support everything that is supported at runtime,
> this will likely make it much more useful.

version(ctfe) would be helpful in many other situations (e.g. asm-optimized routines etc.)

I thought more about it and realized that ~= can be still implemented for arrays if the GC cooperates.

For relatively large chunks of memory, the GC keeps a control block. We could add a member size_t requestedSize that keeps the size that was requested with an allocation/reallocation call. The GC can take initiative in overallocating memory and expose primitives such as requestedSize (how much did the program ask for?) and capacity (how much is really allocated?).  With that API, it is possible to implement ~= efficiently.


Andrei