The demise of T[new]

Re: version(ctfe)
Oct 19, 2009 Andrei Alexandrescu
Oct 19, 2009 Leandro Lucarella
Oct 19, 2009 Andrei Alexandrescu
Oct 19, 2009 dsimcha
Oct 19, 2009 Andrei Alexandrescu
Oct 19, 2009 Fawzi Mohamed
Oct 19, 2009 Leandro Lucarella

Oct 19, 2009

Leandro Lucarella

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Chris Nicholson-Sauls

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 20, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 19, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 19, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 20, 2009

Oct 30, 2009

October 18, 2009

Posted by Walter Bright

Permalink

Walter Bright

Permalink

The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?

The solution we came up with was to create a third array type, T[new], which was a reference type.

Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.

The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].

We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.

What do you think?

October 18, 2009

Re: The demise of T[new]

Posted by bearophile
in reply to Walter Bright

Permalink

bearophile

Posted in reply to Walter Bright

Permalink

Walter Bright:

> Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.

I like how Andrei keeps trying to invent new possible ideas.
Sometimes situations can be improved adding things, and some other times removing things.
So arrays will keep being kinda values, so if inside a function you use an appender to add items to an array, outside the function, when the function returns, you will keep seeing the original shorter array. This is not fully intuitive (but you can get used to it just because you use arrays all the time).

The following code is not possible any more because you can't change the length:

import std.stdio: writefln;
void main() {
    int[] a1 = new int[5];
    a1[] = 1;
    int[] a2 = a1[1..3];
    writefln(a1, " ", a2); // [1,1,1,1,1] [1,1]
    a2.length = a2.length + 4;
    writefln(a1, " ", a2); // [1,1,1,1,1] [1,1,0,0,0,0]
}

But how do you change the length of an array (because appending many single items isn't always what you need to do)? And how does that interacts with the slicing, as in this code example?

Bye,
bearophile

October 18, 2009

Re: The demise of T[new]

Posted by Denis Koroskin
in reply to Walter Bright

Permalink

Denis Koroskin

Posted in reply to Walter Bright

Permalink

On Mon, 19 Oct 2009 01:05:39 +0400, Walter Bright <newshound1@digitalmars.com> wrote:

> The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?
>
> The solution we came up with was to create a third array type, T[new], which was a reference type.
>
> Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.
>
> The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].
>
> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
>
> What do you think?

Well, I personally don't feel much of a need for T[new], and I agree T[] needs to be "fixed" the way you describe (i.e. make a shrink-only range out of it).
But I believe a change like this coupled with a demise of T[new] would be way too restricting. Given a

T[] array;

What type would result of these operations be?

1) auto x = array.dup;
2) auto y = array ~ array;

T[] is a view into someone else's container. But when you create a new array (by dup'ing or concatenating), you get a fresh copy. It points to a beginning of data sequence, and there is nothing wrong to expand it.

I believe it should be T[new]. A reference type, yes, a type that would allow appending and would render Appender unnecessary.

October 18, 2009

Re: The demise of T[new]

Posted by grauzone
in reply to Walter Bright

Permalink

grauzone

Posted in reply to Walter Bright

Permalink

Walter Bright wrote:
> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.

How would this remove this corner case?

> What do you think?

Whether T[new] is bultin (mostly implemented in the runtime), or a library type (implemented in Phobos) doesn't make much a difference; it's pretty much the same, isn't it? I don't understand what the exact reasons for this decision.

If anything, the array type should be available without additional Phobos imports (i.e. it should be declared in object.d). Ease of access and ease of use is the key thing here.

October 18, 2009

Re: The demise of T[new]

Posted by Andrei Alexandrescu
in reply to Denis Koroskin

Permalink

Andrei Alexandrescu

Posted in reply to Denis Koroskin

Permalink

Denis Koroskin wrote:
> On Mon, 19 Oct 2009 01:05:39 +0400, Walter Bright <newshound1@digitalmars.com> wrote:
> 
>> The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?
>>
>> The solution we came up with was to create a third array type, T[new], which was a reference type.
>>
>> Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.
>>
>> The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].
>>
>> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
>>
>> What do you think?
> 
> Well, I personally don't feel much of a need for T[new], and I agree T[] needs to be "fixed" the way you describe (i.e. make a shrink-only range out of it).
> But I believe a change like this coupled with a demise of T[new] would be way too restricting. Given a
> 
> T[] array;
> 
> What type would result of these operations be?
> 
> 1) auto x = array.dup;
> 2) auto y = array ~ array;
> 
> T[] is a view into someone else's container. But when you create a new array (by dup'ing or concatenating), you get a fresh copy. It points to a beginning of data sequence, and there is nothing wrong to expand it.
> 
> I believe it should be T[new]. A reference type, yes, a type that would allow appending and would render Appender unnecessary.

That was the exact plan. I even have half a chapter written about it with nice figures and all, that I can make available. The problem was, it sucked. Returning a distinct type from .dup and ~ makes slices not closed over these operations, a source of complication, confusion, and bloating.

Andrei

October 18, 2009

Re: The demise of T[new]

Posted by Andrei Alexandrescu
in reply to grauzone

Permalink

Andrei Alexandrescu

Posted in reply to grauzone

Permalink

grauzone wrote:
> Walter Bright wrote:
>> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
> 
> How would this remove this corner case?

Can't increment length if it's read-only.

Andrei

October 18, 2009

Re: The demise of T[new]

Posted by Denis Koroskin
in reply to Andrei Alexandrescu

Permalink

Denis Koroskin

Posted in reply to Andrei Alexandrescu

Permalink

On Mon, 19 Oct 2009 01:55:28 +0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Denis Koroskin wrote:
>> On Mon, 19 Oct 2009 01:05:39 +0400, Walter Bright <newshound1@digitalmars.com> wrote:
>>
>>> The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?
>>>
>>> The solution we came up with was to create a third array type, T[new], which was a reference type.
>>>
>>> Andrei had the idea that T[new] could be dispensed with by making a "builder" library type to handle creating arrays by doing things like appending, and then delivering a finished T[] type. This is similar to what std.outbuffer and std.array.Appender do, they just need a bit of refining.
>>>
>>> The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].
>>>
>>> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.
>>>
>>> What do you think?
>>  Well, I personally don't feel much of a need for T[new], and I agree T[] needs to be "fixed" the way you describe (i.e. make a shrink-only range out of it).
>> But I believe a change like this coupled with a demise of T[new] would be way too restricting. Given a
>>  T[] array;
>>  What type would result of these operations be?
>>  1) auto x = array.dup;
>> 2) auto y = array ~ array;
>>  T[] is a view into someone else's container. But when you create a new array (by dup'ing or concatenating), you get a fresh copy. It points to a beginning of data sequence, and there is nothing wrong to expand it.
>>  I believe it should be T[new]. A reference type, yes, a type that would allow appending and would render Appender unnecessary.
>
> That was the exact plan. I even have half a chapter written about it with nice figures and all, that I can make available. The problem was, it sucked. Returning a distinct type from .dup and ~ makes slices not closed over these operations, a source of complication, confusion, and bloating.
>
> Andrei

What's problem with implicit cast of T[new] to T[]?

T[new] container = [1, 2, 3].dup;
T[] range1 = container; // implicit container[] call via alias this
T[] range2 = [1, 2, 3].dup; // same here

An other option would be to declare

// duplicates a range
T[] rdup(T[] range)
{
    return range.dup[];
}

but it's less likely to happen.

October 18, 2009

Re: The demise of T[new]

Posted by dsimcha
in reply to Walter Bright

Permalink

dsimcha

Posted in reply to Walter Bright

Permalink

== Quote from Walter Bright (newshound1@digitalmars.com)'s article
> The purpose of T[new] was to solve the problems T[] had with passing T[]
> to a function and then the function resizes the T[]. What happens with
> the original?
> The solution we came up with was to create a third array type, T[new],
> which was a reference type.
> Andrei had the idea that T[new] could be dispensed with by making a
> "builder" library type to handle creating arrays by doing things like
> appending, and then delivering a finished T[] type. This is similar to
> what std.outbuffer and std.array.Appender do, they just need a bit of
> refining.
> The .length property of T[] would then become an rvalue only, not an
> lvalue, and ~= would no longer be allowed for T[].
> We both feel that this would simplify D, make it more flexible, and
> remove some awkward corner cases like the inability to say a.length++.
> What do you think?

This is ridiculous.  The status quo works well most of the time and just has a few really ugly corner cases.  As long as you're only appending to one array at a time, appending isn't even that slow anymore now that bug 2900 (http://d.puremagic.com/issues/show_bug.cgi?id=2900) is fixed.

I frankly think it's absurd to make working with arrays an order of magnitude harder and less elegant in the 90+% of cases where they work fine just to fix a few corner case bugs.  Don't get me wrong, the corner case bugs should be fixed because they're pretty nasty safety issues.  They just shouldn't be fixed in a way that makes arrays substantially harder to use, or even syntactically uglier, in the cases where they already work well.  As good a programmer as Andrei is, I'm sure whatever he comes up with will be much less syntactially pleasing and easy to use than something the core language understands.

Here's my proposal for how T[new] should work:

1.  It should be a reference type to be consistent with slices.  Yes, slices are kind of a hybrid, but they're more semantically similar to reference types than value types.  If you can't modify the length of a slice anymore, then for all practical purposes it will be a reference type.

2.  A T[new] should support all the same operations as a T[] with semantics as similar as common sense will allow, including indexing, ~, .dup, .idup, slice assign, etc.  Basically, it should have, to the greatest degree possible without defeating the purpose, the same compile time interface.

3.  A T[new] should be implicitly convertible to a slice.  For example:

auto foo = someFunctionThatReturnsTnew();
// foo is a T[new].
T[] bar = someFunctionThatReturnsTnew();
// Works.  bar is a T[].  The T[new] went into oblivion.

This solves the problem of slices not being closed over .dup and ~.

4.  It should be guaranteed that no block of memory is ever referenced by more than one T[new] instance.  This is needed to guarantee safety when appending to immutable arrays, etc.

5.  Assigning a T[new] to another T[new] should be by reference, just like assigning a class instance to another class instance.  Assigning a T[] to a T[new] should duplicate the memory block referenced by the T[] because this is probably the only way to guarantee (4).

6.  Since T[new] guarantees unique access to a memory block, it should have an assumeUnique() method that returns an immutable slice and sets the T[new]'s reference to the memory block to null.  This solves the problem of building immutable arrays without the performance penalty of not being able to pre-allocate or the unsafeness of having to cowboy cast it to immutable.

7.  As long as the GC is conservative, there absolutely *must* be a method of manually freeing the memory block referenced by a T[new] provided that the GC supports this operation, though it doesn't have to be particularly pretty.  In general, since D is a systems language, T[new] should not be too opaque.  A good way to do this might be to make all of the fields of the T[new] public but undocumented.  If you *really* want to mess with it, you'll read the source code and figure it out.

8.  The first call to opSlice on a T[new] should set a flag that indicates that there may be multiple pointers to the underlying memory block.  Before that flag is set, appends to a T[new] should result in calls to GC.free() to free the old block whenever it needs to be expanded (since we can guarantee that we own it exclusively).  This will help deal with false pointer issues, since D's GC looks like it will remain conservative for the foreseeable future.

October 18, 2009

Re: The demise of T[new]

Posted by dsimcha
in reply to dsimcha

Permalink

dsimcha

Posted in reply to dsimcha

Permalink

== Quote from dsimcha (dsimcha@yahoo.com)'s article
> 8.  The first call to opSlice on a T[new] should set a flag that indicates that there may be multiple pointers to the underlying memory block.  Before that flag is set, appends to a T[new] should result in calls to GC.free() to free the old block whenever it needs to be expanded (since we can guarantee that we own it exclusively).  This will help deal with false pointer issues, since D's GC looks like it will remain conservative for the foreseeable future.

Scratch this one.  It breaks on the following:

T[new] stuff = somethingThatReturnsTnew();
T* ptrToStuffMemory = &(stuff[8]);
stuff ~= otherStuff;  // May delete memory referred to by ptrToStuffMemory.

October 19, 2009

Re: The demise of T[new]

Posted by Andrei Alexandrescu
in reply to dsimcha

Permalink

Andrei Alexandrescu

Posted in reply to dsimcha

Permalink

dsimcha wrote:
> Here's my proposal for how T[new] should work:
> 
> 1.  It should be a reference type to be consistent with slices.  Yes, slices are
> kind of a hybrid, but they're more semantically similar to reference types than
> value types.  If you can't modify the length of a slice anymore, then for all
> practical purposes it will be a reference type.

Check.

> 2.  A T[new] should support all the same operations as a T[] with semantics as
> similar as common sense will allow, including indexing, ~, .dup, .idup, slice
> assign, etc.  Basically, it should have, to the greatest degree possible without
> defeating the purpose, the same compile time interface.

Check.

> 3.  A T[new] should be implicitly convertible to a slice.  For example:
> 
> auto foo = someFunctionThatReturnsTnew();
> // foo is a T[new].
> T[] bar = someFunctionThatReturnsTnew();
> // Works.  bar is a T[].  The T[new] went into oblivion.
> 
> This solves the problem of slices not being closed over .dup and ~.

Check.

> 4.  It should be guaranteed that no block of memory is ever referenced by more
> than one T[new] instance.  This is needed to guarantee safety when appending to
> immutable arrays, etc.

That doesn't go with reference semantics. Uncheck.

> 5.  Assigning a T[new] to another T[new] should be by reference, just like
> assigning a class instance to another class instance.

Check. (BTW contradicts 4)

> Assigning a T[] to a T[new]
> should duplicate the memory block referenced by the T[] because this is probably
> the only way to guarantee (4).

No check, but could have been done.

> 6.  Since T[new] guarantees unique access to a memory block, it should have an
> assumeUnique() method that returns an immutable slice and sets the T[new]'s
> reference to the memory block to null.  This solves the problem of building
> immutable arrays without the performance penalty of not being able to pre-allocate
> or the unsafeness of having to cowboy cast it to immutable.

Uncheck.

> 7.  As long as the GC is conservative, there absolutely *must* be a method of
> manually freeing the memory block referenced by a T[new] provided that the GC
> supports this operation, though it doesn't have to be particularly pretty.  In
> general, since D is a systems language, T[new] should not be too opaque.  A good
> way to do this might be to make all of the fields of the T[new] public but
> undocumented.  If you *really* want to mess with it, you'll read the source code
> and figure it out.

Check while delete still exists. Please use malloc for that stuff.

> 8.  The first call to opSlice on a T[new] should set a flag that indicates that
> there may be multiple pointers to the underlying memory block.  Before that flag
> is set, appends to a T[new] should result in calls to GC.free() to free the old
> block whenever it needs to be expanded (since we can guarantee that we own it
> exclusively).  This will help deal with false pointer issues, since D's GC looks
> like it will remain conservative for the foreseeable future.

Uncheck.

So now: every place I've said "check" means there was implementation and book-quality illustrated documentation that Walter and I have done with the sweat of our brow. At the end we looked at the result and concluded we should throw away all that work.


Andrei

Top | Forum index | About this forum

Forums