T[new] (page 2)

August 09, 2009

Re: T[new]

Posted by Andrei Alexandrescu
in reply to Brad Roberts

Permalink

Andrei Alexandrescu

Posted in reply to Brad Roberts

Permalink

Brad Roberts wrote:
> Walter Bright wrote:
>> Brad Roberts wrote:
>>> Yay.  What will happen with slices over a T[new] when the underlying
>>> T[new] must
>>> be moved due to resizing?  The behavior needs to be at least well
>>> specified, if
>>> not well defined.
>> Great question! It's no different from what happens with any container
>> when its contents change and there are references to the old content.
>>
>> What won't happen is the slice won't be pointing to unallocated memory,
>> thanks to the gc. Resizing the T[new] will not cause the old contents to
>> be deleted.
>>
>> The slice will either point to the old content, if the resize caused a
>> move, or the new content, if it was resized in place. So it will still
>> be implementation-defined behavior.
>>
>> But the likelihood of getting caught by such behavior is much less
>> likely with T[new], as one can clearly see in the code where the
>> resizeables are, rather than the current situation where any slice could
>> be resized by anyone.
>>
>> I think that taking a slice of a T[new], then resizing the T[new], will
>> be rare (or at least should be). Normally, a T[new] will be built, and
>> when it is all done, it is converted to a T[] and there is no longer any
>> way to resize it.
> 
> As expected, but like I said.. this needs to be clear in the spec.  Because
> something like this is just confusing (and yes, I know it's not new behavior):
> 
>     auto a = new int[10];
>     a[0] = 1;
>     auto s1 = a[0..1];
>     a.length = <some value large enough to force a move>;
>     auto s2 = a[0..1];
>     s2[0] = 2;
>     assert(s1[0] == s2[1]); // fail
> 
> Later,
> Brad
> 

Well yah but in all languages this is the case. C++'s iterators not only invalidate at the drop of a hat, but you can't really tell whether they're invalid. Java iterators can also be invalidated and may throw an exception.

When resizing a D array, its underlying slices may be *orphaned*. That means they are still valid, they just don't refer to the same data as the array. I think that's a reasonable tradeoff between efficiency and safety.


Andrei

bearophile wrote: > Walter Bright: > > I like this general proposal, it sounds like something that can improve D a little. There are many other things that have to be improved in D, but baby steps are enough to go somewhere :-) > >> Slices will retain the old: >> T[] slice; >> syntax. Resizeable arrays will be declared as: >> T[new] array; > > Such syntaxes have to be chosen wisely. I don't fully understand that syntax. And maybe I don't fully like it. Why am I not surprised :o). > I think the default (simpler) syntax has to be the most flexible and safer construct, that is resizable arrays. Slices can be seen as an optimization, so they can have a bit longer syntax. Slices will remain ubiquitous, arrays probably not so much. Andrei

Reply to bearophile, > Walter Bright: > >> 2. make arrays implementable on .net >> > I don't care of such thing. dotnet already has C# and C# is probably > better than D, I beg to differ. D is much better than c# in many ways (D's abstraction tools are WAY better; C#'s end at reflection and generics, both runtime-only) and not as good in a few others (tools, libs, runtime refection). > and it's similar anyway. So I don't think people will > use D on dotnet. This I agree with because D looses many if not all of its advantages when forced into the .NET/CLI/managed-code world

Now there are four ways to create/use "arrays" in D2: - the * syntax on a raw memory block, like in C. - fixed sized arrays, that are just a pointer somewhere. they may even become fully value types........ - the slice struct, about 2 * size_t.sizeof - the new reference arrays, a pointer on the stack, plus a 3 items struct somewhere, probably on the heap. If it doesn't escape the scope LDC may be able to put it too on the stack. The GC may allocate a space of 4 items anyway for it, so there's an item wasted. :-) In the meantime I have understood that the slice has to be the default syntax to keep the compatibility with C :-) Walter Bright: > Yes. Clearly, those properties will have to be functions under the hood. > So T[new] operations will be a bit slower than for slices. For faster > indexing, you'd probably want to do: > auto slice = a[]; > and then operate on the slice. Let's hope such functions can be inlined. Assuming they can be inlined, a smart compiler can remove some of those ifs where it knows the array surely exists (virtual machines today are usually able to remove some array bound tests using similar tricks). I don't hold my breath for D2 to become this smart... Bye, bearophile

bearophile wrote: >> 2. make arrays implementable on .net > > I don't care of such thing. dotnet already has C# and C# is probably better than D, and it's similar anyway. So I don't think people will use D on dotnet. So even if creating a D for dotnet can be positive, I don't want D2 to change its design to allow a better implementation on dotnet. I see two things a dotnet implementation of D could have over native D: - better garbage collector (the D one barely does its job...) - better interoperability (*hint* OMF *hint*)

bearophile wrote: >> 2. make arrays implementable on .net > > I don't care of such thing. dotnet already has C# and C# is probably > better than D, and it's similar anyway. So I don't think people will > use D on dotnet. So even if creating a D for dotnet can be positive, > I don't want D2 to change its design to allow a better implementation > on dotnet. Even if you're correct that D.net is pointless, and I don't agree with that assessment, I think the problems implementing D arrays on .net will show up elsewhere in attempts to support other targets. So I think it's a "canary" issue rather than a .net one.

grauzone: > I see two things a dotnet implementation of D could have over native D: - better garbage collector (the D one barely does its job...) The dotnet GC is probably better than the current D GC, but I think it's designed for mostly movable objects. Currently most (or all) D objects are pinned (they can't be moved around in memory), so I don't know if the dotnet GC will do much good. D needs a GC designed for its peculiar characteristics. And I believe D will also need some extra semantics to allow the creation of such efficient half-movable half-pinned GC (I have explained such ideas one time in the past). Bye, bearophile

Walter Bright wrote: <snip> > Under the hood, a T[new] will be a single pointer to a library defined type. This library defined type will likely contain three properties: > > size_t length; > T* ptr; > size_t capacity; > > The usual array operations will work on T[new] as well as T[]. <snip> Would new T[10] allocate this structure and the array data on a single GC block, or on two separate blocks? And when the array is reallocated, will the structure move with it? I suppose it depends on whether you want T[new] to be (a) something whereby all references persist as the array is reallocated (b) merely a reference to an allocated array as opposed to an array slice If (a), this is currently achievable with a T[]*. If (b), what might work well is a structure like size_t length; size_t capacity; T[capacity] data; meaning still only one allocation and only one level of indirection when one is used. And the T[new] variable itself would simply hold &data[0]. Moreover, would whatever happens solve such const/invariant holes as bug 2093? Stewart.

Stewart Gordon wrote: > Moreover, would whatever happens solve such const/invariant holes as bug 2093? Just what happens to the ~= operator anyway? Right now, it appends data inline. My vote would be to make "a~=b" do the same as "a=a~b" (with types "T[] a" and "T[] b" or "T b"). T[new]'s ~= would still append inline.

Stewart Gordon wrote: > Walter Bright wrote: > <snip> >> Under the hood, a T[new] will be a single pointer to a library defined type. This library defined type will likely contain three properties: >> >> size_t length; >> T* ptr; >> size_t capacity; >> >> The usual array operations will work on T[new] as well as T[]. > <snip> > > Would new T[10] allocate this structure and the array data on a single GC block, or on two separate blocks? That's up to the implementation. > And when the array is reallocated, will the structure move with it? No, that would defeat the whole purpose of making T[new] a reference type. With it being a reference type: T[new] a = ...; T[new] b = a; a.length = ... ... b.length changes too ... > I suppose it depends on whether you want T[new] to be > (a) something whereby all references persist as the array is reallocated (b) merely a reference to an allocated array as opposed to an array slice > > If (a), this is currently achievable with a T[]*. > > If (b), what might work well is a structure like > > size_t length; > size_t capacity; > T[capacity] data; > > meaning still only one allocation and only one level of indirection when one is used. And the T[new] variable itself would simply hold &data[0]. > > Moreover, would whatever happens solve such const/invariant holes as bug 2093? I believe it does. > > Stewart.

Forums