August 09, 2009
Brad Roberts wrote:
> Walter Bright wrote:
>> Brad Roberts wrote:
>>> Yay.  What will happen with slices over a T[new] when the underlying
>>> T[new] must
>>> be moved due to resizing?  The behavior needs to be at least well
>>> specified, if
>>> not well defined.
>> Great question! It's no different from what happens with any container
>> when its contents change and there are references to the old content.
>>
>> What won't happen is the slice won't be pointing to unallocated memory,
>> thanks to the gc. Resizing the T[new] will not cause the old contents to
>> be deleted.
>>
>> The slice will either point to the old content, if the resize caused a
>> move, or the new content, if it was resized in place. So it will still
>> be implementation-defined behavior.
>>
>> But the likelihood of getting caught by such behavior is much less
>> likely with T[new], as one can clearly see in the code where the
>> resizeables are, rather than the current situation where any slice could
>> be resized by anyone.
>>
>> I think that taking a slice of a T[new], then resizing the T[new], will
>> be rare (or at least should be). Normally, a T[new] will be built, and
>> when it is all done, it is converted to a T[] and there is no longer any
>> way to resize it.
> 
> As expected, but like I said.. this needs to be clear in the spec.  Because
> something like this is just confusing (and yes, I know it's not new behavior):
> 
>     auto a = new int[10];
>     a[0] = 1;
>     auto s1 = a[0..1];
>     a.length = <some value large enough to force a move>;
>     auto s2 = a[0..1];
>     s2[0] = 2;
>     assert(s1[0] == s2[1]); // fail
> 
> Later,
> Brad
> 

Well yah but in all languages this is the case. C++'s iterators not only invalidate at the drop of a hat, but you can't really tell whether they're invalid. Java iterators can also be invalidated and may throw an exception.

When resizing a D array, its underlying slices may be *orphaned*. That means they are still valid, they just don't refer to the same data as the array. I think that's a reasonable tradeoff between efficiency and safety.


Andrei
August 09, 2009
bearophile wrote:
> Walter Bright:
> 
> I like this general proposal, it sounds like something that can improve D a little. There are many other things that have to be improved in D, but baby steps are enough to go somewhere :-)
> 
>> Slices will retain the old:
>>     T[] slice;
>> syntax. Resizeable arrays will be declared as:
>>     T[new] array;
> 
> Such syntaxes have to be chosen wisely. I don't fully understand that syntax. And maybe I don't fully like it.

Why am I not surprised :o).

> I think the default (simpler) syntax has to be the most flexible and safer construct, that is resizable arrays. Slices can be seen as an optimization, so they can have a bit longer syntax.

Slices will remain ubiquitous, arrays probably not so much.


Andrei

August 09, 2009
Reply to bearophile,

> Walter Bright:
>
>> 2. make arrays implementable on .net
>> 
> I don't care of such thing. dotnet already has C# and C# is probably
> better than D,

I beg to differ. D is much better than c# in many ways (D's abstraction tools are WAY better; C#'s end at reflection and generics, both runtime-only) and not as good in a few others (tools, libs, runtime refection).

> and it's similar anyway. So I don't think people will
> use D on dotnet.

This I agree with because D looses many if not all of its advantages when forced into the .NET/CLI/managed-code world


August 09, 2009
Now there are four ways to create/use "arrays" in D2:
- the * syntax on a raw memory block, like in C.
- fixed sized arrays, that are just a pointer somewhere. they may even become fully value types........
- the slice struct, about 2 * size_t.sizeof
- the new reference arrays, a pointer on the stack, plus a 3 items struct somewhere, probably on the heap. If it doesn't escape the scope LDC may be able to put it too on the stack. The GC may allocate a space of 4 items anyway for it, so there's an item wasted.
:-)
In the meantime I have understood that the slice has to be the default syntax to keep the compatibility with C :-)


Walter Bright:

> Yes. Clearly, those properties will have to be functions under the hood.
> So T[new] operations will be a bit slower than for slices. For faster
> indexing, you'd probably want to do:
> auto slice = a[];
> and then operate on the slice.

Let's hope such functions can be inlined. Assuming they can be inlined, a smart compiler can remove some of those ifs where it knows the array surely exists (virtual machines today are usually able to remove some array bound tests using similar tricks). I don't hold my breath for D2 to become this smart...

Bye,
bearophile
August 09, 2009
bearophile wrote:
>> 2. make arrays implementable on .net
> 
> I don't care of such thing. dotnet already has C# and C# is probably better than D, and it's similar anyway. So I don't think people will use D on dotnet. So even if creating a D for dotnet can be positive, I don't want D2 to change its design to allow a better implementation on dotnet.

I see two things a dotnet implementation of D could have over native D:
- better garbage collector (the D one barely does its job...)
- better interoperability (*hint* OMF *hint*)
August 09, 2009
bearophile wrote:
>> 2. make arrays implementable on .net
> 
> I don't care of such thing. dotnet already has C# and C# is probably
> better than D, and it's similar anyway. So I don't think people will
> use D on dotnet. So even if creating a D for dotnet can be positive,
> I don't want D2 to change its design to allow a better implementation
> on dotnet.

Even if you're correct that D.net is pointless, and I don't agree with that assessment, I think the problems implementing D arrays on .net will show up elsewhere in attempts to support other targets. So I think it's a "canary" issue rather than a .net one.
August 09, 2009
grauzone:

> I see two things a dotnet implementation of D could have over native D: - better garbage collector (the D one barely does its job...)

The dotnet GC is probably better than the current D GC, but I think it's designed for mostly movable objects. Currently most (or all) D objects are pinned (they can't be moved around in memory), so I don't know if the dotnet GC will do much good.
D needs a GC designed for its peculiar characteristics. And I believe D will also need some extra semantics to allow the creation of such efficient half-movable half-pinned GC (I have explained such ideas one time in the past).

Bye,
bearophile
August 09, 2009
Walter Bright wrote:
<snip>
> Under the hood, a T[new] will be a single pointer to a library defined type. This library defined type will likely contain three properties:
> 
>     size_t length;
>     T* ptr;
>     size_t capacity;
> 
> The usual array operations will work on T[new] as well as T[].
<snip>

Would new T[10] allocate this structure and the array data on a single GC block, or on two separate blocks?  And when the array is reallocated, will the structure move with it?

I suppose it depends on whether you want T[new] to be
(a) something whereby all references persist as the array is reallocated (b) merely a reference to an allocated array as opposed to an array slice

If (a), this is currently achievable with a T[]*.

If (b), what might work well is a structure like

    size_t length;
    size_t capacity;
    T[capacity] data;

meaning still only one allocation and only one level of indirection when one is used.  And the T[new] variable itself would simply hold &data[0].

Moreover, would whatever happens solve such const/invariant holes as bug 2093?

Stewart.
August 09, 2009
Stewart Gordon wrote:
> Moreover, would whatever happens solve such const/invariant holes as bug 2093?

Just what happens to the ~= operator anyway? Right now, it appends data inline.

My vote would be to make "a~=b" do the same as "a=a~b" (with types "T[] a" and "T[] b" or "T b"). T[new]'s ~= would still append inline.
August 09, 2009
Stewart Gordon wrote:
> Walter Bright wrote:
> <snip>
>> Under the hood, a T[new] will be a single pointer to a library defined type. This library defined type will likely contain three properties:
>>
>>     size_t length;
>>     T* ptr;
>>     size_t capacity;
>>
>> The usual array operations will work on T[new] as well as T[].
> <snip>
> 
> Would new T[10] allocate this structure and the array data on a single GC block, or on two separate blocks?

That's up to the implementation.

> And when the array is reallocated, will the structure move with it?

No, that would defeat the whole purpose of making T[new] a reference type. With it being a reference type:

T[new] a = ...;
T[new] b = a;

a.length = ...
... b.length changes too ...


> I suppose it depends on whether you want T[new] to be
> (a) something whereby all references persist as the array is reallocated (b) merely a reference to an allocated array as opposed to an array slice
> 
> If (a), this is currently achievable with a T[]*.
> 
> If (b), what might work well is a structure like
> 
>     size_t length;
>     size_t capacity;
>     T[capacity] data;
> 
> meaning still only one allocation and only one level of indirection when one is used.  And the T[new] variable itself would simply hold &data[0].
> 
> Moreover, would whatever happens solve such const/invariant holes as bug 2093?

I believe it does.

> 
> Stewart.