Thread overview
Problem with associative arrays
Mar 19, 2011
Piotr Szturmaj
Mar 19, 2011
bearophile
Mar 19, 2011
Mafi
Mar 19, 2011
Piotr Szturmaj
Mar 19, 2011
Jesse Phillips
Mar 20, 2011
Piotr Szturmaj
Mar 20, 2011
Jesse Phillips
Mar 19, 2011
Ali Çehreli
Mar 19, 2011
Jonathan M Davis
March 19, 2011
Shouldn't dynamic array be reference type?

uint[][uint] aa;
uint[] temp;
	
aa[5] = new uint[0];
temp = aa[5]; // copy uint[] reference
temp ~= 1;
	
assert(temp.length == 1 && temp[0] == 1); // pass
assert(aa[5].length == 1 && aa[5][0] == 1); // fail

Is this a bug?
March 19, 2011
Piotr Szturmaj:

> Shouldn't dynamic array be reference type?

They are partially a reference type and partially a value :-)
A D dynamic array is represented with a small struct 2-words long that contains the pointer to the first item of the array and the length of the array (you are able to read them with the .ptr and .length attributes). So when you copy a dynamic array using temp=aa[5] you are copying this little struct. When you perform the temp~=1 you are changing the length just of the copied struct, not of the original aa[5] one (in D1 if temp~=1 induces a realloc then the pointer too gets modified only in the temp struct. In D2 the situation is more complex, unfortunately).


> Is this a bug?

It's clearly a bug-prone characteristic of D2, but it's not a D2 bug, so it's a bug in your code :-( I think future D2 lint tools will have to test for this possible source of bugs too.

Keep in mind that D2 associative arrays have an even worse nature:

void test(int[int] arr, int x) {
    arr[x] = x;
}
void main() {
    int[int] d;
    test(d, 0);
    int[int] d0;
    assert(d == d0);
    d[1] = 1;
    test(d, 2);
    assert(d == [1: 1, 2: 2]);
}

Bye,
bearophile
March 19, 2011
Am 19.03.2011 17:29, schrieb Piotr Szturmaj:
> Shouldn't dynamic array be reference type?
>
> uint[][uint] aa;
> uint[] temp;
>
> aa[5] = new uint[0];
> temp = aa[5]; // copy uint[] reference
> temp ~= 1;
>
> assert(temp.length == 1 && temp[0] == 1); // pass
> assert(aa[5].length == 1 && aa[5][0] == 1); // fail
>
> Is this a bug?

They are not 'perfect' references. There are pointer+length to/of memory.
If you just pass around these refences they reference the same memory. If you change the reference in some way (eg appending, concating etc) the memory could be reallocated.

int[] a, b;
a = [3,4,5];
b = a;
assert(a == b);
assert(a is b); //test for refence equality
//a and b now reference exactly the same memory
//concatening is guaranteed to reallocate
a = a ~ [6, 7];
assert(a == [3, 4, 5, 6, 7]);
assert(b == [3, 4, 5]);
// BUT these 3,4,5 are in different places in memeory
b[0] = 42;
assert(a == [3, 4, 5, 6, 7]);
// when slicing you know reference a subpart of the other array
b = a[0.. 3];
assert(b == [3, 4, 5]);
// changes of b will now be visible in a
// until a or b get reallocated
b[0] = 42;
assert(a == [42, 4, 5, 6, 7]);
assert(b == [42, 4, 5]);

Just one note: appending (ar ~= 1) differs from concating (ar = ar ~ 1) that while the second is guanranteed to reallocate, about the first you can't be sure. It depends on the GC.
March 19, 2011
Thank you for your very complete answers :)

I was trying to avoid multiple AA key lookups while appending many elements to dynamic array. It's clear now, that with D2 semantics it's better to first build an array and then assign it to AA.
March 19, 2011
On 03/19/2011 10:05 AM, Mafi wrote:
> Am 19.03.2011 17:29, schrieb Piotr Szturmaj:
>> Shouldn't dynamic array be reference type?

> Just one note: appending (ar ~= 1) differs from concating (ar = ar ~ 1)
> that while the second is guanranteed to reallocate, about the first you
> can't be sure. It depends on the GC.

On the other hand, if the dynamic array is sharing a part of another array, extending the dynamic array always breaks the sharing and the dynamic array is always relocated.

void main()
{
    int[] a = [ 100, 200, 300 ];

    // Three slices (aka dynamic arrays) to the first two elements
    int[] s0 = a[0..2];
    int[] s1 = a[0..2];
    int[] s2 = a[0..2];

    // They share the same first two elements
    assert(s0.ptr == a.ptr);
    assert(s1.ptr == a.ptr);
    assert(s2.ptr == a.ptr);

    // All of these operations break the sharing
    ++s0.length;
    s1 ~= 42;
    s2 = s2 ~ 43;

    // Nothing happened to the original
    assert(a == [ 100, 200, 300 ]);

    // Because nobody is sharing anymore
    assert(s0.ptr != a.ptr);
    assert(s1.ptr != a.ptr);
    assert(s1.ptr != s0.ptr);
    assert(s2.ptr != a.ptr);
    assert(s2.ptr != s0.ptr);
    assert(s2.ptr != s1.ptr);
}

I forgot the name of the feature. Basically, no slice (aka dynamic array) can extend into other elements and starts sharing them.

Ali

March 19, 2011
Piotr Szturmaj Wrote:

> Thank you for your very complete answers :)
> 
> I was trying to avoid multiple AA key lookups while appending many elements to dynamic array. It's clear now, that with D2 semantics it's better to first build an array and then assign it to AA.

What everyone else said, but you can get a pointer with 'in'

void main() {
    uint[][uint] aa;

    aa[5] = new uint[0];
    auto temp = 5 in aa; // copy uint[] reference
    *temp ~= 1;

    assert(temp.length == 1 && (*temp)[0] == 1); // pass
    assert(aa[5].length == 1 && aa[5][0] == 1); // pass

}

March 19, 2011
On Saturday 19 March 2011 11:12:49 Ali Çehreli wrote:
> On 03/19/2011 10:05 AM, Mafi wrote:
>  > Am 19.03.2011 17:29, schrieb Piotr Szturmaj:
>  >> Shouldn't dynamic array be reference type?
>  >
>  > Just one note: appending (ar ~= 1) differs from concating (ar = ar ~ 1)
>  > that while the second is guanranteed to reallocate, about the first you
>  > can't be sure. It depends on the GC.
> 
> On the other hand, if the dynamic array is sharing a part of another array, extending the dynamic array always breaks the sharing and the dynamic array is always relocated.
> 
> void main()
> {
>      int[] a = [ 100, 200, 300 ];
> 
>      // Three slices (aka dynamic arrays) to the first two elements
>      int[] s0 = a[0..2];
>      int[] s1 = a[0..2];
>      int[] s2 = a[0..2];
> 
>      // They share the same first two elements
>      assert(s0.ptr == a.ptr);
>      assert(s1.ptr == a.ptr);
>      assert(s2.ptr == a.ptr);
> 
>      // All of these operations break the sharing
>      ++s0.length;
>      s1 ~= 42;
>      s2 = s2 ~ 43;
> 
>      // Nothing happened to the original
>      assert(a == [ 100, 200, 300 ]);
> 
>      // Because nobody is sharing anymore
>      assert(s0.ptr != a.ptr);
>      assert(s1.ptr != a.ptr);
>      assert(s1.ptr != s0.ptr);
>      assert(s2.ptr != a.ptr);
>      assert(s2.ptr != s0.ptr);
>      assert(s2.ptr != s1.ptr);
> }
> 
> I forgot the name of the feature. Basically, no slice (aka dynamic array) can extend into other elements and starts sharing them.

Yes. Dynamic arrays attempt to resize themselves efficiently, so they avoid reallocations where they can, but if resizing would cause two arrays to share elements which they did not share before that resizing operation, then a reallocation of the array being sized takes place, so they no longer share any elements even if they did before.

Overall, the way that arrays work works really well, and it's really not all that hard to wrap your head around, but it can certainly be confusing at first.

- Jonathan M Davis
March 20, 2011
Jesse Phillips wrote:
> Piotr Szturmaj Wrote:
>
>> Thank you for your very complete answers :)
>>
>> I was trying to avoid multiple AA key lookups while appending many
>> elements to dynamic array. It's clear now, that with D2 semantics it's
>> better to first build an array and then assign it to AA.
>
> What everyone else said, but you can get a pointer with 'in'
>
> void main() {
>      uint[][uint] aa;
>
>      aa[5] = new uint[0];
>      auto temp = 5 in aa; // copy uint[] reference
>      *temp ~= 1;
>
>      assert(temp.length == 1 && (*temp)[0] == 1); // pass
>      assert(aa[5].length == 1 && aa[5][0] == 1); // pass
>
> }
>

Yes, I already used pointers but in other way:

uint[]* temp = &aa[5]; // copy uint[] reference

and it worked the same as using 'in'. However, I wasn't sure it's completely safe.
March 20, 2011
Piotr Szturmaj Wrote:

> Yes, I already used pointers but in other way:
> 
> uint[]* temp = &aa[5]; // copy uint[] reference
> 
> and it worked the same as using 'in'. However, I wasn't sure it's completely safe.

Depends on what you mean by "safe." In your example if 5 is not a key then a Range violation will be thrown. In mine null will be returned.

Generally safety refers to not corrupting memory, and under that definition both of these are safe. Under the definition of "will not crash" no solution is safe.