Problem with associative arrays

Mar 19, 2011

Piotr Szturmaj

Mar 19, 2011

bearophile

Mar 19, 2011

Mar 19, 2011

Mar 19, 2011

Mar 20, 2011

Mar 20, 2011

Mar 19, 2011

Mar 19, 2011

Shouldn't dynamic array be reference type? uint[][uint] aa; uint[] temp; aa[5] = new uint[0]; temp = aa[5]; // copy uint[] reference temp ~= 1; assert(temp.length == 1 && temp[0] == 1); // pass assert(aa[5].length == 1 && aa[5][0] == 1); // fail Is this a bug?

Piotr Szturmaj: > Shouldn't dynamic array be reference type? They are partially a reference type and partially a value :-) A D dynamic array is represented with a small struct 2-words long that contains the pointer to the first item of the array and the length of the array (you are able to read them with the .ptr and .length attributes). So when you copy a dynamic array using temp=aa[5] you are copying this little struct. When you perform the temp~=1 you are changing the length just of the copied struct, not of the original aa[5] one (in D1 if temp~=1 induces a realloc then the pointer too gets modified only in the temp struct. In D2 the situation is more complex, unfortunately). > Is this a bug? It's clearly a bug-prone characteristic of D2, but it's not a D2 bug, so it's a bug in your code :-( I think future D2 lint tools will have to test for this possible source of bugs too. Keep in mind that D2 associative arrays have an even worse nature: void test(int[int] arr, int x) { arr[x] = x; } void main() { int[int] d; test(d, 0); int[int] d0; assert(d == d0); d[1] = 1; test(d, 2); assert(d == [1: 1, 2: 2]); } Bye, bearophile

Am 19.03.2011 17:29, schrieb Piotr Szturmaj: > Shouldn't dynamic array be reference type? > > uint[][uint] aa; > uint[] temp; > > aa[5] = new uint[0]; > temp = aa[5]; // copy uint[] reference > temp ~= 1; > > assert(temp.length == 1 && temp[0] == 1); // pass > assert(aa[5].length == 1 && aa[5][0] == 1); // fail > > Is this a bug? They are not 'perfect' references. There are pointer+length to/of memory. If you just pass around these refences they reference the same memory. If you change the reference in some way (eg appending, concating etc) the memory could be reallocated. int[] a, b; a = [3,4,5]; b = a; assert(a == b); assert(a is b); //test for refence equality //a and b now reference exactly the same memory //concatening is guaranteed to reallocate a = a ~ [6, 7]; assert(a == [3, 4, 5, 6, 7]); assert(b == [3, 4, 5]); // BUT these 3,4,5 are in different places in memeory b[0] = 42; assert(a == [3, 4, 5, 6, 7]); // when slicing you know reference a subpart of the other array b = a[0.. 3]; assert(b == [3, 4, 5]); // changes of b will now be visible in a // until a or b get reallocated b[0] = 42; assert(a == [42, 4, 5, 6, 7]); assert(b == [42, 4, 5]); Just one note: appending (ar ~= 1) differs from concating (ar = ar ~ 1) that while the second is guanranteed to reallocate, about the first you can't be sure. It depends on the GC.

Thank you for your very complete answers :) I was trying to avoid multiple AA key lookups while appending many elements to dynamic array. It's clear now, that with D2 semantics it's better to first build an array and then assign it to AA.

On 03/19/2011 10:05 AM, Mafi wrote: > Am 19.03.2011 17:29, schrieb Piotr Szturmaj: >> Shouldn't dynamic array be reference type? > Just one note: appending (ar ~= 1) differs from concating (ar = ar ~ 1) > that while the second is guanranteed to reallocate, about the first you > can't be sure. It depends on the GC. On the other hand, if the dynamic array is sharing a part of another array, extending the dynamic array always breaks the sharing and the dynamic array is always relocated. void main() { int[] a = [ 100, 200, 300 ]; // Three slices (aka dynamic arrays) to the first two elements int[] s0 = a[0..2]; int[] s1 = a[0..2]; int[] s2 = a[0..2]; // They share the same first two elements assert(s0.ptr == a.ptr); assert(s1.ptr == a.ptr); assert(s2.ptr == a.ptr); // All of these operations break the sharing ++s0.length; s1 ~= 42; s2 = s2 ~ 43; // Nothing happened to the original assert(a == [ 100, 200, 300 ]); // Because nobody is sharing anymore assert(s0.ptr != a.ptr); assert(s1.ptr != a.ptr); assert(s1.ptr != s0.ptr); assert(s2.ptr != a.ptr); assert(s2.ptr != s0.ptr); assert(s2.ptr != s1.ptr); } I forgot the name of the feature. Basically, no slice (aka dynamic array) can extend into other elements and starts sharing them. Ali

Piotr Szturmaj Wrote: > Thank you for your very complete answers :) > > I was trying to avoid multiple AA key lookups while appending many elements to dynamic array. It's clear now, that with D2 semantics it's better to first build an array and then assign it to AA. What everyone else said, but you can get a pointer with 'in' void main() { uint[][uint] aa; aa[5] = new uint[0]; auto temp = 5 in aa; // copy uint[] reference *temp ~= 1; assert(temp.length == 1 && (*temp)[0] == 1); // pass assert(aa[5].length == 1 && aa[5][0] == 1); // pass }

March 19, 2011

Re: Problem with associative arrays

Posted by Jonathan M Davis
in reply to Ali Çehreli

Permalink

Jonathan M Davis

Posted in reply to Ali Çehreli

Permalink

On Saturday 19 March 2011 11:12:49 Ali Çehreli wrote:
> On 03/19/2011 10:05 AM, Mafi wrote:
>  > Am 19.03.2011 17:29, schrieb Piotr Szturmaj:
>  >> Shouldn't dynamic array be reference type?
>  >
>  > Just one note: appending (ar ~= 1) differs from concating (ar = ar ~ 1)
>  > that while the second is guanranteed to reallocate, about the first you
>  > can't be sure. It depends on the GC.
> 
> On the other hand, if the dynamic array is sharing a part of another array, extending the dynamic array always breaks the sharing and the dynamic array is always relocated.
> 
> void main()
> {
>      int[] a = [ 100, 200, 300 ];
> 
>      // Three slices (aka dynamic arrays) to the first two elements
>      int[] s0 = a[0..2];
>      int[] s1 = a[0..2];
>      int[] s2 = a[0..2];
> 
>      // They share the same first two elements
>      assert(s0.ptr == a.ptr);
>      assert(s1.ptr == a.ptr);
>      assert(s2.ptr == a.ptr);
> 
>      // All of these operations break the sharing
>      ++s0.length;
>      s1 ~= 42;
>      s2 = s2 ~ 43;
> 
>      // Nothing happened to the original
>      assert(a == [ 100, 200, 300 ]);
> 
>      // Because nobody is sharing anymore
>      assert(s0.ptr != a.ptr);
>      assert(s1.ptr != a.ptr);
>      assert(s1.ptr != s0.ptr);
>      assert(s2.ptr != a.ptr);
>      assert(s2.ptr != s0.ptr);
>      assert(s2.ptr != s1.ptr);
> }
> 
> I forgot the name of the feature. Basically, no slice (aka dynamic array) can extend into other elements and starts sharing them.

Yes. Dynamic arrays attempt to resize themselves efficiently, so they avoid reallocations where they can, but if resizing would cause two arrays to share elements which they did not share before that resizing operation, then a reallocation of the array being sized takes place, so they no longer share any elements even if they did before.

Overall, the way that arrays work works really well, and it's really not all that hard to wrap your head around, but it can certainly be confusing at first.

- Jonathan M Davis

Jesse Phillips wrote: > Piotr Szturmaj Wrote: > >> Thank you for your very complete answers :) >> >> I was trying to avoid multiple AA key lookups while appending many >> elements to dynamic array. It's clear now, that with D2 semantics it's >> better to first build an array and then assign it to AA. > > What everyone else said, but you can get a pointer with 'in' > > void main() { > uint[][uint] aa; > > aa[5] = new uint[0]; > auto temp = 5 in aa; // copy uint[] reference > *temp ~= 1; > > assert(temp.length == 1 && (*temp)[0] == 1); // pass > assert(aa[5].length == 1 && aa[5][0] == 1); // pass > > } > Yes, I already used pointers but in other way: uint[]* temp = &aa[5]; // copy uint[] reference and it worked the same as using 'in'. However, I wasn't sure it's completely safe.

Piotr Szturmaj Wrote: > Yes, I already used pointers but in other way: > > uint[]* temp = &aa[5]; // copy uint[] reference > > and it worked the same as using 'in'. However, I wasn't sure it's completely safe. Depends on what you mean by "safe." In your example if 5 is not a key then a Range violation will be thrown. In mine null will be returned. Generally safety refers to not corrupting memory, and under that definition both of these are safe. Under the definition of "will not crash" no solution is safe.

Forums