Thread overview
Arrays as references, or by value, or copy on write?! (BUG)
Dec 02, 2003
davepermen
Dec 02, 2003
J Anderson
Dec 02, 2003
davepermen
Dec 03, 2003
Ilya Minkov
Dec 03, 2003
Ben Hinkle
Dec 03, 2003
davepermen
Dec 03, 2003
Brad Beveridge
December 02, 2003
class A {
this() { myArray.length = 10; myArray[0] = 100; }
ubyte[] get() { return myArray; }
void print() {
printf("class A:" \n);
printf("myArray.length = %i" \n,myArray.length);
printf("myArray[0] = %i" \n,myArray[0]);
}
private ubyte[] myArray;
}
class B {
void set(ubyte[] x) { myArray = x; }
void print() {
printf("class B:" \n);
printf("myArray.length = %i" \n,myArray.length);
printf("myArray[0] = %i" \n,myArray[0]);
}
void reset() { /+myArray.length = 20;+/ myArray[0] = 200; }
private ubyte[] myArray;
}

void test() {
A a = new A;
B b = new B;
b.set(a.get());
a.print();
b.print();
b.reset();
a.print();
b.print();
}

if i don't resize B.myArray, and write to it, it manipulates A.myArray, too => it gets returned by A.get() as reference, passed into B.set() as reference, and A.myArray gets set as a reference to A.myArray. i can write to it, and manipulate both.

if i change the length of B.myArray (by removing the /++/ comments), B.myArray gets resized, and the reference to A.myArray is lost. from this moment on, B.myArray is a copy of A.myArray. they are not in sync anymore.


this behaviour is highly confusing. looks like i have to do it like this in the end to be save it is a reference:

class ArrayRef { ubyte[] data; }

and have an ArrayRef in A and B, and pass that around..


as far as i know, it should always be copy on write. but it isn't. its copy on resize. this is highly unintuitive, a.k.a. buggy.

and actually.. how to return by reference?


December 02, 2003
davepermen wrote:

>class A {
>this() { myArray.length = 10; myArray[0] = 100; }
>ubyte[] get() { return myArray; }
>void print() {
>printf("class A:" \n);
>printf("myArray.length = %i" \n,myArray.length);
>printf("myArray[0] = %i" \n,myArray[0]);
>}
>private ubyte[] myArray;
>}
>class B {
>void set(ubyte[] x) { myArray = x; }
>void print() {
>printf("class B:" \n);
>printf("myArray.length = %i" \n,myArray.length);
>printf("myArray[0] = %i" \n,myArray[0]);
>}
>void reset() { /+myArray.length = 20;+/ myArray[0] = 200; }
>private ubyte[] myArray;
>}
>
>void test() {
>A a = new A;
>B b = new B;
>b.set(a.get());
>a.print();
>b.print();
>b.reset();
>a.print();
>b.print();
>}
>
>if i don't resize B.myArray, and write to it, it manipulates A.myArray, too =>
>it gets returned by A.get() as reference, passed into B.set() as reference, and
>A.myArray gets set as a reference to A.myArray. i can write to it, and
>manipulate both.
>
>if i change the length of B.myArray (by removing the /++/ comments), B.myArray
>gets resized, and the reference to A.myArray is lost. from this moment on,
>B.myArray is a copy of A.myArray. they are not in sync anymore.
>
>
>this behaviour is highly confusing. looks like i have to do it like this in the
>end to be save it is a reference:
>
>class ArrayRef { ubyte[] data; }
>  
>
Or you could use the ancient art of pointers.

>and have an ArrayRef in A and B, and pass that around..
>
>
>as far as i know, it should always be copy on write. but it isn't. its copy on
>resize. this is highly unintuitive, a.k.a. buggy.
>
>and actually.. how to return by reference?
>  
>

December 02, 2003
>Or you could use the ancient art of pointers.

never. i want a shared dynamic array. with full array functionality in both shares. without any pointer mess

why can't array be by reference by default. they have the .dup to copy..

eighter make them copy behaving, or not. but this is a buggy copy-on-write. that doesn't work out right


December 03, 2003
I was surprised to learn that D arrays are pretty different than C arrays but once I learned they were just structs with a length and a pointer to the data it all made sense and I use that mental model to figure stuff out. The length of the array isn't *really* part of the array in the C sense. In Java the length is read-only so it doesn't run into problems about resizing the array since it never needs to re-allocate the data pointer.

-Ben

"davepermen" <davepermen_member@pathlink.com> wrote in message news:bqibpf$252n$1@digitaldaemon.com...
> class A {
> this() { myArray.length = 10; myArray[0] = 100; }
> ubyte[] get() { return myArray; }
> void print() {
> printf("class A:" \n);
> printf("myArray.length = %i" \n,myArray.length);
> printf("myArray[0] = %i" \n,myArray[0]);
> }
> private ubyte[] myArray;
> }
> class B {
> void set(ubyte[] x) { myArray = x; }
> void print() {
> printf("class B:" \n);
> printf("myArray.length = %i" \n,myArray.length);
> printf("myArray[0] = %i" \n,myArray[0]);
> }
> void reset() { /+myArray.length = 20;+/ myArray[0] = 200; }
> private ubyte[] myArray;
> }
>
> void test() {
> A a = new A;
> B b = new B;
> b.set(a.get());
> a.print();
> b.print();
> b.reset();
> a.print();
> b.print();
> }
>
> if i don't resize B.myArray, and write to it, it manipulates A.myArray,
too =>
> it gets returned by A.get() as reference, passed into B.set() as
reference, and
> A.myArray gets set as a reference to A.myArray. i can write to it, and manipulate both.
>
> if i change the length of B.myArray (by removing the /++/ comments),
B.myArray
> gets resized, and the reference to A.myArray is lost. from this moment on, B.myArray is a copy of A.myArray. they are not in sync anymore.
>
>
> this behaviour is highly confusing. looks like i have to do it like this
in the
> end to be save it is a reference:
>
> class ArrayRef { ubyte[] data; }
>
> and have an ArrayRef in A and B, and pass that around..
>
>
> as far as i know, it should always be copy on write. but it isn't. its
copy on
> resize. this is highly unintuitive, a.k.a. buggy.
>
> and actually.. how to return by reference?
>
>


December 03, 2003
this behaviour is simply not simple. its bad. eighter copy all around, or reference all around. but this way is very unintuitive imho. teach THAT to a newbie and then tell D is a great language, because it works simple and logical.

In article <bqjblv$hmf$1@digitaldaemon.com>, Ben Hinkle says...
>
>I was surprised to learn that D arrays are pretty different than C arrays but once I learned they were just structs with a length and a pointer to the data it all made sense and I use that mental model to figure stuff out. The length of the array isn't *really* part of the array in the C sense. In Java the length is read-only so it doesn't run into problems about resizing the array since it never needs to re-allocate the data pointer.
>
>-Ben


December 03, 2003
I agree Dave.
I think that D should always handle arrays by reference, except where explicitly dup'ed.

That's easy to understand and consistant.  What about slicing though? Even the current method confuses me a little.
So a[] = b[2..10] - is a reference, but what happens if you resize a. Because you can resize in place, but that may mess with elements within b.  I would submit that if you are slicing out part of an array, the new slice size is now fixed.  If you need to resize a, you will need to dup it first.
So a.length = x throws an exception (or a compile time check may pick it up?)  Actually, it should be easy enough to make a a static array.

My major grip with being able to slice out an array & then resize that new slice - that sounds like it would cause seriously subtle bugs.

Cheers
Brad


davepermen wrote:
> this behaviour is simply not simple. its bad. eighter copy all around, or
> reference all around. but this way is very unintuitive imho. teach THAT to a
> newbie and then tell D is a great language, because it works simple and logical.
> 
> In article <bqjblv$hmf$1@digitaldaemon.com>, Ben Hinkle says...
> 
>>I was surprised to learn that D arrays are pretty different than C arrays
>>but once I learned they were just structs with a length and a pointer to the
>>data it all made sense and I use that mental model to figure stuff out. The
>>length of the array isn't *really* part of the array in the C sense. In Java
>>the length is read-only so it doesn't run into problems about resizing the
>>array since it never needs to re-allocate the data pointer.
>>
>>-Ben
> 
> 
> 

December 03, 2003
davepermen wrote:
>>Or you could use the ancient art of pointers.
> 
> never. i want a shared dynamic array. with full array functionality in both
> shares. without any pointer mess

How about inout specifier?

> why can't array be by reference by default. they have the .dup to copy..

Let's get down to it. Array is a struct of length and pointer. If you change the contents, the change is safely propagated back. However, if you re-settle the array, the change doesn't propagate back since all you have is a pointer and length by *value*.

One possible solution would be to make an array simply a pointer into length + data. But then, if you slice into an array, a copy of the contents must be made to be able to set the length just before it. Slow.

Another solution would be to make arrays behave *really* by value, that is copy always. BTW, this should also apply to objects then for symmetry reasons, and Java programmers won't like that...

Yet another solution would be double indirection of a pointer to a current array struct. Agree that's ugly and slow, but if you need it it's there in form of the inout specifier.

Big idea: i think all the "in" parameters must give a compiler error, if someone tries to make a change that doesn't propagate to caller. That is, modify array elements is not an error, but reseat an array is an error. Modify the input by-value integer or float or struct be also an error. Thus people would be made think what they really need, and copy into locals as desired.

> eighter make them copy behaving, or not. but this is a buggy copy-on-write. that
> doesn't work out right

It's not copy on write by semantics, but by convention.

BTW, do you recall you cannot resize an array from a function in C or C++ either?

-eye