resizeable arrays: T[new] (page 6)

Bruno Medeiros wrote: > int[new] a = new int[7]; > int[new] b = a; > b.length = 6; > > b[1] = 2; // writes through to a[1] > b.length = 4; > b[1] = 3; // writes through to a[1] > b.length = 10; > b[1] = 4; // does not write through to a[1] > > Why does this problem exist? Because, setting an array length may or may not create a new array, and there isn't a way to know which will happen. Well, in that code segment we can see when it will resize, but we can't know in the general case (like when we receive an array as a function parameter). This makes setting .length as very unsafe operation, if not plain *broken*. > There could perhaps be an internal flag in every array, initially set to false. It would be set in the b = a assignment, thus making it known that b is an alias to a. Then, when b.length is modified, a new array is created and the flag is unset. I'm not quite sure how robust this would be, but having .length changeable without creating a new array is just too handy to give up. Otherwise, we'd have to use realloc() to achieve the same effect. The loss here is 1 bit of .length or 1 byte in the array struct for the flag, and the overhead of having to check it whenever .length is modified. Of course, the simplest solution is simply to say that modification of .length for aliased arrays is undefined behaviour. -- Remove ".doesnotlike.spam" from the mail address.

Bruno Medeiros skrev: > Walter Bright wrote: >> Sean Kelly wrote: >>> Walter Bright wrote: >>>> (The problem is if I pass a slice to a function, and then that function reallocates the slice by changing the length or appending to it, it can affect other slices into the same data in very hard-to-explain ways. >>> >>> I'm not sure I understand how a reallocation could affect other slices into the same data. If a reallocation occurs, that reallocation will be into entirely new memory, provided the slice doesn't start at the head of a memory block. Is it the condition I just mentioned that's a problem? If so, I suppose it is one way to trick the compiler into thinking the reference no longer refers to const data, when in fact it does. If not, could you point me in the right direction? >> >> Given: >> >> int[] a = new int[7]; >> int[] b = a[1..6]; >> b[1] = 2; // writes through to a[2] >> b.length = 4; >> b[1] = 3; // writes through to a[2] >> b.length = 6; >> b[1] = 4; // does not write through to a[2] >> > > So, is that the problem we are trying to solve? Why not just simply make it so that setting .length *allways* creates a new array? Not really an option. That would make ~= O(n^2) instead of O(n). > Hum, now that I think about it, this may not only be a better solution, it may be the *only* real solution. Even, with the new resizable/unresizable array proposal you can have the same problem as above: > > int[new] a = new int[7]; > int[new] b = a; > b.length = 6; > > b[1] = 2; // writes through to a[1] > b.length = 4; > b[1] = 3; // writes through to a[1] > b.length = 10; > b[1] = 4; // does not write through to a[1] > > Why does this problem exist? Because, setting an array length may or may not create a new array, and there isn't a way to know which will happen. Well, in that code segment we can see when it will resize, but we can't know in the general case (like when we receive an array as a function parameter). This makes setting .length as very unsafe operation, if not plain *broken*. That's why my suggestion was making T[new] a reference type instead of a value type, just like T[U] has become. The associative array used to have a similar issue. Previously (before D 0.151): int[int] a = ...; int[int] b = a; b[0] = 0; could potentially corrupt a... And an analogue for todays dynamic array: int[] a = ...; int[] b = a; b.length = 1; b ~= ...; will corrupt a. The solution would be identical: Make the resizable array a reference type. Todays value type behavior for dynamic arrays are perfect for slices, just not for resizable arrays. Another option, if one wishes to resolve the above issues and have T[new] remain a value type, would be to insert an implicit .dup on assignment for T[new]. That way, there would never be two resizable arrays sharing the same ptr. /Oskar

June 10, 2007

Re: resizeable arrays: T[new]

Posted by Bruno Medeiros
in reply to Oskar Linde

Permalink

Bruno Medeiros

Posted in reply to Oskar Linde

Permalink

Oskar Linde wrote:
> Bruno Medeiros skrev:
>> Walter Bright wrote:
>>> Sean Kelly wrote:
>>>> Walter Bright wrote:
>>>>> (The problem is if I pass a slice to a function, and then that function reallocates the slice by changing the length or appending to it, it can affect other slices into the same data in very hard-to-explain ways.
>>>>
>>>> I'm not sure I understand how a reallocation could affect other slices into the same data.  If a reallocation occurs, that reallocation will be into entirely new memory, provided the slice doesn't start at the head of a memory block.  Is it the condition I just mentioned that's a problem?  If so, I suppose it is one way to trick the compiler into thinking the reference no longer refers to const data, when in fact it does.  If not, could you point me in the right direction?
>>>
>>> Given:
>>>
>>>     int[] a = new int[7];
>>>     int[] b = a[1..6];
>>>     b[1] = 2;   // writes through to a[2]
>>>     b.length = 4;
>>>     b[1] = 3;   // writes through to a[2]
>>>     b.length = 6;
>>>     b[1] = 4;   // does not write through to a[2]
>>>
>>
>> So, is that the problem we are trying to solve? Why not just simply make it so that setting .length *allways* creates a new array?
> 
> Not really an option. That would make ~= O(n^2) instead of O(n).
> 
> 

Hum, didn't recall that. However, I still wouldn't say it's not an option. You're asking to choose between performance (significant only in some cases, like realloc'ing in loops) and correctness/"safeness". Safeness because length setting is not safe in the general case, but only if you know who and how the array was allocated, and also that no other references to it exist (in the case you are increasing the length). It seems to me that it should be left to the compiler, through code analysis, whether the in-place realloc optimization can be safely performed or not (i.e., without changing program semantics).
However, since this would take quite some work to implement in the compiler, and also since some people may like to know for sure if their optimization is implemented or not, we could have a syntax (alternative to setting the length), that would do an inplace realloc. It could be another property (ar.inlength = 42), or simply a method (ar.realloc(42)), or whatever other syntax.
This would mean that concatenation operators would always dup the array, so when you would want to resize inplace you couldn't use the concatenation operators. That would be, I think, the only difference versus the two array types solution. But given the rarity of inplace resizing, does not being able to use the concatenation operators justify adding a new array type to the language?

>> Hum, now that I think about it, this may not only be a better solution, it may be the *only* real solution. Even, with the new resizable/unresizable array proposal you can have the same problem as above:
>>
>>   int[new] a = new int[7];
>>   int[new] b = a;
>>   b.length = 6;
>>
>>   b[1] = 2;   // writes through to a[1]
>>   b.length = 4;
>>   b[1] = 3;   // writes through to a[1]
>>   b.length = 10;
>>   b[1] = 4;   // does not write through to a[1]
>>
>> Why does this problem exist? Because, setting an array length may or may not create a new array, and there isn't a way to know which will happen.  Well, in that code segment we can see when it will resize, but we can't know in the general case (like when we receive an array as a function parameter). This makes setting .length as very unsafe operation, if not plain *broken*.
> 
> That's why my suggestion was making T[new] a reference type instead of a value type, just like T[U] has become.
> 
> The associative array used to have a similar issue. Previously (before D 0.151):
> 
> int[int] a = ...;
> int[int] b = a;
> b[0] = 0;
> 
> could potentially corrupt a...
> 
> 
> And an analogue for todays dynamic array:
> 
> int[] a = ...;
> int[] b = a;
> b.length = 1;
> b ~= ...;
> 
> will corrupt a.
> 
> 
> The solution would be identical: Make the resizable array a reference type. Todays value type behavior for dynamic arrays are perfect for slices, just not for resizable arrays.
> 
> Another option, if one wishes to resolve the above issues and have T[new] remain a value type, would be to insert an implicit .dup on assignment for T[new]. That way, there would never be two resizable arrays sharing the same ptr.
> 
> /Oskar

Yeah, I guess those two would also be solutions, although I don't see right now if there could be some other implications with those usages.

But in any case, even having two array kinds, and whatever the way the resizable array kind works, why do the non-resizable arrays need to be non-resizable at all? They could be resizable, but having the resize operation always allocate a new array (and consequently also allow concatenation). You gain language functionality and lose *nothing*.

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Forums