Array length & allocation question (page 3)

Bruno Medeiros skrev: > Oskar Linde wrote: >> >> Like this: >> >> void foo(char[] arr) { >> if (!arr) >> writefln("Uninitialized array passed"); >> else if (arr.length == 0) >> writefln("Zero length array received"); >> } >> >> /Oskar > > This is not safe to do. Currently in D null arrays and zero-length arrays are conceptually the same. It just so happens that sometimes the arr.ptr is null and sometimes not, depending on the previous operations. > The "A 'dup'ed empty string is now a null string." is an example of why that is not safe. I thought you knew this already? This is nothing new. Yeah, I knew about that. I did mot mean to imply that D is flawless in this regard. The cases given were: foo(""); and char[] s; foo(s); And for those, the above function works. My only point, if I had one, was that there are differences between zero length arrays and null arrays in some cases in D. > BTW, I do find it (at first sight at least) unnatural that a null array is the same as a zero-length arrays. It doesn't seem conceptually right/consistent. In my view, D's dynamic arrays are quite different from a conceptually ideal array. Conceptually, I see an array as an ordered collection of elements. The elements belong to (or are part of) the array. One could imagine such arrays as both value and reference types. For a reference type ideal array, there has to be a clear difference between null and zero length. A value type ideal array on the other hand would not need one such distinction. Another conceptual entity apart from an array is an array view. An array view refers to a selection of indices of another array. For example, a range of indices (aka a slice). An array view may or may not remain valid when the referred array changes. D's dynamic array is quite far from my ideal array. Both its reference and its value version. A closer match is actually a by-value array slice. Does it make sense for a by-value array slice type to discriminate between null and zero-length? I would say that it has its uses. For example, a regexp could match a zero length portion of a string. It is still important to know where in the string the match was made. D's arrays have both the role of a non-reference array and of an array slice. In the role of an non-reference array, it makes sense that null is equivalent to zero-length. In the role of an array slice on the other hand, it does make sense to discriminate between zero length and null. There are other differences. Appending elements only makes sense to the array role, not the slice role. dup creates an array from a slice or an array. It therefore makes sense that dup returns null on zero length arrays. The semantics of some operations depends on the role the array has. D has no way of knowing, so it guesses. Take that with a grain of salt, but operations on arrays depend on a runtime judgment by the gc. Take the append operation. Appending elements to a D array that is in the array role makes sense and works like a charm. Appending elements to an array slice doesn't make any sense, but D will create a new array with copies of the elements the slice refers to and append the element to that array. The slice has been transformed into an array. But how does D know when an array is in the slice role or the array role? It doesn't. Here is where the (educated) guess comes in. Any array that starts at the beginning of a gc chunk is assumed to be an array. Otherwise, it is assumed to be a slice. The implications are: char[] mystr = "abcd".dup; char[] slice1 = mystr[0..1]; char[] slice2 = mystr[1..2]; slice1 ~= "x"; // alters the original mystr slice2 ~= "y"; // doesn't alter the original I've written too much nonsense now. Some condensed conclusions: - D's arrays have a schizophrenic nature (slice vs array) - The compiler is unable to tell the difference and can't protect you against mistakes - D arrays are not self documenting: char[] foo(); // <- returns an array or a slice of someone else's array? /Oskar

June 14, 2006

Re: Array length & allocation question

Posted by Bruno Medeiros
in reply to Oskar Linde

Permalink

Bruno Medeiros

Posted in reply to Oskar Linde

Permalink

Oskar Linde wrote:
> Bruno Medeiros skrev:
>> Oskar Linde wrote:
>>>
>>> Like this:
>>>
>>> void foo(char[] arr) {
>>>     if (!arr)
>>>         writefln("Uninitialized array passed");
>>>     else if (arr.length == 0)
>>>         writefln("Zero length array received");
>>> }
>>>
>>> /Oskar
>>
>> This is not safe to do. Currently in D null arrays and zero-length arrays are conceptually the same. It just so happens that sometimes the arr.ptr is null and sometimes not, depending on the previous operations.
>> The "A 'dup'ed empty string is now a null string." is an example of why that is not safe. I thought you knew this already? This is nothing new.
> 
> Yeah, I knew about that. I did mot mean to imply that D is flawless in this regard. The cases given were:
> 
> foo(""); and char[] s; foo(s);
> 
> And for those, the above function works. My only point, if I had one, was that there are differences between zero length arrays and null arrays in some cases in D.
> 
>> BTW, I do find it (at first sight at least) unnatural that a null array is the same as a zero-length arrays. It doesn't seem conceptually right/consistent.
> 
> In my view, D's dynamic arrays are quite different from a conceptually ideal array.
> 
> Conceptually, I see an array as an ordered collection of elements. The elements belong to (or are part of) the array.
> 
> One could imagine such arrays as both value and reference types. For a reference type ideal array, there has to be a clear difference between null and zero length. A value type ideal array on the other hand would not need one such distinction.
> 
> Another conceptual entity apart from an array is an array view. An array view refers to a selection of indices of another array. For example, a range of indices (aka a slice). An array view may or may not remain valid when the referred array changes.
> 
> D's dynamic array is quite far from my ideal array. Both its reference and its value version. A closer match is actually a by-value array slice.
> 
> Does it make sense for a by-value array slice type to discriminate between null and zero-length? I would say that it has its uses. For example, a regexp could match a zero length portion of a string. It is still important to know where in the string the match was made.
> 
> D's arrays have both the role of a non-reference array and of an array slice. In the role of an non-reference array, it makes sense that null is equivalent to zero-length. In the role of an array slice on the other hand, it does make sense to discriminate between zero length and null. There are other differences. Appending elements only makes sense to the array role, not the slice role. dup creates an array from a slice or an array. It therefore makes sense that dup returns null on zero length arrays.
> 
> The semantics of some operations depends on the role the array has. D has no way of knowing, so it guesses. Take that with a grain of salt, but operations on arrays depend on a runtime judgment by the gc.
> 
> Take the append operation. Appending elements to a D array that is in the array role makes sense and works like a charm. Appending elements to an array slice doesn't make any sense, but D will create a new array with copies of the elements the slice refers to and append the element to that array. The slice has been transformed into an array.
> 
> But how does D know when an array is in the slice role or the array role? It doesn't. Here is where the (educated) guess comes in. Any array that starts at the beginning of a gc chunk is assumed to be an array. Otherwise, it is assumed to be a slice. The implications are:
> 
> char[] mystr = "abcd".dup;
> char[] slice1 = mystr[0..1];
> char[] slice2 = mystr[1..2];
> slice1 ~= "x"; // alters the original mystr
> slice2 ~= "y"; // doesn't alter the original
> 

Well, those new thing you mentioned are actually very related with ownership management, and reference/object immutibility, than to just arrays itself.


> I've written too much nonsense now. Some condensed conclusions:
> 
> - D's arrays have a schizophrenic nature (slice vs array)
> - The compiler is unable to tell the difference and can't protect you against mistakes
> - D arrays are not self documenting:
> 
> char[] foo(); // <- returns an array or a slice of someone else's array?
> 
> /Oskar

We have often mentioned the problems of arrays (both static and dynamic) before. It should be brought under discussion to the "general" D public eventually. (although for me preferably not soon, other things to take care)


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Forums