October 18, 2013
On Friday, October 18, 2013 10:38:12 H. S. Teoh wrote:
> On Fri, Oct 18, 2013 at 01:32:58PM -0400, Jonathan M Davis wrote:
> > On Friday, October 18, 2013 09:55:46 Andrei Alexandrescu wrote:
> > > On 10/18/13 9:26 AM, Max Samukha wrote:
> > > > *That's* bad API design. readln should be symmetrical to writeln, not write. And about preserving the exact representation of new lines, readln/writeln shouldn't preserve that, pure and simple.
> > > 
> > > Fair point. I just gave one possible alternative out of many. Thing is, relying on client code to distinguish subtleties between empty and null strings is fraught with dangers.
> > 
> > Yeah, but the primary reason that it's bad design is the fact that D tries to conflate null and empty instead of keeping them distinct (which is essentially the complaint that was made). Whether that's ultimately good or bad is up for debate, but the side effect is that relying on the difference between null and empty ends up being very bug-prone, whereas in other languages which don't conflate the two, it isn't problematic in the same way, and it's much more reasonable to have the API treat them differently.
> 
> [...]
> 
> IMO, distinguishing between null and empty arrays is bad abstraction. I agree with D's "conflation" of null with empty, actually. Conceptually speaking, an array is a sequence of values of non-negative length. An array with non-zero length contains at least one element, and is therefore non-empty, whereas an array with zero length is empty. Same thing goes with a slice. A slice is a view into zero or more array elements. A slice with zero length is empty, and a slice with non-zero length contains at least one element. There's nowhere in this conceptual scheme for such a thing as a "null array" that's distinct from an empty array. This distinction only crops up in implementation, and IMO leads to code smells because code should be operating based on the conceptual behaviour of arrays rather than on the implementation details.

In most languages, an array is a reference type, so there's the question of whether it's even _there_. There's a clear distinction between having null reference to an array and having a reference to an empty array. This is particularly clear in C++ where an array is just a pointer, but it's try in plenty of other languages that don't treat as arrays as pointers (e.g. Java).

The problem is that D put the length on the stack alongside the pointer, making it so that D arrays are sort of reference types and sort of not. The pointer is a reference type, but the length is a value type, making the dynamic array half and half. If it were fully a reference type, then there would be no problem with distinguishing between null and empty arrays. A null array is simply a null reference to an array. But since D arrays aren't quite reference types, that doesn't work.

I see no problem in the abstraction of arrays with having null arrays, because a null array is simply a null reference to an array, which is exactly the same as having a null object or null pointer. It's the reference that's null, not what it points to. It's just D's implementation that's weird. It would be like taking some of the member variables of a class and putting them in the reference instead of in the object and then discussing how much a null object makes sense. It's just bizarre.

Now, D arrays end up working great overall in spite of their semantic weirdness, but it does mean that you can't really have proper null arrays in the same way that most languages with arrays can, forcing you to either be extremely careful when dealing with null and arrays or to waste space doing stuff to keep track of nullability separately from the array itself like Nullable does.

- Jonathan M Davis
October 18, 2013
On Fri, Oct 18, 2013 at 02:04:41PM -0400, Jonathan M Davis wrote:
> On Friday, October 18, 2013 10:38:12 H. S. Teoh wrote:
[...]
> > IMO, distinguishing between null and empty arrays is bad abstraction. I agree with D's "conflation" of null with empty, actually. Conceptually speaking, an array is a sequence of values of non-negative length. An array with non-zero length contains at least one element, and is therefore non-empty, whereas an array with zero length is empty. Same thing goes with a slice. A slice is a view into zero or more array elements. A slice with zero length is empty, and a slice with non-zero length contains at least one element. There's nowhere in this conceptual scheme for such a thing as a "null array" that's distinct from an empty array. This distinction only crops up in implementation, and IMO leads to code smells because code should be operating based on the conceptual behaviour of arrays rather than on the implementation details.
> 
> In most languages, an array is a reference type, so there's the question of whether it's even _there_. There's a clear distinction between having null reference to an array and having a reference to an empty array. This is particularly clear in C++ where an array is just a pointer, but it's try in plenty of other languages that don't treat as arrays as pointers (e.g. Java).

To me, these are just implementation details. Conceptually speaking, D arrays are actually slices, so that gives them reference semantics. Being slices, they refer to zero or more elements, so either their length is zero, or not. There is no concept of nullity here. That only comes because we chose to implement slices as pointer + length, so implementation-wise we can distinguish between a null .ptr and a non-null .ptr. But from the conceptual POV, if we consider slices as a whole, they are just a sequence of zero or more elements. Null has no meaning here.

Put another way, slices themselves are value types, but they refer to their elements by reference. It's a subtle but important difference.


> The problem is that D put the length on the stack alongside the pointer, making it so that D arrays are sort of reference types and sort of not. The pointer is a reference type, but the length is a value type, making the dynamic array half and half. If it were fully a reference type, then there would be no problem with distinguishing between null and empty arrays. A null array is simply a null reference to an array. But since D arrays aren't quite reference types, that doesn't work.
[...]

I think the issue comes from the preconceived notion acquired from other languages that arrays are some kind of object floating somewhere out there on the heap, for which we have a handle here. Thus we have the notion of null, being the case when we have a handle here but there's actually nothing out there.

But we consider the slice as being a thing right *here* and now, referencing some sequence of elements out there, then we arrive at D's notion of null and empty being the same thing, because while there may be no elements out there being referenced, the handle (i.e. slice) is always *here*. In that sense, there's no distinction between an empty slice and a null slice: either there are elements out there that we're referring to, or there are none. There is no third "null" case.

There's no reason why we should adopt the previous notion if this one works just as well, if not better. I argue that the second notion is conceptually cleaner, because it eliminates an unnecessary distinction between an empty sequence and a non-existent sequence (which then leads to similar issues one encounters with null pointers).


T

-- 
Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?
October 18, 2013
On Friday, 18 October 2013 at 19:59:26 UTC, H. S. Teoh wrote:
> ...because it eliminates an unnecessary distinction between an empty sequence and a non-existent sequence (which then leads to similar issues one encounters with null pointers).

That just seems silly. Surely we all recognize that there's a difference between the empty set and having no set at all, and that it's valuable to be able to distinguish between the two. The empty set is still a set, while nothing is... nothing.

October 18, 2013
I agree a null value and empty array are separate concepts, but from my very anecdotal/non rigorous point of view I really appreciate D's ability to treat them as equivalent.

My day job mostly involves C# and array code almost always follows the pattern if(arr == null || arr.Length == 0) ...

In D just doing if(arr.length) feels much nicer and less error prone. I'm all for correctness but would hate to throw the baby out with the bathwater.
October 18, 2013
On 10/18/2013 09:58 PM, H. S. Teoh wrote:
> To me, these are just implementation details. Conceptually speaking, D
> arrays are actually slices, so that gives them reference semantics.
> Being slices, they refer to zero or more elements, so either their
> length is zero, or not. There is no concept of nullity here. That only
> comes because we chose to implement slices as pointer + length, so
> implementation-wise we can distinguish between a null .ptr and a
> non-null .ptr. But from the conceptual POV, if we consider slices as a
> whole, they are just a sequence of zero or more elements. Null has no
> meaning here.

int[] a = null; // <- :(
October 18, 2013
On 10/18/2013 10:09 PM, Blake Anderton wrote:
> I agree a null value and empty array are separate concepts, but from my
> very anecdotal/non rigorous point of view I really appreciate D's
> ability to treat them as equivalent.
>
> My day job mostly involves C# and array code almost always follows the
> pattern if(arr == null || arr.Length == 0) ...
>
> In D just doing if(arr.length) feels much nicer and less error prone.
> I'm all for correctness but would hate to throw the baby out with the
> bathwater.

(This will work either way.)
October 18, 2013
On Friday, 18 October 2013 at 20:15:31 UTC, Timon Gehr wrote:
> (This will work either way.)

Speaking of that, it's really annoying to have to import std.array just to use range primitives with slices. Would these be better in druntime, or is that a bad idea?

October 18, 2013
On Friday, 18 October 2013 at 20:09:37 UTC, Blake Anderton wrote:
> I agree a null value and empty array are separate concepts, but from my very anecdotal/non rigorous point of view I really appreciate D's ability to treat them as equivalent.
>
> My day job mostly involves C# and array code almost always follows the pattern if(arr == null || arr.Length == 0) ...
>
> In D just doing if(arr.length) feels much nicer and less error prone. I'm all for correctness but would hate to throw the baby out with the bathwater.

Really? I NEVER write that pattern. I may check if an array is null or don't because the function shouldnt be receiving nulls (maybe its bad but idc). I just write linq and never bother to see if something is empty
October 18, 2013
On Friday, 18 October 2013 at 20:09:37 UTC, Blake Anderton wrote:
> I agree a null value and empty array are separate concepts […]

Yes, null values are a different concept, and slices being value types, there isn't really one for them. I'm torn on whether allowing conversion of arrays to pointers for the purpose of null comparison was a good idea or not.

David
October 18, 2013
On Friday, 18 October 2013 at 20:32:48 UTC, ProgrammingGhost wrote:
> Really? I NEVER write that pattern. I may check if an array is null or don't because the function shouldnt be receiving nulls (maybe its bad but idc). I just write linq and never bother to see if something is empty

Yeah, LINQ makes it a lot easier, but I usually take IEnumerable<T> instead of coding directly against arrays in that case. I find most of the time I use arrays directly is when using "params" parameters. It's very easy to not null check that and cause heartache down the line.