October 19, 2013
On Saturday, 19 October 2013 at 12:04:43 UTC, Kagamin wrote:
> On Friday, 18 October 2013 at 17:59:17 UTC, Max Samukha wrote:
>> On Friday, 18 October 2013 at 16:55:19 UTC, Andrei Alexandrescu wrote:
>>>
>>> Fair point. I just gave one possible alternative out of many. Thing is, relying on client code to distinguish subtleties between empty and null strings is fraught with dangers.
>>>
>>> Andrei
>>
>> I agree. Thinking about your variant of readln - it's ok to use [] as the value indicating EOF, since it is not included in the value set of type "line" as you define it.
>
> No, if the last line is empty, it has no new line character(s) at the end, and is as empty, as it can get.

Right. Then readln is broken.
October 21, 2013
On Fri, 18 Oct 2013 17:36:28 +0100, Dicebot <public@dicebot.lv> wrote:

> On Friday, 18 October 2013 at 15:42:56 UTC, Andrei Alexandrescu wrote:
>> That's bad API design, pure and simple. The function should e.g. return the string including the line terminator, and only return an empty (or null) string upon EOF.
>
> I'd say it should throw upon EOF as it is pretty high-level convenience function.

I disagree.  Exceptions should never be used for flow control so the rule is to throw on exceptional occurrences ONLY not on something that you will ALWAYS eventually happen.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Sat, 19 Oct 2013 10:56:02 +0100, Kagamin <spam@here.lot> wrote:

> On Friday, 18 October 2013 at 10:44:11 UTC, Regan Heath wrote:
>> This comes up time and again.  The use of, and ability to distinguish empty from null is very useful.  Yes, you run the risk of things like null pointer exceptions etc, but we have that risk now without the reward of being able to distinguish these cases.
>
> In C# code null strings are a plague.

I code in C# every day for work and I never have any problems with null strings.  The conflated empty/null cases are the real nightmare for me (more below).

null strings are no different to null class references, they're not a special case.  People seem to have this odd idea that null is somehow an invalid state for a string /reference/ (c# strings are reference types), it's not.

People also seem to elevate empty strings to some sort of special status, that's like saying 0 has some special status for int - it doesn't it's just one of a number of possible values.

In fact, int having no null like state is a "problem" causing solutions like boxing to elevate the value type to a reference in order to allow a null state for int.

Yet, in D we've decided to inconsistently remove that functionality from string for no gain.  If string could not actually be null then we'd gain something from the limitation, instead we lose functionality and gain nothing - you still have to check your strings for null in D.

We ought to go one way or the other, this middle ground is worse than either of the other options.

In my code I don't have to check for or treat empty strings any differently to other values.  I simply have to check for null.  Remembering to check for null on reference types is automatic for me, strings are not special in this regard.

> Most of the time you don't need them

Sure, and if I don't have access to null (like when using a value type like int), I can code around that lack, but it's never as straight forward a solution.

> but still must check for them just in order to not get an exception.

Sure, you must check for the possible states of a reference type.

> Also business logic makes no difference between null and empty

This is simply not true.  Example at the end.

> both of them are just "no data", so you end up typing if(string.IsNullOrEmpty(mystr)) every time everywhere.

I only have to code like this when I use 3rd party code which has conflated empty and null.  In my code when it's null it means not specified, and empty is just one type of value - for which I do no special handling.

> And, yeah, only one small feature in this big mess ever needs to differentiate between null and empty.

Untrue, null allows many alternate and IMO more direct/obvious designs.

> I found this one case trivially implementable, but nulls still plague all remaining code.

Which one case?  The readline() one below?

>> Take this simple design:
>>
>>   string readline();
>>
>> This function would like to be able to:
>>  - return null for EOF
>>  - return [] for a blank line
>>
>> but it cannot, because as soon as you write:
>>
>>   foo(readline())
>>
>> the null/[] case merges.
>
> This is a horrible design. You better throw an exception on eof instead of null:

No, no, no.  You should only throw in exceptional circumstances or you risk using exceptions for flow control, and that is just plain horrid.

> this null will break the caller anyway possibly in a contrived way.

Never a contrived way, always a blatantly obvious one and only if you're not doing your job properly.  If you want a contrived, unpredictable and difficult to debug breakage look no further than heap or stack corruption.  Null is never a difficult bug to find and fix, and is no different to forgetting to handle one of the integer return values of a function.

I use this all the time:
http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx

It has never caused me any issues.  It explicitly states that null is a possible output, and so I check for it - doing anything less is simply bad programming.

> It works if you read one line per loop cycle, but if you read several lines and assume they're not null (some multiline data format),

There is your problem, never "assume" - the documentation is very clear on the issue.

> you're screwed or your code becomes littered with null checks, but who accounts for all alternative scenarios from the start?

Me, and IMO any competent programmer.  It is misguided to think you can ignore valid states, null is a valid state in C, C++, C#, and D.. You should be thinking about and handling it.

You don't have to check for it on every access to the variable, but you do need to check for it once where the variable is assigned, or passed (in private functions you can skip this).  From that point onward you can assume non-null, valid, job done.

>> There are plenty of other such design/cases that can be imagined, and while you can work around them all they add complexity for zero gain.
>
> I believe there's no problem domain, which would like to differentiate between null and empty string instead of treating them as "no data".

null means not specified, non existent, was not there.
empty means, present but set to empty/blank.

Databases have this distinction for a reason.

If you get input from a user a field called "foo" may be:
 - not specified
 - specified

and if specified, may be:
 - empty
 - not empty

If foo is not specified you may want to assign a default value for it, if your business logic is using empty to mean "not specified" you prevent the user actually setting foo to empty and that limitation is a right pain in many cases.

You can code around this by using a boolean a dictionary to indicate the specified/not specified distinction, but this is less direct than simply using null.

If we have null, lets use it, if we want to remove null the lets remove it, but can we get out of this horrid middle ground please.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Fri, 18 Oct 2013 16:43:23 +0100, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 10/18/13 3:44 AM, Regan Heath wrote:
>> On Fri, 18 Oct 2013 00:32:46 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx>
>> wrote:
>>
>>> On Fri, Oct 18, 2013 at 01:27:33AM +0200, Adam D. Ruppe wrote:
>>>> On Thursday, 17 October 2013 at 23:12:03 UTC, ProgrammingGhost
>>>> wrote:
>>>> >is null still treats [] as null.
>>>>
>>>> blah, you're right. It will at least distinguish it from an empty
>>>> slice though (like arr[$..$]). I don't think there's any way to tell
>>>> [] from null except typeof(null) at all. At runtime they're both the
>>>> same: no contents, so null pointer and zero length.
>>>
>>> I think it's a mistake to rely on the distinction between null and
>>> non-null but empty arrays in D. They should be regarded as
>>> implementation details that user code shouldn't depend on. If you need
>>> to distinguish between arrays that are empty and arrays that are null,
>>> consider using Nullable!(T[]) instead.
>>
>> This comes up time and again.  The use of, and ability to distinguish
>> empty from null is very useful.
>
> I disagree.

Because.. the risk of a null pointer exception is not worth the gain?  If so, why not go the whole hog and prevent string from ever being null?  Then, at least we'd gain something from the loss of the null/empty distinction/limitation.

D strings ought to decide whether they're reference types or value types, if the former then I want consistent null back, if the latter then I want to be rid of null for good.  This middle ground sucks.

>> Yes, you run the risk of things like
>> null pointer exceptions etc, but we have that risk now without the
>> reward of being able to distinguish these cases.
>>
>> Take this simple design:
>>
>>    string readline();
>>
>> This function would like to be able to:
>>   - return null for EOF
>>   - return [] for a blank line
>
> That's bad API design, pure and simple. The function should e.g. return the string including the line terminator, and only return an empty (or null) string upon EOF.

It's the C# ReadLine() design and I've never once had a bug because of it.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Fri, 18 Oct 2013 17:55:46 +0100, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 10/18/13 9:26 AM, Max Samukha wrote:
>> *That's* bad API design. readln should be symmetrical to writeln, not
>> write. And about preserving the exact representation of new lines,
>> readln/writeln shouldn't preserve that, pure and simple.
>
> Fair point. I just gave one possible alternative out of many. Thing is, relying on client code to distinguish subtleties between empty and null strings is fraught with dangers.

My code does not need to distinguish between empty and null.  null is checked for, and empty is just a normal value for a string.  The "problem" you're referring to is /casused/ by conflating null and empty, by making empty strings "special" in the same way someone might make 0 a special value for an int (meaning not specified - for example).

If you stop using empty string as a special case of null, then empty does not need special handling - it's just a normal string value handled like any other - you can read it, write it, append it, etc etc etc.

null is the /only/ case which needs special handling - just like any other reference type.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Fri, 18 Oct 2013 18:38:12 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Fri, Oct 18, 2013 at 01:32:58PM -0400, Jonathan M Davis wrote:
>> On Friday, October 18, 2013 09:55:46 Andrei Alexandrescu wrote:
>> > On 10/18/13 9:26 AM, Max Samukha wrote:
>> > > *That's* bad API design. readln should be symmetrical to writeln,
>> > > not write. And about preserving the exact representation of new
>> > > lines, readln/writeln shouldn't preserve that, pure and simple.
>> >
>> > Fair point. I just gave one possible alternative out of many. Thing
>> > is, relying on client code to distinguish subtleties between empty
>> > and null strings is fraught with dangers.
>>
>> Yeah, but the primary reason that it's bad design is the fact that D
>> tries to conflate null and empty instead of keeping them distinct
>> (which is essentially the complaint that was made). Whether that's
>> ultimately good or bad is up for debate, but the side effect is that
>> relying on the difference between null and empty ends up being very
>> bug-prone, whereas in other languages which don't conflate the two, it
>> isn't problematic in the same way, and it's much more reasonable to
>> have the API treat them differently.
> [...]
>
> Conceptually
> speaking, an array is a sequence of values of non-negative length. An
> array with non-zero length contains at least one element, and is
> therefore non-empty, whereas an array with zero length is empty. Same
> thing goes with a slice. A slice is a view into zero or more array
> elements. A slice with zero length is empty, and a slice with non-zero
> length contains at least one element.

This describes the empty/not empty distinction.

> There's nowhere in this conceptual
> scheme for such a thing as a "null array" that's distinct from an empty
> array.

And this is the problem/complaint.  You cannot represent specified/not specified, you can only represent empty/not empty.

I agree you cannot logically have an existing array that is somehow a "null array" and distinct/different from an empty array, but that's not what I want/am asking for.  I want to use an array 'reference' to represent that the array is non existent, has not been set, has not been defined, etc.  This is what null is for.

> This distinction only crops up in implementation, and IMO leads
> to code smells because code should be operating based on the conceptual
> behaviour of arrays rather than on the implementation details.

It is not an implementation detail, it's a conceptual difference.  A reference type has the power to represent specified/not specified in addition to referring to an array which is empty/not empty.  A value type, like int, cannot do the same thing without either boxing (into a reference type, whose reference can be null) or by giving up one of it's values (i.e. 0) and pretending it's something special.

This is what D's string has done with empty, it is pretending that it is special and means "not specified", and because it converts null into empty, that means we cannot rely on empty really being empty (as in the user wants the value set to empty), as it might also be a value the user did not specify.

It's actually a fairly simple distinction I want to be able to make.  If you get input from a user a field called "foo" may be:
 - not specified
 - specified

and if specified, may be:
 - empty
 - not empty

null allows us the specified/not specified distinction.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Fri, 18 Oct 2013 20:58:07 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Fri, Oct 18, 2013 at 02:04:41PM -0400, Jonathan M Davis wrote:
>> On Friday, October 18, 2013 10:38:12 H. S. Teoh wrote:
> [...]
>> > IMO, distinguishing between null and empty arrays is bad
>> > abstraction. I agree with D's "conflation" of null with empty,
>> > actually. Conceptually speaking, an array is a sequence of values of
>> > non-negative length. An array with non-zero length contains at least
>> > one element, and is therefore non-empty, whereas an array with zero
>> > length is empty. Same thing goes with a slice. A slice is a view
>> > into zero or more array elements. A slice with zero length is empty,
>> > and a slice with non-zero length contains at least one element.
>> > There's nowhere in this conceptual scheme for such a thing as a
>> > "null array" that's distinct from an empty array. This distinction
>> > only crops up in implementation, and IMO leads to code smells
>> > because code should be operating based on the conceptual behaviour
>> > of arrays rather than on the implementation details.
>>
>> In most languages, an array is a reference type, so there's the
>> question of whether it's even _there_. There's a clear distinction
>> between having null reference to an array and having a reference to an
>> empty array. This is particularly clear in C++ where an array is just
>> a pointer, but it's try in plenty of other languages that don't treat
>> as arrays as pointers (e.g. Java).
>
> To me, these are just implementation details. Conceptually speaking, D
> arrays are actually slices, so that gives them reference semantics.
> Being slices, they refer to zero or more elements, so either their
> length is zero, or not. There is no concept of nullity here. That only
> comes because we chose to implement slices as pointer + length, so
> implementation-wise we can distinguish between a null .ptr and a
> non-null .ptr. But from the conceptual POV, if we consider slices as a
> whole, they are just a sequence of zero or more elements. Null has no
> meaning here.
>
> Put another way, slices themselves are value types, but they refer to
> their elements by reference. It's a subtle but important difference.
>
>
>> The problem is that D put the length on the stack alongside the
>> pointer, making it so that D arrays are sort of reference types and
>> sort of not. The pointer is a reference type, but the length is a
>> value type, making the dynamic array half and half. If it were fully a
>> reference type, then there would be no problem with distinguishing
>> between null and empty arrays. A null array is simply a null reference
>> to an array. But since D arrays aren't quite reference types, that
>> doesn't work.
> [...]
>
> I think the issue comes from the preconceived notion acquired from other
> languages that arrays are some kind of object floating somewhere out
> there on the heap, for which we have a handle here. Thus we have the
> notion of null, being the case when we have a handle here but there's
> actually nothing out there.
>
> But we consider the slice as being a thing right *here* and now,
> referencing some sequence of elements out there, then we arrive at D's
> notion of null and empty being the same thing, because while there may
> be no elements out there being referenced, the handle (i.e. slice) is
> always *here*. In that sense, there's no distinction between an empty
> slice and a null slice: either there are elements out there that we're
> referring to, or there are none. There is no third "null" case.
>
> There's no reason why we should adopt the previous notion if this one
> works just as well, if not better. I argue that the second notion is
> conceptually cleaner, because it eliminates an unnecessary distinction
> between an empty sequence and a non-existent sequence (which then leads
> to similar issues one encounters with null pointers).

If what you say is true then slices would and could never be null... If that were the case I would stop complaining and simply "box" them with Nullable when I wanted a reference type.  But, D's strings/slices are some kind of mutant half reference half value type, and that's the underlying problem here.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Fri, 18 Oct 2013 21:09:35 +0100, Blake Anderton <rbanderton@gmail.com> wrote:

> I agree a null value and empty array are separate concepts, but from my very anecdotal/non rigorous point of view I really appreciate D's ability to treat them as equivalent.
>
> My day job mostly involves C# and array code almost always follows the pattern if(arr == null || arr.Length == 0) ...

Interesting.  My day job is C# and I almost never do that.  I check for null and treat empty as any other string value.  The /only/ time I have to check for empty is when I have interfaced with 3rd party code which has decided to conflate empty and null to mean the same thing.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Mon, 21 Oct 2013 11:58:07 +0100, Regan Heath <regan@netmail.co.nz> wrote:

> On Fri, 18 Oct 2013 20:58:07 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
>
>> On Fri, Oct 18, 2013 at 02:04:41PM -0400, Jonathan M Davis wrote:
>>> On Friday, October 18, 2013 10:38:12 H. S. Teoh wrote:
>> [...]
>>> > IMO, distinguishing between null and empty arrays is bad
>>> > abstraction. I agree with D's "conflation" of null with empty,
>>> > actually. Conceptually speaking, an array is a sequence of values of
>>> > non-negative length. An array with non-zero length contains at least
>>> > one element, and is therefore non-empty, whereas an array with zero
>>> > length is empty. Same thing goes with a slice. A slice is a view
>>> > into zero or more array elements. A slice with zero length is empty,
>>> > and a slice with non-zero length contains at least one element.
>>> > There's nowhere in this conceptual scheme for such a thing as a
>>> > "null array" that's distinct from an empty array. This distinction
>>> > only crops up in implementation, and IMO leads to code smells
>>> > because code should be operating based on the conceptual behaviour
>>> > of arrays rather than on the implementation details.
>>>
>>> In most languages, an array is a reference type, so there's the
>>> question of whether it's even _there_. There's a clear distinction
>>> between having null reference to an array and having a reference to an
>>> empty array. This is particularly clear in C++ where an array is just
>>> a pointer, but it's try in plenty of other languages that don't treat
>>> as arrays as pointers (e.g. Java).
>>
>> To me, these are just implementation details. Conceptually speaking, D
>> arrays are actually slices, so that gives them reference semantics.
>> Being slices, they refer to zero or more elements, so either their
>> length is zero, or not. There is no concept of nullity here. That only
>> comes because we chose to implement slices as pointer + length, so
>> implementation-wise we can distinguish between a null .ptr and a
>> non-null .ptr. But from the conceptual POV, if we consider slices as a
>> whole, they are just a sequence of zero or more elements. Null has no
>> meaning here.
>>
>> Put another way, slices themselves are value types, but they refer to
>> their elements by reference. It's a subtle but important difference.
>>
>>
>>> The problem is that D put the length on the stack alongside the
>>> pointer, making it so that D arrays are sort of reference types and
>>> sort of not. The pointer is a reference type, but the length is a
>>> value type, making the dynamic array half and half. If it were fully a
>>> reference type, then there would be no problem with distinguishing
>>> between null and empty arrays. A null array is simply a null reference
>>> to an array. But since D arrays aren't quite reference types, that
>>> doesn't work.
>> [...]
>>
>> I think the issue comes from the preconceived notion acquired from other
>> languages that arrays are some kind of object floating somewhere out
>> there on the heap, for which we have a handle here. Thus we have the
>> notion of null, being the case when we have a handle here but there's
>> actually nothing out there.
>>
>> But we consider the slice as being a thing right *here* and now,
>> referencing some sequence of elements out there, then we arrive at D's
>> notion of null and empty being the same thing, because while there may
>> be no elements out there being referenced, the handle (i.e. slice) is
>> always *here*. In that sense, there's no distinction between an empty
>> slice and a null slice: either there are elements out there that we're
>> referring to, or there are none. There is no third "null" case.
>>
>> There's no reason why we should adopt the previous notion if this one
>> works just as well, if not better. I argue that the second notion is
>> conceptually cleaner, because it eliminates an unnecessary distinction
>> between an empty sequence and a non-existent sequence (which then leads
>> to similar issues one encounters with null pointers).
>
> If what you say is true then slices would and could never be null...

Aargh, my apologies I misread your post.  Ignore my first reply.

I agree that slices never being null are like a pre-null checked array, which is a good thing.  The issue I have had in the past is with strings (not slices) mutating from null to empty and/or vice-versa.

Also, it's not at all clear when you're dealing with a pre-check not-null slice and when you're dealing with a possibly null array, for example..

import std.stdio;

void foo(string arr)
{
	if (arr is null) writefln("null");
	else writefln("not null");
	if (arr.length == 0) writefln("empty");
	else writefln("not empty");
}

void main()
{
	string arr;
	foo(arr);
	foo(arr[0..$]);
	arr = "";
	foo(arr);
	foo(arr[0..$]);
}

Output:
null
empty
null
empty
not null
empty
not null
empty

Which of those are strings/arrays and which are slices?  Why are the ones formed by actually slicing coming up as "is null"?

(This last, not directed at you, just venting..)

I can understand arguing against null from a safety point of view.

I can understand arguing against designs that use null, for the same reasons.

I disagree, but then I have comfortably used null for a long time so the cost/benefit of using null is heavily on the benefit side for me.  I can understand for others this may not be the case.

But, I cannot understand someone who says they have no use for the concept of non-existence, or that no code will ever want to make the distinction, that is just plainly incorrect .. implementing a singleton pattern (probably a bad example :p) relies on being able to check for non-existence, using null as the indicator, we do it all the time.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Monday, October 21, 2013 11:58:07 Regan Heath wrote:
> If what you say is true then slices would and could never be null... If that were the case I would stop complaining and simply "box" them with Nullable when I wanted a reference type.  But, D's strings/slices are some kind of mutant half reference half value type, and that's the underlying problem here.

Yeah, dynamic arrays in D are just plain weird. They're halfway between reference types and value types, and it definitely causes confusion, and it totally screws with null (which definitely sucks). But they mostly work really well the way that they are, and in general, the way that slices work works really well. So, I don't know if what we have is ultimately the right design or not. I definitely don't like how null works for arrays though.

Given how they work, we probably would have been better off if they couldn't be null. The ptr obviously could be null, but the array itself arguably shouldn't be able to be null. If we did that, then it would be clear that null wouldn't work with arrays, and no one would try. It would still kind of suck, since you wouldn't have null, but then at least it would be clear that null wouldn't work with arrays instead of having a situation where it kind of does and kind of doesn't.

- Jonathan M Davis