July 21, 2005
Hi Regan,

In article <opst8meeo123k2f5@nrage.netwin.co.nz>, Regan Heath says...
>
>On Thu, 21 Jul 2005 00:04:56 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
>>> Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.
>>
>> Hm... not only does this distinction exist, it is in fact _very_ much
>> available
>> in D. That's exactly the point Regan has made in some past replies. I'm
>> indifferent towards this distinction, but Regan seems fond of it. Please
>> look at
>> my examples further below.
>
>It's true.

Praise the lord, agreement. ;)

>>> And at all, arrays have somewhat pointer-like semantics in D.
>>
>> No, the do not, IMHO. This is one of the points I've tried to make.
>> Arrays have
>> completely different semantics in D compared to C. In D arrays are
>> first-class
>> objects. They are handled via references, which can't be nulled, they
>> keep their
>> own length, etc. I think this is a good thing. Very different from C.
>
>The point I'm trying to make is that in D an array can be nulled, and it has meaning, eg.
>
>char[] p = null;
>
>you're confusing the _implementation_ of arrays with the _behaviour_ of arrays, the above array _referece_ behaves just like any other reference that has been nulled(*) eg.

I'm well aware of the implementation vs. the behaviour. It just so happens the two are married when it comes to the compiler. In fact, in the resulting executable, they are indistinguishable. Confusion arises as a result.

>>> One of the reasons is that it seems
>>> familiar to C programmers.
>>
>> Indeed. It seems familiar, and people will misuse it because of that.
>
>How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is perfectly correct in all cases(*).

And except for static arrays. Oh, and strings, which must be compatible with C. Since strings are a fairly important piece of the puzzle, I'd say this is problematic.

>(*) except for the _BUG_ where you can write:
>
>char[] p = "";
>p.length = 0;
>if (p) { //false, length = 0 resets the data pointer to null }

Has Walter actually acknowledged this to be a bug? This seems more like what you mentioned, a desire to make the distinction (empty/exist) dissapear. If that's the case, then why would you say it's a bug? If anything, it could only get worse.

>> # int[0] emptyArray;
>> # if (emptyArray) writef("See, I'm empty, yet I exist!");
>> // The statement will print.
>
>This is a static array. It's data pointer can never be null, thus it
>always exists.
>(Nothing incongruous here)

My friend, that's the very definition of an incongruence. It means static arrays
do not follow the same principles as other kinds (just like strings).
# int[0] empty;              // Not null.
# int[ ] empty = new int[0]; // Yes null.

I even went ahead and _assigned_ an empty array (int[0]) to the reference, and yet it remains _non_ existant. How do you explain that? You can't have a dynamic array that is empty and non-existant, but you _can_ have a static one? (or at least, not via the initializer?)

Let's analyze this carefully, and you will definitely see an incongruence:

# int[] A = null;
# int[] B = new int[0];

if (A) // this is false.
if (B) // this is false.

Since false == false, then A == B, and therefore null == int[0]. The very distinction you are so fond of is gone! So in this case empty == non-existant, but all over the place it isn't? _That's_ an incongruence.

>> // Let's try it again:
>> # int[] emptyArray = new int[0];
>> # if (emptyArray) writef("I'm still empty, but non-existant.");
>> // The statement will *not* print.
>
>Here you have not allocated any memory, thus nothing exists.
>(Nothing incongruous here)

Oh, so then it's purely about memory? How very semantic. Nevermind the fact that int[0] means an empty array. The distinction is lost, as shown above. IMHO there's no way around this one.

>> // Think about strings:
>> # string emptyString = "";
>> # if (emptyString) writef("Empty, yet I exist");
>> // The statement will *not* print.
>
>Wrong, this statement will print (try it).
>
>The reason it prints is that memory _is_ allocated because string constants are C compatible i.e. contain a null terminator. If this was not the case then this would act as the previous example.

"If this was not the case". That's fine, but it happens to _be_ the case. Therefore the docs should state: "There is an incongruence when it comes to string literals. Because we want them to be compatible with C, it means an empty string is not really empty. In other words, what should have been an empty array is really not. Careful, folks!"

>> But what about this:
>> # string emptyString = null;
>> # if (emptyString) writef("Empty, but now I don't exist");
>> // The statement will print.
>
>Wrong, it will not print. The array is null, nothing exists.
>(Nothing incongruous here)
>
>> Would you say the behaviour I showed above is consistent?

If you agree with the previous statements, you'll concur that the behaviour is not consistent. It calls for exceptions to be made and explained. Once more gratuitously: static vs. dynamic, and string literals, and the .length "bug," and the dynamic initializer problem.

>> You don't find it a tad, say, ambiguous?

If you at least agree it's inconsistent, then we are getting somewhere. The ambiguity results in not knowing when which is going to happen. Since there is no documentation on this, the problem is only aggravated.

>> You don't think people will be confused? I certainly was.
>
>That's because you're asking the wrong questions, and you didn't check your answers.

I did check my answers, and now I know. I made the mistake, and by _chance_ one case didn't work early on, so I started looking under the hood. But how many people will go to their graves with bugs like that still coded? How many bugs like that exist as we speak? Remember, for _most_ cases, it will not show up.

Tell me this, do you agree with this statement:
People (mistakedly) may use if (array) to test for the emptiness of an array.
What about this:
Moreover, this test will work most of the time.
And finally:
The remaining times, they are bugs.

My proposal aims to prevent those bugs.

>>> makes the foreach..else syntax suggestion from AJG very unnecessary.
>>
>> Huh? I don't see how the two things are related. You may have a valid point, but I fail to see the connection.
>
>I'm not sure either. I suspect he's referring to foreach being usable on a null array equally well, i.e. you dont have to check whether it's a null array, it will iterate 0 times for both a null array and an emtpy array.

If this is true, Ilya, that was never the intention of my suggestion. I know that foreach is "safe" even with "null" arrays. The suggestion is a way to deal with the no-items case elegantly without using a separate if statement every single time. As a matter of fact, no-items happens quite a bit IMHO.

Thanks for reading,
--AJG.


July 21, 2005
On Thu, 21 Jul 2005 02:18:27 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
>>> Hm... not only does this distinction exist, it is in fact _very_ much
>>> available
>>> in D. That's exactly the point Regan has made in some past replies. I'm
>>> indifferent towards this distinction, but Regan seems fond of it. Please
>>> look at
>>> my examples further below.
>>
>> It's true.
>
> Praise the lord, agreement. ;)

We're both men of "distinction" ;)

>> you're confusing the _implementation_ of arrays with the _behaviour_ of
>> arrays, the above array _referece_ behaves just like any other reference
>> that has been nulled(*) eg.
>
> I'm well aware of the implementation vs. the behaviour.It just so happens the two are married when it comes to the compiler. In fact, in the resulting
> executable, they are indistinguishable. Confusion arises as a result.

Sorry, I don't see your point. The compiler isn't confused, neither am I. Arrays are references, treat them as such and there is no confusion.

>>>> One of the reasons is that it seems
>>>> familiar to C programmers.
>>>
>>> Indeed. It seems familiar, and people will misuse it because of that.
>>
>> How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is perfectly correct in all cases(*).
>
> And except for static arrays.

No, this is no exception to the rule.

Yes, static arrays are different to dynamic ones, no surprises there. Yes, static arrays cannot have a null data pointer, no, it makes no difference to the behaviour of "if(x)", nor should it.

static arrays are the same as dynamic ones that _exist_, this makes perfect sense as static arrays always exist.

> Oh, and strings, which must be compatible with C.

Again, there is no exception to the rule here.
"bob" is a static string, it cannot be null.
"" is a static string, it cannot be null.

Yes, the last example has no items, i.e. has a 0 length, but it still _exists_.

If Walter decided to remove the trailing null and make it incompatible with C then it could be optimised away, i.e. the compiler could decide "" was meaningless and so could remove it, making it non existant. In that case it wouldn't exist. Otherwise it does. As long as it exists it has a non-null data pointer. The length is meaningless when talking about existance.

>> (*) except for the _BUG_ where you can write:
>>
>> char[] p = "";
>> p.length = 0;
>> if (p) { //false, length = 0 resets the data pointer to null }
>
> Has Walter actually acknowledged this to be a bug?

In short, no. But then he isn't known for his verbosity on many matters. He just percolates and out pops a new compiler possibly with a changes we talk about.

> This seems more like what you mentioned, a desire to make the distinction (empty/exist) dissapear.

I believe that was the original intent.

> If that's the case, then why would you say it's a bug?

In this case my impression is that the real intent was to remove the seg-v problems associated with null strings, remove the need to check for null all the time, etc. That has been achieved, what is great is that at the same time we can preseve the distinction if we so choose (it takes so very little to do this, from the current state)

> If anything, it could only get worse.

Oh ye of little faith!

>>> # int[0] emptyArray;
>>> # if (emptyArray) writef("See, I'm empty, yet I exist!");
>>> // The statement will print.
>>
>> This is a static array. It's data pointer can never be null, thus it
>> always exists.
>> (Nothing incongruous here)
>
> My friend, that's the very definition of an incongruence.

Whose definition?
  http://dictionary.reference.com/search?q=incongruous

The closest/best definition for this situation appears to be:
  "Not in keeping with what is correct, proper, or logical; inappropriate: incongruous behavior"

> It means static arrays do not follow the same principles as other kinds (just like strings).

What "principles" are you referring to?

> # int[0] empty;              // Not null.
> # int[ ] empty = new int[0]; // Yes null.

> I even went ahead and _assigned_ an empty array (int[0]) to the reference, and yet it remains _non_ existant. How do you explain that? You can't have a dynamic array that is empty and non-existant, but you _can_ have a static one? (or at least, not via the initializer?)

Aha! This is a new (good) example. I agree in this example shows "incongruous behaviour".

I would suggest that "int[0] s;" be an error, as it's pretty meaningless.. Except template programmers would likely be a little annoyed with that.

I would suggest that "int[0] s;" have a null data pointer (as the dynamic one does).. But I believe they're implemented in such a way that there is no such data pointer.

There seems to be no simple solution to this problem, perhaps Walter has an idea. I'll post to the bugs NG.

> Let's analyze this carefully, and you will definitely see an incongruence:
>
> # int[] A = null;
> # int[] B = new int[0];
>
> if (A) // this is false.
> if (B) // this is false.
>
> Since false == false, then A == B, and therefore null == int[0]. The very
> distinction you are so fond of is gone!

Not true.

I suspect "new int[0]" allocates no memory, therefore it _is_ null.
This is different to C/C++ which can and do allocate a zero-length item in the heap.

This could be a solution to the problem above, if "new int[0]" allocated a zero length item on the heap it would be consistent with the static array case.

>>> // Let's try it again:
>>> # int[] emptyArray = new int[0];
>>> # if (emptyArray) writef("I'm still empty, but non-existant.");
>>> // The statement will *not* print.
>>
>> Here you have not allocated any memory, thus nothing exists.
>> (Nothing incongruous here)
>
> Oh, so then it's purely about memory?

In essence, yes. If no memory is allocated it doesn't exist. Exactly like your own C example earlier.

> How very semantic. Nevermind the fact that int[0] means an empty array.

"new int[0]" means allocate an array of 0 int's. 0 * int.sizeof == 0. In other words allocate 0 bytes. I suspect a shortcut is being done where it does no allocation when you ask for 0 bytes. I think perhaps it should allocate a zero-length item on the heap instead.

> The distinction is lost, as shown above. IMHO there's no way around this one.

Sure, there is 1 problem in the static array vs dynamic array example.
Lets hope Walter agrees and has/likes the solution.

>>> // Think about strings:
>>> # string emptyString = "";
>>> # if (emptyString) writef("Empty, yet I exist");
>>> // The statement will *not* print.
>>
>> Wrong, this statement will print (try it).
>>
>> The reason it prints is that memory _is_ allocated because string
>> constants are C compatible i.e. contain a null terminator. If this was not the case then this would act as the previous example.
>
> "If this was not the case". That's fine, but it happens to _be_ the case.
> Therefore the docs should state: "There is an incongruence when it comes to string literals. Because we want them to be compatible with C, it means an empty string is not really empty.

It depends how you want to look at it. When I type "" I'm saying here exists a string containing nothing. In other words, it _exists_ but contains _nothing_ it's the very definition of a non-null data pointer with a 0 length.

> In other words, what should have been an empty array is really not. Careful, folks!"

It _is_ empty, it's length is 0. The trailing \0 is effectively outside the length of the array, it exists past the end.

>>> But what about this:
>>> # string emptyString = null;
>>> # if (emptyString) writef("Empty, but now I don't exist");
>>> // The statement will print.
>>
>> Wrong, it will not print. The array is null, nothing exists.
>> (Nothing incongruous here)
>>
>>> Would you say the behaviour I showed above is consistent?
>
> If you agree with the previous statements, you'll concur that the behaviour is not consistent. It calls for exceptions to be made and explained.

As I said above, there are no exceptions in the rule for "if(x)". It simply and always checks the variable 'x' against null or 0. Nothing more, nothing less. You do however need to understand what other statements like the "new int[0]" do, in order to understand how they relate to "if(x)". That doesn't mean there is anything wrong with "if(x)".

> Once more gratuitously: static vs. dynamic, and string literals, and the .length "bug," and the dynamic initializer problem.

Summary:
I agree there is a problem with static vs dynamic above.
I don't agree that there is anything wrong with the behaviour of "if(x)".

>>> You don't find it a tad, say, ambiguous?
>
> If you at least agree it's inconsistent, then we are getting somewhere.

The static vs dynamic example above shows inconsistency.

> The ambiguity results in not knowing when which is going to happen.

Specifically with statments like "new int[0]" and "int[0] a" and what exactly _they_ do.

>>> You don't think people will be confused? I certainly was.
>>
>> That's because you're asking the wrong questions, and you didn't check
>> your answers.
>
> I did check my answers, and now I know.

Yeah, I didn't see your post correcting it till after I wrote this.

> I made the mistake, and by _chance_ one case didn't work early on, so I started looking under the hood. But how many people will go to their graves with bugs like that still coded? How many bugs like that exist as we speak? Remember, for _most_ cases, it will not show up.
>
> Tell me this, do you agree with this statement:
> People (mistakedly) may use if (array) to test for the emptiness of an array.

No. My reasoning:

1. Most container classes use a length or size member for this. I haven't seen a single container class/object/thing in any language that lets you check the length or size of an object using "if(x)".

2. The statement "if(x)" is well know to mean check x vs null or 0. If you assume an array is a struct you're writing something meaningless. If you assume an array is a reference you're comparing the reference to null or 0. I cannot see how you would ever think it would silently call ther length member of x.

> What about this:
> Moreover, this test will work most of the time.

Sure. Most of the time you'll have an array with items, thus the data pointer will be non-null.

> And finally:
> The remaining times, they are bugs.

Yes. Assuming: you wrote "if(x)" and meant to check for length>0 then in the case of a non-null data pointer and a 0 length it would execute the code you had written for arrays with a length greater than 0.

> My proposal aims to prevent those bugs.

Sure, only you want to do it in such a way as to break existing code relying on "if(x)". You want to introduce inconsistent behaviour (making arrays behave differently to all other types in D). And lastly the bugs you're referring to are, IMO, unlikely to occur.

Essentially you have to generate a zero length non-null array. The 3 ways I know of doing this are:

char[0] p;            //1
char[] p = "";        //2

char[] tmp = "abc";
char[] p = tmp[0..0]; //3

You'd have to (incorrectly) attempt to compare the length of an array with "if(p)" and the outcome would have to be wrong in a subtle way for this to be a serious problem, a blatant bug is easy to find and you quickly learn not to use "if(p)" to check for length.

Most cases I can imagine the non-null zero length array causes no problems, because as Ilya mentioned things like "foreach" treat them the same. This is part of the "treat them the same" that was Walters initial goal and is achieved mostly by array references never being null.

In short, I like it how it is, I can't see a significant problem, and I totally dislike your suggested solution. But, like you say thanks for listening to my point of view, it's been fun. (I think we've exhausted our ideas and I don't think we're agreeing)

Regan.
July 21, 2005
>>It might be easier to just live with the current behavior.
>
> That's just laziness speaking ;).

Maybe "easier" isn't the right word :-)
The last time this topic came up one suggestion was to encourage explicit
.length or .ptr conditions but to keep the current implicit conversions. For
example the C++string vs D string page
http://www.digitalmars.com/d/cppstrings.html was changed to test for empty
as:
 if (!array.length) ...
It's in the section "Checking For Empty Strings". It used to just be "if
(!array)", I think.


>>Then again we already have 'if (x = y)' illegal so there is precendent for filtering conditions - the good-old 'value does not give boolean result' error.
>
> Yes! That's exactly what I was thinking. D even has its cake and eats it,
> because (x = y) is still legal with an additional explict == true/false;
> this is
> great. It allows you to do it yet prevents the common missing = mistake.
>
> This is analogous to if (array). The pointer check can still be done via
> array.ptr, but D would error out when using the ambiguous form. So there
> is
> definitely precedent, and it's a good precendent.

In fact now that I think about the 'if (!array)' code if we made 'if (array)' illegal we'd also need a special check for 'if (!array)'. That's at least two more special cases for conditions.


July 21, 2005
Hi,

>>>> Please
>>>> look at
>>>> my examples further below.
>>>
>>> It's true.
>>
>> Praise the lord, agreement. ;)
>
>We're both men of "distinction" ;)

Hehehe. I'll in requiring your testimony in court one day.

>In short, I like it how it is, I can't see a significant problem, and I totally dislike your suggested solution. But, like you say thanks for listening to my point of view, it's been fun. (I think we've exhausted our ideas and I don't think we're agreeing)

Yes, I suppose we can agree to disagree.

One last couple of things I'd like to clarify, though: My idea is not necessarily to make if (array) check length automatically. This is just one of the three I mentioned. My general suggestion is to improve/clarify and document the behaviour of the construct because I find it dangerous and leading to the subtle bugs I mentioned.

You agreed that the bugs can at least happen. It'd be great to know how common they could appear; alas, this wouldn't be easy. However, in all honesty, bugs arising from using assignment as a boolean (if (x = y)) haven't happened to me very much. Maybe once or twice (in years). Yet the construct was made partially illegal, requiring a more explicit version. That's fine with me. It helps prevent those subtle (if seldom) bugs.

In addition, IIRC, nowhere on the D site proper is there a mention of what the correct behaviour is supposed to be. I have a feeling Walter left this construct a little unfinished with regards to arrays. Maybe he's working on the empty/null distinction thing and then he will revise it. Anyway, as I've said the lack of documentation doesn't help.

And finally: Could you give me a concrete example of a useful application of if (array) to test for the array pointer's nullness? Say, in a complete function? I simply don't think dealing with ptrs (or checking them) should be necessary in D except for C-compat. But perhaps you have a really good use for this construct that I haven't considered.

Thanks,
--AJG.


July 21, 2005
On Thu, 21 Jul 2005 13:35:36 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
> And finally: Could you give me a concrete example of a useful application of if (array) to test for the array pointer's nullness? Say, in a complete function? I simply don't think dealing with ptrs (or checking them) should be necessary in D except for C-compat. But perhaps you have a really good use for this construct that I haven't considered.

Template programming is an example of where we rely on the logical consistency of types to achieve generic things, see:

import std.stdio;

class A
{
	char[] toString()
	{
		return "A";
	}
}

template doWrite(Type)
{
	void doWrite(Type p)
	{
		if (p) writef(p);
	}
}

alias doWrite!(A) doWriteA;
alias doWrite!(char[]) doWriteC;

void main()
{
	char[] a = "this is an ";
	
	doWriteC(null);
	doWriteC(a);
	doWriteA(null);
	doWriteA(new A());
}

Essentially anywhere you expect consistent behaviour of references (string or otherwise) and want to test the reference is not null, i.e. non-existant.

Regan
July 22, 2005
"Regan Heath" <regan@netwin.co.nz> wrote in message news:opst6x8cje23k2f5@nrage.netwin.co.nz...
> On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
>> This is a suggestion based on a thread from a couple of weeks ago. What
>> about
>> making if (array) illegal in D? I think it brings ambiguity and a high
>> potential
>> for errors to the language. The main two uses for this construct can
>> already be
>> done with a slightly more explicit syntax:
>>
>> if (array.ptr == null) // Check for a kind of "non-existance."
>> if (array.length == 0) // Check for explicit emptiness.
>>
>> On the other hand, one is not sure what if (array) by itself is supposed
>> to
>> mean, since it's _not_ like C. In C, if (array), where array is
>> typically a
>> pointer, means simply != NULL. The problem in D is that the array ptr is
>> tricky
>> and IMHO it's best not to interface with it directly.
>>
>> I think it would be wise to remove this ambiguity. I propose two options:
>> 1) Make if (array) equal _always_ to if (array.length).
>> 2) Simply make it illegal.
>>
>> What do you guys think? Walter?
>
> I prefer the current behaviour (for all the reasons I mentioned in the
> previous thread):
>   http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/25804
>
> "if (array)" is the same as "if (array.ptr)" which acts just like it does in C, comparing it to 0/null.
>
> Essentially the "if" statement is checking the not zero state of the variable itself. In the case of value types it compares the value to 0. In the case of pointers and references it compares them to null.
>
> In the case of an array, which (as explained in link above) is a mix/pseudo value/reference type, it compares the data pointer to null.
>
> The reason this is the correct behaviour is that a null array has a null data pointer, but, an empty array i.e. an existing set containing no elements may have a non-null data pointer. In both cases they have a 0 length property.
>
> Of course we could change this, we could remove the case where an array contains no items but has a non-null data pointer. This IMO would remove a useful distinction, the "existing set containing no items" would be un-representable with a single array variable. IMO that would be a bad move, the current situation(*) is good.
>
> (*) there remains the problem where setting the length of an array sets the data pointer to null. This can change an "existing set with no elements" into a "non existant set".
>
> Regan

I was poking around the Qt documentation and interestingly enough QString has a concept of null and empty. Here's what they say, though: "For historical reasons, QString distinguishes between a null string and an empty string. [snip] We recommend that you always use isEmpty() and avoid isNull()."

The exact doc is http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings


July 22, 2005
On Thu, 21 Jul 2005 22:31:37 -0400, Ben Hinkle <ben.hinkle@gmail.com> wrote:
> "Regan Heath" <regan@netwin.co.nz> wrote in message
> news:opst6x8cje23k2f5@nrage.netwin.co.nz...
>> On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member@pathlink.com>
>> wrote:
>>> This is a suggestion based on a thread from a couple of weeks ago. What
>>> about
>>> making if (array) illegal in D? I think it brings ambiguity and a high
>>> potential
>>> for errors to the language. The main two uses for this construct can
>>> already be
>>> done with a slightly more explicit syntax:
>>>
>>> if (array.ptr == null) // Check for a kind of "non-existance."
>>> if (array.length == 0) // Check for explicit emptiness.
>>>
>>> On the other hand, one is not sure what if (array) by itself is supposed
>>> to
>>> mean, since it's _not_ like C. In C, if (array), where array is
>>> typically a
>>> pointer, means simply != NULL. The problem in D is that the array ptr is
>>> tricky
>>> and IMHO it's best not to interface with it directly.
>>>
>>> I think it would be wise to remove this ambiguity. I propose two options:
>>> 1) Make if (array) equal _always_ to if (array.length).
>>> 2) Simply make it illegal.
>>>
>>> What do you guys think? Walter?
>>
>> I prefer the current behaviour (for all the reasons I mentioned in the
>> previous thread):
>>   http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/25804
>>
>> "if (array)" is the same as "if (array.ptr)" which acts just like it does
>> in C, comparing it to 0/null.
>>
>> Essentially the "if" statement is checking the not zero state of the
>> variable itself. In the case of value types it compares the value to 0. In
>> the case of pointers and references it compares them to null.
>>
>> In the case of an array, which (as explained in link above) is a
>> mix/pseudo value/reference type, it compares the data pointer to null.
>>
>> The reason this is the correct behaviour is that a null array has a null
>> data pointer, but, an empty array i.e. an existing set containing no
>> elements may have a non-null data pointer. In both cases they have a 0
>> length property.
>>
>> Of course we could change this, we could remove the case where an array
>> contains no items but has a non-null data pointer. This IMO would remove a
>> useful distinction, the "existing set containing no items" would be
>> un-representable with a single array variable. IMO that would be a bad
>> move, the current situation(*) is good.
>>
>> (*) there remains the problem where setting the length of an array sets
>> the data pointer to null. This can change an "existing set with no
>> elements" into a "non existant set".
>>
>> Regan
>
> I was poking around the Qt documentation and interestingly enough QString
> has a concept of null and empty. Here's what they say, though: "For
> historical reasons, QString distinguishes between a null string and an empty
> string. [snip] We recommend that you always use isEmpty() and avoid
> isNull()."
>
> The exact doc is
> http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

That's not too surprising. A lot of people have never seen the need for the distinction, and it certainly can make life "simpler". However, I don't believe you can argue that it doesn't exist, at least logically. That is why you get situations like this (stolen from a post to the DMDScript group):

<quote>
For example, might it not be useful to return 'null' on EOF, thus allowing
this sort of construct:

    var line = readln();

    while (line != null)
    {
         ...
         line = readln();
    }
</quote>

which is an example where there is a desire to distinguish between existance and empty.

Sure, you can remove the distinction, lessen the expressiveness of arrays and force everyone to "work around" the deficiency in other ways, it's possible, it can make life simpler for the general case and more complicated for the rest.

I think arrays in D are nearly perfect(*). They allow you to ignore the distinction in the general case (thus life is pretty easy already) yet you can tell the difference if you require it.

(*) there are only 2 problems with them IMO:

1. length = 0; resets the data pointer to null, changing emtpy into non-existant.
2. "int[0] a;" and "int[] a = new int[0];" produce different results when you'd expect the same thing.

Regan
July 22, 2005
Derek Parnell schrieb:
>>Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.
> 
> Maybe in your world, but not in mine.

[...]

> To repeat: Existence and Emptiness are not the same concept.

The matter of discussion is not your or my view of the real world, nor some other programming languages' realm. The matter is how arrays are implemented, or should be implemented in D. Considering that D relies on garbage collection heaily with arrays anyway, the construct of an empty, but existant array is unnecessary.

I believe that making this distinction, between empty and non-existent arrays, just provides the possibility for another misconception and bug.

If someone sees real technical necessity to be able to distinguish between the empty and the non-existing one, is invited to show it here.

-eye
July 22, 2005
"Regan Heath" <regan@netwin.co.nz> wrote in message news:opsuaqfmcv23k2f5@nrage.netwin.co.nz...
> On Thu, 21 Jul 2005 22:31:37 -0400, Ben Hinkle <ben.hinkle@gmail.com> wrote:
>> "Regan Heath" <regan@netwin.co.nz> wrote in message news:opst6x8cje23k2f5@nrage.netwin.co.nz...
>>> On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member@pathlink.com>
>>> wrote:
>>>> This is a suggestion based on a thread from a couple of weeks ago. What
>>>> about
>>>> making if (array) illegal in D? I think it brings ambiguity and a high
>>>> potential
>>>> for errors to the language. The main two uses for this construct can
>>>> already be
>>>> done with a slightly more explicit syntax:
>>>>
>>>> if (array.ptr == null) // Check for a kind of "non-existance."
>>>> if (array.length == 0) // Check for explicit emptiness.
>>>>
>>>> On the other hand, one is not sure what if (array) by itself is
>>>> supposed
>>>> to
>>>> mean, since it's _not_ like C. In C, if (array), where array is
>>>> typically a
>>>> pointer, means simply != NULL. The problem in D is that the array ptr
>>>> is
>>>> tricky
>>>> and IMHO it's best not to interface with it directly.
>>>>
>>>> I think it would be wise to remove this ambiguity. I propose two
>>>> options:
>>>> 1) Make if (array) equal _always_ to if (array.length).
>>>> 2) Simply make it illegal.
>>>>
>>>> What do you guys think? Walter?
>>>
>>> I prefer the current behaviour (for all the reasons I mentioned in the
>>> previous thread):
>>>   http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/25804
>>>
>>> "if (array)" is the same as "if (array.ptr)" which acts just like it
>>> does
>>> in C, comparing it to 0/null.
>>>
>>> Essentially the "if" statement is checking the not zero state of the
>>> variable itself. In the case of value types it compares the value to 0.
>>> In
>>> the case of pointers and references it compares them to null.
>>>
>>> In the case of an array, which (as explained in link above) is a mix/pseudo value/reference type, it compares the data pointer to null.
>>>
>>> The reason this is the correct behaviour is that a null array has a null data pointer, but, an empty array i.e. an existing set containing no elements may have a non-null data pointer. In both cases they have a 0 length property.
>>>
>>> Of course we could change this, we could remove the case where an array
>>> contains no items but has a non-null data pointer. This IMO would
>>> remove a
>>> useful distinction, the "existing set containing no items" would be
>>> un-representable with a single array variable. IMO that would be a bad
>>> move, the current situation(*) is good.
>>>
>>> (*) there remains the problem where setting the length of an array sets the data pointer to null. This can change an "existing set with no elements" into a "non existant set".
>>>
>>> Regan
>>
>> I was poking around the Qt documentation and interestingly enough QString
>> has a concept of null and empty. Here's what they say, though: "For
>> historical reasons, QString distinguishes between a null string and an
>> empty
>> string. [snip] We recommend that you always use isEmpty() and avoid
>> isNull()."
>>
>> The exact doc is http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings
>
> That's not too surprising. A lot of people have never seen the need for the distinction, and it certainly can make life "simpler". However, I don't believe you can argue that it doesn't exist, at least logically. That is why you get situations like this (stolen from a post to the DMDScript group):
>
> <quote>
> For example, might it not be useful to return 'null' on EOF, thus allowing
> this sort of construct:
>
>     var line = readln();
>
>     while (line != null)
>     {
>          ...
>          line = readln();
>     }
> </quote>
>
> which is an example where there is a desire to distinguish between existance and empty.
>
> Sure, you can remove the distinction, lessen the expressiveness of arrays and force everyone to "work around" the deficiency in other ways, it's possible, it can make life simpler for the general case and more complicated for the rest.
>
> I think arrays in D are nearly perfect(*). They allow you to ignore the distinction in the general case (thus life is pretty easy already) yet you can tell the difference if you require it.
>
> (*) there are only 2 problems with them IMO:
>
> 1. length = 0; resets the data pointer to null, changing emtpy into
> non-existant.
> 2. "int[0] a;" and "int[] a = new int[0];" produce different results when
> you'd expect the same thing.
>
> Regan

Sure, I agree special values can be useful and null is an easy special value
to use. Note the same behavior can be obtained with returning a singleton
empty just for eof, if desired. The singleton approach could arguably make
the code more readable, too, since the reader wouldn't have to know that
null line meant eof. For example
 char[] line = din.readLine();
 while (line !is din.eofLine()) { ... line = din.readLine(); }
where eofLine can return null or if the stream author wishes it can return
some other unique empty string.


July 23, 2005
On Fri, 22 Jul 2005 15:00:51 +0200, Ilya Minkov <minkov@cs.tum.edu> wrote:
> Derek Parnell schrieb:
>>> Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.
>>  Maybe in your world, but not in mine.
>
> [...]
>
>> To repeat: Existence and Emptiness are not the same concept.
>
> The matter of discussion is not your or my view of the real world, nor some other programming languages' realm. The matter is how arrays are implemented, or should be implemented in D.

Sure, however D exists in the real world. Programmers solve real world problems. IMO arrays should be implemented in D in a manner that best allows us to do that.

> Considering that D relies on garbage collection heaily with arrays anyway, the construct of an empty, but existant array is unnecessary.

I don't see your point. The concept of existance, non-existance, empty, not-empty still exists with garbage collection as much as any other memory management sceme. Garbage collection does not obviate the need to express non-existance, exists but empty, exists and not empty.

> I believe that making this distinction, between empty and non-existent arrays, just provides the possibility for another misconception and bug.

You're correct in one respect, having the ability to express more i.e. non-existance, exists but empty, exists and not empty adds complexity increasing the chance that someone will mistakenly use one when they mean the other.

However, as a concrete example a very common bug in C/C++ is referencing a null pointer (a pointer is a good example of a type which can represent non-existance, exists but empty, exists and not empty).

Arrays in D do not share this problem, the array reference cannot be null. At the same time, the current array implementation retains the expressiveness that allows you to represent non-existance, exists but empty, exists and not empty.

My point is that D's arrays have the expressiveness without the complexity, you can ignore the non-existance case unless you want/need to consider it.

> If someone sees real technical necessity to be able to distinguish between the empty and the non-existing one, is invited to show it here.

I'm not sure there is a "necessity" as in most cases you could probably "work around" the restriction (if it was added to D). Here is an example where the expressiveness of representing non-existance, exists but empty, exists and not empty is useful.

This comment was posted to the DMDScript NG recently:

<quote>
For example, might it not be useful to return 'null' on EOF, thus allowing
this sort of construct:

    var line = readln();

    while (line != null)
    {
         ...
         line = readln();
    }
</quote>

Of course you could implement this in another way, removing the need for the ability to represent non-existance. You would have to if your type couldn't represent non-existance, that is the price you pay for simplicity. The current price paid for the current array's expressiveness is very little IMO.

Regan