July 23, 2005
On Fri, 22 Jul 2005 15:00:51 +0200, Ilya Minkov wrote:


[snip]

> I believe that making this distinction, between empty and non-existent arrays, just provides the possibility for another misconception and bug.

When I started seriously coding with D, I was making mistakes in my code because I assumed that D would make this distinction.

> If someone sees real technical necessity to be able to distinguish between the empty and the non-existing one, is invited to show it here.

One reasonable use for a non-existent string is to represent the fact that a default value has not been supplied. As every possible string value, including an empty string, could be the default value, I needed a way to state that a string has no default yet.

-- 
Derek Parnell
Melbourne, Australia
24/07/2005 12:07:59 AM
July 23, 2005
Regan Heath schrieb:
>> Considering that D relies on garbage collection heaily with arrays  anyway, the construct of an empty, but existant array is unnecessary.
> 
> I don't see your point. The concept of existance, non-existance, empty,  not-empty still exists with garbage collection as much as any other memory  management sceme. Garbage collection does not obviate the need to express  non-existance, exists but empty, exists and not empty.

In C it was extremely important, and one had to keep one's eye on uniqueness. At every allocation, one was to think about how to "anchor" and where to free this value, and not forget to implement freeing. Naturally, C++ automated this process somewhat. In C, the non-existance versus emtyness was sometimes very important.

>> I believe that making this distinction, between empty and non-existent  arrays, just provides the possibility for another misconception and bug.
> 
> You're correct in one respect, having the ability to express more i.e.  non-existance, exists but empty, exists and not empty adds complexity  increasing the chance that someone will mistakenly use one when they mean  the other.
> 
> However, as a concrete example a very common bug in C/C++ is referencing a  null pointer (a pointer is a good example of a type which can represent  non-existance, exists but empty, exists and not empty).

There is a problem with "exists but empty". What does malloc do when you request 0 bytes? As far as i can remember, the standard allows 2 options: the implementation can return NULL, or it could return a tiny region of memory - still not "nothing". What will it contain? My bet would be "uninitialized space". This is garbage which was in the memory before it was allocated, and might be zero, or might be anything else.

So, in C there is no other way than to embed the information on the non-existance into your data structure. In the case of strings, this is a string having '\0' character at the very beginning.

One could suggest to preallocate one data structure which will be stored globally as "the empty singleton", and when one wants to distinguish, do a pointer comparison, similarly to the null handling. However, in C it might be bad for finding a memory management solution (as we in fact deal not with a special inaccessible adress in memory, but a living object), while in D the solution is, apart from special cases, simply to copy and forget, and make the GC do the dirty work.

> Arrays in D do not share this problem, the array reference cannot be null.  At the same time, the current array implementation retains the  expressiveness that allows you to represent non-existance, exists but  empty, exists and not empty.

What do you mean by can't be null?

>> If someone sees real technical necessity to be able to distinguish  between the empty and the non-existing one, is invited to show it here.
> 
> I'm not sure there is a "necessity" as in most cases you could probably  "work around" the restriction (if it was added to D). Here is an example  where the expressiveness of representing non-existance, exists but empty,  exists and not empty is useful.

Necessity is a fuzzy value which is probably best destinguished by the heavyness of workaround.

> This comment was posted to the DMDScript NG recently:
> 
> <quote>
> For example, might it not be useful to return 'null' on EOF, thus allowing
> this sort of construct:
> 
>     var line = readln();
> 
>     while (line != null)
>     {
>          ...
>          line = readln();
>     }
> </quote>

As above, i think preallocated EOL line would do, as long as array comparison (done on pointer and length) is a simple operation.

> Of course you could implement this in another way, removing the need for  the ability to represent non-existance. You would have to if your type  couldn't represent non-existance, that is the price you pay for  simplicity. The current price paid for the current array's expressiveness  is very little IMO.

Ok, given we still have the ability to manipulate the pointer and the length separately, how should array conversion to boolean condition be defined then? Should it query the pointer, the length, or some combination of both? If length is zero, one obviously cannot iterate over it. If pointer is null, the length should be invariably zero?

-eye
July 23, 2005
On Sat, 23 Jul 2005 22:13:24 +0200, Ilya Minkov <minkov@cs.tum.edu> wrote:
> Regan Heath schrieb:
>>> Considering that D relies on garbage collection heaily with arrays  anyway, the construct of an empty, but existant array is unnecessary.
>>  I don't see your point. The concept of existance, non-existance, empty,  not-empty still exists with garbage collection as much as any other memory  management sceme. Garbage collection does not obviate the need to express  non-existance, exists but empty, exists and not empty.
>
> In C it was extremely important, and one had to keep one's eye on uniqueness. At every allocation, one was to think about how to "anchor" and where to free this value, and not forget to implement freeing. Naturally, C++ automated this process somewhat. In C, the non-existance versus emtyness was sometimes very important.

Sure, memory management makes things complicated. But, uniqueness has nothing to do with non-existance. The fact that non-existance is typically represented by null is the same regardless of memory management model.

>>> I believe that making this distinction, between empty and non-existent  arrays, just provides the possibility for another misconception and bug.
>>  You're correct in one respect, having the ability to express more i.e.  non-existance, exists but empty, exists and not empty adds complexity  increasing the chance that someone will mistakenly use one when they mean  the other.
>>  However, as a concrete example a very common bug in C/C++ is referencing a  null pointer (a pointer is a good example of a type which can represent  non-existance, exists but empty, exists and not empty).
>
> There is a problem with "exists but empty". What does malloc do when you request 0 bytes?

Allocates a zero length item on the heap. (I checked this recently).

> As far as i can remember, the standard allows 2 options: the implementation can return NULL, or it could return a tiny region of memory - still not "nothing". What will it contain? My bet would be "uninitialized space". This is garbage which was in the memory before it was allocated, and might be zero, or might be anything else.
>
> So, in C there is no other way than to embed the information on the non-existance into your data structure.

No, you simply use null. A non-existant string in C is a null pointer. An empty string in C is a non-null pointer which contains a \0 as the first character. The same applies to any other object. A null pointer indicates non-existance, and emptiness is represented in whatever fashion makes sense for the object i.e. a length property set to 0.

> In the case of strings, this is a string having '\0' character at the very beginning.

No, that is an "empty" string, not a "non-existant" one.

> One could suggest to preallocate one data structure which will be stored globally as "the empty singleton", and when one wants to distinguish, do a pointer comparison, similarly to the null handling. However, in C it might be bad for finding a memory management solution (as we in fact deal not with a special inaccessible adress in memory, but a living object), while in D the solution is, apart from special cases, simply to copy and forget, and make the GC do the dirty work.

None of this is necessary.

>> Arrays in D do not share this problem, the array reference cannot be null.  At the same time, the current array implementation retains the  expressiveness that allows you to represent non-existance, exists but  empty, exists and not empty.
>
> What do you mean by can't be null?

char[] p = null;
if (p.length == 0) { //does not crash, p itself is never 'null' }

>>> If someone sees real technical necessity to be able to distinguish  between the empty and the non-existing one, is invited to show it here.
>>  I'm not sure there is a "necessity" as in most cases you could probably  "work around" the restriction (if it was added to D). Here is an example  where the expressiveness of representing non-existance, exists but empty,  exists and not empty is useful.
>
> Necessity is a fuzzy value which is probably best destinguished by the heavyness of workaround.

Exactly. However the other thing to consider is the price paid for it, if that price is smaller than the cost (as I believe it is in this case) then it is a point in it's favour. You then factor in all the other issues, complexity of implementation, etc.

>> This comment was posted to the DMDScript NG recently:
>>  <quote>
>> For example, might it not be useful to return 'null' on EOF, thus allowing
>> this sort of construct:
>>      var line = readln();
>>      while (line != null)
>>     {
>>          ...
>>          line = readln();
>>     }
>> </quote>
>
> As above, i think preallocated EOL line would do, as long as array comparison (done on pointer and length) is a simple operation.
>
>> Of course you could implement this in another way, removing the need for  the ability to represent non-existance. You would have to if your type  couldn't represent non-existance, that is the price you pay for  simplicity. The current price paid for the current array's expressiveness  is very little IMO.
>
> Ok, given we still have the ability to manipulate the pointer and the length separately, how should array conversion to boolean condition be defined then?

The same way it works for every other type in D, the statement "if(x)" means "compare x to null or 0". In the case of a reference it compares the reference to null.

The confusion arises in this case because arrays in D cannot be null, and because arrays are in fact implemented as stack based structs in the background. This makes arrays appear to be a struct and not a reference, however currently in all (I believe) situations they behave as references. I believe this was done on purpose.

As I've noted in all cases where an array reference would be null, i.e.

char[] p = null;

it isn't, but instead the data pointer p.ptr is null.

So, in order for them to behave as references it's logically consistent for "if(p)" to check the data ptr vs null. Change that and you need to code special cases for arrays vs other reference types, eg.

template doWrite(Type) { void doWrite(Type p) {
  if (p) writefln(p);
}

class C {
  char[] toString() { return "C"; }
}

char[] p = "test";
C c = new C();

doWrite!(char[])(p);
doWrite!(char[])(c);

> Should it query the pointer, the length, or some combination of both?

The ptr, for reasons given above. Checking both is a waste of time as when the pointer is null the length must be 0 (as you say below).

> If length is zero, one obviously cannot iterate over it.

Correct. One cannot iterate over an empty or a non-existant array.

> If pointer is null, the length should be invariably zero?

Indeed. It is currently.

Regan
July 24, 2005
On Fri, 22 Jul 2005 09:06:48 -0400, Ben Hinkle <ben.hinkle@gmail.com> wrote:
>>> I was poking around the Qt documentation and interestingly enough QString
>>> has a concept of null and empty. Here's what they say, though: "For
>>> historical reasons, QString distinguishes between a null string and an
>>> empty
>>> string. [snip] We recommend that you always use isEmpty() and avoid
>>> isNull()."
>>>
>>> The exact doc is
>>> http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings
>>
>> That's not too surprising. A lot of people have never seen the need for
>> the distinction, and it certainly can make life "simpler". However, I
>> don't believe you can argue that it doesn't exist, at least logically.
>> That is why you get situations like this (stolen from a post to the
>> DMDScript group):
>>
>> <quote>
>> For example, might it not be useful to return 'null' on EOF, thus allowing
>> this sort of construct:
>>
>>     var line = readln();
>>
>>     while (line != null)
>>     {
>>          ...
>>          line = readln();
>>     }
>> </quote>
>>
>> which is an example where there is a desire to distinguish between
>> existance and empty.
>>
>> Sure, you can remove the distinction, lessen the expressiveness of arrays
>> and force everyone to "work around" the deficiency in other ways, it's
>> possible, it can make life simpler for the general case and more
>> complicated for the rest.
>>
>> I think arrays in D are nearly perfect(*). They allow you to ignore the
>> distinction in the general case (thus life is pretty easy already) yet you
>> can tell the difference if you require it.
>>
>> (*) there are only 2 problems with them IMO:
>>
>> 1. length = 0; resets the data pointer to null, changing emtpy into
>> non-existant.
>> 2. "int[0] a;" and "int[] a = new int[0];" produce different results when
>> you'd expect the same thing.
>>
>> Regan
>
> Sure, I agree special values can be useful and null is an easy special value to use.

Indeed, null and NAN have a lot in common. They indicate non-existance, or un-initialised. Think how much trouble we have coding with 'int' and other 'value' types that cannot indicate non-existance? esp with container classes and the like. std.boxer wouldn't exist if int could indicate non-existance.

> Note the same behavior can be obtained with returning a singleton
> empty just for eof, if desired. The singleton approach could arguably make the code more readable, too, since the reader wouldn't have to know that
> null line meant eof. For example
>  char[] line = din.readLine();
>  while (line !is din.eofLine()) { ... line = din.readLine(); }
> where eofLine can return null or if the stream author wishes it can return some other unique empty string.

That code is more descriptive, sure. However, null is more generic in application. You can use it 'everywhere' and everywhere it is used it can have the same meaning. This means no 'special case' code is required (like that shown above).

Regan
July 24, 2005
On Sun, 24 Jul 2005 11:55:35 +1200, Regan Heath <regan@netwin.co.nz> wrote:
> So, in order for them to behave as references it's logically consistent for "if(p)" to check the data ptr vs null. Change that and you need to code special cases for arrays vs other reference types, eg.
>
> template doWrite(Type) { void doWrite(Type p) {
>    if (p) writefln(p);
> }
>
> class C {
>    char[] toString() { return "C"; }
> }
>
> char[] p = "test";
> C c = new C();
>
> doWrite!(char[])(p);

TYPO:

> doWrite!(char[])(c);

 Should be:

doWrite!(C)(c);

Regan
July 24, 2005
In article <opsud4qxii23k2f5@nrage.netwin.co.nz>, Regan Heath says...
>
>On Sat, 23 Jul 2005 22:13:24 +0200, Ilya Minkov <minkov@cs.tum.edu> wrote:
>> So, in C there is no other way than to embed the information on the non-existance into your data structure.
>
>No, you simply use null. A non-existant string in C is a null pointer. An empty string in C is a non-null pointer which contains a \0 as the first character. The same applies to any other object. A null pointer indicates non-existance, and emptiness is represented in whatever fashion makes sense for the object i.e. a length property set to 0.
>
>> In the case of strings, this is a string having '\0' character at the very beginning.
>
>No, that is an "empty" string, not a "non-existant" one.

Hi Regan, you're of course spot-on.
It's not the first time that someone expressed misguided perceptions of "not
existant" vs "empty" in the C language here. I really wonder how often we'll
need to discuss such basic C-isms on the D NG? People should better learn their
stuff before making such bold statements.

Cheers,
Holger


July 24, 2005
Hi,

>> Sure, I agree special values can be useful and null is an easy special value to use.
>
>Indeed, null and NAN have a lot in common. They indicate non-existance, or un-initialised. Think how much trouble we have coding with 'int' and other 'value' types that cannot indicate non-existance? esp with container classes and the like. std.boxer wouldn't exist if int could indicate non-existance.

Yes! That is exactly right. The problem with using array.ptr as null for existance checks is that it's not orthogonal at all. It only works with arrays. It might also work with classes (not sure). What about primitives? No, it's back to an additional boolean or somesuch. That's why I think it's a crappy solution, and that's exactly the source of the if (array) dilemma in the first place.

C# 2.0 will "solve" this problem with the concept of nullable types. Even ints will be nullable. I'm not sure how this is going to work (haven't tried it), but at least it's orthogonal. It works everywhere. Whereas array.ptr is shaky, buggy, likely to change and IMHO unsemantic. If we at least see that this is a problem, and that there is a need for a more complete feature, maybe we can work towards a better solution.

>> Note the same behavior can be obtained with returning a singleton
>> empty just for eof, if desired. The singleton approach could arguably
>> make the code more readable, too, since the reader wouldn't have to know
>> that
>> null line meant eof. For example
>>  char[] line = din.readLine();
>>  while (line !is din.eofLine()) { ... line = din.readLine(); }
>> where eofLine can return null or if the stream author wishes it can
>> return some other unique empty string.
>
>That code is more descriptive, sure. However, null is more generic in application. You can use it 'everywhere' and everywhere it is used it can have the same meaning. This means no 'special case' code is required (like that shown above).

That's not true. 'Everywhere' would mean complete orthogonality, and as we know, this trick only works with certain types. But I agree with the premise, that nullness is a great (and easy) special value that makes life simpler. Thus a good solution should be built into the language.

Cheers,
--AJG.


July 24, 2005
In article <dbv45d$1ju8$1@digitaldaemon.com>, AJG says...
>
>C# 2.0 will "solve" this problem with the concept of nullable types. Even ints will be nullable. I'm not sure how this is going to work (haven't tried it),

Interesting stuff.. I looked into this a bit and apparently the underlying implementation is done through System.Nullable<T>.

System.Nullable<int> j;
int? k;

The 'T?' form is shorthand.

As you can imagine, there appears to be quite a bit of overhead involved as the nullable types aren't native. But there's nothing stopping you from retrieving, say, a DB value into a nullable type, checking if it's null and then assigning it to a native variable if it's not. But assigning it requires accessing a property (int k = j.Value;) or a cast (int k = (int)j;).

I like the idea, but given that you will still always have to check if a nullable variable is not null before using it or even assigning it to another (non-nullable) variable, I'm having trouble imagining how much more productive / readable it's going to make coding for most chores where "nullable native types" would be useful. For example, for database applications, I can still see a need to write a library of wrapper functions to assign a column to native data types, or if the table was represented by a class, to check for null each time a column was retrieved in order to assign the value to a native type.

Either way it seems like it will require about the same amount of code to write most applications, but with added complexity to the language.


July 24, 2005
Hi,

In article <dbv8fr$1nnm$1@digitaldaemon.com>, Dave says...
>
>In article <dbv45d$1ju8$1@digitaldaemon.com>, AJG says...
>>
>>C# 2.0 will "solve" this problem with the concept of nullable types. Even ints will be nullable. I'm not sure how this is going to work (haven't tried it),
>
>Interesting stuff.. I looked into this a bit and apparently the underlying implementation is done through System.Nullable<T>.
>
>System.Nullable<int> j;
>int? k;
>
>The 'T?' form is shorthand.

Interesting. I didn't know that. I was actually kinda hoping they found a magic "native" way, but I guess not.

>As you can imagine, there appears to be quite a bit of overhead involved as the nullable types aren't native.

Yes, I agree. Though I shouldn't speculate without having even tested for performance.

>But there's nothing stopping you from retrieving,
>say, a DB value into a nullable type, checking if it's null and then assigning
>it to a native variable if it's not. But assigning it requires accessing a
>property (int k = j.Value;) or a cast (int k = (int)j;).

This looks a little cumbersome. It will remain cumbersome without language support, IMHO.

>I like the idea, but given that you will still always have to check if a
>nullable variable is not null before using it or even assigning it to another
>(non-nullable) variable, I'm having trouble imagining how much more >productive /
>readable it's going to make coding for most chores where "nullable native >types" would be useful.

I disagree here. I think the non-existent concept is a good one. It's useful in arrays (and possibly classes), and I think the usefulness extends across primitives as well.

>For example, for database applications, I can still see a need
>to write a library of wrapper functions to assign a column to native data >types,
>or if the table was represented by a class, to check for null each time a >column was retrieved in order to assign the value to a native type.

For instance:

# void someDataFunc(nullableInt dbValue) {
#     // Handle the special case:
#     if (!dbValue) throw new Exception("Value must be specified.");
#
#     // These can now all be valid values:
#          if (dbValue  < 0) { /* Case 1 */ }
#     else if (dbValue == 0) { /* Case 2 */ }
#     else if (dbValue  > 0) { /* Case 3 */ }
#     else assert(false);
# }

>Either way it seems like it will require about the same amount of code to write most applications, but with added complexity to the language.

Some (most?) of the complexity is already there. Arrays and Classes both are already capable of existing vs. being empty. This would merely extend the feature for orthogonality. I think it would be fairly useful.

Just my 2 cents.
--AJG.


July 24, 2005
On Sun, 24 Jul 2005 04:07:09 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
>>> Sure, I agree special values can be useful and null is an easy special
>>> value to use.
>>
>> Indeed, null and NAN have a lot in common. They indicate non-existance, or
>> un-initialised. Think how much trouble we have coding with 'int' and other
>> 'value' types that cannot indicate non-existance? esp with container
>> classes and the like. std.boxer wouldn't exist if int could indicate
>> non-existance.
>
> Yes! That is exactly right. The problem with using array.ptr as null for
> existance checks is that it's not orthogonal at all. It only works with arrays.

No, the key point you seem to be missing is: "if(x)" compares 'x' to null or 0. It is _not_ intended to test for existance, that is _not_ it's purpose.

The "if(x)" rule is true *for all types* even primitives (with the exception of a struct - because it is user defined and cannot be compared to null or 0).

If the variable 'x' is a reference type it compares the reference to null. Arrays are references, so it compares the array reference to null. Array references cannot be null. When an array reference would be null, the array.ptr is null. Therefore to compare an array to null, you compare the array.ptr to null.

This behaviour is _required_ to make arrays orthogonal with other references.

This behaviour is completely orthogonal *for all types* and this can be proven by example.

class C {}

char[] p = null;
C c = null;
int i = 0;

if (c) { //not true }
if (p) { //not true }
if (i) { //not true }

> It might also work with classes (not sure).

Yes, see above.

> What about primitives? No, it's back to an additional boolean or somesuch. That's why I think it's a crappy solution, and that's exactly the source of the if (array) dilemma in the first place.

The ability to express non-existance has nothing to do with the "if(x)" statement. The "if(x)" statement's purpose is not specifically to test for non-existance. I repeat:
  "if(x)" compares 'x' to null or 0

That's it.

Yes, you can use it to test for non-existance with reference and pointer types. That however is not it's purpose.

> C# 2.0 will "solve" this problem with the concept of nullable types.

And we have std.boxer.

> Even ints will be nullable. I'm not sure how this is going to work (haven't tried it), but at least it's orthogonal. It works everywhere.

Likely it's going to work like std.boxer except automagically.

> Whereas array.ptr is shaky, buggy, likely to change and IMHO unsemantic. If we at least see that this is a problem, and that there is a need for a more complete feature, maybe we can work towards a better solution.

IMO there is no problem with "if(x)". Not being able to represent non-existance is a tradeoff when using value types. std.boxer is the solution to those tradeoffs, that or using pointers.

>>> Note the same behavior can be obtained with returning a singleton
>>> empty just for eof, if desired. The singleton approach could arguably
>>> make the code more readable, too, since the reader wouldn't have to know
>>> that
>>> null line meant eof. For example
>>>  char[] line = din.readLine();
>>>  while (line !is din.eofLine()) { ... line = din.readLine(); }
>>> where eofLine can return null or if the stream author wishes it can
>>> return some other unique empty string.
>>
>> That code is more descriptive, sure. However, null is more generic in
>> application. You can use it 'everywhere' and everywhere it is used it can
>> have the same meaning. This means no 'special case' code is required (like
>> that shown above).
>
> That's not true. 'Everywhere' would mean complete orthogonality, and as we know, this trick only works with certain types.

You are correct. I meant only to refer to reference and pointer types above which can express non existance.

However, I repeat (because this is an important fact): The purpose of "if(x)" is not to test for non-existance, it's purpose is to compare 'x' to null or 0. Nothing more, nothing less.

> But I agree with the premise, that nullness is a great (and easy) special value that makes life simpler.

In this we agree. :)

> Thus a good solution should be built into the language.

IMO a good solution _is_ built into the language. Arrays, are a good solution to the problem posed by types which can represent non-existance, that problem being that the added expressiveness comes with greater risk of accidental use. Arrays cannot be null, yet can represent non-existance, they're a great solution.

Unfortunately they're not the solution to the "non-existance of value types" problem, which currently has 2 solutions:
  - std.boxer.
  - pointers.

Both these solutions involve references/pointers that can be null, so they suffer from the risk involved in using null, unlike arrays.

Regan