July 25, 2007
Frits van Bommel wrote:
> Regan Heath wrote:
>> This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point.
>>
>> I'm in the 'distinguishable' camp.  I can see the merit.  At the very least it should be consistent!
> 
> They *are* distinguishable. That's why above code returns different results for the 'is' comparison...
> 

The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable.
Example:
	writefln("" is null); // false
	writefln("".dup is null); // true

"".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 25, 2007
Bruno Medeiros wrote:
> Frits van Bommel wrote:
>> Regan Heath wrote:
>>> This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point.
>>>
>>> I'm in the 'distinguishable' camp.  I can see the merit.  At the very least it should be consistent!
>>
>> They *are* distinguishable. That's why above code returns different results for the 'is' comparison...
>>
> 
> The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable.
> Example:
>     writefln("" is null); // false
>     writefln("".dup is null); // true
> 
> "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).

Ick.  IMO "".dup should allocate 1 byte of memory, set it to '\0' and create a reference to it with length of 0.

What do you mean by "empty arrays are conceptually the same as null arrays"?

To me null arrays (non-existant) and "" arrays (empty) are conceptually different.  null indicates the array does not exist (no set at all), "" indicates it does but contains no items (an empty set).

Regan
July 25, 2007
Regan Heath wrote:
> Bruno Medeiros wrote:
>> Frits van Bommel wrote:
>>> Regan Heath wrote:
>>>> This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point.
>>>>
>>>> I'm in the 'distinguishable' camp.  I can see the merit.  At the very least it should be consistent!
>>>
>>> They *are* distinguishable. That's why above code returns different results for the 'is' comparison...
>>>
>>
>> The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable.
>> Example:
>>     writefln("" is null); // false
>>     writefln("".dup is null); // true
>>
>> "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).
> 
> Ick.  IMO "".dup should allocate 1 byte of memory, set it to '\0' and create a reference to it with length of 0.
> 
> What do you mean by "empty arrays are conceptually the same as null arrays"?
> 

I meant that in current D they are semantically the same. (I should have used those words)

> To me null arrays (non-existant) and "" arrays (empty) are conceptually different.  null indicates the array does not exist (no set at all), "" indicates it does but contains no items (an empty set).
> 
> Regan

I know, and I agree, don't you recall the V2 string discussion:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55388

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 25, 2007
Regan Heath wrote:
>>> So, "" is < and == null!?
>>> and <=,== but not >=!?
>>>
>>
>> You didn't update all writefln's :)
> 
> <hangs head in shame> What can I say, I'm having a bad morning.
> 
>> Anyway, it feels like an undefined area in the language. Do the specs
>> say anything about how exactly arrays/strings/delegates should compare
>> to null? It seems to be more than comparing the pointer part of the
>> structs.
> 
> Not that I can find.  The array page does say:
> 
> "Strings can be copied, compared, concatenated, and appended:"
> ...
> "with the obvious semantics."
> 
> but not much more on the topic.  Under "Array Initialization" we see:
> 
>     * Pointers are initialized to null.
>     ..
>     * Dynamic arrays are initialized to having 0 elements.
>     ..
> 
> Which does not state that an array will be initialised to "null" but rather to something with 0 elements.
> 

It's in
http://www.digitalmars.com/d/expression.html#IdentityExpression
"For static and dynamic arrays, identity is defined as referring to the same array elements"

But in current D empty arrays can have a null identity (even if they don't allway have), so you can't use 'is' to try do distinguish null arrays from empty arrays. Thus effectively they are semantically the same in current D.


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 25, 2007
Bruno Medeiros wrote:
> Regan Heath wrote:
>> What do you mean by "empty arrays are conceptually the same as null arrays"?
>>
> 
> I meant that in current D they are semantically the same. (I should have used those words)

:)

>> To me null arrays (non-existant) and "" arrays (empty) are conceptually different.  null indicates the array does not exist (no set at all), "" indicates it does but contains no items (an empty set).
>>
>> Regan
> 
> I know, and I agree, don't you recall the V2 string discussion:
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55388 

Yes, I remember it.  I just forgot who was involved and what their opinions were.  I have a hard enough time keeping track of my own opinion let alone others.

Regan
July 25, 2007
Bruno Medeiros wrote:
> Regan Heath wrote:
>>>> So, "" is < and == null!?
>>>> and <=,== but not >=!?
>>>>
>>>
>>> You didn't update all writefln's :)
>>
>> <hangs head in shame> What can I say, I'm having a bad morning.
>>
>>> Anyway, it feels like an undefined area in the language. Do the specs
>>> say anything about how exactly arrays/strings/delegates should compare
>>> to null? It seems to be more than comparing the pointer part of the
>>> structs.
>>
>> Not that I can find.  The array page does say:
>>
>> "Strings can be copied, compared, concatenated, and appended:"
>> ...
>> "with the obvious semantics."
>>
>> but not much more on the topic.  Under "Array Initialization" we see:
>>
>>     * Pointers are initialized to null.
>>     ..
>>     * Dynamic arrays are initialized to having 0 elements.
>>     ..
>>
>> Which does not state that an array will be initialised to "null" but rather to something with 0 elements.
>>
> 
> It's in
> http://www.digitalmars.com/d/expression.html#IdentityExpression
> "For static and dynamic arrays, identity is defined as referring to the same array elements"

And under "Equality Expressions" we have:
"For static and dynamic arrays, equality is defined as the lengths of the arrays matching, and all the elements are equal."

Which is exactly what we see in the compare function, so it's following the spec.

But, this means "" compares equal to null and vice-versa which is something I think we want to change.

I am a little puzzled by the fact that:
"Identity Expressions" include ("is", "!is")
"Equality Expressions" include ("==", "!=", "is", "!is")
Why do "is" and "!is" exist in both equality and identity?

> But in current D empty arrays can have a null identity (even if they don't allway have), so you can't use 'is' to try do distinguish null arrays from empty arrays. Thus effectively they are semantically the same in current D.

I understand what you're saying now.

Given that null and "" have different ptr values, they therefore "refer to different array elements" and "is" should distinguish them.

But, in current D implementation the .dup function isn't distinguishing the cases and is instead duplicating "" resulting in null and therefore preventing further distinction of the 2 cases.

This is similar to the behaviour where setting length to 0 used to free the data pointer, turning an empty array into a null one.

So, this definately needs fixing I reckon.

Regan
July 25, 2007
Regan Heath wrote:
>>> I'm in the 'distinguishable' camp.  I can see the merit.  At the very least it should be consistent!
>>
>> They *are* distinguishable. That's why above code returns different results for the 'is' comparison...
> 
> True.  I guess what I meant to say was I'm in the '3 distict states' camp (which may be a camp of 1 for all I know).  See my reply to digitalmars.D for a definition of the 3 states.
> 
>> I for one am perfectly fine with "cast(char[]) null" meaning ".length == 0 && .ptr == null" 
> 
> Same here.
> 
>  > and with comparisons of arrays using == and friends
>> only inspecting the contents (not location) of the data.
> 
> I don't think an empty string (non-null, length == 0) should compare equal to a non-existant string (null, length == 0).  And vice-versa.
> 
> The only thing that should compare equal to null is null.  Likewise an empty array should only compare equal to another empty array.
>
> My reasoning for this is consistency, see at end.

Since null arrays have length 0, they *are* empty arrays :P.

> Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.

At least with that last paragraph I can agree ;)


Now, about this:

> All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg.
> 
>  > int opEquals(T)(T[] u, T[] v) {
>  >     if (u.length != v.length) return false;
>       if (u.length == 0) return (u.ptr == v.ptr);
>  >     for (size_t i = 0; i < u.length; i++) {
>  >         if (u[i] != v[i]) return false;
>  >     }
>  >     return true;
>  > }
> 
> This should mean "" == "" but not "" == null, likewise null == null but not null == "".

Let's look at this code:
---
import std.stdio;

void main()
{
    char[][] strings = ["hello world!", "", null];

    foreach (str; strings) {
        auto str2 = str.dup;
        if (str == str2)
            writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr);
        else
            writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr);
    }
}
---
The output is currently (on my machine):
=====
"hello world!" == "hello world!" (805BE60, F7CFBFE0)
"" == "" (805BE78, 0000)
"" == "" (0000, 0000)
=====
Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?
(Same goes for other ways to create different-ptr empty strings)

What you might have meant on that extra line might be more like:
---
       if (u.length == 0) return ((u.ptr is null) == (v.ptr is null));
---
which will return true if both .ptr values are null or both are non-null.
July 25, 2007
On Wed, 25 Jul 2007 15:30:10 +0100, Regan Heath <regan@netmail.co.nz> wrote:

>I am a little puzzled by the fact that:
>"Identity Expressions" include ("is", "!is")
>"Equality Expressions" include ("==", "!=", "is", "!is")
>Why do "is" and "!is" exist in both equality and identity?

From "Identity Expressions: "For operand types other than class objects, static or dynamic arrays, identity is defined as being the same as equality." That's the reason, I guess.

>
>> But in current D empty arrays can have a null identity (even if they don't allway have), so you can't use 'is' to try do distinguish null arrays from empty arrays. Thus effectively they are semantically the same in current D.
>
>I understand what you're saying now.
>
>Given that null and "" have different ptr values, they therefore "refer to different array elements" and "is" should distinguish them.
>
>But, in current D implementation the .dup function isn't distinguishing the cases and is instead duplicating "" resulting in null and therefore preventing further distinction of the 2 cases.
>
>This is similar to the behaviour where setting length to 0 used to free the data pointer, turning an empty array into a null one.
>
>So, this definately needs fixing I reckon.
>
>Regan

IMO, your proposal makes sense. I think, "" == null will be a rich source of confusion for newcomers. OTOH, I can live with the current implementation after it has been explained. Thanks to all of you.
July 25, 2007
>> The only thing that should compare equal to null is null.  Likewise an empty array should only compare equal to another empty array.
>  >
>  > My reasoning for this is consistency, see at end.
> 
> Since null arrays have length 0, they *are* empty arrays :P.

I can't tell in which way you're joking so I'm just going to come out with...

The length of something be it an array, a car, a <insert thing> is totally independant of whether it exists (though a non-existant item cannot have a length).

It either exists or it does not.  If it exists, it has a length which may or may not be zero.

Something which exists cannot be equal to something which doesn't.

Period.

>> Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.
> 
> At least with that last paragraph I can agree ;)

:)

> Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?

Oops, my bad.  My suggested code change is totally incorrect.  That'll teach me for posting while working on something else at the same time.

> (Same goes for other ways to create different-ptr empty strings)
> 
> What you might have meant on that extra line might be more like:
> ---
>        if (u.length == 0) return ((u.ptr is null) == (v.ptr is null));
> ---
> which will return true if both .ptr values are null or both are non-null.

Yes, and yes, I want "".dup to allocate a new 1 byte point at it and set length to 0.

Regan
July 25, 2007
Max Samukha wrote:
> On Wed, 25 Jul 2007 15:30:10 +0100, Regan Heath <regan@netmail.co.nz>
> wrote:
> 
>> I am a little puzzled by the fact that:
>> "Identity Expressions" include ("is", "!is")
>> "Equality Expressions" include ("==", "!=", "is", "!is")
>> Why do "is" and "!is" exist in both equality and identity?
> 
> From "Identity Expressions: "For operand types other than class
> objects, static or dynamic arrays, identity is defined as being the
> same as equality." That's the reason, I guess.

Ahh thanks, I was thinking of it solely in terms of dynamic arrays and ignoring all other types you might compare!

Regan