Thread overview | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
July 08, 2017 The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Yesterday I noticed that std.uri.decodeComponent does not 'preserve' the nullity of its argument: 1 void main () 2 { 3 import std.uri; 4 string s = null; 5 assert (s is null); 6 assert (s.decodeComponent); 7 } The assertion in line 6 fails. This failure gave rise to a more general investigation on strings. After some research I found that one "cannot implicitly convert expression (s) of type string to bool" as in 1 void main () 2 { 3 string s; 4 bool b = s; 5 } Nonetheless in certain boolean contexts strings convert to bool as here: 1 void main () 2 { 3 import std.stdio; 4 string s; // equivalent to s = null 5 writeln (s ? true : false); 6 s = ""; 7 writeln (s ? true : false); 8 } The code prints false true to the console. This lead me to the insight, that in D there are two distinct kinds of empty strings: Those having a ptr which is null and the other. It seems that this ptr nullity not only determines whether the string compares equal to null in an IdentityExpression [1] but also the result of the above mentioned conversion in the boolean context. I wonder if this distinction is meaningful and---if not---why it is exposed to the application programmer so prominently. Then today I found this piece of code 1 void main () 2 { 3 string s = null; 4 string t = ""; 5 assert (s is t); 6 } which, according to the wording in [1] "For static and dynamic arrays, identity is defined as referring to the same array elements and the same number of elements." shall succeed but its assertion fails [2]. I anticipate the implementation compares the ptrs even in the case of zero elements. A last example of 'deviant behavior' I found is this: 1 import std.stdio; 2 import std.file; 3 void main () 4 { 5 string s = null; 6 try 7 mkdir (s); 8 catch (Exception e) 9 e.msg.writeln; 10 11 s = ""; 12 try 13 mkdir (s); 14 catch (Exception e) 15 e.msg.writeln; 16 } Using DMD v2.073.2 the first expression terminates the programm with a segmentation fault. With 2.074.1 the program prints : Bad address : No such file or directory I find that a bit confusing. [1] https://dlang.org/spec/expression.html#identity_expressions [2] https://issues.dlang.org/show_bug.cgi?id=17623 |
July 08, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to kdevel | On 08.07.2017 19:16, kdevel wrote: > > I wonder if this distinction is meaningful Not nearly as much as it would need to be to justify the current behavior. It's mostly a historical accident. > and---if not---why it is > exposed to the application programmer so prominently. I don't think there is a good reason except backwards-compatibility. Also see: https://github.com/dlang/dmd/pull/4623 (This is the pull request that restored the bad behaviour after it had been fixed.) |
July 08, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to kdevel | On 07/08/2017 07:16 PM, kdevel wrote: > The assertion in line 6 fails. This failure gave rise to a more general > investigation on strings. After some research I found that one > "cannot implicitly convert expression (s) of type string to bool" as in [...] > Nonetheless in certain boolean contexts strings convert to bool as here: > > 1 void main () > 2 { > 3 import std.stdio; > 4 string s; // equivalent to s = null > 5 writeln (s ? true : false); > 6 s = ""; > 7 writeln (s ? true : false); > 8 } Yeah, that's considered "explicit". Also happens with `if (s)`. > The code prints > > false > true > > to the console. This lead me to the insight, that in D there are two > distinct kinds of empty strings: Those having a ptr which is null and > the other. It seems that this ptr nullity not only determines whether > the string compares equal to null in an IdentityExpression [1] but also > the result of the above mentioned conversion in the boolean context. Yup. Though I'd say the distinction is null vs every other array, not null vs other empty arrays. null is one specific array. It happens to be empty, but that doesn't really matter. `foo is null` compares with the null array. It doesn't check for emptiness. Conversion to bool also compares with null. The concept of emptiness is unrelated. Maybe detecting empty arrays would be more useful. As far as I know, there's no killer argument either way. Changing it now would break code, of course. Personally, I wouldn't mind if those conversions to bool just went away. It's not obvious what exactly is being checked, and it's not hard to be explicit about it with .ptr and/or .length. But as Timon notes, that has been attempted, and it broke code. So it was reverted, and that's that. > I wonder if this distinction is meaningful and---if not---why it is > exposed to the application programmer so prominently. "Prominently"? It only shows up when you convert to bool. You only get surprised if you expect that to check for emptiness (or something else entirely). And you don't really have a reason to expect that. You can easily avoid the issue by being more explicit in your code (`arr.ptr is null`, `arr.length == 0`/`arr.empty`). > Then today I found this piece of code > > 1 void main () > 2 { > 3 string s = null; > 4 string t = ""; > 5 assert (s is t); > 6 } > > which, according to the wording in [1] > > "For static and dynamic arrays, identity is defined as referring to > the same array elements and the same number of elements." > > shall succeed but its assertion fails [2]. I anticipate the > implementation compares the ptrs even in the case of zero elements. The spec isn't very clear there. What does "the same array elements" mean for empty arrays? Can two arrays refer to "the same array elements" but have different lengths? It seems like "referring to the same array elements" is supposed to mean "having the same value in .ptr" without mentioning .ptr. The implementation obviously compares .ptr and .length. > A last example of 'deviant behavior' I found is this: > > 1 import std.stdio; > 2 import std.file; > 3 void main () > 4 { > 5 string s = null; > 6 try > 7 mkdir (s); > 8 catch (Exception e) > 9 e.msg.writeln; > 10 > 11 s = ""; > 12 try > 13 mkdir (s); > 14 catch (Exception e) > 15 e.msg.writeln; > 16 } > > Using DMD v2.073.2 the first expression terminates the programm with a > segmentation fault. With 2.074.1 the program prints > > : Bad address > : No such file or directory > > I find that a bit confusing. That looks like a bug/oddity in mkdir. null is as valid a string as "". It shouldn't give a worse exception message. But the message for `""` isn't exactly good, either. Of course the directory doesn't exist, yet; I'm trying to create it! |
July 08, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to kdevel | On Saturday, July 8, 2017 5:16:51 PM MDT kdevel via Digitalmars-d-learn wrote:
> Yesterday I noticed that std.uri.decodeComponent does not
> 'preserve' the
> nullity of its argument:
>
> 1 void main ()
> 2 {
> 3 import std.uri;
> 4 string s = null;
> 5 assert (s is null);
> 6 assert (s.decodeComponent);
> 7 }
>
> The assertion in line 6 fails. This failure gave rise to a more
> general
> investigation on strings. After some research I found that one
> "cannot implicitly convert expression (s) of type string to bool"
> as in
>
> 1 void main ()
> 2 {
> 3 string s;
> 4 bool b = s;
> 5 }
>
> Nonetheless in certain boolean contexts strings convert to bool as here:
>
> 1 void main ()
> 2 {
> 3 import std.stdio;
> 4 string s; // equivalent to s = null
> 5 writeln (s ? true : false);
> 6 s = "";
> 7 writeln (s ? true : false);
> 8 }
>
> The code prints
>
> false
> true
>
> to the console. This lead me to the insight, that in D there are
> two
> distinct kinds of empty strings: Those having a ptr which is null
> and
> the other. It seems that this ptr nullity not only determines
> whether
> the string compares equal to null in an IdentityExpression [1]
> but also
> the result of the above mentioned conversion in the boolean
> context.
>
> I wonder if this distinction is meaningful and---if not---why it
> is
> exposed to the application programmer so prominently.
>
> Then today I found this piece of code
>
> 1 void main ()
> 2 {
> 3 string s = null;
> 4 string t = "";
> 5 assert (s is t);
> 6 }
>
> which, according to the wording in [1]
>
> "For static and dynamic arrays, identity is defined as
> referring to
> the same array elements and the same number of elements."
>
> shall succeed but its assertion fails [2]. I anticipate the implementation compares the ptrs even in the case of zero elements.
>
> A last example of 'deviant behavior' I found is this:
>
> 1 import std.stdio;
> 2 import std.file;
> 3 void main ()
> 4 {
> 5 string s = null;
> 6 try
> 7 mkdir (s);
> 8 catch (Exception e)
> 9 e.msg.writeln;
> 10
> 11 s = "";
> 12 try
> 13 mkdir (s);
> 14 catch (Exception e)
> 15 e.msg.writeln;
> 16 }
>
> Using DMD v2.073.2 the first expression terminates the programm
> with a
> segmentation fault. With 2.074.1 the program prints
>
> : Bad address
> : No such file or directory
>
> I find that a bit confusing.
>
> [1] https://dlang.org/spec/expression.html#identity_expressions [2] https://issues.dlang.org/show_bug.cgi?id=17623
A dynamic array in D is essentially
struct DynamicArray(T)
{
size_t length;
T* ptr;
}
That's not _exactly_ what it is at the moment (it actually does stuff with void* rather than templates unfortunately), but essentially, that's what it is and what it behaves like.
In the case of dyanamic arrays, null is a dynamic array whose ptr is null and whose length is 0.
The empty property for arrays checks whether the length of the array is 0. So, any array with a length of 0 (regardless of its ptr) is considered empty.
The is expression checks for bitwise equality. So,
arr is null
checks for whether the array has a null ptr and a 0 length. In _most_ circumstances, that's equvialent to checking that the array's ptr is null, but if you do something screwy with unitialized memory, then you could end up with a ptr value of null and a non-zero length, and
arr is null
would be false. The == expression, on the other, hand checks that the elements are equal. So, it does something similar to
if(lhs.length != rhs.length)
return false;
for(size_t i = 0; i < lhs.length; ++i)
{
if(lhs.ptr[i] != rhs.ptr[i])
return false;
}
return true;
So, if the lengths are 0, no iterating happens, and the two arrays are considered equal. This means that a null array is equal to any other empty array, regardless of the value of ptr. It's also why I would consider
arr == null
to be a code smell. IMHO, if you want to check for empty, then you should use the empty property or check length directly, since those are clear about your intent, whereas with
arr == null
you always have the question of whether they should have used an is expression or whether they were simpy checking for an empty array.
If you understand all of this, it is perfectly possible to write code which treats null arrays as distinct from empty arrays. However, it's _very_ easy to get into a situation where you have an empty array rather than a null one. Pretty much as soon as you do anything to a null array other than pass it around or compare it, trusting that it's still null can get error-prone. And that's why a number of folks think that it's just plain error-prone to try and treat null arrays as special - but some folks who understand the issues continue to do so anyway, because they know enough to make it work and consider the distinction valuable.
Personally, I think that it can make sense to have a function explicitly return null to indicate something, but beyond that, I'd actually consider using std.typecons.Nullable to make the whole thing clear, even if it is a bit dumb to have to wrap a nullable type in a Nullable to treat it as null.
As for conversions to bool, not much implcitly converts to bool - dynamic arrays included. However, conditional expressions in if statements, loops, ternary expressions, and assertions actually insert an invisible, explicit cast. So, even though the conversion _looks_ implicit, it's actually explicit. So,
if(cond)
{
}
is actually
if(cast(bool)cond)
{
}
For user-defined types, that means that the way to affect how they're treated in condition expressions is to overload opCast to bool. For, built-in types, the result varies depending on how it was decided to casting that type to bool would work. For pointers,
cast(bool)ptr
becomes
ptr !is null
which makes a lot of sense. Unfortunately, because dynamic arrays were just pointers in C, D has historically treated dynamic arrays as pointers under certain circumstances and implictly converted them to value of their ptr property. Fortunately, in many cases, that has been fixed, and the compiler has gotten stricter. Unforunately, however, it is still the case that casting a dynamic array to bool checks its ptr value for null. This works fine if you know what you're doing but is frequently surprising to folks and is arguably error-prone. It _was_ temporarily fixed at one point by deprecating using arrays in conditional expressions, but some major D contributors (Andrei included) who understood how to correctly treat null, dynamic arrays as special did not like the change, and it was reverted.
So, basically, you should be _very_ wary of ever using a dynamic array in a conditional expression directly. If you know what you're doing, it can be done correctly, but it's error prone, and it's arguably a code smell, because folks reading your code don't necessarily know that you know what you're doing well enough to get it right.
- Jonathan M Davis
|
July 08, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | Just saw that my first example was wrong, it should read 1 void main () 2 { 3 import std.uri; 4 string a = ""; 5 assert (a); 6 auto s = a.decodeComponent; 7 assert (s); 8 } The non-nullity was not preserved. Only the second assert fails. On Saturday, 8 July 2017 at 18:39:47 UTC, ag0aep6g wrote: > On 07/08/2017 07:16 PM, kdevel wrote: > null is one specific array. It happens to be empty, but that doesn't really matter. `foo is null` compares with the null array. It doesn't check for emptiness. Conversion to bool also compares with null. The concept of emptiness is unrelated. But why? What is the intended use of converting a string (or any other dynamic array) to bool? In Issue 17623 Vladimir pointed out, that in the case of strings there may be a need to store an empty D-string which also is a NUL-terminated C-String. It would be sufficient if the ptr-Value would convert for checking if there is a valid part of memory containing the NUL byte. Moreover everything I've written about strings is also valid for e.g. dynamic arrays of doubles. Here there are also two different kinds of empty arrays which compare equal but are not identical. I see no purpose for that. >> I wonder if this distinction is meaningful and---if not---why it is >> exposed to the application programmer so prominently. > > "Prominently"? It only shows up when you convert to bool. The conversion to bool (in a bool context) is part of the interface of the type. The interface of a type *is* prominently exposed. > You only get surprised if you expect that to check for emptiness (or something else entirely). As mentioned I was surprised, that the non-nullity did not pass thru decodeComponent. > The spec isn't very clear there. What does "the same array elements" mean for empty arrays? Mathematically that's easily answered: https://en.wikipedia.org/wiki/Universal_quantification#The_empty_set (mkdir) >> Using DMD v2.073.2 the first expression terminates the programm with a >> segmentation fault. With 2.074.1 the program prints >> >> : Bad address >> : No such file or directory >> >> I find that a bit confusing. > > That looks like a bug/oddity in mkdir. null is as valid a string as "". It shouldn't give a worse exception message. > > But the message for `""` isn't exactly good, either. Of course the directory doesn't exist, yet; I'm trying to create it! I would expect the same error message (ENOENT) in both cases. The EFAULT in the first case occurs if you invoke POSIX mkdir with NULL as first argument. |
July 09, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to kdevel | On 07/09/2017 01:12 AM, kdevel wrote: > On Saturday, 8 July 2017 at 18:39:47 UTC, ag0aep6g wrote: >> On 07/08/2017 07:16 PM, kdevel wrote: > >> null is one specific array. It happens to be empty, but that doesn't really matter. `foo is null` compares with the null array. It doesn't check for emptiness. Conversion to bool also compares with null. The concept of emptiness is unrelated. > > But why? What is the intended use of converting a string (or any other dynamic array) to bool? As I said: I wouldn't mind if it went away. I don't see a strong use case that justifies the non-obvious behavior of `if (arr)`. But apparently it is being used, and breaking code is a no-no. As for how it's used, I'd start digging at the link Timon has posted. > In Issue 17623 Vladimir pointed out, that in the case of strings there may be a need to store an empty D-string which also is a NUL-terminated C-String. It would be sufficient if the ptr-Value would convert for checking if there is a valid part of memory containing the NUL byte. But just looking at .ptr doesn't tell if there's a '\0'. You'd have to dereference the pointer too. And that's not what Vladimir is getting at. Issue 17623 is about `arr1 is arr2`, not about conversions to bool like `if (arr)`. It makes sense that `null !is ""`. They're not "the same". One place where the difference matters is when working with C strings. Issue 17623 is absolutely valid. But it's much more likely that the spec will be changed rather than the implementation. > Moreover everything I've written about strings is also valid for e.g. dynamic arrays of doubles. Here there are also two different kinds of empty arrays which compare equal but are not identical. I see no purpose for that. So you'd make `arr1 is arr2` true when they're empty, ignoring a difference in pointers. Otherwise, it would still compare pointers. Right? I don't think that's a good idea, simply because it's a special case. I noticed that you haven't mentioned `==`. You're probably aware of it, but if not we might be talking past each other. So, just to be clear: You can also compare arrays with `==` which compares elements. `null == ""` is true. >> You only get surprised if you expect that to check for emptiness (or something else entirely). > > As mentioned I was surprised, that the non-nullity did not pass thru decodeComponent. decodeComponent doesn't seem to return the same (identical) string you pass it, most of the time. Try "foo": ---- void main() { import std.uri; string a = "foo"; auto s = a.decodeComponent; assert(s == a); /* passes */ assert(s is a); /* fails */ } ---- decodeComponent simply gives no promise of preserving pointers. You also shouldn't rely on it returning null for a null input, even when it currently does that. >> The spec isn't very clear there. What does "the same array elements" mean for empty arrays? > > Mathematically that's easily answered: https://en.wikipedia.org/wiki/Universal_quantification#The_empty_set So "two empty arrays refer to the same elements" is true because everything said about the elements of empty arrays is true? Is "two empty arrays do *not* refer to the same elements" also true? |
July 09, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Saturday, 8 July 2017 at 23:12:15 UTC, Jonathan M Davis wrote: > On Saturday, July 8, 2017 5:16:51 PM MDT kdevel via Digitalmars-d-learn wrote: [...] > IMHO, if you want to check for empty, then you should use the empty property or check length directly, since those are clear about your intent, whereas with My starting point wasn't to check for emptiness but the question if I can use the additional two states (string var is null or !is null) of a string variable to indicate if a value is absent. > If you understand all of this, it is perfectly possible to write code which treats null arrays as distinct from empty arrays. However, it's _very_ easy to get into a situation where you have an empty array rather than a null one. My case was: I get a null one from "".decodeComponent where I did not expect it. (cf. my corrected example in my post "13 hours ago", i.e Saturday, 08 July 2017, 23:12:20 +00:00). > Pretty much as soon as you do anything to a null array other than pass it around or compare it, trusting that it's still null can get error-prone. It's the other way round. I was assuming that it is still not null (My example in my first post was wrong). [...] > Personally, I think that it can make sense to have a function explicitly return null to indicate something, but beyond that, I'd actually consider using std.typecons.Nullable to make the whole thing clear, even if it is a bit dumb to have to wrap a nullable type in a Nullable to treat it as null. You hit the nail on the head. Stefan |
July 09, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | On Sunday, 9 July 2017 at 10:32:23 UTC, ag0aep6g wrote: > On 07/09/2017 01:12 AM, kdevel wrote: >> On Saturday, 8 July 2017 at 18:39:47 UTC, ag0aep6g wrote: >>> On 07/08/2017 07:16 PM, kdevel wrote: [...] >> Moreover everything I've written about strings is also valid for e.g. dynamic arrays of doubles. Here there are also two different kinds of empty arrays which compare equal but are not identical. I see no purpose for that. > > So you'd make `arr1 is arr2` true when they're empty, ignoring a difference in pointers. Otherwise, it would still compare pointers. Right? As a D novice am not in the position to suggest changes in the language (yet). I would appreciate a documentation that accurately represents what is implemented. > I don't think that's a good idea, simply because it's a special case. > > I noticed that you haven't mentioned `==`. You're probably aware of it, but if not we might be talking past each other. So, just to be clear: You can also compare arrays with `==` which compares elements. `null == ""` is true. As mentioned in the subject my posting is about the state of affairs wrt. the (non-)nullity of strings. In C/C++ once a char * variable became non-NULL 'it' never loses this property. In D this is not the case: The non-null value "" 'becomes' null in "".decodeComponent >>> You only get surprised if you expect that to check for emptiness (or something else entirely). >> >> As mentioned I was surprised, that the non-nullity did not pass thru decodeComponent. > > decodeComponent doesn't seem to return the same (identical) string you pass it, most of the time. Sure. But I am writing about the string value which comprises the (non-)nullity of the string. This is not preserved. [...] > decodeComponent simply gives no promise of preserving pointers. string is not a pointer but a type. To the user of string it is completely irrelevant, if the nullity of the string is implemented by referring to a pointer inside the implementation of string. > You also shouldn't rely on it returning null for a null input, even when it currently does that. I assumed that a non-null string is returned for a non-null input. >>> The spec isn't very clear there. What does "the same array elements" mean for empty arrays? >> >> Mathematically that's easily answered: https://en.wikipedia.org/wiki/Universal_quantification#The_empty_set > > So "two empty arrays refer to the same elements" is true because everything said about the elements of empty arrays is true? Is "two empty arrays do *not* refer to the same elements" also true? Yes. But that second proposition what not the one chosen in the documentation. It was not chosen because it does not extend to the nontrivial case where one has more than zero elements. ;-) Stefan |
July 09, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to kdevel | On Sunday, 9 July 2017 at 13:51:44 UTC, kdevel wrote:
> But that second proposition what not the one chosen in the documentation.
Shall read: "But that second predicate was not the one chosen in the documentation."
|
July 09, 2017 Re: The Nullity Of strings and Its Meaning | ||||
---|---|---|---|---|
| ||||
Posted in reply to kdevel | On 07/09/2017 03:51 PM, kdevel wrote: > On Sunday, 9 July 2017 at 10:32:23 UTC, ag0aep6g wrote: [...] > As mentioned in the subject my posting is about the state of affairs wrt. the (non-)nullity of strings. In C/C++ once a char * variable became non-NULL 'it' never loses this property. In D this is not the case: The non-null value "" > 'becomes' null in > > "".decodeComponent Nullity of D strings is quite different from nullity of C strings. A null D string is a valid string with length 0. A null char* is not a proper C string. It doesn't have length 0. It has no length. A C function can't return a null char* when it's supposed to return an empty string. But a D function can return a null char[] in that case. [...] > Sure. But I am writing about the string value which comprises the (non-)nullity of the string. This is not preserved. Just like other pointers are not preserved. In the .ptr field of a D array, a null pointer isn't special. Null arrays aren't special beyond having a unique name. [...] > string is not a pointer but a type. To the user of string it is completely irrelevant, if the nullity of the string is implemented by referring to a pointer inside the implementation of string. string is a type that involves a pointer. The type is not opaque. The user can access the pointer. A null array is not some magic (invalid) value. It's just just the one that has a null .ptr and a zero .length. I think that's widely known, but it might not actually be in the spec. At least, I can't find it. The page on arrays [1] just says that "`.init` returns `null`" and that "pointers are initialized to `null`, without saying what null means for arrays. On the `null` expression [2], the spec mentions a "null value" of arrays, but again doesn't say what that means. >> You also shouldn't rely on it returning null for a null input, even when it currently does that. > > I assumed that a non-null string is returned for a non-null input. As far as I see, you had no reason to assume that. If the spec or some other document mislead you, it needs fixing. [...] > Yes. But that second proposition what not the one chosen in the documentation. It was not chosen because it does not extend to the nontrivial case where one has more than zero elements. ;-) Or the spec's just poorly written there, and wasn't meant the way you've interpreted it. [1] https://dlang.org/spec/arrays.html [2] https://dlang.org/spec/expression.html#null |
Copyright © 1999-2021 by the D Language Foundation