Just a friendly reminder about using arrays in boolean conditions (page 2)

Settings

Help

Index » General » Just a friendly reminder about using arrays in boolean conditions (page 2)

November 18

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by user1234
in reply to Dom DiSc

Permalink

user1234

Posted in reply to Dom DiSc

Permalink

On Monday, 18 November 2024 at 15:39:38 UTC, Dom DiSc wrote:

On Monday, 18 November 2024 at 12:24:01 UTC, user1234 wrote:

> >

void v(string s)
{
    if (s.length)           writeln("case length :`", s, "`");
    else if (s is null)     writeln("case null :`", s,  "`");
    else                    writeln("case not null but no length:`", s,  "`");
}

One should always first check for null and then for length. This should be immediately clear, as asking for a length doesn't make sense if something is null.

No! check the length and only the length ;)

November 18

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Jonathan M Davis
in reply to Dom DiSc

Permalink

Jonathan M Davis

Posted in reply to Dom DiSc

Permalink

On Monday, November 18, 2024 8:39:38 AM MST Dom DiSc via Digitalmars-d wrote:
> On Monday, 18 November 2024 at 12:24:01 UTC, user1234 wrote:
> >> ```d
> >> void v(string s)
> >> {
> >>
> >>     if (s.length)           writeln("case length :`", s, "`");
> >>     else if (s is null)     writeln("case null :`", s,  "`");
> >>     else                    writeln("case not null but no
> >>
> >> length:`", s,  "`");
> >> }
>
> One should always first check for null and then for length. This
> should be immediately clear, as asking for a length doesn't make
> sense if something is null.
> Ok, an array has both a pointer and a length, but I would never
> expect the length to contain something useful, if the pointer is
> not assigned a legal address.

It's really the opposite. If the length is 0, there's no need to look at the ptr member, and it could be anything. In the case of [] or null arrays, it's going to be null, and that works perfectly fine in D, because all accesses to the array do bounds checking, so if the length is 0, you will never dereference the pointer no matter what it is. This means that you really don't need to worry at all about the pointer being null.

Outside of cases where you're doing something like passing the ptr field to an extern(C) function, you really shouldn't care at all whether ptr is null, and it's a definite code smell if code does check for null. D's arrays have essentially eliminated the need to worry about null at all, whereas languages that use a pointer for arrays (e.g. C/C++) or use what's essentially a class reference (e.g. Java) have to worry about null, because if the array is null, they can't actually do anything with it. D does not have that problem, because we've put the length on the stack next to the ptr field. That approach also makes it possible to slice an array to get another array, which can be really nice.

> If I were to implement arrays, I would use a simple pointer, and length would be the first element of the allocated block. For any object, I have always in mind that it could be implemented in this way, so would never access anything as long as the pointer is not checked first.

But why would you need to care what the pointer was if the length was 0? There's no reason to ever dereference it in that case. It _really_ matters whether the pointer is null when that pointer is your entire access to the array, but that goes away when you have access to the length without needing to dereference the pointer. At that point, the value of the ptr really only matters when the length is greater than zero, because then you have elements to access via the pointer. But if it's 0, there are no elements, and you can entirely ignore the value of the pointer.

Also, making it so that the length is allocated with the elements would make it so that you couldn't slice arrays. If you wanted a subset of its elements, you'd be forced to either copy the elements or use a wrapper type which had its own length or pair of indices.

- Jonathan M Davis

November 18

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Jonathan M Davis
in reply to user1234

Permalink

Jonathan M Davis

Posted in reply to user1234

Permalink

On Monday, November 18, 2024 5:20:17 AM MST user1234 via Digitalmars-d wrote:
> Well I agree, even if that will take years to have .length tested instead of .ptr.
>
> Little story however: I've encountered a case where the explict check was also wrong.
>
> ```d
> module runnable;
>
> import std.stdio;
>
> void v(string s)
> {
>      if (s.length)           writeln("case length :`", s, "`");
>      else if (s is null)     writeln("case null :`", s,  "`");
>      else                    writeln("case not null but no
> length:`", s,  "`");
> }
>
> void main(string[] args)
> {
>      v("hello");
>      v(null);
>      v("");
> }
> ```
>
> The different semantics between `null` and `""` for strings is well illustrated here I'd say.

Yeah, I would consider it a code smell if code checks whether an array is null. It _can_ make sense if you do something like have a function return a null string on failure and a string with data (which could be empty) on success, since then that function is directly returning null. However, then you have to be concerned about the possibility of whether the code that generates the non-null string to return could ever return null instead by accident. So, ultimately, it's just better to use something like Nullable!string instead if you want to distinguish between a string with a value and a string without (and the same with any array type).

However, if for whatever reason, someone _does_ choose to have their code
differentiate between null and empty (much as I think that that's a terrible
idea), it's better if it's explicit about it rather than doing it implicitly
with something like if(arr), because then it's not clear what the programmer
was trying to do. It's possible that the programmer was intentially checking
for null, and it's possible that they meant to check for empty (I've seen
the latter case more frequently than the former even though that's not what
the code actually does). However, if the check is explicit - e.g.
if(arr is null) - then it's clear that the programmer intended to check for
null. Whether that then causes other problems is another matter, but at
least the intention of the code is clear, whereas it isn't if the check is
implicit. So, IMHO, if someone is going to do that check, it should always
be explicit - but of course, it would be better to simply not care whether
an array is null, because it takes very little to end up with a null array
when you didn't intend to (or a non-null empty one when you wanted null).
The rest of the language just isn't designed to care about null with arrays.

- Jonathan M Davis

November 18

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Jonathan M Davis
in reply to Steven Schveighoffer

Permalink

Jonathan M Davis

Posted in reply to Steven Schveighoffer

Permalink

On Sunday, November 17, 2024 3:17:18 PM MST Steven Schveighoffer via Digitalmars-d wrote:
> On Sunday, 17 November 2024 at 21:50:18 UTC, kdevel wrote:
> > My question is: Is it possible that a valid D program gets into
> > a
> > state where an array has ptr == null and length > 0? If so, how?
>
> Yes, the compiler uses it:
>
> ```d
> struct S
> {
>      int x;
> }
>
> void main()
> {
>      auto i = typeid(S).initializer;
>      assert(i.ptr is null);
>      assert(i.length > 0);
> }
> ```
>
> For a type that is all 0, the compiler builds `initializer` to be a null array with a length. This signifies to the runtime that the type is all 0 initializer, but has a specific length. This allows saving binary space by not storing a bunch of 0s.

Does this actually appear in the wild at all, or is this really something that's just restricted to what the compiler is doing? Obviously, based on your example, if you use initializer, you have this problem, but is this something that arrays in general need to worry about? There's definitely code out there (including in Phobos) which uses ptr to avoid bounds checks when it already knows that the indices it's using are in bounds, and if code like that runs into an array with a null ptr but non-zero length, it's going to segfault. Array code in general just assumes that that's not a thing, and this is the first I've ever heard of it being a thing.

- Jonathan M Davis

November 23

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by kdevel
in reply to Steven Schveighoffer

Permalink

kdevel

Posted in reply to Steven Schveighoffer

Permalink

On Sunday, 17 November 2024 at 22:17:18 UTC, Steven Schveighoffer wrote:

On Sunday, 17 November 2024 at 21:50:18 UTC, kdevel wrote:

My question is: Is it possible that a valid D program gets into a
state where an array has ptr == null and length > 0? If so, how?

Yes, the compiler uses it:

[...]
    auto i = typeid(S).initializer;
[...]

Issue 20722 - typeid(X).initializer() breaks safety
https://issues.dlang.org/show_bug.cgi?id=20722

November 23

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by IchorDev
in reply to Dom DiSc

Permalink

IchorDev

Posted in reply to Dom DiSc

Permalink

On Monday, 18 November 2024 at 15:39:38 UTC, Dom DiSc wrote:

One should always first check for null and then for length. This should be immediately clear, as asking for a length doesn't make sense if something is null.
Ok, an array has both a pointer and a length, but I would never expect the length to contain something useful, if the pointer is not assigned a legal address.
If I were to implement arrays, I would use a simple pointer, and length would be the first element of the allocated block. For any object, I have always in mind that it could be implemented in this way, so would never access anything as long as the pointer is not checked first.

Having length be 0 if the pointer is null is useful. if(arr) should be the same as `if(arr.length) IMO.

November 25

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Walter Bright
in reply to Jonathan M Davis

Permalink

Walter Bright

Posted in reply to Jonathan M Davis

Permalink

C has an equivalent behavior distinguishing between a null pointer and a 0 length string:

```
char *s;  // string
if (s)    // pointer
if (*s)   // length
```

```
char[] a;     // array
if (a)        // pointer
if (a.length) // length
```

November 25

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

I neglected to mention that:

```
if (a)
```

is equivalent to:

```
if (a.ptr && a.length)
```

November 25

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Jonathan M Davis
in reply to Walter Bright

Permalink

Jonathan M Davis

Posted in reply to Walter Bright

Permalink

On Monday, November 25, 2024 1:50:07 AM MST Walter Bright via Digitalmars-d wrote:
> C has an equivalent behavior distinguishing between a null pointer and a 0 length string:
>
> ```
> char *s;  // string
> if (s)    // pointer
> if (*s)   // length
> ```
>
> ```
> char[] a;     // array
> if (a)        // pointer
> if (a.length) // length
> ```

Given that C arrays are pointers, there's definitely reason to care about whether they're null or not, but D arrays are not pointers. D arrays may have originally come from C arrays, but ultimately, they're fundamentally different from one another.

D arrays are designed in such a way that there is really no reason to care one whit whether they're null or not. They're not pointers, and you don't normally access their ptr member (and when you do, it's @system). When you access the individual elements, you get a RangeError if you attempt to access an element outside of the array, and if you want to know whether an index is within the array, you check its length. If its length isn't 0, then its ptr isn't null, and you don't have any reason to care about null. If its length is 0, then whether its ptr is null is also irrelevant, because you're not going to access non-existent elements.

The result of this is that there's really no reason to care about whether a D array is null, and code that cares is almost certainly buggy. And what compounds that is that precisely because D code in general does not care about null, it's not hard to end up in a situation where you get a null array when you might have expected an empty non-null array - or in some cases, you might end up with a non-null empty array when you might have expected a null one (though the former is more common from what I've seen). For instance, "".idup will give you null, not a non-null empty string, which makes perfect sense from an efficiency perspective given that almost no D code cares about the difference between null and empty. But it's precisely because almost nothing cares about the difference that it becomes very error-prone to treat null as special even if you want to.

For instance, a function could try to return null to indicate that it doesn't have a result and a non-null empty array to indicate that it has a result but that that result is empty (and of course a non-empty array when it has a result that isn't empty). However, while the null return might be clear and explicit and typically be checked immediately on return, it's really easy to get into a situation where you accidentally have a null array when you meant to have a non-null empty array, meaning that such code has a real risk of returning null when it wasn't intended - which is why such functions really should be returning something like a std.typecons.Nullable wrapping an array instead of trying to treat null arrays as special. Treating null arrays as special in D code is just begging for bugs.

As such, I would generally consider it a code smell to see an array in D checked for null instead of empty. It might make sense in some situations when dealing with extern(C) code, but even then, usually you're either passing a length along with it (in which case, a 0 length array shouldn't be dereferenced by C code either), or you're dealing with a string and need to pass a null-terminated string which typically means allocating a string anyway rather than returning the ptr of a D string that might be null. But in the vast majority of D code, checking an array for null almost certainly means that the code is doing something wrong. Checking pointers for null makes sense, because you don't want to dereference a null pointer, but D arrays are not pointers. They contain pointers and will potentially dereference them if their length isn't 0, but they themselves are not pointers and aren't going to be dereferenced if their ptr field is null, because then their length is 0, and it would result in a RangeError.

And to make matters worse, it seems that because of the fact that there's really no reason to care about null with arrays, it's often the case that when someone does it implicitly with an if condition, they think that they're testing for non-empty when they're actually testing for non-null. So, while it's already a code smell to see `if(arr !is null)`, from what I've seen, the odds are extremely high that `if(arr)` is just wrong, because it's not doing what the programmer intended.

There's just no good reason to do it, because it's routinely misunderstood - and that's on top of the fact that `if(arr !is null)` is almost certainly wrong behavior anyway, because outside of very rare cases, D code should not care whether an array is null or empty, because there is no need to maintain that distinction normally, and even trying to maintain that distiction in a section of code is likely to have problems at some point - if nothing else because none of the code interacting with it will make that distinction.

- Jonathan M Davis

November 25

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Jonathan M Davis
in reply to Walter Bright

Permalink

Jonathan M Davis

Posted in reply to Walter Bright

Permalink

On Monday, November 25, 2024 1:57:36 AM MST Walter Bright via Digitalmars-d wrote:
> I neglected to mention that:
>
> ```
> if (a)
> ```
>
> is equivalent to:
>
> ```
> if (a.ptr && a.length)
> ```

The core problem is that ptr is checked at all. Whether it's null or not is absolutely irrelevant to almost all D code. All that matters is length, because if length is 0, then you cannot dereference any elements, because there are none, and if length is non-zero, then there are elements there to be dereferenced. Either way, the ptr will only be dereferenced if there are elements to dereference, and thus whether it's null or not is irrelevant.

So, fundamentally, the check for null makes no sense even if it would have made sense with a C array, because a C array is just a naked pointer and has no language protections to ensure that you don't dereference it when it doesn't have elements. D arrays have those protections.

And routinely, when programmers use `if(arr)`, they seem to expect that it will check for `if(arr.length)` - to the point that even if checking for null arrays did make sense, I'd still consider `if(arr)` to be a code smell, because it's misunderstood so frequently that outside of a D expert, I cannot trust that the programmer who wrote it understood what they were doing, and even with a D expert, I would wonder if they'd made a mistake, since we all do that from time to time.

I started this thread precisely because I ran into a bug at work that was caused by this misunderstanding - and it was written my someone who has worked on Phobos before. So, I figured that I should remind folks of the issue in the hopes that it would be made less often - though ideally, we'd just deprecate using arrays directly in boolean conditions, because the current behavior is too error-prone.

- Jonathan M Davis

Top | Forum index | About this forum

Forums