Jump to page: 1 2 3
Thread overview
Just a friendly reminder about using arrays in boolean conditions
Nov 17
kdevel
Nov 23
kdevel
Re: Just a friendly reminder about using arrays in boolean conditions - null index safety
Nov 18
user1234
Nov 18
user1234
Nov 18
Dom DiSc
Nov 18
monkyyy
Nov 18
user1234
Nov 23
IchorDev
6 days ago
Walter Bright
4 days ago
kdevel
4 days ago
deadalnix
November 17
Just a friendly reminder with regards to arrays and using them in if conditions, assertions, etc. - the language unfortunately checks that they're non-null - like pointers - rather than non-empty like many people seem to expect.

I just had to fix a bug at work that was caused by someone using strings in
boolean conditions where the code was clearly supposed to be checking for
empty strings - e.g `if(!str)` and `str ? result1 : result2`
- and it broke when some code started giving it empty strings instead of
null strings.

>From what I've seen in discussions on this in the past, many people
misunderstand how this works and think that using arrays as boolean conditions checks for non-empty when in fact it checks whether the ptr field is non-null. This matches the behavior for pointers but is almost never what you actually want to do with arrays, because it's almost never a good idea to distinguish between null arrays and empty arrays, since it's quite error-prone (e.g. `"".idup is null` is true). And to make matters worse, because it's so frequently misunderstood, even if you do have a use case where it makes sense to check for non-null, it's almost never a good idea to do it implicitly instead of using !is null simply because without being explicit about it, there's pretty much no way that anyone else reading the code is going to know that the author actually meant to check for non-null instead of misunderstanding how the language feature works, since the odds are much higher that they misunderstood.

Personally, I'd love to see using arrays in boolean conditions simply be deprecated and then made an error (and maybe changed to check for empty at some point in the future if we want to), since it's often used incorrectly, and even when it is used correctly, it's pretty much always a good idea to do an explicit check instead of an implicit one so that it's clear to anyone else reading the code what you meant to check for.

In any case, given that I just ran into this issue in production code, and that it's something that seems to be frequently misunderstood, I thought that I'd bring it up as a reminder in the hopes that more people would understand the actual behavior and not make this mistake.

- Jonathan M Davis



November 17
On Sunday, 17 November 2024 at 08:57:35 UTC, Jonathan M Davis wrote:
> Personally, I'd love to see using arrays in boolean conditions simply be deprecated and then made an error (and maybe changed to check for empty at some point in the future if we want to), since it's often used incorrectly, and even when it is used correctly, it's pretty much always a good idea to do an explicit check instead of an implicit one so that it's clear to anyone else reading the code what you meant to check for.

It was deprecated and then un-deprecated - see https://issues.dlang.org/show_bug.cgi?id=4733#c38 for why it was problematic, possible solutions and also a link to a 2015 discussion.
November 17
On Sunday, 17 November 2024 at 16:47:39 UTC, Nick Treleaven wrote:
> [...]
> It was deprecated and then un-deprecated - see https://issues.dlang.org/show_bug.cgi?id=4733#c38 for why it was problematic, possible solutions and also a link to a 2015 discussion.

Problematic is valid code like

   if (auto arr = make_me_an_array ()) {
      ...
   }

There is another interesting aspect as Steven pointed out in that
2015 discussion [1]:

   || The "truthiness" of an array says it's true ONLY if both the
   || pointer and length are 0.
   | Ugh, *false* only if they are both 0. If either are not zero,
   | then it's true.

And later he stated [2]

   | arr.ptr == null -> arr contains a null pointer (length could
   | technically be non-zero).

My question is: Is it possible that a valid D program gets into a
state where an array has ptr == null and length > 0? If so, how?

Such a "null array" can at least not be printed:

   import std.stdio;

   void main ()
   {
      int [] arr;

      struct V {
         size_t length;
         void *ptr;
      }
      auto q = cast (V *) &(arr);
      (*q).length = 2;

      writeln (arr); // segfault in formatValueImpl
   }


[1] Re: string <-> null/bool implicit conversion
    https://forum.dlang.org/post/mr54qg$15qj$1@digitalmars.com

[2] https://forum.dlang.org/post/mr72sf$2sc7$1@digitalmars.com


November 17

On Sunday, 17 November 2024 at 21:50:18 UTC, kdevel wrote:

>

My question is: Is it possible that a valid D program gets into a
state where an array has ptr == null and length > 0? If so, how?

Yes, the compiler uses it:

struct S
{
    int x;
}

void main()
{
    auto i = typeid(S).initializer;
    assert(i.ptr is null);
    assert(i.length > 0);
}

For a type that is all 0, the compiler builds initializer to be a null array with a length. This signifies to the runtime that the type is all 0 initializer, but has a specific length. This allows saving binary space by not storing a bunch of 0s.

-Steve

November 17
On Sunday, November 17, 2024 9:47:39 AM MST Nick Treleaven via Digitalmars-d wrote:
> On Sunday, 17 November 2024 at 08:57:35 UTC, Jonathan M Davis
>
> wrote:
> > Personally, I'd love to see using arrays in boolean conditions simply be deprecated and then made an error (and maybe changed to check for empty at some point in the future if we want to), since it's often used incorrectly, and even when it is used correctly, it's pretty much always a good idea to do an explicit check instead of an implicit one so that it's clear to anyone else reading the code what you meant to check for.
>
> It was deprecated and then un-deprecated - see https://issues.dlang.org/show_bug.cgi?id=4733#c38 for why it was problematic, possible solutions and also a link to a 2015 discussion.

That doesn't really surprise me, and I'm not sure that the situation is really gonig to be fixed at some point, but IMHO, as things stand, no one should ever use an array in a boolean condition, because it's almost always doing the wrong thing, it's not clear whether the programmer actually knew that they were testing for non-null instead of empty, and really, no one should be testing arrays for null outside of some cases where you're taking the ptr value and passing it to C/C++ (and that code can just check the ptr itself). Trying to distinguish between null arrays and empty arrays is just too bug-prone. But as much as D improves on C/C++, we've made our own share of mistakes in the design of D.

Honestly, if we could change things without caring about backwards compatibility, I'd get rid of null arrays entirely - not get rid of having the ptr field for arrays be null, since that's obviously desirable, but get rid of the symbol null interacting with arrays themselves and the concept that an array itself could be null. The way that D's arrays work allows us to not worry about whether arrays are null (whereas languages where arrays are pointers or classes definitely have to worry about it), but we didn't take it quite far enough, likely at least in part because of how much Walter was used to thinking about arrays as being a pointer thanks to C/C++ when he came up with D dynamic arrays.

So, ideally, you wouldn't be able to initialize arrays with null (you'd need to use [] instead), null wouldn't convert to arrays at all, and it wouldn't be comparable with arrays at all. And then boolean conditions would check for non-empty rather than non-null. The whole mess would be cleaner that way, because then it would be entirely clear that it's empty that matters, not null, and we wouldn't have people trying to treat null arrays as special or get any of the confusion that we currently get with null vs empty. The edge cases where null is treated as special for arrays by either the language or by programmers is just error-prone, and the lower level code that needs to care whether the ptr itself is null would still be able to do that.

However, given how much null currently gets used with arrays, we could never make that change at this point. We might be able to change how arrays interact with boolean conditions to at least improve the situation, but that's probably as far as we'd be able to go even with editions.

- Jonathan M Davis



November 18

On Sunday, 17 November 2024 at 08:57:35 UTC, Jonathan M Davis wrote:

>

Just a friendly reminder with regards to arrays and using them in if conditions, assertions, etc. - the language unfortunately checks that they're non-null - like pointers - rather than non-empty like many people seem to expect.

I just had to fix a bug at work that was caused by someone using strings in
boolean conditions where the code was clearly supposed to be checking for
empty strings - e.g if(!str) and str ? result1 : result2

  • and it broke when some code started giving it empty strings instead of
    null strings.
>

From what I've seen in discussions on this in the past, many people
misunderstand how this works and think that using arrays as boolean conditions checks for non-empty when in fact it checks whether the ptr field is non-null. This matches the behavior for pointers but is almost never what you actually want to do with arrays, because it's almost never a good idea to distinguish between null arrays and empty arrays, since it's quite error-prone (e.g. "".idup is null is true). And to make matters worse, because it's so frequently misunderstood, even if you do have a use case where it makes sense to check for non-null, it's almost never a good idea to do it implicitly instead of using !is null simply because without being explicit about it, there's pretty much no way that anyone else reading the code is going to know that the author actually meant to check for non-null instead of misunderstanding how the language feature works, since the odds are much higher that they misunderstood.

Personally, I'd love to see using arrays in boolean conditions simply be deprecated and then made an error (and maybe changed to check for empty at some point in the future if we want to), since it's often used incorrectly, and even when it is used correctly, it's pretty much always a good idea to do an explicit check instead of an implicit one so that it's clear to anyone else reading the code what you meant to check for.

Well I agree, even if that will take years to have .length tested instead of .ptr.

Little story however: I've encountered a case where the explict check was also wrong.

module runnable;

import std.stdio;

void v(string s)
{
    if (s.length)           writeln("case length :`", s, "`");
    else if (s is null)     writeln("case null :`", s,  "`");
    else                    writeln("case not null but no length:`", s,  "`");
}

void main(string[] args)
{
    v("hello");
    v(null);
    v("");
}

The different semantics between null and "" for strings is well illustrated here I'd say.

November 18

On Monday, 18 November 2024 at 12:20:17 UTC, user1234 wrote:

>

On Sunday, 17 November 2024 at 08:57:35 UTC, Jonathan M Davis wrote:

>

[...]

Well I agree, even if that will take years to have .length tested instead of .ptr.

Little story however: I've encountered a case where the explict check was also wrong.

module runnable;

import std.stdio;

void v(string s)
{
    if (s.length)           writeln("case length :`", s, "`");
    else if (s is null)     writeln("case null :`", s,  "`");
    else                    writeln("case not null but no length:`", s,  "`");
}

void main(string[] args)
{
    v("hello");
    v(null);
    v("");
}

The different semantics between null and "" for strings is well illustrated here I'd say.

I promised a litlle sotry, here it is:

https://gitlab.com/basile.b/harbored-mod/-/commit/5ec14dfe98f91384c1b50e7c50fb4767a297d06f

quite the exact same problem, just here the author tested against null explictly.

November 18

On Monday, 18 November 2024 at 12:24:01 UTC, user1234 wrote:

> >
void v(string s)
{
    if (s.length)           writeln("case length :`", s, "`");
    else if (s is null)     writeln("case null :`", s,  "`");
    else                    writeln("case not null but no length:`", s,  "`");
}

One should always first check for null and then for length. This should be immediately clear, as asking for a length doesn't make sense if something is null.
Ok, an array has both a pointer and a length, but I would never expect the length to contain something useful, if the pointer is not assigned a legal address.
If I were to implement arrays, I would use a simple pointer, and length would be the first element of the allocated block. For any object, I have always in mind that it could be implemented in this way, so would never access anything as long as the pointer is not checked first.

November 18

On Monday, 18 November 2024 at 15:39:38 UTC, Dom DiSc wrote:

>

If I were to implement arrays, I would use a simple pointer, and length would be the first element of the allocated block.

That wouldn't allow efficient slicing.

November 18

On Monday, 18 November 2024 at 15:39:38 UTC, Dom DiSc wrote:

>

length would be the first element of the allocated block. For any object

Id call that a maxlengtharray and make it not a pointer

« First   ‹ Prev
1 2 3