Just a friendly reminder about using arrays in boolean conditions (page 3)

On Saturday, 23 November 2024 at 00:57:01 UTC, kdevel wrote:

On Sunday, 17 November 2024 at 22:17:18 UTC, Steven Schveighoffer wrote:

On Sunday, 17 November 2024 at 21:50:18 UTC, kdevel wrote:

My question is: Is it possible that a valid D program gets into a
state where an array has ptr == null and length > 0? If so, how?

Yes, the compiler uses it:

[...]
    auto i = typeid(S).initializer;
[...]

Issue 20722 - typeid(X).initializer() breaks safety
https://issues.dlang.org/show_bug.cgi?id=20722

Indeed, I don't think this should be considered safe. I don't know how much code would break to change it to system.

But this is beside the point that I was trying to make -- that this scenario does actually ahppen.

You can think of it as a storage of a pointer, which if null means "all are zero", and a length, which is the number of bytes in the initializer.

It is used in places where only TypeInfo is available. I believe this is mostly for AA and array runtime, and nothing else. Eventually we will be able to remove these dependencies.

-Steve

On 11/25/2024 1:53 AM, Jonathan M Davis wrote: > The core problem is that ptr is checked at all. Whether it's null or not is > absolutely irrelevant to almost all D code. It is relevant in the way I use it, as I will often recycle buffers to avoid the free/malloc dance. A non-null pointer tells me it is allocated. > So, fundamentally, the check for null makes no sense even if it would have > made sense with a C array, because a C array is just a naked pointer and has > no language protections to ensure that you don't dereference it when it > doesn't have elements. D arrays have those protections. I recycle buffers in C code as well! BTW, I understand that there can be confusion about what it means. In my own code I'm careful to use `buf.length`.

November 26

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Jonathan M Davis
in reply to Walter Bright

Permalink

Jonathan M Davis

Posted in reply to Walter Bright

Permalink

On Tuesday, November 26, 2024 10:01:15 PM MST Walter Bright via Digitalmars-d wrote:
> On 11/25/2024 1:53 AM, Jonathan M Davis wrote:
> > The core problem is that ptr is checked at all. Whether it's null or not
> > is
> > absolutely irrelevant to almost all D code.
>
> It is relevant in the way I use it, as I will often recycle buffers to avoid the free/malloc dance. A non-null pointer tells me it is allocated.

Well, that's kind of a special case, and it relates specifically to memory management.

D's array operations in general are designed in such a way that they don't care about the difference between null and empty at all, and if you're just using the array operations, there's really no reason to care about null, and it actually becomes error-prone to care.

Where problems tend to crop up is when someone tries to treat null as indicating that an array has no value, whereas non-null empty is a value. This is something that works just fine with pointers, because a null pointer truly has no value, and nothing can be done with it as long as it's null, but it doesn't work very well with arrays. A null array can do all of the same operations that a non-null empty array can. The only real difference as far as the array operations go is that appending to an empty array _might_ cause more memory to be allocated and the ptr field to point to a new address, whereas appending to a null array _will_ cause memory to be allocated and the ptr field to point to a new address.

So, code that tries to treat a null array as an array without a value quickly runs into problems. Most D code (including most of the language) simply doesn't make that distinction. All it cares about is whether the array is empty, and a null array has a length of 0, so it's empty. So, you easily run into situations where code will end up with a null array or a non-null empty array when you might have expected the other (or might have had the other prior to some refactoring). And if a piece of code cares about the difference, it's going to be buggy.

In such cases, it's generally better to use a wrapper such as std.typecons.Nullable to indicate the lack of a value rather than using null to indicate that, just like you'd have to do with any non-nullable type.

Now, if you're specifically using null to check whether an array has been allocated, because you're trying to manage memory in some fashion, then null tells you exactly what you need to know. That information is inherent to what null is. So, that's not buggy in the same way. That then of course gets into all of the typical memory management issues (especially with any code that uses malloc and free rather than the GC), but as far as the array operations go, it's a non-issue. They don't care about the difference between null and empty, and they will quite happily allocate new GC memory when an operation requires it no matter what kind of memory the array pointed to prior to that.

Regardless, my point is that because D arrays are designed in such a way that their semantics don't care about null and generally treat null and empty as the same, having code which tries to treat null as special is usually going to result in bugs (in particular when treating it as special has nothing to do with memory management). Either way, I would consider it good practice to be explicit about testing for null vs empty instead of if(arr), because if(arr) is misunderstood so frequently that the odds are very high that the programmer who wrote the code misunderstood what they were actually testing. And even if they didn't, you have no way of knowing that when reading their code. On the other hand, code like if(arr !is null) or if(!arr.empty) is explicit, so it's clear what was intended.

> > So, fundamentally, the check for null makes no sense even if it would have made sense with a C array, because a C array is just a naked pointer and has no language protections to ensure that you don't dereference it when it doesn't have elements. D arrays have those protections.
>
> I recycle buffers in C code as well!
>
> BTW, I understand that there can be confusion about what it means. In my own code I'm careful to use `buf.length`.

Honestly, I think that the confusion is great enough that using arrays directly in conditions should just be deprecated, but even if it isn't, I started the thread as a reminder about the behavior of if(arr) in the hopes that more people would be aware of the issue and therefore hopefully write fewer bugs. My experience has been that in almost all cases, if(arr) is a bug.

Personally, about the only time that I use non-boolean values in a condition is with pointers when declaring a variable, e.g.

    if(auto value = key in aa)

but there are other people who do it semi-frequently, and in the case of arrays, it's definitely frequently misunderstood.

- Jonathan M Davis

On Wednesday, 27 November 2024 at 06:46:18 UTC, Jonathan M Davis wrote: > [...] Either way, I would consider it good practice to be explicit about testing for null vs empty instead of if(arr), because if(arr) is misunderstood so frequently that the odds are very high that the programmer who wrote the code misunderstood what they were actually testing. And even if they didn't, you have no way of knowing that when reading their code. On the other hand, code like > [line breaks inserted] > > if(arr !is null) That translates into broken English "if array not is null". More idiomatic and more readable is if (arr.ptr) But wait, that's not the same! import std; void main () { auto arr = typeid (int).initializer; writeln (arr ! is null); // true writeln (arr.ptr); // null writeln (arr.length); // 4 writeln (arr.empty); // false writeln (arr ? true : false); // true writeln (arr); // segfault } It appears to me that if (arr !is null) and if (arr) are equivalent.

On Monday, 25 November 2024 at 09:53:13 UTC, Jonathan M Davis wrote: > The core problem is that ptr is checked at all. Whether it's null or not is absolutely irrelevant to almost all D code. All that matters is length, because if length is 0, then you cannot dereference any elements, because there are none, and if length is non-zero, then there are elements there to be dereferenced. Either way, the ptr will only be dereferenced if there are elements to dereference, and thus whether it's null or not is irrelevant. > IMO there is a stronger case. When converting to bool, we typically use value rather than identity, everywhere in the language, except for arrays. Checking for the pointer of the array is an identity operation. Checking for the length and/or the content is value operation. This is the correct one. This null check for arrays is a major footgun. And it is easy to check for when wanted using the `is` operator.

Forums