| Posted by Jonathan M Davis in reply to Walter Bright | PermalinkReply |
|
Jonathan M Davis
Posted in reply to Walter Bright
| On Tuesday, November 26, 2024 10:01:15 PM MST Walter Bright via Digitalmars-d wrote:
> On 11/25/2024 1:53 AM, Jonathan M Davis wrote:
> > The core problem is that ptr is checked at all. Whether it's null or not
> > is
> > absolutely irrelevant to almost all D code.
>
> It is relevant in the way I use it, as I will often recycle buffers to avoid the free/malloc dance. A non-null pointer tells me it is allocated.
Well, that's kind of a special case, and it relates specifically to memory management.
D's array operations in general are designed in such a way that they don't care about the difference between null and empty at all, and if you're just using the array operations, there's really no reason to care about null, and it actually becomes error-prone to care.
Where problems tend to crop up is when someone tries to treat null as indicating that an array has no value, whereas non-null empty is a value. This is something that works just fine with pointers, because a null pointer truly has no value, and nothing can be done with it as long as it's null, but it doesn't work very well with arrays. A null array can do all of the same operations that a non-null empty array can. The only real difference as far as the array operations go is that appending to an empty array _might_ cause more memory to be allocated and the ptr field to point to a new address, whereas appending to a null array _will_ cause memory to be allocated and the ptr field to point to a new address.
So, code that tries to treat a null array as an array without a value quickly runs into problems. Most D code (including most of the language) simply doesn't make that distinction. All it cares about is whether the array is empty, and a null array has a length of 0, so it's empty. So, you easily run into situations where code will end up with a null array or a non-null empty array when you might have expected the other (or might have had the other prior to some refactoring). And if a piece of code cares about the difference, it's going to be buggy.
In such cases, it's generally better to use a wrapper such as std.typecons.Nullable to indicate the lack of a value rather than using null to indicate that, just like you'd have to do with any non-nullable type.
Now, if you're specifically using null to check whether an array has been allocated, because you're trying to manage memory in some fashion, then null tells you exactly what you need to know. That information is inherent to what null is. So, that's not buggy in the same way. That then of course gets into all of the typical memory management issues (especially with any code that uses malloc and free rather than the GC), but as far as the array operations go, it's a non-issue. They don't care about the difference between null and empty, and they will quite happily allocate new GC memory when an operation requires it no matter what kind of memory the array pointed to prior to that.
Regardless, my point is that because D arrays are designed in such a way that their semantics don't care about null and generally treat null and empty as the same, having code which tries to treat null as special is usually going to result in bugs (in particular when treating it as special has nothing to do with memory management). Either way, I would consider it good practice to be explicit about testing for null vs empty instead of if(arr), because if(arr) is misunderstood so frequently that the odds are very high that the programmer who wrote the code misunderstood what they were actually testing. And even if they didn't, you have no way of knowing that when reading their code. On the other hand, code like if(arr !is null) or if(!arr.empty) is explicit, so it's clear what was intended.
> > So, fundamentally, the check for null makes no sense even if it would have made sense with a C array, because a C array is just a naked pointer and has no language protections to ensure that you don't dereference it when it doesn't have elements. D arrays have those protections.
>
> I recycle buffers in C code as well!
>
> BTW, I understand that there can be confusion about what it means. In my own code I'm careful to use `buf.length`.
Honestly, I think that the confusion is great enough that using arrays directly in conditions should just be deprecated, but even if it isn't, I started the thread as a reminder about the behavior of if(arr) in the hopes that more people would be aware of the issue and therefore hopefully write fewer bugs. My experience has been that in almost all cases, if(arr) is a bug.
Personally, about the only time that I use non-boolean values in a condition is with pointers when declaring a variable, e.g.
if(auto value = key in aa)
but there are other people who do it semi-frequently, and in the case of arrays, it's definitely frequently misunderstood.
- Jonathan M Davis
|