Just a friendly reminder about using arrays in boolean conditions (page 2)

void v(string s) { if (s.length) writeln("case length :`", s, "`"); else if (s is null) writeln("case null :`", s, "`"); else writeln("case not null but no length:`", s, "`"); }

2 days ago

Re: Just a friendly reminder about using arrays in boolean conditions

Posted by Jonathan M Davis
in reply to Dom DiSc

Permalink

Jonathan M Davis

Posted in reply to Dom DiSc

Permalink

On Monday, November 18, 2024 8:39:38 AM MST Dom DiSc via Digitalmars-d wrote:
> On Monday, 18 November 2024 at 12:24:01 UTC, user1234 wrote:
> >> ```d
> >> void v(string s)
> >> {
> >>
> >>     if (s.length)           writeln("case length :`", s, "`");
> >>     else if (s is null)     writeln("case null :`", s,  "`");
> >>     else                    writeln("case not null but no
> >>
> >> length:`", s,  "`");
> >> }
>
> One should always first check for null and then for length. This
> should be immediately clear, as asking for a length doesn't make
> sense if something is null.
> Ok, an array has both a pointer and a length, but I would never
> expect the length to contain something useful, if the pointer is
> not assigned a legal address.

It's really the opposite. If the length is 0, there's no need to look at the ptr member, and it could be anything. In the case of [] or null arrays, it's going to be null, and that works perfectly fine in D, because all accesses to the array do bounds checking, so if the length is 0, you will never dereference the pointer no matter what it is. This means that you really don't need to worry at all about the pointer being null.

Outside of cases where you're doing something like passing the ptr field to an extern(C) function, you really shouldn't care at all whether ptr is null, and it's a definite code smell if code does check for null. D's arrays have essentially eliminated the need to worry about null at all, whereas languages that use a pointer for arrays (e.g. C/C++) or use what's essentially a class reference (e.g. Java) have to worry about null, because if the array is null, they can't actually do anything with it. D does not have that problem, because we've put the length on the stack next to the ptr field. That approach also makes it possible to slice an array to get another array, which can be really nice.

> If I were to implement arrays, I would use a simple pointer, and length would be the first element of the allocated block. For any object, I have always in mind that it could be implemented in this way, so would never access anything as long as the pointer is not checked first.

But why would you need to care what the pointer was if the length was 0? There's no reason to ever dereference it in that case. It _really_ matters whether the pointer is null when that pointer is your entire access to the array, but that goes away when you have access to the length without needing to dereference the pointer. At that point, the value of the ptr really only matters when the length is greater than zero, because then you have elements to access via the pointer. But if it's 0, there are no elements, and you can entirely ignore the value of the pointer.

Also, making it so that the length is allocated with the elements would make it so that you couldn't slice arrays. If you wanted a subset of its elements, you'd be forced to either copy the elements or use a wrapper type which had its own length or pair of indices.

- Jonathan M Davis

On Monday, November 18, 2024 5:20:17 AM MST user1234 via Digitalmars-d wrote: > Well I agree, even if that will take years to have .length tested instead of .ptr. > > Little story however: I've encountered a case where the explict check was also wrong. > > ```d > module runnable; > > import std.stdio; > > void v(string s) > { > if (s.length) writeln("case length :`", s, "`"); > else if (s is null) writeln("case null :`", s, "`"); > else writeln("case not null but no > length:`", s, "`"); > } > > void main(string[] args) > { > v("hello"); > v(null); > v(""); > } > ``` > > The different semantics between `null` and `""` for strings is well illustrated here I'd say. Yeah, I would consider it a code smell if code checks whether an array is null. It _can_ make sense if you do something like have a function return a null string on failure and a string with data (which could be empty) on success, since then that function is directly returning null. However, then you have to be concerned about the possibility of whether the code that generates the non-null string to return could ever return null instead by accident. So, ultimately, it's just better to use something like Nullable!string instead if you want to distinguish between a string with a value and a string without (and the same with any array type). However, if for whatever reason, someone _does_ choose to have their code differentiate between null and empty (much as I think that that's a terrible idea), it's better if it's explicit about it rather than doing it implicitly with something like if(arr), because then it's not clear what the programmer was trying to do. It's possible that the programmer was intentially checking for null, and it's possible that they meant to check for empty (I've seen the latter case more frequently than the former even though that's not what the code actually does). However, if the check is explicit - e.g. if(arr is null) - then it's clear that the programmer intended to check for null. Whether that then causes other problems is another matter, but at least the intention of the code is clear, whereas it isn't if the check is implicit. So, IMHO, if someone is going to do that check, it should always be explicit - but of course, it would be better to simply not care whether an array is null, because it takes very little to end up with a null array when you didn't intend to (or a non-null empty one when you wanted null). The rest of the language just isn't designed to care about null with arrays. - Jonathan M Davis

On Sunday, November 17, 2024 3:17:18 PM MST Steven Schveighoffer via Digitalmars-d wrote: > On Sunday, 17 November 2024 at 21:50:18 UTC, kdevel wrote: > > My question is: Is it possible that a valid D program gets into > > a > > state where an array has ptr == null and length > 0? If so, how? > > Yes, the compiler uses it: > > ```d > struct S > { > int x; > } > > void main() > { > auto i = typeid(S).initializer; > assert(i.ptr is null); > assert(i.length > 0); > } > ``` > > For a type that is all 0, the compiler builds `initializer` to be a null array with a length. This signifies to the runtime that the type is all 0 initializer, but has a specific length. This allows saving binary space by not storing a bunch of 0s. Does this actually appear in the wild at all, or is this really something that's just restricted to what the compiler is doing? Obviously, based on your example, if you use initializer, you have this problem, but is this something that arrays in general need to worry about? There's definitely code out there (including in Phobos) which uses ptr to avoid bounds checks when it already knows that the indices it's using are in bounds, and if code like that runs into an array with a null ptr but non-zero length, it's going to segfault. Array code in general just assumes that that's not a thing, and this is the first I've ever heard of it being a thing. - Jonathan M Davis

Forums