Checking if a string is null (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Checking if a string is null (page 4)

July 26, 2007

Re: Checking if a string is null

Posted by Oskar Linde
in reply to Derek Parnell

Oskar Linde

Posted in reply to Derek Parnell

Derek Parnell wrote:

> On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:
> 
>> Since null arrays have length 0, they *are* empty arrays :P.
> 
> Not in my world. I see that null arrays have no length. That is to say, the
> do not have any length, which is different from saying they have a length
> and that length is zero.

But that is not how T[] behaves in D. T[]s are of a dual slice/array nature with semantics closer to a slice than an array. That is something Walter's T[new] suggestion has a potential to remedy.

There is no difference between a "null" array and a slice starting at memory location null, 0 elements long. In my opinion, it would be quite strange for zero length slices to behave any differently if the starting position happens to be null.

There is a very easy way to get the behavior you want BTW:

class Array(T) { ... } :)

>>> All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg.
>>>
>>>  > int opEquals(T)(T[] u, T[] v) {
>>>  >     if (u.length != v.length) return false;
>>>       if (u.length == 0) return (u.ptr == v.ptr);
>>>  >     for (size_t i = 0; i < u.length; i++) {
>>>  >         if (u[i] != v[i]) return false;
>>>  >     }
>>>  >     return true;
>>>  > }
>>>
>>> This should mean "" == "" but not "" == null, likewise null == null but not null == "".

This would mean that "two arrays are equal if all elements are equal" would no longer hold. (Consider two zero length slices at arbitrary memory location, neither of them null).

-- 
Oskar

July 26, 2007

Re: Checking if a string is null

Posted by Derek Parnell
in reply to Frits van Bommel

Derek Parnell

Posted in reply to Frits van Bommel

On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:

> Derek Parnell wrote:
>> On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
>> 
>>> Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.
>> 
>> I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?
> 
> I'm pretty sure the only way to obtain such an array would be to have already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...

There is no basis for assuming that any RAM location is not addressable. I know that some operating systems prevent unprivileged programs from accessing certain locations, and that some RAM is hardware-mapped to I/O ports, but in theory, D as a system language should be able to address any RAM location.

For example, if D had been implemented for the Amiga system, access to RAM address 4 is vital. As that location contained the 32-bit address of the list that contains all addresses of the loaded shared libraries. And every program needed to access that location.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

July 26, 2007

Re: Checking if a string is null

Posted by Oskar Linde

Oskar Linde

Manfred Nowak wrote:
> Frits van Bommel wrote
> 
>> But the fact of the matter is, 'T[] x = null;' reserves space for
>> the .length and sets it to 0. If you have a suggestion for a
>> different value to put there, by all means make it.
> 
> Suggestion:
> After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e.  `size_t.max' will no more be a valid length for an array.

Uhu... Why whould a slice of the full addressable memory space be a good initialization value?

> This is a hack to avoid some overhead in some places, but may introduce  more overhead in other places.

This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).

> Note: after `T[] x= null;' `x' holds an untyped array and so `y= x;' should be a legal assignment for every `y' declared as `U[] y;' for some type `U'---duck and run.

So you are proposing adding runtime type errors? :P

-- 
Oskar

July 26, 2007

Re: Checking if a string is null

Posted by Derek Parnell
in reply to Oskar Linde

Derek Parnell

Posted in reply to Oskar Linde

On Thu, 26 Jul 2007 08:37:13 +0200, Oskar Linde wrote:

> Manfred Nowak wrote:
>> Frits van Bommel wrote
>> 
>>> But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it.
>> 
>> Suggestion:
>> After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e.
>> `size_t.max' will no more be a valid length for an array.
> 
> Uhu... Why whould a slice of the full addressable memory space be a good initialization value?

Maybe x.ptr = size_t.max and x.length = size_t.max might be useful representation of a null array as it is an illegal RAM reference otherwise. But I know, its too late now and probably too expensive at run-time to implement.

>> This is a hack to avoid some overhead in some places, but may introduce more overhead in other places.
> 
> This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).

You may very well be correct.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

July 26, 2007

Re: Checking if a string is null

Posted by Frits van Bommel
in reply to Oskar Linde

Frits van Bommel

Posted in reply to Oskar Linde

Oskar Linde wrote:
> Manfred Nowak wrote:
>> Frits van Bommel wrote
>>
>>> But the fact of the matter is, 'T[] x = null;' reserves space for
>>> the .length and sets it to 0. If you have a suggestion for a
>>> different value to put there, by all means make it.
>>
>> Suggestion:
>> After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e.  `size_t.max' will no more be a valid length for an array.
> 
> Uhu... Why whould a slice of the full addressable memory space be a good initialization value?

It's not the *full* addressable memory space for 1-byte types (the last byte of the address space has an address equal to .ptr(0) + .length(size_t.max), which isn't a member of the array) and it's more than the address space for bigger types (though I guess it does indeed cover the entire address space, possibly several times over, due to wraparound on overflow...).
</pedantic>

July 26, 2007

Re: Checking if a string is null

Posted by Frits van Bommel
in reply to Derek Parnell

Frits van Bommel

Posted in reply to Derek Parnell

Derek Parnell wrote:
> On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:
> 
>> Derek Parnell wrote:
>>> On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
>>>
>>>> Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.
>>> I don't think this is such a good idea. How does one address the array of
>>> four bytes at RAM location 4?
>> I'm pretty sure the only way to obtain such an array would be to have already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...
> 
> There is no basis for assuming that any RAM location is not addressable. I
> know that some operating systems prevent unprivileged programs from
> accessing certain locations, and that some RAM is hardware-mapped to I/O
> ports, but in theory, D as a system language should be able to address any
> RAM location.
> 
> For example, if D had been implemented for the Amiga system, access to RAM
> address 4 is vital. As that location contained the 32-bit address of the
> list that contains all addresses of the loaded shared libraries. And every
> program needed to access that location.

I'm sorry, but what would then be the problem with accessing (cast(byte)4)[0..4] if it's a valid memory location?
I thought your question implied it was an invalid memory location, though I'm very aware that's not always the case (which was why I had the parenthesized sentence in there).

By the way, null is a valid address on x86 too, but most operating systems don't map the first page to any memory to generate pagefaults for null pointer dereferences (and IIRC Linux treats the last page similarly, for null pointers with negative indices). IIRC DOS didn't (and probably couldn't on machines of the time), do this; the interrupt table was located there (which would seem to be a pretty bad idea for a system without memory protection -- a null pointer write could potentially crash the entire system...).

Also, there's no particular reason null has to be cast(whatever)0, that just happens to be a convenient easily-checked-for value...

July 26, 2007

Re: Checking if a string is null

Posted by Derek Parnell
in reply to Frits van Bommel

Derek Parnell

Posted in reply to Frits van Bommel

On Thu, 26 Jul 2007 09:28:16 +0200, Frits van Bommel wrote:

> Derek Parnell wrote:
>> On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:
>> 
>>> Derek Parnell wrote:
>>>> On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
>>>>
>>>>> Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.

> I'm sorry, but what would then be the problem with accessing (cast(byte)4)[0..4] if it's a valid memory location?

Duh!  I am so stupid!   I misread Regan's original post. When he said "If the location and length are identical" I incorrectly read that as "if an array's location and length are identical" and not "if the locations and lengths of two arrays are identical".

Sorry (as he sulks off hoping no one notices) ...

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

July 26, 2007

Re: Checking if a string is null

Posted by Regan Heath
in reply to Frits van Bommel

Regan Heath

Posted in reply to Frits van Bommel

Frits van Bommel wrote:
> Or would you prefer a segfault or diagnostic when accessing (cast(T[])null).length? 

No, definately not.  This is one of the things I love about arrays, they're both value and reference type.  It takes a while to get your head round (if the many discussions on these forums are any indication) but once you have it worked out it's quite powerful.  In fact it's the reason slicing can work the way it does.

Further, for those cases where we do not care to differentiate between null and "" checking length == 0 is the perfect solution.

I'm not interested in an array implementation which is 'pure' in any academic sense but rather one which is consistent in that null arrays do not become empty and vice-versa under any conditions (other than explicitly assigning those values).

For example:

In the past setting length to 0 would free the data pointer.  The result of which was that a zero length (empty) array became a non-existant (null) array.

And the problem we have now is that calling .dup on an empty array results in a null array.

It is cases like these which I was to remove.

The other thing I want is for == to tell me that null and "" are not the same.

I suspect very little existing code is relying on the existing behaviour as it will likely be checking length as opposed to comparing to "" or null (note; comparing with == not checking identity with "is").

Regan

July 26, 2007

Re: Checking if a string is null

Posted by Regan Heath
in reply to Oskar Linde

Regan Heath

Posted in reply to Oskar Linde

Oskar Linde wrote:
>>>> This should mean "" == "" but not "" == null, likewise null == null but not null == "".
> 
> This would mean that "two arrays are equal if all elements are equal" would no longer hold. 

Not true, the two arrays you mention below would still compare 'true' as their contents are still equal.

Ignore the suggested code changes, my one was patently incorrect and the first step is to make it clear what behaviour is desired, something I have obviously not done.

> (Consider two zero length slices at arbitrary
> memory location, neither of them null).

The content of these arrays is equal and would compare so.

The case(s) I want to stop comparing as equal are:

null == ""
"" == null

The cases which should continue to compare equal are:

null == null
"" == ""  (your example above)

No more, no less.

Regan

p.s. I know I said ignore the suggested code changes but it would have to go something like:

if (lhs.length == 0) {
  if (lhs.ptr && rhs.ptr) return true;  //"" == ""
  if (lhs.ptr || rhs.ptr) return false  //"" == null && null == ""
  return true;                          //null == null
}

July 26, 2007

Re: Checking if a string is null

Posted by Regan Heath
in reply to Derek Parnell

Regan Heath

Posted in reply to Derek Parnell

Derek Parnell wrote:
> On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
> 
>> Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.
> 
> I don't think this is such a good idea. How does one address the array of
> four bytes at RAM location 4?


What I meant was:

if (lhs.length == rhs.length && lhs.ptr == rhs.ptr) return true;

Not:

if (lhs.length == lhs.ptr) return true;

;)

Regan

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation