Empty String == Empty Array == Empty Static Array? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Empty String == Empty Array == Empty Static Array?

Thread overview

Empty String == Empty Array == Empty Static Array?
Jun 25, 2005 AJG
Jun 25, 2005 Stefan
Jun 26, 2005 Regan Heath
Jun 27, 2005 AJG
Jun 27, 2005 Regan Heath
Jun 27, 2005 Tom S
Jun 27, 2005 Regan Heath
Jun 27, 2005 Tom S
Jun 27, 2005 Regan Heath
Jun 27, 2005 Tom S
Jun 27, 2005 Regan Heath
Jun 27, 2005 Tom S
Jun 28, 2005 Regan Heath
Jun 28, 2005 Tom S
Jun 28, 2005 Regan Heath
Jun 28, 2005 Tom S
Jun 28, 2005 Regan Heath
Jun 28, 2005 Tom S
Jun 28, 2005 Regan Heath

June 25, 2005

Empty String == Empty Array == Empty Static Array?

Posted by AJG

AJG

Hi there,

I was under the impression that strings (char[]) were strictly compatible with and equivalent to other D arrays. However, the empty string seems to throw a monkey wrench to this idea. In addition, static arrays also mess things up.

// Here we go:
char[]  nul = null;
char[]  str = "";
char[]  dyn;
char[0] sta;
char[]  ini = new char[0];

printf("%d", nul.length);
printf("%d", str.length);
printf("%d", dyn.length);
printf("%d", sta.length);
printf("%d", ini.length);
// The above all print 0, so far so good.

printf(nul ? "true" : "false");
printf(str ? "true" : "false");
printf(dyn ? "true" : "false");
printf(sta ? "true" : "false");
printf(ini ? "true" : "false");
// Here we get false, true, false, true, false.

What's up with this? Why are empty strings (which are empty even by .length accounts) and static arrays different than other empty arrays?

I have a feeling this has got to do with the internal array pointer, but I'm not sure. At any rate, it's not very semantically intuitive. Is this done on purpose, or can we look forward to some convergence?

Cheers,
--AJG.

PS: What exactly does passing NULL to an array parameter do?

================================
2B || !2B, that is the question.

June 25, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Stefan
in reply to AJG

Stefan

Posted in reply to AJG

Have a look at the digitalmars.D.bugs NG.
There was recently an extensive thread that discussed this whole
(non-) issue:

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4301

Kind regards,
Stefan



In article <d9io82$ei9$1@digitaldaemon.com>, AJG says...
>
>Hi there,
>
>I was under the impression that strings (char[]) were strictly compatible with and equivalent to other D arrays. However, the empty string seems to throw a monkey wrench to this idea. In addition, static arrays also mess things up.
>
>// Here we go:
>char[]  nul = null;
>char[]  str = "";
>char[]  dyn;
>char[0] sta;
>char[]  ini = new char[0];
>
>printf("%d", nul.length);
>printf("%d", str.length);
>printf("%d", dyn.length);
>printf("%d", sta.length);
>printf("%d", ini.length);
>// The above all print 0, so far so good.
>
>printf(nul ? "true" : "false");
>printf(str ? "true" : "false");
>printf(dyn ? "true" : "false");
>printf(sta ? "true" : "false");
>printf(ini ? "true" : "false");
>// Here we get false, true, false, true, false.
>
>What's up with this? Why are empty strings (which are empty even by .length accounts) and static arrays different than other empty arrays?
>
>I have a feeling this has got to do with the internal array pointer, but I'm not sure. At any rate, it's not very semantically intuitive. Is this done on purpose, or can we look forward to some convergence?
>
>Cheers,
>--AJG.
>
>PS: What exactly does passing NULL to an array parameter do?
>
>================================
>2B || !2B, that is the question.

June 26, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Regan Heath
in reply to AJG

Regan Heath

Posted in reply to AJG

On Sat, 25 Jun 2005 04:57:38 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
> I was under the impression that strings (char[]) were strictly compatible with
> and equivalent to other D arrays. However, the empty string seems to throw a
> monkey wrench to this idea. In addition, static arrays also mess things up.
>
> // Here we go:
> char[]  nul = null;
> char[]  str = "";
> char[]  dyn;
> char[0] sta;
> char[]  ini = new char[0];
>
> printf("%d", nul.length);
> printf("%d", str.length);
> printf("%d", dyn.length);
> printf("%d", sta.length);
> printf("%d", ini.length);
> // The above all print 0, so far so good.
>
> printf(nul ? "true" : "false");
> printf(str ? "true" : "false");
> printf(dyn ? "true" : "false");
> printf(sta ? "true" : "false");
> printf(ini ? "true" : "false");
> // Here we get false, true, false, true, false.
>
> What's up with this? Why are empty strings (which are empty even by .length accounts) and static arrays different than other empty arrays?
>
> I have a feeling this has got to do with the internal array pointer, but I'm not sure.

I believe so. When you compare an array "reference" to null you end up comparing the array data pointer to null. An array is essentially a struct in the form:

struct array {
  int length;
  void* data;
}

(perhaps not using void, but instead the actual data type?)

Note, a static array is special. It's data member cannot be null and it's length parameter is actually macro replaced upon compilation. eg.

char[5] sta;
char[] dyn;

void main()
{
	int* p = &sta.length;
	int* q = &dyn.length;
}

sta.d(1): constant 5 is not an lvalue
sta.d(6): cannot implicitly convert expression (#5) of type uint* to int*
sta.d(7): dyn.length is not an lvalue
sta.d(7): cannot implicitly convert expression (#dyn.length) of type uint* to int*

Errors 1 & 2 show 'sta.length' replaced by '5'.
Errors 3 & 4 are due to .length being a property/getter method call, not an int.

I suspect also that the reference refers directly to the data and not to an array struct (shown above).

> At any rate, it's not very semantically intuitive. Is this done on
> purpose, or can we look forward to some convergence?

It's a metter of asking for what you really want to know. For example...

  if(arr is null) ;//array never assigned, i.e. non-existant.
  else if (arr.length == 0) ;//array exists, no data present.
  else ;//array exists, has some data.

In many cases "arr.length == 0" is all you care about, in some "arr is null" might be important.

> PS: What exactly does passing NULL to an array parameter do?

Creates an array (struct shown above) with data set to null and length set to 0. This is why .length always 'works' and never gives a segmentation fault for a 'null' array.

Regan

June 27, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by AJG
in reply to Regan Heath

AJG

Posted in reply to Regan Heath

Hi Regan,

Thanks a ton for the post! This actually cleared everything up for me. At least now I know what to expect from the language regarding arrays.

>> PS: What exactly does passing NULL to an array parameter do?
>
>Creates an array (struct shown above) with data set to null and length set to 0. This is why .length always 'works' and never gives a segmentation fault for a 'null' array.

I guess this is a good thing since it prevents functions expecting only an array (even if empty) from segfaulting on a null. However, it also reduces the possibilities for the programmer. Since passing null means passing "an empty array," you can't differentiate between them (safely). Right?

In addition, this is somewhat inconsistent with the way classes and objects are treated. For a class, you can pass a null, and the function will receive that null (not an "empty" class, whatever that would be) and it will segfault unless you guard against it. This is why I was originally confused with the behaviour of arrays.

Finally, a small suggestion. What if when doing something like:

if (someArray) // Do stuff.

Meant implicitly checking the .length property automagically (instead of checking the internal pointer)? The former seems more useful to me, since the concept of a null array no longer exists; it's also universal across all array types. The latter, on the other hand, seems kind of implementation-ish and hackish, IMHO.

Anyway, thanks again for the help.
Cheers,
--AJG.



In article <opsszz0xna23k2f5@nrage.netwin.co.nz>, Regan Heath says...
>
>On Sat, 25 Jun 2005 04:57:38 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
>> I was under the impression that strings (char[]) were strictly
>> compatible with
>> and equivalent to other D arrays. However, the empty string seems to
>> throw a
>> monkey wrench to this idea. In addition, static arrays also mess things
>> up.
>>
>> // Here we go:
>> char[]  nul = null;
>> char[]  str = "";
>> char[]  dyn;
>> char[0] sta;
>> char[]  ini = new char[0];
>>
>> printf("%d", nul.length);
>> printf("%d", str.length);
>> printf("%d", dyn.length);
>> printf("%d", sta.length);
>> printf("%d", ini.length);
>> // The above all print 0, so far so good.
>>
>> printf(nul ? "true" : "false");
>> printf(str ? "true" : "false");
>> printf(dyn ? "true" : "false");
>> printf(sta ? "true" : "false");
>> printf(ini ? "true" : "false");
>> // Here we get false, true, false, true, false.
>>
>> What's up with this? Why are empty strings (which are empty even by .length accounts) and static arrays different than other empty arrays?
>>
>> I have a feeling this has got to do with the internal array pointer, but I'm not sure.
>
>I believe so. When you compare an array "reference" to null you end up comparing the array data pointer to null. An array is essentially a struct in the form:
>
>struct array {
>   int length;
>   void* data;
>}
>
>(perhaps not using void, but instead the actual data type?)
>
>Note, a static array is special. It's data member cannot be null and it's length parameter is actually macro replaced upon compilation. eg.
>
>char[5] sta;
>char[] dyn;
>
>void main()
>{
>	int* p = &sta.length;
>	int* q = &dyn.length;
>}
>
>sta.d(1): constant 5 is not an lvalue
>sta.d(6): cannot implicitly convert expression (#5) of type uint* to int*
>sta.d(7): dyn.length is not an lvalue
>sta.d(7): cannot implicitly convert expression (#dyn.length) of type uint*
>to int*
>
>Errors 1 & 2 show 'sta.length' replaced by '5'.
>Errors 3 & 4 are due to .length being a property/getter method call, not
>an int.
>
>I suspect also that the reference refers directly to the data and not to an array struct (shown above).
>
>> At any rate, it's not very semantically intuitive. Is this done on purpose, or can we look forward to some convergence?
>
>It's a metter of asking for what you really want to know. For example...
>
>   if(arr is null) ;//array never assigned, i.e. non-existant.
>   else if (arr.length == 0) ;//array exists, no data present.
>   else ;//array exists, has some data.
>
>In many cases "arr.length == 0" is all you care about, in some "arr is null" might be important.
>
>> PS: What exactly does passing NULL to an array parameter do?
>
>Creates an array (struct shown above) with data set to null and length set to 0. This is why .length always 'works' and never gives a segmentation fault for a 'null' array.
>
>Regan

================================
2B || !2B, that is the question.

June 27, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Regan Heath
in reply to AJG

Regan Heath

Posted in reply to AJG

On Mon, 27 Jun 2005 05:09:02 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
> Hi Regan,
>
> Thanks a ton for the post! This actually cleared everything up for me. At least now I know what to expect from the language regarding arrays.
>
>>> PS: What exactly does passing NULL to an array parameter do?
>>
>> Creates an array (struct shown above) with data set to null and length set
>> to 0. This is why .length always 'works' and never gives a segmentation
>> fault for a 'null' array.
>
> I guess this is a good thing since it prevents functions expecting only an array (even if empty) from segfaulting on a null.

Yep.

> However, it also reduces the
> possibilities for the programmer. Since passing null means passing "an empty array," you can't differentiate between them (safely). Right?

Actually you can still differentiate between a null array and an empty array, eg.

if (arr is null) ;//non-existant
else if (arr.length == 0) ;//empty
else ;//has items

However don't set length to 0 or you'll turn an empty array into a null array. I believe this behaviour is broken.

> In addition, this is somewhat inconsistent with the way classes and objects are
> treated. For a class, you can pass a null, and the function will receive that
> null (not an "empty" class, whatever that would be) and it will segfault unless
> you guard against it. This is why I was originally confused with the behaviour
> of arrays.

True. Were it up to me...

> Finally, a small suggestion. What if when doing something like:
>
> if (someArray) // Do stuff.
>
> Meant implicitly checking the .length property automagically (instead of
> checking the internal pointer)?

That has been suggested. IMO it breaks consistency with all other types where if (x) compares x will null.

> The former seems more useful to me, since the concept of a null array no longer exists;

But it does, it's just not obvious (maybe that's a good thing?) in general cases you do not care about null, only if (arr.length == 0).

Regan

June 27, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Tom S
in reply to Regan Heath

Tom S

Posted in reply to Regan Heath

Regan Heath wrote:
> On Mon, 27 Jun 2005 05:09:02 +0000 (UTC), AJG <AJG_member@pathlink.com>  
>> Finally, a small suggestion. What if when doing something like:
>>
>> if (someArray) // Do stuff.
>>
>> Meant implicitly checking the .length property automagically (instead of
>> checking the internal pointer)?
> 
> 
> That has been suggested. IMO it breaks consistency with all other types  where if (x) compares x will null.

So if we can't have it, I'd strongly suggest making 'if (arr)' illegal.  Otherwise massive confusion will follow. Currently arrays are treated both as classes and structs at the same time. We can do if(arr), checking null-ness like with classes but passing arrays to functions works as if they were structs - thus we have to use inout to modify ptr or length. Where's the consistency ??? :(

-- 
Tomasz Stachowiak  /+ a.k.a. h3r3tic +/

June 27, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Regan Heath
in reply to Tom S

Regan Heath

Posted in reply to Tom S

On Mon, 27 Jun 2005 12:31:32 +0200, Tom S <h3r3tic@remove.mat.uni.torun.pl> wrote:
> Regan Heath wrote:
>> On Mon, 27 Jun 2005 05:09:02 +0000 (UTC), AJG <AJG_member@pathlink.com>
>>> Finally, a small suggestion. What if when doing something like:
>>>
>>> if (someArray) // Do stuff.
>>>
>>> Meant implicitly checking the .length property automagically (instead of
>>> checking the internal pointer)?
>>   That has been suggested. IMO it breaks consistency with all other types  where if (x) compares x will null.
>
> So if we can't have it, I'd strongly suggest making 'if (arr)' illegal.

No thanks.

>   Otherwise massive confusion will follow.

Somewhat prophetic?

> Currently arrays are treated both as classes and structs at the same time.

That is because they share properties of both but are in fact neither. They are 'arrays'.

> We can do if(arr), checking null-ness like with classes but passing arrays to functions works as if they were structs

The specific nature of arrays makes slicing possible. Slicing is incredibly powerful (I think you'll agree). When you slice you create a new reference to a new 'struct' where the data pointer refers to the start of the data and the length is set accordingly.

When you pass an array as an 'in' parameter you effectively slice the entire array, you basically duplicate the 'struct'. Sure, the default could have been to pass the reference, but then you loose the choice to pass as it currently does. You still have the choice of passing as a reference, simply use 'inout'. It's all about choice.

"if (arr)" compares the data pointer, but, it is actually comparing the reference at the same time as a null reference has a null data pointer. Making it silently compare the length would be confusing IMO.

The only problem I have with arrays is with "arr.length = 0;" setting the data pointer to null, this turns an existing/empty array into a null/non-existant array.

> - thus we have to use inout to modify ptr or length. Where's the consistency ??? :(

An array is not a class.
An array is not a struct.
An array is a unique type that has properties of both classes and structs.

Regan

June 27, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Tom S
in reply to Regan Heath

Tom S

Posted in reply to Regan Heath

Regan Heath wrote:
> On Mon, 27 Jun 2005 12:31:32 +0200, Tom S  <h3r3tic@remove.mat.uni.torun.pl> wrote:
> 
>> Regan Heath wrote:
>>   Otherwise massive confusion will follow.
> 
> 
> Somewhat prophetic?

There is some confusion at the moment. Granted that more people will find interest in D, I think I could be a prophet by profession.

>> Currently arrays are treated both as classes and structs at the same  time.
> 
> 
> That is because they share properties of both but are in fact neither.  They are 'arrays'.

Yup. I agree with that.

>> We can do if(arr), checking null-ness like with classes but passing  arrays to functions works as if they were structs
> 
> 
> The specific nature of arrays makes slicing possible. Slicing is  incredibly powerful (I think you'll agree). When you slice you create a  new reference to a new 'struct' where the data pointer refers to the start  of the data and the length is set accordingly.
> 
> When you pass an array as an 'in' parameter you effectively slice the  entire array, you basically duplicate the 'struct'. Sure, the default  could have been to pass the reference, but then you loose the choice to  pass as it currently does. You still have the choice of passing as a  reference, simply use 'inout'. It's all about choice.

That is the correct behaviour IMO and I got used to it long time ago.

> The only problem I have with arrays is with "arr.length = 0;" setting the  data pointer to null, this turns an existing/empty array into a  null/non-existant array.

I haven't had a need to differentiate between an empty and a null array, so for me 'if (arr)' should be true when either the pointer is null or the length is == 0. Thus if (arr) checking ptr makes little sense IMO, but for you it's the desired behaviour.
Thus I suggested disallowing implicit conversions from arrays to bool's in order to make sure the programmer specifies exactly he/she wants.
Some people will quickly grasp the difference between nullness and emptiness of arrays. But others will find their code buggy due to the implicit conversion.
How often do you want to check the nullness of an array in your code compared to checking its emptiness ? I haven't had a single reason to check for nullness yet. Thus disallowing 'if (arr)' seems like the best option IMO. Any other solutions ?

>> - thus we have to use inout to modify ptr or length. Where's the  consistency ??? :(
> 
> 
> An array is not a class.
> An array is not a struct.
> An array is a unique type that has properties of both classes and structs.

Agreed

-- 
Tomasz Stachowiak  /+ a.k.a. h3r3tic +/

June 27, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Regan Heath
in reply to Tom S

Regan Heath

Posted in reply to Tom S

On Mon, 27 Jun 2005 17:44:24 +0200, Tom S <h3r3tic@remove.mat.uni.torun.pl> wrote:
> Regan Heath wrote:
>> The only problem I have with arrays is with "arr.length = 0;" setting the  data pointer to null, this turns an existing/empty array into a  null/non-existant array.
>
> I haven't had a need to differentiate between an empty and a null array, so for me 'if (arr)' should be true when either the pointer is null or the length is == 0.

You mean false?

> Thus if (arr) checking ptr makes little sense IMO, but for you it's the desired behaviour.

"if (arr)" does not check the ptr, it checks the reference, comparing it to null/0 just like it does for _every other type_ in D. It just happens that for arrays a null reference cannot exist, instead you get a reference to a struct with a null data pointer, thus the confusion.

> Thus I suggested disallowing implicit conversions from arrays to bool's in order to make sure the programmer specifies exactly he/she wants.

I dont think arrays have a stronger argument for this than any other type in D.

> Some people will quickly grasp the difference between nullness and emptiness of arrays. But others will find their code buggy due to the implicit conversion.

People simply need to learn that an array is not a reference, not a struct, but a unique type with properties of both. It's a reference to a struct and cannot be null. I dont think the implicit conversion is the cause of confusion, I think the nature of arrays is, once that is grasped confusion vanishes.

> How often do you want to check the nullness of an array in your code compared to checking its emptiness ? I haven't had a single reason to check for nullness yet.

I'll admit it's much less frequent than simply wanting to know if there are any items or not.

> Thus disallowing 'if (arr)' seems like the best option IMO. Any other solutions ?

Fix "arr.length = 0;" so it does not set the data ptr to null. Nothing else needs to be done IMO.

Regan

June 27, 2005

Re: Empty String == Empty Array == Empty Static Array?

Posted by Tom S
in reply to Regan Heath

Tom S

Posted in reply to Regan Heath

Regan Heath wrote:
> On Mon, 27 Jun 2005 17:44:24 +0200, Tom S  <h3r3tic@remove.mat.uni.torun.pl> wrote:
> 
>> Regan Heath wrote:
>> I haven't had a need to differentiate between an empty and a null array,  so for me 'if (arr)' should be true when either the pointer is null or  the length is == 0.
> 
> 
> You mean false?

Sorry, my bug.

>> Thus if (arr) checking ptr makes little sense IMO, but for you it's the  desired behaviour.
> 
> 
> "if (arr)" does not check the ptr, it checks the reference, comparing it  to null/0 just like it does for _every other type_ in D. It just happens  that for arrays a null reference cannot exist, instead you get a reference  to a struct with a null data pointer, thus the confusion.

And it happens to make little sense in most cases. Simple operations should be simple. 99% of time I want to check if an array is empty. if (foo) then means 'does foo have valid data ? / does foo make any sense ?'.
if (obj)  <-- obj reference is not null ?
if (arr)  <-- array is not empty ?
That's the test I mean by default. Instead by default I'm getting a something that can only be a source of bugs :/
But it seems that I'm alone with my view on the subject, so I'll just shut up and go play with my toys.

>> Thus I suggested disallowing implicit conversions from arrays to bool's  in order to make sure the programmer specifies exactly he/she wants.
> 
> 
> I dont think arrays have a stronger argument for this than any other type  in D.

No ? Arrays are the only type in D that can be asked for their members (ptr, length) when they are null (when in your reasoning, the nullness of the ptr member means nullness of the array reference). This doesn't happen to be true with classes nor structs /+ excluding static members +/

>> Some people will quickly grasp the difference between nullness and  emptiness of arrays. But others will find their code buggy due to the  implicit conversion.
> 
> 
> People simply need to learn that an array is not a reference, not a  struct, but a unique type with properties of both. It's a reference to a  struct and cannot be null. I dont think the implicit conversion is the  cause of confusion, I think the nature of arrays is, once that is grasped  confusion vanishes.

Confusion vanishes, bugs dont. Once one understands the nature of the for loop, they won't write code like:

for (....);
{
}

or will they ?  /+ yes, I know that dmd reports an error here +/

-- 
Tomasz Stachowiak  /+ a.k.a. h3r3tic +/

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation