View mode: basic / threaded / horizontal-split · Log in · Help
June 25, 2005
Empty String == Empty Array == Empty Static Array?
Hi there,

I was under the impression that strings (char[]) were strictly compatible with
and equivalent to other D arrays. However, the empty string seems to throw a
monkey wrench to this idea. In addition, static arrays also mess things up.

// Here we go:
char[]  nul = null;
char[]  str = "";
char[]  dyn;
char[0] sta;
char[]  ini = new char[0];

printf("%d", nul.length);
printf("%d", str.length);
printf("%d", dyn.length);
printf("%d", sta.length);
printf("%d", ini.length);
// The above all print 0, so far so good.

printf(nul ? "true" : "false");
printf(str ? "true" : "false");
printf(dyn ? "true" : "false");
printf(sta ? "true" : "false");
printf(ini ? "true" : "false");
// Here we get false, true, false, true, false.

What's up with this? Why are empty strings (which are empty even by .length
accounts) and static arrays different than other empty arrays?

I have a feeling this has got to do with the internal array pointer, but I'm not
sure. At any rate, it's not very semantically intuitive. Is this done on
purpose, or can we look forward to some convergence?

Cheers,
--AJG.

PS: What exactly does passing NULL to an array parameter do?

================================
2B || !2B, that is the question.
June 25, 2005
Re: Empty String == Empty Array == Empty Static Array?
Have a look at the digitalmars.D.bugs NG.
There was recently an extensive thread that discussed this whole
(non-) issue:

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4301

Kind regards,
Stefan



In article <d9io82$ei9$1@digitaldaemon.com>, AJG says...
>
>Hi there,
>
>I was under the impression that strings (char[]) were strictly compatible with
>and equivalent to other D arrays. However, the empty string seems to throw a
>monkey wrench to this idea. In addition, static arrays also mess things up.
>
>// Here we go:
>char[]  nul = null;
>char[]  str = "";
>char[]  dyn;
>char[0] sta;
>char[]  ini = new char[0];
>
>printf("%d", nul.length);
>printf("%d", str.length);
>printf("%d", dyn.length);
>printf("%d", sta.length);
>printf("%d", ini.length);
>// The above all print 0, so far so good.
>
>printf(nul ? "true" : "false");
>printf(str ? "true" : "false");
>printf(dyn ? "true" : "false");
>printf(sta ? "true" : "false");
>printf(ini ? "true" : "false");
>// Here we get false, true, false, true, false.
>
>What's up with this? Why are empty strings (which are empty even by .length
>accounts) and static arrays different than other empty arrays?
>
>I have a feeling this has got to do with the internal array pointer, but I'm not
>sure. At any rate, it's not very semantically intuitive. Is this done on
>purpose, or can we look forward to some convergence?
>
>Cheers,
>--AJG.
>
>PS: What exactly does passing NULL to an array parameter do?
>
>================================
>2B || !2B, that is the question.
June 26, 2005
Re: Empty String == Empty Array == Empty Static Array?
On Sat, 25 Jun 2005 04:57:38 +0000 (UTC), AJG <AJG_member@pathlink.com>  
wrote:
> I was under the impression that strings (char[]) were strictly  
> compatible with
> and equivalent to other D arrays. However, the empty string seems to  
> throw a
> monkey wrench to this idea. In addition, static arrays also mess things  
> up.
>
> // Here we go:
> char[]  nul = null;
> char[]  str = "";
> char[]  dyn;
> char[0] sta;
> char[]  ini = new char[0];
>
> printf("%d", nul.length);
> printf("%d", str.length);
> printf("%d", dyn.length);
> printf("%d", sta.length);
> printf("%d", ini.length);
> // The above all print 0, so far so good.
>
> printf(nul ? "true" : "false");
> printf(str ? "true" : "false");
> printf(dyn ? "true" : "false");
> printf(sta ? "true" : "false");
> printf(ini ? "true" : "false");
> // Here we get false, true, false, true, false.
>
> What's up with this? Why are empty strings (which are empty even by  
> .length accounts) and static arrays different than other empty arrays?
>
> I have a feeling this has got to do with the internal array pointer, but  
> I'm not sure.

I believe so. When you compare an array "reference" to null you end up  
comparing the array data pointer to null. An array is essentially a struct  
in the form:

struct array {
  int length;
  void* data;
}

(perhaps not using void, but instead the actual data type?)

Note, a static array is special. It's data member cannot be null and it's  
length parameter is actually macro replaced upon compilation. eg.

char[5] sta;
char[] dyn;

void main()
{
	int* p = &sta.length;
	int* q = &dyn.length;
}

sta.d(1): constant 5 is not an lvalue
sta.d(6): cannot implicitly convert expression (#5) of type uint* to int*
sta.d(7): dyn.length is not an lvalue
sta.d(7): cannot implicitly convert expression (#dyn.length) of type uint*  
to int*

Errors 1 & 2 show 'sta.length' replaced by '5'.
Errors 3 & 4 are due to .length being a property/getter method call, not  
an int.

I suspect also that the reference refers directly to the data and not to  
an array struct (shown above).

> At any rate, it's not very semantically intuitive. Is this done on
> purpose, or can we look forward to some convergence?

It's a metter of asking for what you really want to know. For example...

  if(arr is null) ;//array never assigned, i.e. non-existant.
  else if (arr.length == 0) ;//array exists, no data present.
  else ;//array exists, has some data.

In many cases "arr.length == 0" is all you care about, in some "arr is  
null" might be important.

> PS: What exactly does passing NULL to an array parameter do?

Creates an array (struct shown above) with data set to null and length set  
to 0. This is why .length always 'works' and never gives a segmentation  
fault for a 'null' array.

Regan
June 27, 2005
Re: Empty String == Empty Array == Empty Static Array?
Hi Regan,

Thanks a ton for the post! This actually cleared everything up for me. At least
now I know what to expect from the language regarding arrays.

>> PS: What exactly does passing NULL to an array parameter do?
>
>Creates an array (struct shown above) with data set to null and length set  
>to 0. This is why .length always 'works' and never gives a segmentation  
>fault for a 'null' array.

I guess this is a good thing since it prevents functions expecting only an array
(even if empty) from segfaulting on a null. However, it also reduces the
possibilities for the programmer. Since passing null means passing "an empty
array," you can't differentiate between them (safely). Right?

In addition, this is somewhat inconsistent with the way classes and objects are
treated. For a class, you can pass a null, and the function will receive that
null (not an "empty" class, whatever that would be) and it will segfault unless
you guard against it. This is why I was originally confused with the behaviour
of arrays.

Finally, a small suggestion. What if when doing something like:

if (someArray) // Do stuff.

Meant implicitly checking the .length property automagically (instead of
checking the internal pointer)? The former seems more useful to me, since the
concept of a null array no longer exists; it's also universal across all array
types. The latter, on the other hand, seems kind of implementation-ish and
hackish, IMHO.

Anyway, thanks again for the help.
Cheers,
--AJG.



In article <opsszz0xna23k2f5@nrage.netwin.co.nz>, Regan Heath says...
>
>On Sat, 25 Jun 2005 04:57:38 +0000 (UTC), AJG <AJG_member@pathlink.com>  
>wrote:
>> I was under the impression that strings (char[]) were strictly  
>> compatible with
>> and equivalent to other D arrays. However, the empty string seems to  
>> throw a
>> monkey wrench to this idea. In addition, static arrays also mess things  
>> up.
>>
>> // Here we go:
>> char[]  nul = null;
>> char[]  str = "";
>> char[]  dyn;
>> char[0] sta;
>> char[]  ini = new char[0];
>>
>> printf("%d", nul.length);
>> printf("%d", str.length);
>> printf("%d", dyn.length);
>> printf("%d", sta.length);
>> printf("%d", ini.length);
>> // The above all print 0, so far so good.
>>
>> printf(nul ? "true" : "false");
>> printf(str ? "true" : "false");
>> printf(dyn ? "true" : "false");
>> printf(sta ? "true" : "false");
>> printf(ini ? "true" : "false");
>> // Here we get false, true, false, true, false.
>>
>> What's up with this? Why are empty strings (which are empty even by  
>> .length accounts) and static arrays different than other empty arrays?
>>
>> I have a feeling this has got to do with the internal array pointer, but  
>> I'm not sure.
>
>I believe so. When you compare an array "reference" to null you end up  
>comparing the array data pointer to null. An array is essentially a struct  
>in the form:
>
>struct array {
>   int length;
>   void* data;
>}
>
>(perhaps not using void, but instead the actual data type?)
>
>Note, a static array is special. It's data member cannot be null and it's  
>length parameter is actually macro replaced upon compilation. eg.
>
>char[5] sta;
>char[] dyn;
>
>void main()
>{
>	int* p = &sta.length;
>	int* q = &dyn.length;
>}
>
>sta.d(1): constant 5 is not an lvalue
>sta.d(6): cannot implicitly convert expression (#5) of type uint* to int*
>sta.d(7): dyn.length is not an lvalue
>sta.d(7): cannot implicitly convert expression (#dyn.length) of type uint*  
>to int*
>
>Errors 1 & 2 show 'sta.length' replaced by '5'.
>Errors 3 & 4 are due to .length being a property/getter method call, not  
>an int.
>
>I suspect also that the reference refers directly to the data and not to  
>an array struct (shown above).
>
>> At any rate, it's not very semantically intuitive. Is this done on
>> purpose, or can we look forward to some convergence?
>
>It's a metter of asking for what you really want to know. For example...
>
>   if(arr is null) ;//array never assigned, i.e. non-existant.
>   else if (arr.length == 0) ;//array exists, no data present.
>   else ;//array exists, has some data.
>
>In many cases "arr.length == 0" is all you care about, in some "arr is  
>null" might be important.
>
>> PS: What exactly does passing NULL to an array parameter do?
>
>Creates an array (struct shown above) with data set to null and length set  
>to 0. This is why .length always 'works' and never gives a segmentation  
>fault for a 'null' array.
>
>Regan

================================
2B || !2B, that is the question.
June 27, 2005
Re: Empty String == Empty Array == Empty Static Array?
On Mon, 27 Jun 2005 05:09:02 +0000 (UTC), AJG <AJG_member@pathlink.com>  
wrote:
> Hi Regan,
>
> Thanks a ton for the post! This actually cleared everything up for me.  
> At least now I know what to expect from the language regarding arrays.
>
>>> PS: What exactly does passing NULL to an array parameter do?
>>
>> Creates an array (struct shown above) with data set to null and length  
>> set
>> to 0. This is why .length always 'works' and never gives a segmentation
>> fault for a 'null' array.
>
> I guess this is a good thing since it prevents functions expecting only  
> an array (even if empty) from segfaulting on a null.

Yep.

> However, it also reduces the
> possibilities for the programmer. Since passing null means passing "an  
> empty array," you can't differentiate between them (safely). Right?

Actually you can still differentiate between a null array and an empty  
array, eg.

if (arr is null) ;//non-existant
else if (arr.length == 0) ;//empty
else ;//has items

However don't set length to 0 or you'll turn an empty array into a null  
array. I believe this behaviour is broken.

> In addition, this is somewhat inconsistent with the way classes and  
> objects are
> treated. For a class, you can pass a null, and the function will receive  
> that
> null (not an "empty" class, whatever that would be) and it will segfault  
> unless
> you guard against it. This is why I was originally confused with the  
> behaviour
> of arrays.

True. Were it up to me...

> Finally, a small suggestion. What if when doing something like:
>
> if (someArray) // Do stuff.
>
> Meant implicitly checking the .length property automagically (instead of
> checking the internal pointer)?

That has been suggested. IMO it breaks consistency with all other types  
where if (x) compares x will null.

> The former seems more useful to me, since the concept of a null array no  
> longer exists;

But it does, it's just not obvious (maybe that's a good thing?) in general  
cases you do not care about null, only if (arr.length == 0).

Regan
June 27, 2005
Re: Empty String == Empty Array == Empty Static Array?
Regan Heath wrote:
> On Mon, 27 Jun 2005 05:09:02 +0000 (UTC), AJG <AJG_member@pathlink.com>  
>> Finally, a small suggestion. What if when doing something like:
>>
>> if (someArray) // Do stuff.
>>
>> Meant implicitly checking the .length property automagically (instead of
>> checking the internal pointer)?
> 
> 
> That has been suggested. IMO it breaks consistency with all other types  
> where if (x) compares x will null.

So if we can't have it, I'd strongly suggest making 'if (arr)' illegal. 
 Otherwise massive confusion will follow. Currently arrays are treated 
both as classes and structs at the same time. We can do if(arr), 
checking null-ness like with classes but passing arrays to functions 
works as if they were structs - thus we have to use inout to modify ptr 
or length. Where's the consistency ??? :(



-- 
Tomasz Stachowiak  /+ a.k.a. h3r3tic +/
June 27, 2005
Re: Empty String == Empty Array == Empty Static Array?
On Mon, 27 Jun 2005 12:31:32 +0200, Tom S  
<h3r3tic@remove.mat.uni.torun.pl> wrote:
> Regan Heath wrote:
>> On Mon, 27 Jun 2005 05:09:02 +0000 (UTC), AJG <AJG_member@pathlink.com>
>>> Finally, a small suggestion. What if when doing something like:
>>>
>>> if (someArray) // Do stuff.
>>>
>>> Meant implicitly checking the .length property automagically (instead  
>>> of
>>> checking the internal pointer)?
>>   That has been suggested. IMO it breaks consistency with all other  
>> types  where if (x) compares x will null.
>
> So if we can't have it, I'd strongly suggest making 'if (arr)' illegal.

No thanks.

>   Otherwise massive confusion will follow.

Somewhat prophetic?

> Currently arrays are treated both as classes and structs at the same  
> time.

That is because they share properties of both but are in fact neither.  
They are 'arrays'.

> We can do if(arr), checking null-ness like with classes but passing  
> arrays to functions works as if they were structs

The specific nature of arrays makes slicing possible. Slicing is  
incredibly powerful (I think you'll agree). When you slice you create a  
new reference to a new 'struct' where the data pointer refers to the start  
of the data and the length is set accordingly.

When you pass an array as an 'in' parameter you effectively slice the  
entire array, you basically duplicate the 'struct'. Sure, the default  
could have been to pass the reference, but then you loose the choice to  
pass as it currently does. You still have the choice of passing as a  
reference, simply use 'inout'. It's all about choice.

"if (arr)" compares the data pointer, but, it is actually comparing the  
reference at the same time as a null reference has a null data pointer.  
Making it silently compare the length would be confusing IMO.

The only problem I have with arrays is with "arr.length = 0;" setting the  
data pointer to null, this turns an existing/empty array into a  
null/non-existant array.

> - thus we have to use inout to modify ptr or length. Where's the  
> consistency ??? :(

An array is not a class.
An array is not a struct.
An array is a unique type that has properties of both classes and structs.

Regan
June 27, 2005
Re: Empty String == Empty Array == Empty Static Array?
Regan Heath wrote:
> On Mon, 27 Jun 2005 12:31:32 +0200, Tom S  
> <h3r3tic@remove.mat.uni.torun.pl> wrote:
> 
>> Regan Heath wrote:
>>   Otherwise massive confusion will follow.
> 
> 
> Somewhat prophetic?

There is some confusion at the moment. Granted that more people will 
find interest in D, I think I could be a prophet by profession.


>> Currently arrays are treated both as classes and structs at the same  
>> time.
> 
> 
> That is because they share properties of both but are in fact neither.  
> They are 'arrays'.

Yup. I agree with that.


>> We can do if(arr), checking null-ness like with classes but passing  
>> arrays to functions works as if they were structs
> 
> 
> The specific nature of arrays makes slicing possible. Slicing is  
> incredibly powerful (I think you'll agree). When you slice you create a  
> new reference to a new 'struct' where the data pointer refers to the 
> start  of the data and the length is set accordingly.
> 
> When you pass an array as an 'in' parameter you effectively slice the  
> entire array, you basically duplicate the 'struct'. Sure, the default  
> could have been to pass the reference, but then you loose the choice to  
> pass as it currently does. You still have the choice of passing as a  
> reference, simply use 'inout'. It's all about choice.

That is the correct behaviour IMO and I got used to it long time ago.


> The only problem I have with arrays is with "arr.length = 0;" setting 
> the  data pointer to null, this turns an existing/empty array into a  
> null/non-existant array.

I haven't had a need to differentiate between an empty and a null array, 
so for me 'if (arr)' should be true when either the pointer is null or 
the length is == 0. Thus if (arr) checking ptr makes little sense IMO, 
but for you it's the desired behaviour.
Thus I suggested disallowing implicit conversions from arrays to bool's 
in order to make sure the programmer specifies exactly he/she wants.
Some people will quickly grasp the difference between nullness and 
emptiness of arrays. But others will find their code buggy due to the 
implicit conversion.
How often do you want to check the nullness of an array in your code 
compared to checking its emptiness ? I haven't had a single reason to 
check for nullness yet. Thus disallowing 'if (arr)' seems like the best 
option IMO. Any other solutions ?


>> - thus we have to use inout to modify ptr or length. Where's the  
>> consistency ??? :(
> 
> 
> An array is not a class.
> An array is not a struct.
> An array is a unique type that has properties of both classes and structs.

Agreed


-- 
Tomasz Stachowiak  /+ a.k.a. h3r3tic +/
June 27, 2005
Re: Empty String == Empty Array == Empty Static Array?
On Mon, 27 Jun 2005 17:44:24 +0200, Tom S  
<h3r3tic@remove.mat.uni.torun.pl> wrote:
> Regan Heath wrote:
>> The only problem I have with arrays is with "arr.length = 0;" setting  
>> the  data pointer to null, this turns an existing/empty array into a   
>> null/non-existant array.
>
> I haven't had a need to differentiate between an empty and a null array,  
> so for me 'if (arr)' should be true when either the pointer is null or  
> the length is == 0.

You mean false?

> Thus if (arr) checking ptr makes little sense IMO, but for you it's the  
> desired behaviour.

"if (arr)" does not check the ptr, it checks the reference, comparing it  
to null/0 just like it does for _every other type_ in D. It just happens  
that for arrays a null reference cannot exist, instead you get a reference  
to a struct with a null data pointer, thus the confusion.

> Thus I suggested disallowing implicit conversions from arrays to bool's  
> in order to make sure the programmer specifies exactly he/she wants.

I dont think arrays have a stronger argument for this than any other type  
in D.

> Some people will quickly grasp the difference between nullness and  
> emptiness of arrays. But others will find their code buggy due to the  
> implicit conversion.

People simply need to learn that an array is not a reference, not a  
struct, but a unique type with properties of both. It's a reference to a  
struct and cannot be null. I dont think the implicit conversion is the  
cause of confusion, I think the nature of arrays is, once that is grasped  
confusion vanishes.

> How often do you want to check the nullness of an array in your code  
> compared to checking its emptiness ? I haven't had a single reason to  
> check for nullness yet.

I'll admit it's much less frequent than simply wanting to know if there  
are any items or not.

> Thus disallowing 'if (arr)' seems like the best option IMO. Any other  
> solutions ?

Fix "arr.length = 0;" so it does not set the data ptr to null. Nothing  
else needs to be done IMO.

Regan
June 27, 2005
Re: Empty String == Empty Array == Empty Static Array?
Regan Heath wrote:
> On Mon, 27 Jun 2005 17:44:24 +0200, Tom S  
> <h3r3tic@remove.mat.uni.torun.pl> wrote:
> 
>> Regan Heath wrote:
>> I haven't had a need to differentiate between an empty and a null 
>> array,  so for me 'if (arr)' should be true when either the pointer is 
>> null or  the length is == 0.
> 
> 
> You mean false?

Sorry, my bug.


>> Thus if (arr) checking ptr makes little sense IMO, but for you it's 
>> the  desired behaviour.
> 
> 
> "if (arr)" does not check the ptr, it checks the reference, comparing 
> it  to null/0 just like it does for _every other type_ in D. It just 
> happens  that for arrays a null reference cannot exist, instead you get 
> a reference  to a struct with a null data pointer, thus the confusion.

And it happens to make little sense in most cases. Simple operations 
should be simple. 99% of time I want to check if an array is empty. if 
(foo) then means 'does foo have valid data ? / does foo make any sense ?'.
if (obj)  <-- obj reference is not null ?
if (arr)  <-- array is not empty ?
That's the test I mean by default. Instead by default I'm getting a 
something that can only be a source of bugs :/
But it seems that I'm alone with my view on the subject, so I'll just 
shut up and go play with my toys.



>> Thus I suggested disallowing implicit conversions from arrays to 
>> bool's  in order to make sure the programmer specifies exactly he/she 
>> wants.
> 
> 
> I dont think arrays have a stronger argument for this than any other 
> type  in D.

No ? Arrays are the only type in D that can be asked for their members 
(ptr, length) when they are null (when in your reasoning, the nullness 
of the ptr member means nullness of the array reference). This doesn't 
happen to be true with classes nor structs /+ excluding static members +/


>> Some people will quickly grasp the difference between nullness and  
>> emptiness of arrays. But others will find their code buggy due to the  
>> implicit conversion.
> 
> 
> People simply need to learn that an array is not a reference, not a  
> struct, but a unique type with properties of both. It's a reference to 
> a  struct and cannot be null. I dont think the implicit conversion is 
> the  cause of confusion, I think the nature of arrays is, once that is 
> grasped  confusion vanishes.

Confusion vanishes, bugs dont. Once one understands the nature of the 
for loop, they won't write code like:

for (....);
{
}

or will they ?  /+ yes, I know that dmd reports an error here +/



-- 
Tomasz Stachowiak  /+ a.k.a. h3r3tic +/
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home