empty arrays - no complaints? - D Programming Language Discussion Forum

Re: Strings and Arrays
Jun 29, 2004 Arcane Jill
Jun 29, 2004 Sam McCall
Jun 29, 2004 Arcane Jill
Jun 29, 2004 Sam McCall
Jun 30, 2004 Arcane Jill
Jun 30, 2004 Sam McCall
Jun 30, 2004 Arcane Jill
Jun 30, 2004 Sam McCall

Jun 29, 2004

Bent Rasmussen

Jun 29, 2004

Jun 30, 2004

Jun 29, 2004

Jun 30, 2004

Jun 29, 2004

Jun 29, 2004

Jun 30, 2004

Jun 29, 2004

Jun 30, 2004

Jun 29, 2004

Jun 29, 2004

Jun 29, 2004

Jun 30, 2004

Jun 30, 2004

Jul 01, 2004

Jul 01, 2004

Jul 01, 2004

Jul 01, 2004

Jun 30, 2004

Jun 29, 2004

Jun 30, 2004

Jun 30, 2004

Jun 30, 2004

Jun 30, 2004

Jul 01, 2004

Jul 01, 2004

Jul 03, 2004

Jun 30, 2004

Jun 27, 2004

Jun 27, 2004

Jun 27, 2004

Jun 28, 2004

Jun 28, 2004

Jun 28, 2004

Jun 28, 2004

Jun 29, 2004

Jun 29, 2004

Jun 30, 2004

Jun 30, 2004

Jun 28, 2004

Jun 28, 2004

Jun 29, 2004

Jun 29, 2004

Jun 29, 2004

Jun 29, 2004

Jun 28, 2004

Jun 27, 2004

Jun 28, 2004

June 27, 2004

empty arrays - no complaints?

Posted by Farmer

Permalink

Farmer

Permalink

Why are there (almost) no complaints about D's support for empty arrays?


Just to get ex-BASIC programmers in touch with this aspect of D arrays,
here's a (not so) small D sample that shows how to create
   a)null arrays (named: null1, null2, null3)
   b)empty arrays (named: array1, array2, array3)
and also shows how they differ.

[D arrays have sooooo obvious semantic, that D programmers should feel free to skip to the end of this post and read the conclusion.]


--------------------- array sample code ---------------------


void printTraits(char[] array, char[] name)
{

   printf("\n%10.*s%-13.*s", name, ".length == 0");
   if (array.length == 0)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("%10.*s%-13.*s", name, " is null");
   if (array is null)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("\n%10.*s%-13.*s", name, " == null");
   if (array == null)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("%10.*s%-13.*s", name, " == \"\"");
   if (array == "")
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");
}


int main(char args[][])
{
   char[] empty1=(new char[1])[0..0];
   char[] empty2="1"[1..1];   // empty2="1"[2..2]  causes ArrayBoundsError
   char[] empty3="";

   char[] null1;
   char[] null2=new char[0];
   char[] null3=empty1;
   null3.length=0;

   printTraits(null1, "null1");
   printTraits(null2, "null2");
   printTraits(null3, "null3");
   printf("\n");
   printTraits(empty1, "empty1");
   printTraits(empty2, "empty2");
   printTraits(empty3, "empty3");
   printf("\n\n");
   if (null1 == null)
      printf("%20.*s","null1 == null   ");
   if (empty1 == null1)
      printf("%20.*s","empty1 == null1  ");
   if (empty1 != null)
      printf("%20.*s","but  empty1 != null");
   printf("\n");

   return 0;
}


Build with DMD 0.93 (Windows), the output is:

     null1.length == 0    is true     null1 is null        is true
     null1 == null        is true     null1 == ""          is true
     null2.length == 0    is true     null2 is null        is true
     null2 == null        is true     null2 == ""          is true
     null3.length == 0    is true     null3 is null        is true
     null3 == null        is true     null3 == ""          is true

    empty1.length == 0    is true    empty1 is null       is false
    empty1 == null       is false    empty1 == ""          is true
    empty2.length == 0    is true    empty2 is null       is false
    empty2 == null       is false    empty2 == ""          is true
    empty3.length == 0    is true    empty3 is null       is false
    empty3 == null       is false    empty3 == ""          is true

    null1 == null      empty1 == null1   but  empty1 != null


--------------------- end of array sample ---------------------



Conclusion: D does have empty-arrays and null-arrays but the language tries to blur them.

This is unfortunate as

1) a clear separation of empty-arrays vs. null-arrays is useful for functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist), be empty, or have a non-empty value.

If empty-arrays vs. null-arrays are blurred, the interface gets more bloated:
// additional parameter
char[] getAttrValue(char[] name, out bit isNull)
// additional function, potentially wasting a slot in the VTable
bit hasAttrValue(char[] name)
// additional indirection
Attribute getAttribute(char[] name)


2) Initialization bugs are not detected at runtime.

D has
-null-references for objects
-null for pointers
-nan's for FP types
-invalid characters for unicode characters
-garantueed initialization of structs (Constructors are comming, soon !)
-and strong typedefs that empower the programmer to define application
specific 'not-initialized' values for integer types

to make an ubiquitous source of bugs, easy to spot and fix. But if empty/null arrays are commonly treated as being the same thing, uninitialized arrays will cause subtle bugs here and there.


3) This aspect of array behaviour is not obvious!

Ok, what's obvious is always a moot point. (If I knew, what's obvious, I
would write posts about bit vs. bool vs. strong bool types.)
But I know that the array behaviour is definitely not obvious to all D/C/C++
programmers.


So, why doesn't anyone complain?


Farmer.

June 27, 2004

Re: empty arrays - no complaints?

Posted by Sean Kelly
in reply to Farmer

Permalink

Sean Kelly

Posted in reply to Farmer

Permalink

In article <Xns9515C8A3CA1ACitsFarmer@63.105.9.61>, Farmer says...
>
>Conclusion: D does have empty-arrays and null-arrays but the language tries to blur them.

Not really.  I'd rather argue that D tries to make both usable and reduce odd errors resulting from uninitialized arrays.

>This is unfortunate as
>
>1) a clear separation of empty-arrays vs. null-arrays is useful for functional rich but simple API interfaces:
>
>Imagine a function that returns the value of attributes of a XML-element char[] getAttrValue(char[] name)
>
>The attribute value could be non-existant (the attribute doesn't exist), be empty, or have a non-empty value.

I'd say this is an interface or documentaation problem, not a language problem.

>2) Initialization bugs are not detected at runtime.

This makes sense in this case.  I don't like the idea of having to distinguish between an initialized array with no elements and an uninitialized array, as both are equivalent IMO.  Further, setting the length property will cause a reallocation for both types of arrays.

>to make an ubiquitous source of bugs, easy to spot and fix. But if empty/null arrays are commonly treated as being the same thing, uninitialized arrays will cause subtle bugs here and there.

I believe the opposite would be true.


Sean

June 27, 2004

Re: empty arrays - no complaints?

Posted by Andy Friesen
in reply to Farmer

Permalink

Andy Friesen

Posted in reply to Farmer

Permalink

Farmer wrote:
> Why are there (almost) no complaints about D's support for empty arrays?
> 
> Conclusion: D does have empty-arrays and null-arrays but the language tries  to blur them. 
> 
> This is unfortunate ...
> 
> So, why doesn't anyone complain?

I think the problem is that D arrays almost always behave like reference types, and therefore are almost always treated like reference types.

They aren't.  null arrays *are* empty arrays.

Arrays are value types which consist of a length and a pointer to memory.  Copying and slicing an array creates a brand new array whose data happens to (generally) be memory that is also pointed to by another array.

So!  Rules of thumb:

    1) think of arrays as though they are value types which can be cheaply copied.
    2) use .dup if you need to mutate copies made in this way. (the Copy-on-Write principle)

 -- andy

June 27, 2004

Re: empty arrays - no complaints?

Posted by Arcane Jill
in reply to Farmer

Permalink

Arcane Jill

Posted in reply to Farmer

Permalink

In article <Xns9515C8A3CA1ACitsFarmer@63.105.9.61>, Farmer says...
>
>Why are there (almost) no complaints about D's support for empty arrays?

Actually, I think that D has got it right here. At least mostly. I'm happy with the fact that null counts as an empty array. But I do have SOME gripes. These are:

(1) given that a is an array of length n, the expression a[n..n] gives an array bounds exception, and I don't believe it should. I would prefer that it simply evaluated to an empty string. I've lost count of the number of times I've had to put a special test for this case in various bits of code. It's a fairly normal thing to do, to have a pointer (or index in this case) to the first element BEYOND the last one in which you're interested, and to slice against it. Currently you get the assert if n == a.length. I don't believe it should assert unless n >= a.length

(2) I think it is wrong that the test (a == null) will return true if and only if BOTH the length AND the address are zero. I think, if we're going to have a model in which the statement a = null; will create an empty array, then (a == null) should return true if a /is/ an empty array. That is, only the length should be tested, not the address. (If you want to test both parts, well there's always a === null).

Arcane Jill

June 27, 2004

Re: empty arrays - no complaints?

Posted by Regan Heath
in reply to Arcane Jill

Permalink

Regan Heath

Posted in reply to Arcane Jill

Permalink

On Sun, 27 Jun 2004 18:58:50 +0000 (UTC), Arcane Jill <Arcane_member@pathlink.com> wrote:
> In article <Xns9515C8A3CA1ACitsFarmer@63.105.9.61>, Farmer says...
>>
>> Why are there (almost) no complaints about D's support for empty arrays?
>
> Actually, I think that D has got it right here. At least mostly. I'm happy with
> the fact that null counts as an empty array. But I do have SOME gripes. These
> are:
>
> (1) given that a is an array of length n, the expression a[n..n] gives an array
> bounds exception, and I don't believe it should. I would prefer that it simply
> evaluated to an empty string. I've lost count of the number of times I've had to
> put a special test for this case in various bits of code. It's a fairly normal
> thing to do, to have a pointer (or index in this case) to the first element
> BEYOND the last one in which you're interested, and to slice against it.
> Currently you get the assert if n == a.length. I don't believe it should assert
> unless n >= a.length

This (now?) works.

void main()
{
	char[] a;
	
	a ~= "1";
	a ~= "2";
	a ~= "3";
	printf("%.*s\n",a[3..3]);
	printf("%.*s\n",a[2..3]);
	printf("%.*s\n",a[1..3]);
	printf("%.*s\n",a[0..3]);
}

> (2) I think it is wrong that the test (a == null) will return true if and only
> if BOTH the length AND the address are zero.

I think this is correct.

> I think, if we're going to have a
> model in which the statement a = null; will create an empty array,

I think this is wrong. a = null should set the data to null and length to 0.
It should *not* create an empty array.

> then (a ==
> null) should return true if a /is/ an empty array. That is, only the length
> should be tested, not the address. (If you want to test both parts, well there's
> always a === null).

We *need* to have *both* null and empty arrays. The reason is pretty simple:
  - null means does not exist
  - emtpy means exists, but has no value (or empty value)

This is important in situations like the original poster mentioned and in my experience for example... When reading POST input from a web page, you get a string like so:

  Setting1=Regan+Heath&Setting2=&&

when requesting items you might have a function like:

  char[] getFormValue(char[] label);

the code to get the values for the above form might go:

  char[] s;

  s = getFormValue("Setting1"); // s is "Regan Heath"
  s = getFormValue("Setting2"); // s is ""
  s = getFormValue("Setting3"); // s is null

It is important the above code can tell that Setting3 was not passed in the form, so it can decide not to overwrite whatever current value that setting has, whereas it can tell Setting2 was passed and will overwrite the current value with a new blank one.


I think the problem with arrays is that a null array should not compare equal to an empty array. In other words the original post test(s)
  null1 == ""
  null1 == empty1

should be false.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

June 27, 2004

Re: empty arrays - no complaints?

Posted by Derek
in reply to Regan Heath

Permalink

Derek

Posted in reply to Regan Heath

Permalink

On Mon, 28 Jun 2004 10:06:18 +1200, Regan Heath wrote:

[snip]

> 
> We *need* to have *both* null and empty arrays. The reason is pretty
> simple:
>    - null means does not exist
>    - emtpy means exists, but has no value (or empty value)
> 

Agreed. A non-existant array is not the same as an array with no elements.

-- 
Derek
Melbourne, Australia

June 27, 2004

Re: empty arrays - no complaints?

Posted by Farmer
in reply to Sean Kelly

Permalink

Farmer

Posted in reply to Sean Kelly

Permalink

Sean Kelly <sean@f4.ca> wrote in news:cbn29h$rpo$1@digitaldaemon.com:


> Not really.  I'd rather argue that D tries to make both usable and reduce odd errors resulting from uninitialized arrays.

I think, D tries to *hide* errors resulting from uninitialized arrays.

> 
>>This is unfortunate as
>>
>>1) a clear separation of empty-arrays vs. null-arrays is useful for functional rich but simple API interfaces:
>>
>>Imagine a function that returns the value of attributes of a XML-element char[] getAttrValue(char[] name)
>>
>>The attribute value could be non-existant (the attribute doesn't exist), be empty, or have a non-empty value.
> 
> I'd say this is an interface or documentaation problem, not a language problem.

You misunderstood me, I meant that the function interface is a good one.
I could document the function like this:
/*
   Function returns the value the attribute of the given name.
   @param  name  name of the attribute
   @return  returns null if the attribute doesn't exist
            returns value of the attribute otherwise
*/
char[] getAttrValue(char[] name)

But the other functions, I mentioned would be a necessary workaround if you couldn't distinguish between null and empty arrays. And these functions are a waste of both cpu cycles and developer brain.


>>2) Initialization bugs are not detected at runtime.
> 
> This makes sense in this case.  I don't like the idea of having to distinguish between an initialized array with no elements and an uninitialized array, as both are equivalent IMO.  Further, setting the length property will cause a reallocation for both types of arrays.

Well, it's quite easy to do distinquish between an empty and a null array: An uninitialized array (null array) is a bug in either the programmer's code or in the code of a library. An initialized array (empty array) is a perfectly legal thing.

Why is the idea to distinguish between a bug and correct programm behaviour such an unpleasent thing?


Reallocation occures if the length is greater than the allocated size. I'm fine with that, the length 'property' is such an oddity that whatever it does, I would call it consistent.

Reallocation is garanteed to not happen if the new length is less or equal the allocated size (Walter said so). Well, except when the new length happens to be 0. Talk about consistency.

June 27, 2004

Re: empty arrays - no complaints?

Posted by Farmer
in reply to Arcane Jill

Permalink

Farmer

Posted in reply to Arcane Jill

Permalink

Arcane Jill <Arcane_member@pathlink.com> wrote in news:cbn5da$vu1$1@digitaldaemon.com:

> In article <Xns9515C8A3CA1ACitsFarmer@63.105.9.61>, Farmer says...
>>
>>Why are there (almost) no complaints about D's support for empty arrays?
> 
> Actually, I think that D has got it right here. At least mostly. I'm happy with the fact that null counts as an empty array. But I do have SOME gripes. These are:
> 
> (1) given that a is an array of length n, the expression a[n..n] gives an array bounds exception, and I don't believe it should. I would prefer that it simply evaluated to an empty string. I've lost count of the number of times I've had to put a special test for this case in various bits of code. It's a fairly normal thing to do, to have a pointer (or index in this case) to the first element BEYOND the last one in which you're interested, and to slice against it. Currently you get the assert if n == a.length. I don't believe it should assert unless n >= a.length

I'm a bit confused, since in my sample, the array 'empty2' is created from a slice that points behind the array and it didn't cause an array bounds exception. Or did you need empty-slices, that point at arbitrary memory locations?




> (2) I think it is wrong that the test (a == null) will return true if and only if BOTH the length AND the address are zero. I think, if we're going to have a model in which the statement a = null; will create an empty array, then (a == null) should return true if a /is/ an empty array. That is, only the length should be tested, not the address. (If you want to test both parts, well there's always a === null).

I guess the rule here is simple: For value types (as the array handle is one) ==/equals() is exactly the same as ===/is.

But why should we're going to model arrays in way that make arrays less powerful and requires *additional* code to make the model work correct?



Regards,
   Farmer.

June 27, 2004

Re: empty arrays - no complaints?

Posted by Farmer
in reply to Andy Friesen

Permalink

Farmer

Posted in reply to Andy Friesen

Permalink

Andy Friesen <andy@ikagames.com> wrote in news:cbn3js$tgq$1@digitaldaemon.com:


> 
> I think the problem is that D arrays almost always behave like reference types, and therefore are almost always treated like reference types.

Yes, this is a problem. It is a necessary evil to archive that outstanding performance. But it is not really related to the topic null array vs. empty array, since empty arrays are possible with the D array layout


> They aren't.  null arrays *are* empty arrays.

No, null arrays are not empty arrays, as my sample proofs.


> Arrays are value types which consist of a length and a pointer to memory.  Copying and slicing an array creates a brand new array whose data happens to (generally) be memory that is also pointed to by another array.

I think there's a lapsus, slices *always* point to the same memory as the array from which they were created.


Regards,
   Farmer.

June 27, 2004

Re: empty arrays - no complaints?

Posted by Andy Friesen
in reply to Farmer

Permalink

Andy Friesen

Posted in reply to Farmer

Permalink

Farmer wrote:

> Andy Friesen <andy@ikagames.com> wrote in news:cbn3js$tgq$1@digitaldaemon.com:
> 
>>I think the problem is that D arrays almost always behave like reference types, and therefore are almost always treated like reference types.
> 
> Yes, this is a problem. It is a necessary evil to archive that outstanding  performance. But it is not really related to the topic null array vs. empty array, since empty arrays are possible with the D array layout
Sure, in the same sense that D allows 'empty' integers. :)

>>They aren't.  null arrays *are* empty arrays.
> 
> No, null arrays are not empty arrays, as my sample proofs.
Conceptually they are.  If the length is zero, then the data pointer is meaningless.  Testing the data pointer in such a case can be likened to using the result of a division by zero.  Doing things like mathematically 'proving' that 3==5 or that empty!==null is easy when you go into the twilight zone. :)

As an example:

    import std.string;

    char[] permute(char[] c) {
        // mutate that to which the array refers
        c[0] = 'H';
        // mutate the array
        c.length = 4;
        return c;
    }

    int main() {
        char[] c = "hello world!";
        printf("%s\n", toStringz(c));

        char[] d = permute(c);

        printf("Post-permute\n");
        printf("%s\n", toStringz(c));
        printf("%s\n", toStringz(d));
        return 0;
    }

This program produces the output:

	hello world!
	Hello world!
	Hell

The array is a value type.  The data it points to is not.

>>Arrays are value types which consist of a length and a pointer to memory.  Copying and slicing an array creates a brand new array whose data happens to (generally) be memory that is also pointed to by another array.
> 
> I think there's a lapsus, slices *always* point to the same memory as the array from which they were created.
In my experience, this is true, but I don't know if it *must*, so I felt obligated to qualify my statement.

 -- andy

Top | Forum index | About this forum

Forums