July 21, 2005
Hi,

>Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.

Hm... not only does this distinction exist, it is in fact _very_ much available in D. That's exactly the point Regan has made in some past replies. I'm indifferent towards this distinction, but Regan seems fond of it. Please look at my examples further below.

>And at all, arrays have somewhat pointer-like semantics in D.

No, the do not, IMHO. This is one of the points I've tried to make. Arrays have completely different semantics in D compared to C. In D arrays are first-class objects. They are handled via references, which can't be nulled, they keep their own length, etc. I think this is a good thing. Very different from C.

>One of the reasons is that it seems familiar to C programmers.

Indeed. It seems familiar, and people will misuse it because of that. But then the boogieman comes and gets them in the form of a weird bug.

Examples of the incongruence (empty _but_ existant array):

# int[0] emptyArray;
# if (emptyArray) writef("See, I'm empty, yet I exist!");
// The statement will print.

// Let's try it again:
# int[] emptyArray = new int[0];
# if (emptyArray) writef("I'm still empty, but non-existant.");
// The statement will *not* print.

// Think about strings:
# string emptyString = "";
# if (emptyString) writef("Empty, yet I exist");
// The statement will *not* print.

Is that last test not a reasonable thing to do? It seems pretty harmless. You want to test for an empty string, an empty array. But you still get true.

But what about this:
# string emptyString = null;
# if (emptyString) writef("Empty, but now I don't exist");
// The statement will print.

Would you say the behaviour I showed above is consistent?
You don't find it a tad, say, ambiguous?
You don't think people will be confused? I certainly was.

>makes the foreach..else syntax suggestion from AJG very unnecessary.

Huh? I don't see how the two things are related. You may have a valid point, but I fail to see the connection.

Cheers,
--AJG.


July 21, 2005
On Thu, 21 Jul 2005 10:02:49 +1000, Derek Parnell <derek@psych.ward> wrote:
>>> I thought that
>>>
>>>    char[] array;
>>>
>>> defined an eight-byte structure in RAM in which the first 4-bytes is the
>>> current length of the array (if it is allocated) and the second 4-bytes
>>> is the address of the array data. Initially all eight bytes are zero.
>>
>> I'd say: it defines a variable 'array' which is a reference to a
>> struct/class like you've described.
>
> Actually that turns out not to be the case. If it was, then 'array' would
> be represented by a 4-byte value which contained the address of the 8-byte
> struct, {uint len, void* ptr}. However, if you look at the generated
> machine code you can see that the 8-byte struct _is_ the 'array'. In other
> words, 'array' is not a reference to a struct/class.
>
> Here is what I found. I compiled this D code ...
>
>   void main()
>   {
>     char[] array;
>     if (array.ptr == null)
>     {
>         array.length = 2;
>     }
>     if (array.length == 3)
>     {
>         array.length = 4;
>     }
>     if (array)
>     {
>         array.length = 5;
>     }
>   }
>
> And this is the generated machine code ...
>
>         assume  CS:__Dmain
>   L0:           enter   8,0
>                 push    EBX
>                 mov     dword ptr -8[EBP],0
>                 mov     dword ptr -4[EBP],0
>                 cmp     dword ptr -4[EBP],0
>                 jne     L29
>                 lea     EAX,-8[EBP]
>                 push    EAX
>                 push    1
>                 push    2
>                 call    near ptr __d_arraysetlength
>                 add     ESP,0Ch
>   L29:          cmp     dword ptr -8[EBP],3
>                 jne     L3F
>                 lea     ECX,-8[EBP]
>                 push    ECX
>                 push    1
>                 push    4
>                 call    near ptr __d_arraysetlength
>                 add     ESP,0Ch
>   L3F:          mov     EDX,-4[EBP]
>                 or      EDX,-8[EBP]
>                 je      L57
>                 lea     EBX,-8[EBP]
>                 push    EBX
>                 push    1
>                 push    5
>                 call    near ptr __d_arraysetlength
>                 add     ESP,0Ch
>   L57:          pop     EBX
>                 leave
>                 ret
>   __Dmain ends
>
> As you can see, the 8-byte struct is reserved in the local stack and
> references to array.ptr and array.length are direct accesses of the stack
> space and not dereferenced via a pointer.

I'll have to take your word for it, my assembler knowledge is non existant.

I'd call this an "optimisation", and a good one at that.

This does not refute the fact that the 'array' variable _behaves_ as a reference type, i.e is passed by reference, can have null assigned to it, can be used in "if (array is null)", can be assigned to another reference, and so on. Further, it's described in the docs as an "array reference". So despite the _implementation_ of it, it _behaves_ as a reference type(*).

(*)The only exception, the only thing in which it behaves like a struct is the fact that it cannot be null.

> Furthermore, 'if (array)' is
> equivalent to ...
>
>   if (array.len == 0 || array.ptr == null)

Don't you mean:

if (array.len != 0 || array.ptr != null)

?
Does the assembler above show this?

This:
  if (array.len != 0 || array.ptr != null)

is in fact identical in effect/meaning to:
  if (array.ptr != null)

because length cannot be anything other than 0 when the data pointer is null, in other words this is impossible:
  if (array.ptr == null && length != 0) { //impossible }

note that:
  if (array.ptr != null && length == 0) { //not impossible }

>> Of course. Which is why changing "if(array)" to test the length breaks
>> logical consistency and is just plain wrong IMO.
>
> I'm not asking for it's behavior to be changed, just documented.

Sure. I can appreciate the desire to have things set down explicitly for reference.

Regan

July 21, 2005
On Wed, 20 Jul 2005 23:42:49 +0200, Ilya Minkov wrote:

> My vote is against.
> 
> Derek Parnell schrieb:
>> Does ...
>> 
>>   if (array) ...
>> 
>> test for an empty array or a non-existent array? I can't tell from the syntax. It is thus ambiguous.
>> 
>>   if (array.ptr == null) -- test for a non-existence.
>> 
>>   if (array.length == 0) -- test for emptiness
>> 
>>   if (array) -- test for which?
> 
> Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.

Maybe in your world, but not in mine.

I have a glass of water. The glass exists and it is not empty. I drink the water. The glass exists and it is empty. I smash the glass. The glass does not exist and it is neither full nor empty because it doesn't exist.

To repeat: Existence and Emptiness are not the same concept.

And as I've just discovered, 'if (array)' test both the .ptr and the .length properties of the array variable.

-- 
Derek
Melbourne, Australia
21/07/2005 10:10:34 AM
July 21, 2005
Sorry, I got the two last examples backwards. The comments should read
"the statement will print"
and then
"the statement will *not* print"

Not the other way around. The point remains the same, though.
Thanks,
--AJG.


July 21, 2005
On Thu, 21 Jul 2005 00:04:56 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
>> Making difference between an empty array and a nonexistent one is flaky,
>> if not directly ambiguous, thus D does not do it, as far as i can
>> remember the statement of Walter. Thus if(array) is not ambiguous.
>
> Hm... not only does this distinction exist, it is in fact _very_ much available
> in D. That's exactly the point Regan has made in some past replies. I'm
> indifferent towards this distinction, but Regan seems fond of it. Please look at
> my examples further below.

It's true.

>> And at all, arrays have somewhat pointer-like semantics in D.
>
> No, the do not, IMHO. This is one of the points I've tried to make. Arrays have
> completely different semantics in D compared to C. In D arrays are first-class
> objects. They are handled via references, which can't be nulled, they keep their
> own length, etc. I think this is a good thing. Very different from C.

The point I'm trying to make is that in D an array can be nulled, and it has meaning, eg.

char[] p = null;

you're confusing the _implementation_ of arrays with the _behaviour_ of arrays, the above array _referece_ behaves just like any other reference that has been nulled(*) eg.

if (p is null) { //true }

(*) the exception being that the _implementation_ protects you by ensuring the reference always refers to a valid object. The objects data pointer then mirrors the actual state of the array. In addition several optimisations go on in the background, removing the actual reference (as Derek has shown in another post) which makes sense as it's not ever null, thus not required for the _implementation_.

>> One of the reasons is that it seems
>> familiar to C programmers.
>
> Indeed. It seems familiar, and people will misuse it because of that.

How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is perfectly correct in all cases(*).

(*) except for the _BUG_ where you can write:

char[] p = "";
p.length = 0;
if (p) { //false, length = 0 resets the data pointer to null }

> But then
> the boogieman comes and gets them in the form of a weird bug.
>
> Examples of the incongruence (empty _but_ existant array):
>
> # int[0] emptyArray;
> # if (emptyArray) writef("See, I'm empty, yet I exist!");
> // The statement will print.

This is a static array. It's data pointer can never be null, thus it always exists.
(Nothing incongruous here)

> // Let's try it again:
> # int[] emptyArray = new int[0];
> # if (emptyArray) writef("I'm still empty, but non-existant.");
> // The statement will *not* print.

Here you have not allocated any memory, thus nothing exists.
(Nothing incongruous here)

> // Think about strings:
> # string emptyString = "";
> # if (emptyString) writef("Empty, yet I exist");
> // The statement will *not* print.

Wrong, this statement will print (try it).

The reason it prints is that memory _is_ allocated because string constants are C compatible i.e. contain a null terminator. If this was not the case then this would act as the previous example.
(Nothing incongruous here)

> Is that last test not a reasonable thing to do? It seems pretty harmless. You want to test for an empty string, an empty array. But you still get true.

You're asking the wrong questions. The statement "if(x)" asks is x null or 0, it does not ask "is this string longer than 0 characters" or "does this array contain more than 0 elements". The correct question is:

if (x.length > 0) {}

Just like most any other container class you care to name/try.

> But what about this:
> # string emptyString = null;
> # if (emptyString) writef("Empty, but now I don't exist");
> // The statement will print.

Wrong, it will not print. The array is null, nothing exists.
(Nothing incongruous here)

> Would you say the behaviour I showed above is consistent?

Yes.

> You don't find it a tad, say, ambiguous?

No.

> You don't think people will be confused? I certainly was.

That's because you're asking the wrong questions, and you didn't check your answers.

>> makes the foreach..else syntax suggestion from AJG very unnecessary.
>
> Huh? I don't see how the two things are related. You may have a valid point, but I fail to see the connection.

I'm not sure either. I suspect he's referring to foreach being usable on a null array equally well, i.e. you dont have to check whether it's a null array, it will iterate 0 times for both a null array and an emtpy array.

Regan
July 21, 2005
On Thu, 21 Jul 2005 10:17:02 +1000, Derek Parnell <derek@psych.ward> wrote:
> On Wed, 20 Jul 2005 23:42:49 +0200, Ilya Minkov wrote:
>
>> My vote is against.
>>
>> Derek Parnell schrieb:
>>> Does ...
>>>
>>>   if (array) ...
>>>
>>> test for an empty array or a non-existent array? I can't tell from the
>>> syntax. It is thus ambiguous.
>>>
>>>   if (array.ptr == null) -- test for a non-existence.
>>>
>>>   if (array.length == 0) -- test for emptiness
>>>
>>>   if (array) -- test for which?
>>
>> Making difference between an empty array and a nonexistent one is flaky,
>> if not directly ambiguous, thus D does not do it, as far as i can
>> remember the statement of Walter. Thus if(array) is not ambiguous.
>
> Maybe in your world, but not in mine.
>
> I have a glass of water. The glass exists and it is not empty. I drink the
> water. The glass exists and it is empty. I smash the glass. The glass does
> not exist and it is neither full nor empty because it doesn't exist.
>
> To repeat: Existence and Emptiness are not the same concept.

You know I agree. ;)

> And as I've just discovered, 'if (array)' test both the .ptr and the
> .length properties of the array variable.

Which is pointless because when the array pointer is null the length cannot be anything but 0.

Regan

July 21, 2005
On Thu, 21 Jul 2005 00:17:51 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
> Sorry, I got the two last examples backwards. The comments should read
> "the statement will print"
> and then
> "the statement will *not* print"
>
> Not the other way around. The point remains the same, though.

Sorry, I replied before seeing this post. My reply remains the same minus correcting your mistakes.

Regan
July 21, 2005
On Thu, 21 Jul 2005 12:16:08 +1200, Regan Heath wrote:


[snip]

> This does not refute the fact that the 'array' variable _behaves_ as a reference type, ...

> i.e is passed by reference,

Well ... not always. If the function parameter is an 'in' type, then the 8-byte struct is passed to the function and not a reference to it. If the parameter is either 'out' or 'inout' then the address of the 8-byte struct is passed to the function.

> can have null assigned to it,

This just sets the 8-bytes to zero.

> can be used in "if (array is null)",

This is identical to 'if (array)' according to the generated machine code.

>can be assigned to another reference,

This just copies the source struct 8 bytes to the target struct's 8 bytes.

> and so on. Further, it's described in the docs as an "array reference". So despite the _implementation_ of it, it _behaves_ as a reference type(*).
> 
> (*)The only exception, the only thing in which it behaves like a struct is the fact that it cannot be null.

Often there seems to be a confusion between the 'array' reference and the
reference to the data that 'array' owns.

-- 
Derek
Melbourne, Australia
21/07/2005 10:23:03 AM
July 21, 2005
On Thu, 21 Jul 2005 12:16:08 +1200, Regan Heath wrote:

> On Thu, 21 Jul 2005 10:02:49 +1000, Derek Parnell <derek@psych.ward> wrote:

[snip]
> 
>> Furthermore, 'if (array)' is
>> equivalent to ...
>>
>>   if (array.len == 0 || array.ptr == null)
> 
> Don't you mean:
> 
> if (array.len != 0 || array.ptr != null)
> 
> ?
Oops. Yes I got that wrong. Your code is right.

> Does the assembler above show this?

Yes.
     mov     EDX,-4[EBP] ; Put the ptr into DX register
     or      EDX,-8[EBP] ; OR the DX register with the length
     je      L57         ; jump if the result is zero


-- 
Derek
Melbourne, Australia
21/07/2005 10:50:30 AM
July 21, 2005
On Thu, 21 Jul 2005 10:45:40 +1000, Derek Parnell <derek@psych.ward> wrote:
>> This does not refute the fact that the 'array' variable _behaves_ as a
>> reference type, ...
>
>> i.e is passed by reference,
>
> Well ... not always. If the function parameter is an 'in' type, then the
> 8-byte struct is passed to the function and not a reference to it. If the
> parameter is either 'out' or 'inout' then the address of the 8-byte struct is passed to the function.

Cool. Optimisations.

>> can have null assigned to it,
>
> This just sets the 8-bytes to zero.

Like opAssign for a normal struct could do.

>> can be used in "if (array is null)",
>
> This is identical to 'if (array)' according to the generated machine code.

Cool.

>> can be assigned to another reference,
>
> This just copies the source struct 8 bytes to the target struct's 8 bytes.

And/or creates a new one (i.e. if slicing)

>> and so on. Further, it's described in the docs as an "array reference". So
>> despite the _implementation_ of it, it _behaves_ as a reference type(*).
>>
>> (*)The only exception, the only thing in which it behaves like a struct is the fact that it cannot be null.
>
> Often there seems to be a confusion between the 'array' reference and the
> reference to the data that 'array' owns.

Right. Thanks, this thread has been enlightening. I believe this statement accurately describes arrays.

"Array references _behave_ like references but are _implemented_ as stack based structs."

In other words treat it like a reference as that is what it's pretending to be. At the same time you get the performance of a stack based struct. This is yet more evidence as to why arrays are great.

In short, I still believe "if(array)" is doing it's job correctly (in effect, if not exactly - see changes below) . I don't believe people will commonly expect this statement to check the length of an array, nor do I think it should be illegal.

I believe Walter has tried to remove the distinction between a non-existant array and an empty one (going on the results you're shown here) but has failed in some areas, thankfully, because I still believe it is a useful distinction.

In fact I'd say he's got the implementation of arrays pretty much perfect, I would make the following changes:

 - change "if(array)" and "if(array is null)" to check the data pointer only (it's pointless checking length).
 - fix array.length = 0; so as it doesn't set the data pointer to null.

Regan