July 20, 2005
"Derek Parnell" <derek@psych.ward> wrote in message news:1k0mwc3gtmj73.inn5n1oiajb5$.dlg@40tude.net...
> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
>
>> Mr Heath, I agree with You on this.
>
> I don't.
>
> Does ...
>
>  if (array) ...
>
> test for an empty array or a non-existent array? I can't tell from the syntax. It is thus ambiguous.
>
>  if (array.ptr == null) -- test for a non-existence.
>
>  if (array.length == 0) -- test for emptiness
>
>  if (array) -- test for which?

I can sympathize with the argument that it should be illegal to implicitly
test 'array' but presumably we'd want to keep implicit conversion to the ptr
in calls like
  void foo(char* p);
  foo(array);
That would mean 'array' is implicitly converted to ptr in some places but
not everywhere and that seems like a slippery slope. It might be easier to
just live with the current behavior. For example dlint can flag implicit
array conditions.
Then again we already have 'if (x = y)' illegal so there is precendent for
filtering conditions - the good-old 'value does not give boolean result'
error.


July 20, 2005
On Wed, 20 Jul 2005 22:42:22 +1200, Regan Heath wrote:

> On Wed, 20 Jul 2005 19:49:19 +1000, Derek Parnell <derek@psych.ward> wrote:
>> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
>>
>>> Mr Heath, I agree with You on this.
>>
>> I don't.
>>
>> Does ...
>>
>>   if (array) ...
>>
>> test for an empty array or a non-existent array?
> 
> It does what it always does, for every type in D, it tests whether 'array'
> is null or 0.
> A null array is a non-existant array, thus it tests for a non-existant
> array.

I think I'm not understanding this.

I thought that

   char[] array;

defined an eight-byte structure in RAM in which the first 4-bytes is the current length of the array (if it is allocated) and the second 4-bytes is the address of the array data. Initially all eight bytes are zero.

Thus when I see "if (array)" I think it is converted into machine language instructions that tests the second 4-bytes against zero. In other words ...

  if (array)

is essentially the same as

  if (array.ptr == 0)

and

  if (*(cast(int*)((&array)+4)) == 0)

I'm only guessing at this, because I haven't see it written down this *explicitly* ;-)

>> I can't tell from the
>> syntax. It is thus ambiguous.
> 
> Granted, it's not 'explicit'. However, the behaviour is well defined.

Where is that behavior defined? I can't see it in the documentation.

> The only 'catch' in this case is that an array cannot be null.

Of course not. It's an 8-byte structure. All 8 bytes can be zero though.

> However, when an array would be null it's data pointer is null, therefore testing the data pointer _is_ testing the array.

Huh? You just said that 'array cannot be null' so how does that reconcile with 'when an array would be null'?

But back to what I was saying ...

  if (array)

is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it is testing the first 4-byte field or the second 4-byte field in 'array'. It's behaviour may be precisely defined, but I haven't seen that yet.

Oh, and there is a difference in semantics to testing array.ptr and array.length.


-- 
Derek Parnell
Melbourne, Australia
20/07/2005 10:35:31 PM
July 20, 2005
Hi,

>It does what it always does, for every type in D, it tests whether 'array'
>is null or 0.
>A null array is a non-existant array, thus it tests for a non-existant
>array.

That's not exactly true. As you mentioned yourself, .length = 0 makes the pointer null, yet isn't the array "existant?" This kind of implementation defect should not be exposed in the language.

>> I can't tell from the
>> syntax. It is thus ambiguous.
>
>Granted, it's not 'explicit'. However, the behaviour is well defined.
>
>The only 'catch' in this case is that an array cannot be null. However, when an array would be null it's data pointer is null,

Isn't this a contradiction?

>therefore testing the data pointer _is_ testing the array.

That's where I beg to differ. That's the source of ambiguity. To _you_ it may seem like "testing the data pointer _is_ testing the array," but that's most certainly not the only interpretation, and in fact I think it's a misleading one.

Testing the array ptr is _just_ that, testing a pointer, some random block of memory that just happens to be used by your array. It is unsemantic and unclear. I am certain it will be misused by both the C camp and new programmers. This behaviour is not even documented anywhere.

The problem once again is that in D, "testing the array" doesn't mean anything outright because the array is always there. Technically if (array) should _always_ return true. Therefore, I think it would be much more consistent to use the .length property rather than .ptr for this implicit test, or ban the implicit test.

Why is .length better?
1) It is much more semantic. It means in D what it would have meant in C.
2) It is a simple test for numerical emptiness. Nothing more, nothing less. No
memory involved. No philosophical questions about null/empty needed.
3) It is not prone to weird memory incongruences (e.g. an empty existant array)
or changes in the technical details of the implementation.
4) It is consistent: It works exactly the same with normal arrays, dynamic
arrays, static arrays, associative arrays, and even raw pointers (which map
directly to C's behaviour).

I think there is another non-ambiguous option now (C):
A) Make if (array) equal to if (array.length)
B) Make if (array) illegal.
C) Make if (array) always return true, since the array is always there.

I prefer A first, then B, then C as a last resort.
Thanks for listening.
--AJG.


July 20, 2005
Hi,

>I can sympathize with the argument that it should be illegal to implicitly test 'array' but presumably we'd want to keep implicit conversion to the ptr in calls like
>  void foo(char* p);
>  foo(array);
>That would mean 'array' is implicitly converted to ptr in some places but not everywhere and that seems like a slippery slope.

I agree that this is something to think about. Of course, there is a fundamental
difference here. foo (char *) expects a pointer. if (array) expects a bool
(well, int, technically; another D annoyance). This is a clear distinction to
me, one that prevents the slippery slope.

>It might be easier to just live with the current behavior.

That's just laziness speaking ;).

>Then again we already have 'if (x = y)' illegal so there is precendent for filtering conditions - the good-old 'value does not give boolean result' error.

Yes! That's exactly what I was thinking. D even has its cake and eats it, because (x = y) is still legal with an additional explict == true/false; this is great. It allows you to do it yet prevents the common missing = mistake.

This is analogous to if (array). The pointer check can still be done via array.ptr, but D would error out when using the ambiguous form. So there is definitely precedent, and it's a good precendent.

Cheers,
--AJG.


July 20, 2005
My vote is against.

Derek Parnell schrieb:
> Does ...
> 
>   if (array) ...
> 
> test for an empty array or a non-existent array? I can't tell from the
> syntax. It is thus ambiguous.
> 
>   if (array.ptr == null) -- test for a non-existence.
> 
>   if (array.length == 0) -- test for emptiness
> 
>   if (array) -- test for which?

Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.

And at all, arrays have somewhat pointer-like semantics in D, so it should stay, among other reasons. One of the reasons is that it seems familiar to C programmers and makes the foreach..else syntax suggestion from AJG very unnecessary.

-eye
July 20, 2005
On Wed, 20 Jul 2005 14:29:13 +0000 (UTC), AJG <AJG_member@pathlink.com> wrote:
>> It does what it always does, for every type in D, it tests whether 'array'
>> is null or 0.
>> A null array is a non-existant array, thus it tests for a non-existant
>> array.
>
> That's not exactly true. As you mentioned yourself, .length = 0 makes the
> pointer null, yet isn't the array "existant?"

Not anymore, that is why this is a BUG.

> This kind of implementation defect
> should not be exposed in the language.

It is a BUG.

>>> I can't tell from the
>>> syntax. It is thus ambiguous.
>>
>> Granted, it's not 'explicit'. However, the behaviour is well defined.
>>
>> The only 'catch' in this case is that an array cannot be null. However,
>> when an array would be null it's data pointer is null,
>
> Isn't this a contradiction?

No. We have 2 facts:

1. array _references_ are never null.
2. null arrays have null data pointers.

To be clear a "null array" is an array to which you have assigned null, or to which nothing has ever been assigned. It represents "non-existant".

>> therefore testing the data pointer _is_ testing the array.
>
> That's where I beg to differ. That's the source of ambiguity. To _you_ it may seem like "testing the data pointer _is_ testing the array," but that's most certainly not the only interpretation, and in fact I think it's a misleading one.
> Testing the array ptr is _just_ that, testing a pointer, some random block of memory that just happens to be used by your array. It is unsemantic and unclear.

It's identical to the C code you posted which you said was semantic and clear.

The data pointer is the part of the array struct that mirrors the value the array reference would have were it not for the additional safety features they have i.e. can never be null.

Therefore in my opinion the data pointer _is_ the array.

> The problem once again is that in D, "testing the array" doesn't mean anything outright because the array is always there.

You're confusing implementation with concept. Walter has chosen for the implementation to ensure the array _reference_ is never null, yet, it's still possible to assign 'null' to one, in order to represent a 'null array', when you do so it sets the data pointer to null.

If you ignore what you know about how an array works internally and just look at it from the point of view that it is another reference like any other then it's current behaviour is perfectly consistent with all other types. You can treat an array like any other class with a "length" member/property.

The added bonus with arrays is that:
 - they can be created on the fly implicitly.
 - you can never have a null reference to one.

Would you expect "if (x)" to call a member function of a class x?

> Technically if (array) should _always_ return true.

No, technically they should not, for if they did:

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.

Then statement B would be incorrect, as "if (p)" would return TRUE and this would be inconsistent with other types in D.

> Therefore, I think it would be much more consistent

Less consistent, because then you would break this logic:

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.
C. Given "char[] p = "";" then "if (p)" should be TRUE.

All 3 statements are correct and true for all pointer/reference types, and are also all correct and true for value types, except structs, if you replace the null and "" with appropriate values eg. 0 and 1

In short, if you set an array to null "if (array)" will be FALSE.
if you set an array to anything else "if (array)" will be TRUE.
if you change "if (array)" to test length you break that logic.

You'll also note that the statement "if (array is null)" is true for arrays to which you have assigned null, in short: although the array reference is not itself null it pretends to be in situations where it would be, were it not for the implementation ensuring it cannot be (for crash safety reasons).

> Why is .length better?
> 1) It is much more semantic. It means in D what it would have meant in C.
> 2) It is a simple test for numerical emptiness. Nothing more, nothing less. No memory involved. No philosophical questions about null/empty needed.
> 3) It is not prone to weird memory incongruences (e.g. an empty existant array) or changes in the technical details of the implementation.
> 4) It is consistent: It works exactly the same with normal arrays, dynamic arrays, static arrays, associative arrays, and even raw pointers (which map directly to C's behaviour).

Why is .length wrong?

1. It makes the behaviour of "if (x)" inconsistent with other types.
2. It makes arrays inconsistent, "if (x)" no longer returns FALSE for an array to which you have assigned null.

In short it breaks the logical consistency of types.

> I think there is another non-ambiguous option now (C):
> A) Make if (array) equal to if (array.length)
> B) Make if (array) illegal.
> C) Make if (array) always return true, since the array is always there.
>
> I prefer A first, then B, then C as a last resort.

I prefer the current situation. The options above all break consistency.

Regan
July 20, 2005
Derek Parnell wrote:
> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
> 
>> Mr Heath, I agree with You on this.
> 
> I don't.
> 
> Does ...
> 
>   if (array) ...
> 
> test for an empty array or a non-existent array? I can't tell from the
> syntax. It is thus ambiguous.
> 
>   if (array.ptr == null) -- test for a non-existence.
> 
>   if (array.length == 0) -- test for emptiness
> 
>   if (array) -- test for which?

If array might be null, can you be certain that it's proper to dereference it, e.g. array.length would seem to presume that array wasn't null.  (Actually, so would array.ptr...but perhaps that's just me.)

July 20, 2005
On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek@psych.ward> wrote:
>>> Does ...
>>>
>>>   if (array) ...
>>>
>>> test for an empty array or a non-existent array?
>>
>> It does what it always does, for every type in D, it tests whether 'array' is null or 0.
>> A null array is a non-existant array, thus it tests for a non-existant
>> array.
>
> I think I'm not understanding this.
>
> I thought that
>
>    char[] array;
>
> defined an eight-byte structure in RAM in which the first 4-bytes is the
> current length of the array (if it is allocated) and the second 4-bytes is the address of the array data. Initially all eight bytes are zero.

I'd say: it defines a variable 'array' which is a reference to a struct/class like you've described.

> Thus when I see "if (array)" I think it is converted into machine language instructions that tests the second 4-bytes against zero.

Because you're thinking of 'array' as a struct. It's not, it's a reference. Thus, "if (array)" compares that reference to null.

I'd guess the reason you think of it as a struct is because like a struct it cannot be null. That is the only similarity it has to a struct, all the rest of it's behaviour is that of a reference.

Because it's a reference you can set it to null, because it's a reference you can say "if(array)", because it's a reference you can say "if(array is null)", because it's a reference it behaves like any other reference, except for the fact that it cannot be null.

It is logically consistent with all other types in D (barring structs), eg.

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.
C. Given "char[] p = "";" then "if (p)" should be TRUE.

These are all correct and true for all types in D barring structs. (replace null and "" with 0 and 1 for value types).

>>> I can't tell from the
>>> syntax. It is thus ambiguous.
>>
>> Granted, it's not 'explicit'. However, the behaviour is well defined.
>
> Where is that behavior defined? I can't see it in the documentation.

I was referring to the behaviour of "if (x)". Most people know, or quickly learn this behaviour.

>> The only 'catch' in this case is that an array cannot be null.
>
> Of course not. It's an 8-byte structure.

No, it's not. Or rather, we have to decide what exactly we're talking about here.

Above, you defined a variable 'array'. It is a reference. It refers to an object. The object contains some data and has a length property.

The array reference, like any other can be set to 'null'. However the implementation is such that it is defined never to be null. Yet, statements in the form "if (array is null)" and "if (array)" still behave like the reference is null. (thus they are consistent, see A,B,C above)

They behave in that way because they check the data pointer, the data pointer is the part of the object that mirrors the state the reference would have, were it not prohibited from being null. In essence the data pointer _is_ the array, the rest is implementation around it.

> All 8 bytes can be zero though.

Just like a normal struct. However an array reference is not itself a struct, it's a reference to a object (struct/class) with a length property.

>> However,
>> when an array would be null it's data pointer is null, therefore testing
>> the data pointer _is_ testing the array.
>
> Huh? You just said that 'array cannot be null' so how does that reconcile
> with 'when an array would be null'?

The data pointer mirrors the state the reference would have, were it not for the implementation ensuring the reference is never null. Essentially the data pointer _is_ the array, the rest is implementation.

> But back to what I was saying ...
>
>   if (array)
>
> is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it is testing the first 4-byte field or the second 4-byte field in 'array'.

So? This is no different to any other variable type, try an int for example.

> It's behaviour may be precisely defined, but I haven't seen that yet.

It's behaviour is to test the variable 'array' against null or 0.

> Oh, and there is a difference in semantics to testing array.ptr and
> array.length.

Of course. Which is why changing "if(array)" to test the length breaks logical consistency and is just plain wrong IMO.

Regan
July 20, 2005
On Wed, 20 Jul 2005 15:38:04 -0700, Charles Hixson <charleshixsn@earthlink.net> wrote:
> Derek Parnell wrote:
>> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
>>
>>> Mr Heath, I agree with You on this.
>>  I don't.
>>  Does ...
>>    if (array) ...
>>  test for an empty array or a non-existent array? I can't tell from the
>> syntax. It is thus ambiguous.
>>    if (array.ptr == null) -- test for a non-existence.
>>    if (array.length == 0) -- test for emptiness
>>    if (array) -- test for which?
>
> If array might be null, can you be certain that it's proper to dereference it, e.g. array.length would seem to presume that array wasn't null.  (Actually, so would array.ptr...but perhaps that's just me.)

D guarantees an array reference is never null.

Is an array reference _the_ array? No, just like an object reference is not _the_ object (thus why you can have x references to the same object)

A null array, "char[] p = null;" has a null data pointer. So, to check for a null array you check the data pointer.

An empty array, "char[] p = "";" has a non-null data pointer but a 0 length. So, to check for an empty array you check the length.

See my other posts for reasoning as to why "if(array)" checks the null array case.

Regan



July 21, 2005
On Thu, 21 Jul 2005 10:39:54 +1200, Regan Heath wrote:

> On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek@psych.ward> wrote:
>>>> Does ...
>>>>
>>>>   if (array) ...
>>>>
>>>> test for an empty array or a non-existent array?
>>>
>>> It does what it always does, for every type in D, it tests whether
>>> 'array' is null or 0.
>>> A null array is a non-existant array, thus it tests for a non-existant
>>> array.
>>
>> I think I'm not understanding this.
>>
>> I thought that
>>
>>    char[] array;
>>
>> defined an eight-byte structure in RAM in which the first 4-bytes is the current length of the array (if it is allocated) and the second 4-bytes is the address of the array data. Initially all eight bytes are zero.
> 
> I'd say: it defines a variable 'array' which is a reference to a struct/class like you've described.

Actually that turns out not to be the case. If it was, then 'array' would be represented by a 4-byte value which contained the address of the 8-byte struct, {uint len, void* ptr}. However, if you look at the generated machine code you can see that the 8-byte struct _is_ the 'array'. In other words, 'array' is not a reference to a struct/class.

Here is what I found. I compiled this D code ...

  void main()
  {
    char[] array;
    if (array.ptr == null)
    {
        array.length = 2;
    }
    if (array.length == 3)
    {
        array.length = 4;
    }
    if (array)
    {
        array.length = 5;
    }
  }

And this is the generated machine code ...

        assume  CS:__Dmain
  L0:           enter   8,0
                push    EBX
                mov     dword ptr -8[EBP],0
                mov     dword ptr -4[EBP],0
                cmp     dword ptr -4[EBP],0
                jne     L29
                lea     EAX,-8[EBP]
                push    EAX
                push    1
                push    2
                call    near ptr __d_arraysetlength
                add     ESP,0Ch
  L29:          cmp     dword ptr -8[EBP],3
                jne     L3F
                lea     ECX,-8[EBP]
                push    ECX
                push    1
                push    4
                call    near ptr __d_arraysetlength
                add     ESP,0Ch
  L3F:          mov     EDX,-4[EBP]
                or      EDX,-8[EBP]
                je      L57
                lea     EBX,-8[EBP]
                push    EBX
                push    1
                push    5
                call    near ptr __d_arraysetlength
                add     ESP,0Ch
  L57:          pop     EBX
                leave
                ret
  __Dmain ends

As you can see, the 8-byte struct is reserved in the local stack and references to array.ptr and array.length are direct accesses of the stack space and not dereferenced via a pointer. Furthermore, 'if (array)' is equivalent to ...

  if (array.len == 0 || array.ptr == null)

which I think is slightly slower than testing either .length or .ptr


[snip]

> Of course. Which is why changing "if(array)" to test the length breaks logical consistency and is just plain wrong IMO.

I'm not asking for it's behavior to be changed, just documented.

-- 
Derek
Melbourne, Australia
21/07/2005 9:35:42 AM