View mode: basic / threaded / horizontal-split · Log in · Help
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
"Derek Parnell" <derek@psych.ward> wrote in message 
news:1k0mwc3gtmj73.inn5n1oiajb5$.dlg@40tude.net...
> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
>
>> Mr Heath, I agree with You on this.
>
> I don't.
>
> Does ...
>
>  if (array) ...
>
> test for an empty array or a non-existent array? I can't tell from the
> syntax. It is thus ambiguous.
>
>  if (array.ptr == null) -- test for a non-existence.
>
>  if (array.length == 0) -- test for emptiness
>
>  if (array) -- test for which?

I can sympathize with the argument that it should be illegal to implicitly 
test 'array' but presumably we'd want to keep implicit conversion to the ptr 
in calls like
 void foo(char* p);
 foo(array);
That would mean 'array' is implicitly converted to ptr in some places but 
not everywhere and that seems like a slippery slope. It might be easier to 
just live with the current behavior. For example dlint can flag implicit 
array conditions.
Then again we already have 'if (x = y)' illegal so there is precendent for 
filtering conditions - the good-old 'value does not give boolean result' 
error.
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
On Wed, 20 Jul 2005 22:42:22 +1200, Regan Heath wrote:

> On Wed, 20 Jul 2005 19:49:19 +1000, Derek Parnell <derek@psych.ward> wrote:
>> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
>>
>>> Mr Heath, I agree with You on this.
>>
>> I don't.
>>
>> Does ...
>>
>>   if (array) ...
>>
>> test for an empty array or a non-existent array?
> 
> It does what it always does, for every type in D, it tests whether 'array'  
> is null or 0.
> A null array is a non-existant array, thus it tests for a non-existant  
> array.

I think I'm not understanding this.

I thought that 

  char[] array;

defined an eight-byte structure in RAM in which the first 4-bytes is the
current length of the array (if it is allocated) and the second 4-bytes is
the address of the array data. Initially all eight bytes are zero.

Thus when I see "if (array)" I think it is converted into machine language
instructions that tests the second 4-bytes against zero. In other words ...

 if (array)

is essentially the same as 

 if (array.ptr == 0)

and 

 if (*(cast(int*)((&array)+4)) == 0)

I'm only guessing at this, because I haven't see it written down this
*explicitly* ;-)

>> I can't tell from the
>> syntax. It is thus ambiguous.
> 
> Granted, it's not 'explicit'. However, the behaviour is well defined.

Where is that behavior defined? I can't see it in the documentation.

> The only 'catch' in this case is that an array cannot be null. 

Of course not. It's an 8-byte structure. All 8 bytes can be zero though.

> However,  
> when an array would be null it's data pointer is null, therefore testing  
> the data pointer _is_ testing the array.

Huh? You just said that 'array cannot be null' so how does that reconcile
with 'when an array would be null'? 

But back to what I was saying ...

 if (array) 

is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it is
testing the first 4-byte field or the second 4-byte field in 'array'. It's
behaviour may be precisely defined, but I haven't seen that yet. 

Oh, and there is a difference in semantics to testing array.ptr and
array.length.


-- 
Derek Parnell
Melbourne, Australia
20/07/2005 10:35:31 PM
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
Hi,

>It does what it always does, for every type in D, it tests whether 'array'  
>is null or 0.
>A null array is a non-existant array, thus it tests for a non-existant  
>array.

That's not exactly true. As you mentioned yourself, .length = 0 makes the
pointer null, yet isn't the array "existant?" This kind of implementation defect
should not be exposed in the language.

>> I can't tell from the
>> syntax. It is thus ambiguous.
>
>Granted, it's not 'explicit'. However, the behaviour is well defined.
>
>The only 'catch' in this case is that an array cannot be null. However,  
>when an array would be null it's data pointer is null,

Isn't this a contradiction?

>therefore testing the data pointer _is_ testing the array.

That's where I beg to differ. That's the source of ambiguity. To _you_ it may
seem like "testing the data pointer _is_ testing the array," but that's most
certainly not the only interpretation, and in fact I think it's a misleading
one.

Testing the array ptr is _just_ that, testing a pointer, some random block of
memory that just happens to be used by your array. It is unsemantic and unclear.
I am certain it will be misused by both the C camp and new programmers. This
behaviour is not even documented anywhere.

The problem once again is that in D, "testing the array" doesn't mean anything
outright because the array is always there. Technically if (array) should
_always_ return true. Therefore, I think it would be much more consistent to use
the .length property rather than .ptr for this implicit test, or ban the
implicit test.

Why is .length better?
1) It is much more semantic. It means in D what it would have meant in C.
2) It is a simple test for numerical emptiness. Nothing more, nothing less. No
memory involved. No philosophical questions about null/empty needed.
3) It is not prone to weird memory incongruences (e.g. an empty existant array)
or changes in the technical details of the implementation.
4) It is consistent: It works exactly the same with normal arrays, dynamic
arrays, static arrays, associative arrays, and even raw pointers (which map
directly to C's behaviour).

I think there is another non-ambiguous option now (C):
A) Make if (array) equal to if (array.length)
B) Make if (array) illegal.
C) Make if (array) always return true, since the array is always there.

I prefer A first, then B, then C as a last resort.
Thanks for listening.
--AJG.
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
Hi,

>I can sympathize with the argument that it should be illegal to implicitly 
>test 'array' but presumably we'd want to keep implicit conversion to the ptr 
>in calls like
>  void foo(char* p);
>  foo(array);
>That would mean 'array' is implicitly converted to ptr in some places but 
>not everywhere and that seems like a slippery slope.

I agree that this is something to think about. Of course, there is a fundamental
difference here. foo (char *) expects a pointer. if (array) expects a bool
(well, int, technically; another D annoyance). This is a clear distinction to
me, one that prevents the slippery slope.

>It might be easier to just live with the current behavior.

That's just laziness speaking ;).

>Then again we already have 'if (x = y)' illegal so there is precendent for 
>filtering conditions - the good-old 'value does not give boolean result' 
>error.

Yes! That's exactly what I was thinking. D even has its cake and eats it,
because (x = y) is still legal with an additional explict == true/false; this is
great. It allows you to do it yet prevents the common missing = mistake.

This is analogous to if (array). The pointer check can still be done via
array.ptr, but D would error out when using the ambiguous form. So there is
definitely precedent, and it's a good precendent.

Cheers,
--AJG.
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
My vote is against.

Derek Parnell schrieb:
> Does ...
> 
>   if (array) ...
> 
> test for an empty array or a non-existent array? I can't tell from the
> syntax. It is thus ambiguous.
> 
>   if (array.ptr == null) -- test for a non-existence.
> 
>   if (array.length == 0) -- test for emptiness
> 
>   if (array) -- test for which?

Making difference between an empty array and a nonexistent one is flaky, 
if not directly ambiguous, thus D does not do it, as far as i can 
remember the statement of Walter. Thus if(array) is not ambiguous.

And at all, arrays have somewhat pointer-like semantics in D, so it 
should stay, among other reasons. One of the reasons is that it seems 
familiar to C programmers and makes the foreach..else syntax suggestion 
from AJG very unnecessary.

-eye
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
On Wed, 20 Jul 2005 14:29:13 +0000 (UTC), AJG <AJG_member@pathlink.com>  
wrote:
>> It does what it always does, for every type in D, it tests whether  
>> 'array'
>> is null or 0.
>> A null array is a non-existant array, thus it tests for a non-existant
>> array.
>
> That's not exactly true. As you mentioned yourself, .length = 0 makes the
> pointer null, yet isn't the array "existant?"

Not anymore, that is why this is a BUG.

> This kind of implementation defect
> should not be exposed in the language.

It is a BUG.

>>> I can't tell from the
>>> syntax. It is thus ambiguous.
>>
>> Granted, it's not 'explicit'. However, the behaviour is well defined.
>>
>> The only 'catch' in this case is that an array cannot be null. However,
>> when an array would be null it's data pointer is null,
>
> Isn't this a contradiction?

No. We have 2 facts:

1. array _references_ are never null.
2. null arrays have null data pointers.

To be clear a "null array" is an array to which you have assigned null, or  
to which nothing has ever been assigned. It represents "non-existant".

>> therefore testing the data pointer _is_ testing the array.
>
> That's where I beg to differ. That's the source of ambiguity. To _you_  
> it may seem like "testing the data pointer _is_ testing the array," but  
> that's most certainly not the only interpretation, and in fact I think  
> it's a misleading one.
> Testing the array ptr is _just_ that, testing a pointer, some random  
> block of memory that just happens to be used by your array. It is  
> unsemantic and unclear.

It's identical to the C code you posted which you said was semantic and  
clear.

The data pointer is the part of the array struct that mirrors the value  
the array reference would have were it not for the additional safety  
features they have i.e. can never be null.

Therefore in my opinion the data pointer _is_ the array.

> The problem once again is that in D, "testing the array" doesn't mean  
> anything outright because the array is always there.

You're confusing implementation with concept. Walter has chosen for the  
implementation to ensure the array _reference_ is never null, yet, it's  
still possible to assign 'null' to one, in order to represent a 'null  
array', when you do so it sets the data pointer to null.

If you ignore what you know about how an array works internally and just  
look at it from the point of view that it is another reference like any  
other then it's current behaviour is perfectly consistent with all other  
types. You can treat an array like any other class with a "length"  
member/property.

The added bonus with arrays is that:
 - they can be created on the fly implicitly.
 - you can never have a null reference to one.

Would you expect "if (x)" to call a member function of a class x?

> Technically if (array) should _always_ return true.

No, technically they should not, for if they did:

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.

Then statement B would be incorrect, as "if (p)" would return TRUE and  
this would be inconsistent with other types in D.

> Therefore, I think it would be much more consistent

Less consistent, because then you would break this logic:

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.
C. Given "char[] p = "";" then "if (p)" should be TRUE.

All 3 statements are correct and true for all pointer/reference types, and  
are also all correct and true for value types, except structs, if you  
replace the null and "" with appropriate values eg. 0 and 1

In short, if you set an array to null "if (array)" will be FALSE.
if you set an array to anything else "if (array)" will be TRUE.
if you change "if (array)" to test length you break that logic.

You'll also note that the statement "if (array is null)" is true for  
arrays to which you have assigned null, in short: although the array  
reference is not itself null it pretends to be in situations where it  
would be, were it not for the implementation ensuring it cannot be (for  
crash safety reasons).

> Why is .length better?
> 1) It is much more semantic. It means in D what it would have meant in C.
> 2) It is a simple test for numerical emptiness. Nothing more, nothing  
> less. No memory involved. No philosophical questions about null/empty  
> needed.
> 3) It is not prone to weird memory incongruences (e.g. an empty existant  
> array) or changes in the technical details of the implementation.
> 4) It is consistent: It works exactly the same with normal arrays,  
> dynamic arrays, static arrays, associative arrays, and even raw pointers  
> (which map directly to C's behaviour).

Why is .length wrong?

1. It makes the behaviour of "if (x)" inconsistent with other types.
2. It makes arrays inconsistent, "if (x)" no longer returns FALSE for an  
array to which you have assigned null.

In short it breaks the logical consistency of types.

> I think there is another non-ambiguous option now (C):
> A) Make if (array) equal to if (array.length)
> B) Make if (array) illegal.
> C) Make if (array) always return true, since the array is always there.
>
> I prefer A first, then B, then C as a last resort.

I prefer the current situation. The options above all break consistency.

Regan
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
Derek Parnell wrote:
> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
> 
>> Mr Heath, I agree with You on this.
> 
> I don't.
> 
> Does ...
> 
>   if (array) ...
> 
> test for an empty array or a non-existent array? I can't tell from the
> syntax. It is thus ambiguous.
> 
>   if (array.ptr == null) -- test for a non-existence.
> 
>   if (array.length == 0) -- test for emptiness
> 
>   if (array) -- test for which?

If array might be null, can you be certain that it's proper to 
dereference it, e.g. array.length would seem to presume that 
array wasn't null.  (Actually, so would array.ptr...but perhaps 
that's just me.)
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek@psych.ward> wrote:
>>> Does ...
>>>
>>>   if (array) ...
>>>
>>> test for an empty array or a non-existent array?
>>
>> It does what it always does, for every type in D, it tests whether  
>> 'array' is null or 0.
>> A null array is a non-existant array, thus it tests for a non-existant
>> array.
>
> I think I'm not understanding this.
>
> I thought that
>
>    char[] array;
>
> defined an eight-byte structure in RAM in which the first 4-bytes is the
> current length of the array (if it is allocated) and the second 4-bytes  
> is the address of the array data. Initially all eight bytes are zero.

I'd say: it defines a variable 'array' which is a reference to a  
struct/class like you've described.

> Thus when I see "if (array)" I think it is converted into machine  
> language instructions that tests the second 4-bytes against zero.

Because you're thinking of 'array' as a struct. It's not, it's a  
reference. Thus, "if (array)" compares that reference to null.

I'd guess the reason you think of it as a struct is because like a struct  
it cannot be null. That is the only similarity it has to a struct, all the  
rest of it's behaviour is that of a reference.

Because it's a reference you can set it to null, because it's a reference  
you can say "if(array)", because it's a reference you can say "if(array is  
null)", because it's a reference it behaves like any other reference,  
except for the fact that it cannot be null.

It is logically consistent with all other types in D (barring structs), eg.

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.
C. Given "char[] p = "";" then "if (p)" should be TRUE.

These are all correct and true for all types in D barring structs.  
(replace null and "" with 0 and 1 for value types).

>>> I can't tell from the
>>> syntax. It is thus ambiguous.
>>
>> Granted, it's not 'explicit'. However, the behaviour is well defined.
>
> Where is that behavior defined? I can't see it in the documentation.

I was referring to the behaviour of "if (x)". Most people know, or quickly  
learn this behaviour.

>> The only 'catch' in this case is that an array cannot be null.
>
> Of course not. It's an 8-byte structure.

No, it's not. Or rather, we have to decide what exactly we're talking  
about here.

Above, you defined a variable 'array'. It is a reference. It refers to an  
object. The object contains some data and has a length property.

The array reference, like any other can be set to 'null'. However the  
implementation is such that it is defined never to be null. Yet,  
statements in the form "if (array is null)" and "if (array)" still behave  
like the reference is null. (thus they are consistent, see A,B,C above)

They behave in that way because they check the data pointer, the data  
pointer is the part of the object that mirrors the state the reference  
would have, were it not prohibited from being null. In essence the data  
pointer _is_ the array, the rest is implementation around it.

> All 8 bytes can be zero though.

Just like a normal struct. However an array reference is not itself a  
struct, it's a reference to a object (struct/class) with a length property.

>> However,
>> when an array would be null it's data pointer is null, therefore testing
>> the data pointer _is_ testing the array.
>
> Huh? You just said that 'array cannot be null' so how does that reconcile
> with 'when an array would be null'?

The data pointer mirrors the state the reference would have, were it not  
for the implementation ensuring the reference is never null. Essentially  
the data pointer _is_ the array, the rest is implementation.

> But back to what I was saying ...
>
>   if (array)
>
> is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it  
> is testing the first 4-byte field or the second 4-byte field in 'array'.

So? This is no different to any other variable type, try an int for  
example.

> It's behaviour may be precisely defined, but I haven't seen that yet.

It's behaviour is to test the variable 'array' against null or 0.

> Oh, and there is a difference in semantics to testing array.ptr and
> array.length.

Of course. Which is why changing "if(array)" to test the length breaks  
logical consistency and is just plain wrong IMO.

Regan
July 20, 2005
Re: [Suggestion] Make if(array) illegal.
On Wed, 20 Jul 2005 15:38:04 -0700, Charles Hixson  
<charleshixsn@earthlink.net> wrote:
> Derek Parnell wrote:
>> On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
>>
>>> Mr Heath, I agree with You on this.
>>  I don't.
>>  Does ...
>>    if (array) ...
>>  test for an empty array or a non-existent array? I can't tell from the
>> syntax. It is thus ambiguous.
>>    if (array.ptr == null) -- test for a non-existence.
>>    if (array.length == 0) -- test for emptiness
>>    if (array) -- test for which?
>
> If array might be null, can you be certain that it's proper to  
> dereference it, e.g. array.length would seem to presume that array  
> wasn't null.  (Actually, so would array.ptr...but perhaps that's just  
> me.)

D guarantees an array reference is never null.

Is an array reference _the_ array? No, just like an object reference is  
not _the_ object (thus why you can have x references to the same object)

A null array, "char[] p = null;" has a null data pointer. So, to check for  
a null array you check the data pointer.

An empty array, "char[] p = "";" has a non-null data pointer but a 0  
length. So, to check for an empty array you check the length.

See my other posts for reasoning as to why "if(array)" checks the null  
array case.

Regan
July 21, 2005
Re: [Suggestion] Make if(array) illegal.
On Thu, 21 Jul 2005 10:39:54 +1200, Regan Heath wrote:

> On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek@psych.ward> wrote:
>>>> Does ...
>>>>
>>>>   if (array) ...
>>>>
>>>> test for an empty array or a non-existent array?
>>>
>>> It does what it always does, for every type in D, it tests whether  
>>> 'array' is null or 0.
>>> A null array is a non-existant array, thus it tests for a non-existant
>>> array.
>>
>> I think I'm not understanding this.
>>
>> I thought that
>>
>>    char[] array;
>>
>> defined an eight-byte structure in RAM in which the first 4-bytes is the
>> current length of the array (if it is allocated) and the second 4-bytes  
>> is the address of the array data. Initially all eight bytes are zero.
> 
> I'd say: it defines a variable 'array' which is a reference to a  
> struct/class like you've described.

Actually that turns out not to be the case. If it was, then 'array' would
be represented by a 4-byte value which contained the address of the 8-byte
struct, {uint len, void* ptr}. However, if you look at the generated
machine code you can see that the 8-byte struct _is_ the 'array'. In other
words, 'array' is not a reference to a struct/class. 

Here is what I found. I compiled this D code ...

 void main()
 {
   char[] array;
   if (array.ptr == null)
   {
       array.length = 2;
   }
   if (array.length == 3)
   {
       array.length = 4;
   }
   if (array)
   {
       array.length = 5;
   }
 }

And this is the generated machine code ...

       assume  CS:__Dmain
 L0:           enter   8,0
               push    EBX
               mov     dword ptr -8[EBP],0
               mov     dword ptr -4[EBP],0
               cmp     dword ptr -4[EBP],0
               jne     L29
               lea     EAX,-8[EBP]
               push    EAX
               push    1
               push    2
               call    near ptr __d_arraysetlength
               add     ESP,0Ch
 L29:          cmp     dword ptr -8[EBP],3
               jne     L3F
               lea     ECX,-8[EBP]
               push    ECX
               push    1
               push    4
               call    near ptr __d_arraysetlength
               add     ESP,0Ch
 L3F:          mov     EDX,-4[EBP]
               or      EDX,-8[EBP]
               je      L57
               lea     EBX,-8[EBP]
               push    EBX
               push    1
               push    5
               call    near ptr __d_arraysetlength
               add     ESP,0Ch
 L57:          pop     EBX
               leave
               ret
 __Dmain ends

As you can see, the 8-byte struct is reserved in the local stack and
references to array.ptr and array.length are direct accesses of the stack
space and not dereferenced via a pointer. Furthermore, 'if (array)' is
equivalent to ...

 if (array.len == 0 || array.ptr == null)

which I think is slightly slower than testing either .length or .ptr


[snip]

> Of course. Which is why changing "if(array)" to test the length breaks  
> logical consistency and is just plain wrong IMO.

I'm not asking for it's behavior to be changed, just documented.

-- 
Derek
Melbourne, Australia
21/07/2005 9:35:42 AM
1 2 3 4 5 6
Top | Discussion index | About this forum | D home