July 30, 2005
so wait, you basically want an array to be a pointer to data containing a length and a pointer? i have been following this thread somewhat but I can hardly find the benifit here. it seems to me you want to take something very straightforward and close to the metal and turn it into a referenced object, for some bizzare reason regarding reference semantics. why dont you just put your arrays in objects if you are having problems?
July 30, 2005
Hi,

>so wait, you basically want an array to be a pointer to data containing a length and a pointer? i have been following this thread somewhat but I can hardly find the benifit here.

No. I would like it to be that way, but I know there wouldn't be support for this. What I'd like is for all array properties to follow reference semantics.

>it seems to me you want to take something very straightforward and close to the metal and turn it into a referenced object, for some bizzare reason regarding reference semantics.

What is bizarre is the current array semantics, be it due to "close to the metal" requirements, or whatever. If you don't think arrays at the moment follow at least _partial_ reference semantics, then why does:

# char[] A = "123"; // Yes, it's static, bear with me.
# char[] B = A;
# B.reverse;

Reverse _also_ the contents of A? Those are reference semantics. According to Derek, the array reference itself is implemented on the stack in 8-byte chunks. That's fine. I'm not talking about making the array itself a pointer.

Now, my point is that .length breaks reference semantics in special cases, because:

# char[] A = "123";
# char[] B = A;
# B.length = 4;

A.length did not change. If it were consistent with .reverse and .sort, then A's length too would have changed.

Cheers,
--AJG.







why dont you just put your arrays in objects if you are
>having problems?




July 30, 2005
On Sat, 30 Jul 2005 17:07:06 +0000 (UTC), AJG wrote:


[snip]

> 
> SomeObject A = new SomeObject;
> SomeObject B = A;
> B.SomeProperty; // Operates on A.
> 
> SomeStruct A;
> SomeStruct B = A;
> B.SomeProperty; // Operates on B.
> 
> int[] A = new int[5];
> int[] B = A;
> B.SomeProperty; // Operates on A;
> // _Except_ if it's .length.
> 
> This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.

You are wrong here because 'B.someProperty' operates on B not A. A simple proof is this ...

 int[] A = new int[5];
 int[] B = A;
 A.length = 4;
 writefln("%d", B.length);  // displays 5.

In your example, it *appears* to operate on A (the 8-byte array structure)
because B and A have the same values. That is A.ptr == B.ptr and A.length
== B.length.

We just have to admit that arrays in D are not the classical array definition and are really a different type of thing altogether. Then get to learn the rules of D 'arrays'. If you want arrays to behave like objects, then maybe you can write an array class.

-- 
Derek Parnell
Melbourne, Australia
31/07/2005 8:26:46 AM
July 30, 2005
On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:


[snip]
> What is bizarre is the current array semantics, be it due to "close to the metal" requirements, or whatever. If you don't think arrays at the moment follow at least _partial_ reference semantics, then why does:
> 
> # char[] A = "123"; // Yes, it's static, bear with me.
> # char[] B = A;
> # B.reverse;
> 
> Reverse _also_ the contents of A?

There might have been be an argument that .reverse and .sort should follow Walter's Copy-on-Write rules of engagement, but the current behavior is documented and relied upon in current code.

-- 
Derek Parnell
Melbourne, Australia
31/07/2005 8:53:41 AM
July 30, 2005
Hi Derek,

>> int[] A = new int[5];
>> int[] B = A;
>> B.SomeProperty; // Operates on A;
>> // _Except_ if it's .length.
>> 
>> This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.
>
>You are wrong here because 'B.someProperty' operates on B not A. A simple proof is this ...
>
> int[] A = new int[5];
> int[] B = A;
> A.length = 4;
> writefln("%d", B.length);  // displays 5.
> 
>In your example, it *appears* to operate on A (the 8-byte array structure) because B and A have the same values. That is A.ptr == B.ptr and A.length == B.length.

Um... I said "except .length" for a reason. That's my very point. That .length is the exception. All others operate on A.

>We just have to admit that arrays in D are not the classical array definition and are really a different type of thing altogether. Then get to learn the rules of D 'arrays'. If you want arrays to behave like objects, then maybe you can write an array class.

First of all, this would throw efficiency out the window. Second, let me quote you a little of the D manifesto:

[Taken from "The D Programming Language" written by Walter Bright] [Arrays Section]

"Arrays are enhanced from being little more than an alternative syntax for a pointer into first class objects."

That's, ahem, "First Class Objects," for those that missed it.

Cheers,
--AJG.


July 31, 2005
In article <dch28c$1nrj$1@digitaldaemon.com>, AJG says...
>
>Hi Derek,
>
>>> int[] A = new int[5];
>>> int[] B = A;
>>> B.SomeProperty; // Operates on A;
>>> // _Except_ if it's .length.
>>> 
>>> This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.
>>
>>You are wrong here because 'B.someProperty' operates on B not A. A simple proof is this ...
>>
>> int[] A = new int[5];
>> int[] B = A;
>> A.length = 4;
>> writefln("%d", B.length);  // displays 5.
>> 
>>In your example, it *appears* to operate on A (the 8-byte array structure) because B and A have the same values. That is A.ptr == B.ptr and A.length == B.length.
>
>Um... I said "except .length" for a reason. That's my very point. That .length is the exception. All others operate on A.

No, All others do _NOT_ operate on A.  They happen to operate on the same data
that A points to.  A is
a struct which an int and a ptr, obviously changing B's ptr, or B's length do
not affect A.  You're thinking
about D arrays all wrong.   That's what Derek was getting at.  A and B are two
separate objects which
happen to be able to have references to the same data.   For effiencies sake
both the length and the ptr
are assigned by value.  Think of it this way in C, if you have this structure:

struct Array {  int length; void* ptr; } a, b;

a.ptr = new char[100];
b = a;

What does this do?  This is the semantics of D arrays. A and B are distinct
structures, and if you allocate
more memory for b then it's not going to change A.  As you can see this is not
the same as reference
semantics at all, otherwise A's ptr would change as well.  If you want reference
semantics you are free
to use an array handle.  But the way D arrays are handled is not mystical or
inconsistent.  They're
perfectly consistent with themselves, and if you understand how they operate
(which is not hard) then
you won't make mistakes.

As for your other issue, where array nullness and length == 0 being converged, I
do not think this is an
issue.  length == 0 is the definition of a null set (arrays in CS seem to be
more in line with sets, dunno
why they're named as they are). But if you want to be consitent with
terminology, techincally a null array
is a an array with all elements set to null.    Can you show me an example where
it matters if length ==
0 and arr.ptr == null does not denote the same thing?

-Sha


July 31, 2005
AJG escribió:
> 
> I'm not suggesting making .length read-only. I'm suggesting making it operate on
> the same data it has a pointer to. Just like .sort or .reverse would. The way I
> see it, if you explicitly want to make a copy of the data, that's why there is
> dup. Why should .length secretely call .dup sometimes, and sometimes not?
> 
> Cheers,
> --AJG.
> 
> 

First of all, I don't agree with AJG: I think D arrays are very well the way they're now.

There's something, though, and correct me if I'm wrong, but I think array.length doesn't go hand in hand with COW.

char [] a;
a.length = 3;
foo(a);
void foo(char [] b)
{
	b[0] = 'f';    // 1
	b.length = 5;  // 2
}

COW says to do 1, you have to dup first, because you don't own the array, but when you do 2, b is automatically dupped. So, my point is that to be consistent, maybe resizing should also require dupping.

Am I right? Does it make sense?

-- 
Carlos Santander Bernal
July 31, 2005
Hi,

>>Um... I said "except .length" for a reason. That's my very point. That .length is the exception. All others operate on A.
>
>No, All others do _NOT_ operate on A.  They happen to operate on the same data that A points to.

You are simply splitting hairs here. You are arguing language semantics. The fact of the matter is that for all practical purposes, EXCEPT for .length, arrays in D are by reference. This means that for all practical purposes, EXCEPT for .length, B operates on A. It doesn't matter if it's because of the pointer (an implementation, system-dependent, gory detail) or because of any other reason.

If assiging an array _immediately_ copied the data, then what you said is true.
But it doesn't, because (a) that would be inefficient, and (b) that would remove
_all_ reference semantics.

Therefore, as it is, reference semantics are broken when it comes to .length.

<snip>
> RE: Arrays as structs.

This is were _you_ are wrong. Arrays are not structs. Arrays do not share the semantics of structs. Arrays share _implementation details_ with structs, and that's _it_.

Didn't you see the quote from the D language doc? It clearly says "First-Class Objects." Not structs. Not primitives. Not pointers.

If you, however, equate that with structs, that's fine. But I certainly do not.

>They're perfectly consistent with themselves,

This means absolutely nothing. A bug can be perfectly consistent with itself and it is still a bug. To be meaningful, they would have to be consistent with the rest of the language. Or perhaps, consistent with another part of the language, like, say, Objects.

>and if you understand how they operate
>(which is not hard) then
>you won't make mistakes.

It's not about making mistakes. Sure, I can just as well avoid a function in a library that is buggy, and I'll avoid a mistake. That's not the point. If something is broken, then it need to be fixed. If Walter could perhaps clarify the semantics of arrays, then we would get somewhere.

>As for your other issue, where array nullness and length == 0 being converged
>do not think this is an
>issue.  length == 0 is the definition of a null set

So? What I would like to express is _No Set_.

>(arrays in CS seem to be
>more in line with sets, dunno
>why they're named as they are). But if you want to be consitent with
>terminology, techincally a null array
>is a an array with all elements set to null. Can you show me an example
>it matters if length ==
>0 and arr.ptr == null does not denote the same thing?

When you are returning fields from a database, for instance. If you've ever dealt with a DB, you would know fields can be NULL, meaning no value. This is different than "", which means explicitly the empty string. It is very difficult to do this because of certain bugs which meld .length == 0 and .ptr == null.

They are not the same thing. Not semantically. Not technically, at the moment, except for the "bugs." That's why I'm asking Walter whether he _plans_ on merging the two into one. If that's his vision, which would be unfortunate, then those things aren't "bugs" at all, but rather the intended design.


Cheers,
--AJG.


July 31, 2005
In article <dchgkl$23v5$1@digitaldaemon.com>, AJG says...
>
>Hi,
>
>>>Um... I said "except .length" for a reason. That's my very point. That .length is the exception. All others operate on A.
>>
>>No, All others do _NOT_ operate on A.  They happen to operate on the same data that A points to.
>
>You are simply splitting hairs here. You are arguing language semantics. The fact of the matter is that for all practical purposes, EXCEPT for .length, arrays in D are by reference. This means that for all practical purposes, EXCEPT for .length, B operates on A. It doesn't matter if it's because of the pointer (an implementation, system-dependent, gory detail) or because of any other reason.

I am not splitting hairs.  I gave you a very valid reason why a and b are not
references, not even
theoretically. They happen to have a reference member that in some cases, will
point to the same data.
YOU are in full control over when that happens. If that's not what you intended,
then you should be
using references to the ARRAY.  Rather than using multiple arrays with have
references to the same
data.

I might ask you this:  What MAGIC would you like to happen with arrays?  What
you want is not possible
without some kind of magic.  Try this example on for size, from classic C:

int* a = malloc(100 * sizeof(int));
int* b = a;

b = realloc( b, 1000 * sizeof(int) );

Guess what, a is most likely now a bad reference.  Is this what you would like D
to do?  Probably not,
you probably want 'a' to point to the new array of length 1000.  Do you want the
compiler to magically
handle this for you?

Would you like length to be read only?  Forcing us to call b = new int[], and
then manually code up the
data copy to resize the array?  Starting to sound like C.... What a pain arrays
were.  And a still didn't
change automatically to where b is pointing now.

>If assiging an array _immediately_ copied the data, then what you said is true.
>But it doesn't, because (a) that would be inefficient, and (b) that would remove
>_all_ reference semantics.
>
>Therefore, as it is, reference semantics are broken when it comes to .length.

There are no reference semantics when it comes to arrays.  Maybe what you want
is D to automagically
do a Copy-on-Write.  Any time an array that is set to a reference of another
array the flag could get
turned on, and when you use it as an lvalue and that is on, it could dup the
array.   But that's silly since

b = new int[100]; is perfectly legal in D, and would result in a double memory
access if you ever tried to
assign to the array.  Wonder what kind of magic would have to be done to fix
this case.

IMHO, Better to let the programmer specify when he wants a and b to point a the same data.

>
><snip>
>> RE: Arrays as structs.
>
>This is were _you_ are wrong. Arrays are not structs. Arrays do not share the semantics of structs. Arrays share _implementation details_ with structs, and that's _it_.
>
>Didn't you see the quote from the D language doc? It clearly says "First-Class Objects." Not structs. Not primitives. Not pointers.
>
>If you, however, equate that with structs, that's fine. But I certainly do not.

You can't use a language to it's fully potential if you don't know
implementation details.  There will
always be ambiguities of when references are by value, by ref, or whatever else.
As the saying goes:
the language is in the details.

Here's a good example for you, from a VB.NET project i just inherited:

If arr.Length - arr.Replace(",", "").Length <> 17 Then
'error out

What's the big deal?  It's only one line of code, must be just as good as
counting the number of commas
in the array....

>
>>They're perfectly consistent with themselves,
>
>This means absolutely nothing. A bug can be perfectly consistent with itself and it is still a bug. To be meaningful, they would have to be consistent with the rest of the language. Or perhaps, consistent with another part of the language, like, say, Objects.
>
>>and if you understand how they operate
>>(which is not hard) then
>>you won't make mistakes.
>
>It's not about making mistakes. Sure, I can just as well avoid a function in a library that is buggy, and I'll avoid a mistake. That's not the point. If something is broken, then it need to be fixed. If Walter could perhaps clarify the semantics of arrays, then we would get somewhere.
>
>>As for your other issue, where array nullness and length == 0 being converged
>>do not think this is an
>>issue.  length == 0 is the definition of a null set
>
>So? What I would like to express is _No Set_.

Not Set?

>
>>(arrays in CS seem to be
>>more in line with sets, dunno
>>why they're named as they are). But if you want to be consitent with
>>terminology, techincally a null array
>>is a an array with all elements set to null. Can you show me an example
>>it matters if length ==
>>0 and arr.ptr == null does not denote the same thing?
>
>When you are returning fields from a database, for instance. If you've ever dealt with a DB, you would know fields can be NULL, meaning no value. This is different than "", which means explicitly the empty string. It is very difficult to do this because of certain bugs which meld .length == 0 and .ptr == null.

I see your point, but any kind of attempt to do that would be abusing the array.
There are laws against
array abuse in most countries these days. </sarcasm>

Most every single database api in existence deals with that by having special objects.

so you have this:

static char[0] DBNull; in your database module;

then

char[] foo;
foo = dbCommand.executeScalar( );

if( foo is DBNull )
// I'm not sure if the .ptr prop is needed here.  Last I heard if you just use
the array name it defaults to
the ptr
. oh noes, the field was null!
else
. oh good ..



>
>They are not the same thing. Not semantically. Not technically, at the moment, except for the "bugs." That's why I'm asking Walter whether he _plans_ on merging the two into one.

They should never be the same thing.  But there's a gotcha,  if .ptr is null,
then length should always be
0.  Other way around is not necessarily true.  Just because length == 0 the ptr
isn't necesisarily null.
This should be the case when the array was at one point allocated, and then
length was reduced.  It
should be that way for efficiency.

That however is not useful for your example of DBNulls.  It would be silly to
allocate some space and
then just not use it and say that's when somebody entered something, and it was
nothing.

> If that's his vision, which would be unfortunate, then
>those things aren't "bugs" at all, but rather the intended design.

What 'things'?  Are you talking about the .ptr value being the same for two arrays?






July 31, 2005
In article <dcgkt5$1b4i$1@digitaldaemon.com>, AJG says...
>
>Hi Ben,
>
>>>So then .length is related to slicing? How does the semantics of
>>>.length affect
>>>slicing? Or perhaps you meant other benefits?
>>
>>I recommend you pursue some of your ideas where length is manipulated by reference and follow the dependencies to see how different dynamic arrays (and, yes, slicing) would be. In particular I recommend you learn more about slicing. I'm sorry if that sounds harsh but I've gotten the opinion now that you haven't really gotten experience with D arrays as they exist now.
>
>Would an example do? I may not be an expert regarding slicing, but I could see a discrete problem if you point it out.

Let me step through some choices that I was hoping you would do. Let's start by thinking about what an array with reference-based length would look like. It would either be a pointer to today's dynamic array (a ptr and a length) or it would be a pointer to one memory block with the length stored either at the front or end of the array data. How would slicing work for those two implementations? For the first slicing would have to allocate memory to store the new ptr and new length. For the second slicing would have to be a different type since it is impossible to store the length for the slice in the middle of the original source array. So that's why I suggested you think through your initial suggestion and work out the impact on slicing and arrays in general.

But to be honest I would still prefer the current behavior where the length information is always available without having to check for null first - even if you could somehow make the rest of D remain the same as today.