July 29, 2005
Hi,

>> What about .dup, .sort, .reverse, .sizeof?
>> Do those have reference semantics or not?
>
>Yes - they "have reference semantics" in the sense that they act on the data (though in the case of .dup and .sizeof the reference/value semantics is irrelevant).

Just to make sure I understand:

char[] A = "123";
char[] B = A;
B.reverse;
// B will be 321
// A will be 321 also.
// correct?

BUT:

char[] A = "123";
char[] B = A;
B.length = 2;
// B will be 12
// A will be remain 123.
// correct?


If this is true, then it seems rather arbitrary to me that .length should break reference semantics. Why not keep it in line to how the rest work? (Specially since it's not related to the benefits you talked about before).

>The first sentance of http://www.digitalmars.com/d/arrays.html section Dynamic Arrays says "Dynamic arrays consist of a length and a pointer to the array data." I agree, though, that the doc needs to emphasize this more. I added some feedback to the Wiki about arrays asking for examples illustrating how array assignment works.

Ok. This would be an improvement.

>> Why is it a good design choice?

>It would be very annoying to have to check for null before asking if an array length is zero. Plus the whole design of slicing would need to be redone and probably would lose much of the efficiency it has today.

Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.

>I view an array as much closer to a struct than an object: an array is just like a struct with a pointer field and a length field. That's the simplest description of what an array is. Comparing them to objects is the wrong analogy.

Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.

>>>I agree it is
>>>different than object behavior but that's well worth the benefits of the
>>>current system.
>>
>> Like what? Which benefits?
>
>see above - checking all the time for null would be very annoying. Almost all the time with arrays one cares if the length is zero and making people check for null before asking that question is error-prone.

It wouldn't be error prone. Perhaps you mean exceptions would be thrown, and that's fine, but there wouldn't be unnoticed errors. But in general I agree with you, slicing would lose its "magic" having to check for nulls.

>See Java for examples of making people check for null before asking for the length.

You can also learn from their mistakes and avoid them.

>>>If there are statements in the D doc that say "arrays have
>>>reference sematnics" I think they should be changed to be more accurate
>>>and
>>>say something like "the array data has reference semantics". It's common
>>>to
>>>ignore the length field when you are casually talking about arrays.
>>
>> Or perhaps the arrays themselves could be changed to reference types? ;)
>
>Sure - one can change anything in D if the tradeoffs are worth it. I happen to believe D's dynamic array semantics are an excellent balance of tradeoffs.

I think the semantics could use a little rethinking and specially a bit of clarification.

Cheers,
--AJG.


July 30, 2005
On Fri, 29 Jul 2005 18:50:45 +0000 (UTC), AJG wrote:

> Hi Ben,
> 
> Ok, I don't think I said exactly what I meant before. Let's look at this piece by piece:
> 
> 1) Arrays are ("in theory") reference types.

This is where I think we separate. I don't think that D arrays are reference types in the same manner as objects. I think they are value types in that they always have two fields; a pointer and a length. D arrays are more like a predefined struct. Your phrase "in theory", depends on whose theory you are talking about.

> 2) Objects are reference types.

Okay.

> 3) Arrays are not objects.

True.

> 4) So, even though Arrays and Objects are different, they share (or should)
> reference semantics.

I assume at this point that you are talking about arrays as defined in some
computer science book rather than how they are implemented in D.

> I believe most of us can agree up to here.

Apparently not ;-)

> My overall point is that D is not keeping its promise regarding Arrays obeying reference semantics.

"Promise"? Where is that written down?

>Whether this is good or not is debatable, but at least it
> should be noted. Do you agree that D's arrays break reference semantics?

I suppose so. But it doesn't worry me because it is a pragmatic implementation that makes coding clearer (IMO) and improves performance. I'm not sorry that D doesn't have text-book arrays, in that case.

In your previous example ...

> # char[] a = "hello"; // whatever.
> # char[] b = a; // This is a reference to a.
> # b.length = 2; // Now b became its own instance.
> 
> Semantically speaking, I think this is wrong.

I've adjusted my thinking when using D. To me, after the assignment 'b = a', I see that 'a' and 'b' are distinct arrays that happen to share the same data. This may be seen as twisting words or playing with semantics, but it works for me.

And by the way, the 'b.length = 2' statement does not cause 'b' to become another instance. It still shares the same data as 'a'. You only get a new instance when the length increases.

If D has not implemented text-book arrays, what are we losing? I can't see that we have lost anything, in fact we have gained.

-- 
Derek Parnell
Melbourne, Australia
30/07/2005 11:19:20 AM
July 30, 2005
Hi Derek,

>> Ok, I don't think I said exactly what I meant before. Let's look at this piece by piece:
>> 
>> 1) Arrays are ("in theory") reference types.
>
>This is where I think we separate. I don't think that D arrays are reference types in the same manner as objects. I think they are value types in that they always have two fields; a pointer and a length. D arrays are more like a predefined struct. Your phrase "in theory", depends on whose theory you are talking about.

Well, my "in theory" is actually pretty down-to-earth. I mean reference semantics in the way C++, C#, PHP, Java, Javascript and many other languages do references. This is not an ivory tower concept. It means essentially a nicer, fancier version of a pointer. When using the languages I mentioned, if you assign a reference, it will not become its own instance spontaineously in certain cases.

>> 4) So, even though Arrays and Objects are different, they share (or should)
>> reference semantics.
>
>I assume at this point that you are talking about arrays as defined in some computer science book rather than how they are implemented in D.

Guilty as charged re: being a computer scientist ;). However, once again, this is not a high-brow idea. Reference semantics are very basic and are implemented fairly similarly across various mainstream languages (C++, C#, PHP, Java, Javascript). D breaks reference semantics when it comes to arrays. This leads me to believe arrays are _not_ reference types, which is not the impression I got from their description. Walter has remained conspicously silent about the matter, and has not answered the question.

Are arrays reference types or not? If yes, then they are broken.

>> I believe most of us can agree up to here.
>
>Apparently not ;-)

Indeed. The final word can only come from the Big W., I'm afraid.

>> My overall point is that D is not keeping its promise regarding Arrays obeying reference semantics.
>
>"Promise"? Where is that written down?

It was a figure of speech :p. The promise "would" be written down if D agrees to implement array reference semantics and then doesn't. This is what I'm not sure about.

>>Whether this is good or not is debatable, but at least it
>> should be noted. Do you agree that D's arrays break reference semantics?
>
>I suppose so. But it doesn't worry me because it is a pragmatic implementation that makes coding clearer (IMO) and improves performance. I'm not sorry that D doesn't have text-book arrays, in that case.

Once more, these "text-book" arrays are fairly common across modern languages, and D's semantics are certainly a twisted variation. Also, I don't follow how that improves performance. If anything, it _decreases_ performance by spawning deep copies of array instances in certain special cases.

>In your previous example ...
>
>> # char[] a = "hello"; // whatever.
>> # char[] b = a; // This is a reference to a.
>> # b.length = 2; // Now b became its own instance.
>> 
>> Semantically speaking, I think this is wrong.
>
>I've adjusted my thinking when using D. To me, after the assignment 'b = a', I see that 'a' and 'b' are distinct arrays that happen to share the same data. This may be seen as twisting words or playing with semantics, but it works for me.

Well, then that's not a reference. Sharing just the same data is some weird variation of array that I hadn't encountered. This is not a reference.

>And by the way, the 'b.length = 2' statement does not cause 'b' to become another instance. It still shares the same data as 'a'. You only get a new instance when the length increases.

Great, yet another exception. Thanks for pointing it out.

>If D has not implemented text-book arrays, what are we losing? I can't see that we have lost anything, in fact we have gained.

Well, so what if we lost object reference semantics? Would that also be another "gain?" Less is more! Rations will be increased -33%. It's doubleplusgood!

;)

Cheers,
--AJG.


July 30, 2005
In article <dce4up$2cbc$1@digitaldaemon.com>, AJG says...
>
>Hi,
>
>>> What about .dup, .sort, .reverse, .sizeof?
>>> Do those have reference semantics or not?
>>
>>Yes - they "have reference semantics" in the sense that they act on the data (though in the case of .dup and .sizeof the reference/value semantics is irrelevant).
>
>Just to make sure I understand:
>
>char[] A = "123";
>char[] B = A;
>B.reverse;
>// B will be 321
>// A will be 321 also.
>// correct?

yes - aside from the fact that you should dup the "123" before trying to modify
it since "123" is put in read-only memory.
Reverse acts in-place because it is a method of the array type - like sorting is
in-place.

>BUT:
>
>char[] A = "123";
>char[] B = A;
>B.length = 2;
>// B will be 12
>// A will be remain 123.
>// correct?

yes

>If this is true, then it seems rather arbitrary to me that .length should break reference semantics. Why not keep it in line to how the rest work? (Specially since it's not related to the benefits you talked about before).

It is not arbitrary. There are advantages to the current design. I don't see why you say it is not related since it would be silly to have length do something different if there weren't benefits to making length special.

>>> Why is it a good design choice?
>
>>It would be very annoying to have to check for null before asking if an array length is zero. Plus the whole design of slicing would need to be redone and probably would lose much of the efficiency it has today.
>
>Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.

Where is the reaction in fear? I only see people trying to explain the current design and its advantages. I said I doubt a solution exists that would have the benefits of the current design while having reference semantics (if even reference semantics for length would be desirable). If you want to present some ideas that would be great - do whatever you want and enjoy (remember we're all doing this for fun).

>>I view an array as much closer to a struct than an object: an array is just like a struct with a pointer field and a length field. That's the simplest description of what an array is. Comparing them to objects is the wrong analogy.
>
>Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.

uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.

>>>>I agree it is
>>>>different than object behavior but that's well worth the benefits of the
>>>>current system.
>>>
>>> Like what? Which benefits?
>>
>>see above - checking all the time for null would be very annoying. Almost all the time with arrays one cares if the length is zero and making people check for null before asking that question is error-prone.
>
>It wouldn't be error prone. Perhaps you mean exceptions would be thrown, and that's fine, but there wouldn't be unnoticed errors. But in general I agree with you, slicing would lose its "magic" having to check for nulls.

By error-prone I mean the programmer will introduce bugs into the code by forgetting to check for null every time they want to know if an array has any content (meaning non-zero length).

>>See Java for examples of making people check for null before asking for the length.
>
>You can also learn from their mistakes and avoid them.

That's what D has now - it is avoiding the mistakes of Java by not requiring all those annoying null checks. Plus slicing is fast by not requiring memory allocations. Note in Java the length of an array is read-only so the whole question about length having value/reference semantics doesn't apply.


July 30, 2005
AJG wrote:

> 
> 
> Well, then that's not a reference. Sharing just the same data is some weird
> variation of array that I hadn't encountered. This is not a reference.
> 
> 

> 
> 
> Great, yet another exception. Thanks for pointing it out.
> 
> 

> Well, so what if we lost object reference semantics? Would that also be another
> "gain?" Less is more! Rations will be increased -33%. It's doubleplusgood!
> 

Wasn't it you who posted elsewhere in this thread that change is good? ;)

D has changed the way we think about arrays. From my perspective, it's a good change and your desire to revert to the 'array as a reference' paradigm is not. Maybe it would help if you think of the D array as a wrapper/facade to the actual reference?
July 30, 2005
On Sat, 30 Jul 2005 02:30:17 +0000 (UTC), AJG wrote:

> Hi Derek,
> 
>>> Ok, I don't think I said exactly what I meant before. Let's look at this piece by piece:
>>> 
>>> 1) Arrays are ("in theory") reference types.
>>
>>This is where I think we separate. I don't think that D arrays are reference types in the same manner as objects. I think they are value types in that they always have two fields; a pointer and a length. D arrays are more like a predefined struct. Your phrase "in theory", depends on whose theory you are talking about.
> 
> Well, my "in theory" is actually pretty down-to-earth. I mean reference semantics in the way C++, C#, PHP, Java, Javascript and many other languages do references. This is not an ivory tower concept. It means essentially a nicer, fancier version of a pointer. When using the languages I mentioned, if you assign a reference, it will not become its own instance spontaineously in certain cases.

I think I have the solution. Rename them. Don't call them arrays. Call them something else. Then your problem goes away ;-)

-- 
Derek Parnell
Melbourne, Australia
30/07/2005 10:49:59 PM
July 30, 2005
Hi Ben,

>>If this is true, then it seems rather arbitrary to me that .length should break reference semantics. Why not keep it in line to how the rest work? (Specially since it's not related to the benefits you talked about before).
>
>It is not arbitrary. There are advantages to the current design. I don't see why you say it is not related since it would be silly to have length do something different if there weren't benefits to making length special.

So then .length is related to slicing? How does the semantics of .length affect slicing? Or perhaps you meant other benefits?

>>>> Why is it a good design choice?
>>
>>>It would be very annoying to have to check for null before asking if an array length is zero. Plus the whole design of slicing would need to be redone and probably would lose much of the efficiency it has today.
>>
>>Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.
>
>Where is the reaction in fear? I only see people trying to explain the current design and its advantages. I said I doubt a solution exists that would have the benefits of the current design while having reference semantics (if even reference semantics for length would be desirable). If you want to present some ideas that would be great - do whatever you want and enjoy (remember we're all doing this for fun).

The general impression I get is that as soon as something creates the possibility of breaking existing code, then there is backlash. This would be fine for the embedded C language that runs medical heart devices. But for a language that isn't even out the door, it's disheartening (haha, no pun intended ;). Just my 2 cents.

>>>I view an array as much closer to a struct than an object: an array is just like a struct with a pointer field and a length field. That's the simplest description of what an array is. Comparing them to objects is the wrong analogy.
>>
>>Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.
>
>uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.

SomeObject A = new SomeObject;
SomeObject B = A;
B.SomeProperty; // Operates on A.

SomeStruct A;
SomeStruct B = A;
B.SomeProperty; // Operates on B.

int[] A = new int[5];
int[] B = A;
B.SomeProperty; // Operates on A;
// _Except_ if it's .length.

This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.

>>>>>I agree it is
>>>>>different than object behavior but that's well worth the benefits of the
>>>>>current system.
>>>>
>>>> Like what? Which benefits?
>>>
>>>see above - checking all the time for null would be very annoying. Almost all the time with arrays one cares if the length is zero and making people check for null before asking that question is error-prone.
>>
>>It wouldn't be error prone. Perhaps you mean exceptions would be thrown, and that's fine, but there wouldn't be unnoticed errors. But in general I agree with you, slicing would lose its "magic" having to check for nulls.
>
>By error-prone I mean the programmer will introduce bugs into the code by forgetting to check for null every time they want to know if an array has any content (meaning non-zero length).

Ok.

>>>See Java for examples of making people check for null before asking for the length.
>>
>>You can also learn from their mistakes and avoid them.
>
>That's what D has now - it is avoiding the mistakes of Java by not requiring all those annoying null checks. Plus slicing is fast by not requiring memory allocations. Note in Java the length of an array is read-only so the whole question about length having value/reference semantics doesn't apply.

I'm not suggesting making .length read-only. I'm suggesting making it operate on the same data it has a pointer to. Just like .sort or .reverse would. The way I see it, if you explicitly want to make a copy of the data, that's why there is dup. Why should .length secretely call .dup sometimes, and sometimes not?

Cheers,
--AJG.


July 30, 2005
In article <dcgc3q$13i9$1@digitaldaemon.com>, AJG says...
>
>Hi Ben,
>
>>>If this is true, then it seems rather arbitrary to me that .length should break reference semantics. Why not keep it in line to how the rest work? (Specially since it's not related to the benefits you talked about before).
>>
>>It is not arbitrary. There are advantages to the current design. I don't see why you say it is not related since it would be silly to have length do something different if there weren't benefits to making length special.
>
>So then .length is related to slicing? How does the semantics of .length affect slicing? Or perhaps you meant other benefits?

I recommend you pursue some of your ideas where length is manipulated by reference and follow the dependencies to see how different dynamic arrays (and, yes, slicing) would be. In particular I recommend you learn more about slicing. I'm sorry if that sounds harsh but I've gotten the opinion now that you haven't really gotten experience with D arrays as they exist now.

>>>>> Why is it a good design choice?
>>>
>>>>It would be very annoying to have to check for null before asking if an array length is zero. Plus the whole design of slicing would need to be redone and probably would lose much of the efficiency it has today.
>>>
>>>Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.
>>
>>Where is the reaction in fear? I only see people trying to explain the current design and its advantages. I said I doubt a solution exists that would have the benefits of the current design while having reference semantics (if even reference semantics for length would be desirable). If you want to present some ideas that would be great - do whatever you want and enjoy (remember we're all doing this for fun).
>
>The general impression I get is that as soon as something creates the possibility of breaking existing code, then there is backlash. This would be fine for the embedded C language that runs medical heart devices. But for a language that isn't even out the door, it's disheartening (haha, no pun intended ;). Just my 2 cents.

For my case when I said essentially "much code will break" it wasn't meant as a backlash - just as a fact you would have to address. A proposed change that breaks lots of code is harder to push through than one that doesn't as a simple practical matter more than any emotional attachment to old code.

>>>>I view an array as much closer to a struct than an object: an array is just like a struct with a pointer field and a length field. That's the simplest description of what an array is. Comparing them to objects is the wrong analogy.
>>>
>>>Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.
>>
>>uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.
>
>SomeObject A = new SomeObject;
>SomeObject B = A;
>B.SomeProperty; // Operates on A.
>
>SomeStruct A;
>SomeStruct B = A;
>B.SomeProperty; // Operates on B.
>
>int[] A = new int[5];
>int[] B = A;
>B.SomeProperty; // Operates on A;
>// _Except_ if it's .length.
>
>This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.

Please think about structs that contain pointers.

[snip]
>Why should .length secretely call .dup sometimes, and sometimes not?

Here I agree that the documentation should be more explicit in describing when setting the length reallocated and when it doesn't. If it is compiler-dependent the doc should say so.


July 30, 2005
Hi Ben,

>>So then .length is related to slicing? How does the semantics of
>>.length affect
>>slicing? Or perhaps you meant other benefits?
>
>I recommend you pursue some of your ideas where length is manipulated by reference and follow the dependencies to see how different dynamic arrays (and, yes, slicing) would be. In particular I recommend you learn more about slicing. I'm sorry if that sounds harsh but I've gotten the opinion now that you haven't really gotten experience with D arrays as they exist now.

Would an example do? I may not be an expert regarding slicing, but I could see a discrete problem if you point it out.

>>The general impression I get is that as soon as something creates the possibility of breaking existing code, then there is backlash. This would be fine for the embedded C language that runs medical heart devices. But for a language that isn't even out the door, it's disheartening (haha, no pun intended ;). Just my 2 cents.
>
>For my case when I said essentially "much code will break" it wasn't meant as a backlash - just as a fact you would have to address. A proposed change that breaks lots of code is harder to push through than one that doesn't as a simple practical matter more than any emotional attachment to old code.

This kind of thinking only works ceteris paribus. But if a solution that breaks less code is not as good, then the language loses. I think at this point the language can afford such changes before it becomes like C, where a header file was needed to introduce mere booleans.

>>>>>I view an array as much closer to a struct than an object: an array is just like a struct with a pointer field and a length field. That's the simplest description of what an array is. Comparing them to objects is the wrong analogy.
>>>>
>>>>Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.
>>>
>>>uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.
>>
>>SomeObject A = new SomeObject;
>>SomeObject B = A;
>>B.SomeProperty; // Operates on A.
>>
>>SomeStruct A;
>>SomeStruct B = A;
>>B.SomeProperty; // Operates on B.
>>
>>int[] A = new int[5];
>>int[] B = A;
>>B.SomeProperty; // Operates on A;
>>// _Except_ if it's .length.
>>
>>This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.
>
>Please think about structs that contain pointers.

Even if we see arrays as structs (which I don't, but for the sake of the argument), it doesn't explain why .length should break the other properties' semantics. If there's an obvious reason I'm blind to, could you point it out? I'm a little dense sometimes.

>[snip]
>>Why should .length secretely call .dup sometimes, and sometimes not?
>
>Here I agree that the documentation should be more explicit in describing when setting the length reallocated and when it doesn't. If it is compiler-dependent the doc should say so.

Ok.

Cheers,
--AJG.


July 30, 2005
In article <dcgkt5$1b4i$1@digitaldaemon.com>, AJG says...

>
>Even if we see arrays as structs (which I don't, but for the sake of the argument), it doesn't explain why .length should break the other properties' semantics. If there's an obvious reason I'm blind to, could you point it out? I'm a little dense sometimes.


Because sometimes it needs to reallocate memory.  Why don't you look at `man realloc`:

     The realloc() function tries to change the size of the allocation pointed
     to by ptr to size, and return ptr.  If there is not enough room to
     enlarge the memory allocation pointed to by ptr, realloc() creates a new
     allocation, copies as much of the old data pointed to by ptr as will fit
     to the new allocation, frees the old allocation, and returns a pointer to
     the allocated memory.  realloc() returns a NULL pointer if there is an
     error, and the allocation pointed to by ptr is still valid.

The difference is that D cannot let it free the original, because if it did then
other refereces to the data
would break.  So it dups the data if a realloc is going to allocate memory in a
different area.    I'm not
sure fo the exact implementation details in D, but that's my basic
understanding.

So for recap:
If length increases, and there is not enough space available to grow the array
it, it allocates another
block of memory and copies the data.   It leaves the original pointer in tack
then and lets the garbage
collector decide if anybody else has references to it still.

This may seem confusing, but it's about array slicing being fast.  If you don't
want there do be this
mixed semantics, and always dup your data.

(P.S. You mention C++ reference symatecs when you're talking about these arrays.
But this isn't even
legal in C++:
int foo[10];
foo = null;
You really can't compare the two languages in this aspect.  I think D arrays are
a big step forward when
compared to C arrays, which literally couldn't find their ass with both hands.)

-Sha