August 01, 2005
Hi Ben,

>Let me step through some choices that I was hoping you would do. Let's start by thinking about what an array with reference-based length would look like. It would either be a pointer to today's dynamic array (a ptr and a length) or it would be a pointer to one memory block with the length stored either at the front or end of the array data. How would slicing work for those two implementations? For the first slicing would have to allocate memory to store the new ptr and new length. For the second slicing would have to be a different type since it is impossible to store the length for the slice in the middle of the original source array. So that's why I suggested you think through your initial suggestion and work out the impact on slicing and arrays in general.

I don't think this change in the way arrays operate internally would be necessary. What about simply using the current data pointer as it is to implement reference semantics? A null pointer means the reference is null; and vice-versa.

The problem I keep hearing comes when trying to re-size (specifically, enlarge), an array, by reference. So then what it all comes down to re: .length is the inability of realloc() to guarantee that the pointer it returns is the same on it receives. Is this correct?

>But to be honest I would still prefer the current behavior where the length
>information is always available without having to check for null first - even >if you could somehow make the rest of D remain the same as today.

I understand this concern, and it is a valid one. However, at this point D is trying to have the cake and eating it too: It wants to have null arrays, but not have to go thru null checks. The result is a bit confusing, IMHO. Moreover, it is buggy. Worse of all, it is not well documented.

This combination of factors leads me to think something should be done.

Frankly, from the docs I can't make out what the semantics of arrays are supposed to be. That was why I asked the original question: should we or shouldn't we treat arrays as null? I guess maybe not even Walter knows ;) ?

Cheers,
--AJG.


August 01, 2005
Ben Hinkle wrote:
> I think you'll have a hard time getting lots of support for that. I much prefer the current behavior and I bet there is lots of existing D code that assumes one can test the length of an array at any time. Since an array is not an object I see no problem with the "inconistency" - an array is an array. 

Indeed. I think the array semantics where you can't access a property of the array without the Fear of the NullPointerException is the most annoying thing in the world, or at least in the field of programming.

I will happily agree to this difference in semantics because the benefits far outweigh the slight inconsistency.

Besides, in a way there is no inconsistency. An array reference is a value type consisting of two 4-byte integers (in 32-bit environments). This is different from an object reference. The first integer is the length of the array and the second is a pointer to the first item of the array. Whenever an array reference is created a pointer to the data exists. The .length property is just a shortcut to access the length field of the array. The .sort property is a function called on the array reference. These always work even if the array reference points to an empty array. Trying to access the elements of an empty array will segfault in the usual way.

Object references stored in an array have the usual semantics. IMO nothing forces a language to treat arrays as templated instances of a class Array with regular object semantics. D's way is just better.

-- 
Niko Korhonen
SW Developer
August 01, 2005
On Mon, 01 Aug 2005 09:56:57 +0300, Niko Korhonen wrote:

> Ben Hinkle wrote:
>> I think you'll have a hard time getting lots of support for that. I much prefer the current behavior and I bet there is lots of existing D code that assumes one can test the length of an array at any time. Since an array is not an object I see no problem with the "inconistency" - an array is an array.
> 
> Indeed. I think the array semantics where you can't access a property of the array without the Fear of the NullPointerException is the most annoying thing in the world, or at least in the field of programming.
> 
> I will happily agree to this difference in semantics because the benefits far outweigh the slight inconsistency.
> 
> Besides, in a way there is no inconsistency. An array reference is a value type consisting of two 4-byte integers (in 32-bit environments). This is different from an object reference.

Agreed. The way I look at it is that a D array variable *contains* a reference to the array elements but is, in itself, not the reference.

When it comes to implementation, dynamic-length arrays always have an 8-byte structure allocated to themselves, and may have more RAM allocated if there are any elements in the array. The address of the array variable is not the address of the first element; the length property is fetched at runtime from the array variable.

However, fixed-length arrays always have a minimum of 8 bytes allocated regardless of the number of elements declared, and the address of the array variable is also the address of its first element; the length property is 'hard-coded' by the compiler in any expressions that use it.

-- 
Derek
Melbourne, Australia
1/08/2005 5:01:43 PM
August 01, 2005
"Derek Parnell" <derek@psych.ward> wrote in message news:a118xxgyuee7.t1828b9vk5du$.dlg@40tude.net...
> On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:
>
>
> [snip]
>> What is bizarre is the current array semantics, be it due to "close to
>> the
>> metal" requirements, or whatever. If you don't think arrays at the moment
>> follow
>> at least _partial_ reference semantics, then why does:
>>
>> # char[] A = "123"; // Yes, it's static, bear with me.
>> # char[] B = A;
>> # B.reverse;
>>
>> Reverse _also_ the contents of A?
>
> There might have been be an argument that .reverse and .sort should follow Walter's Copy-on-Write rules of engagement, but the current behavior is documented and relied upon in current code.

Besides those reasons writing "B.reverse" to me indicates you want to affect B hence no COW while "reverse(B)" says you want a reversed B hence COW. That's one reason why I don't really like the current syntax hack of being able to write B.tolower() to mean tolower(B).


August 01, 2005
In article <dclba9$2pif$1@digitaldaemon.com>, Ben Hinkle says...
>
>
>"Derek Parnell" <derek@psych.ward> wrote in message news:a118xxgyuee7.t1828b9vk5du$.dlg@40tude.net...
>> On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:
>>
>>
>> [snip]
>>> What is bizarre is the current array semantics, be it due to "close to
>>> the
>>> metal" requirements, or whatever. If you don't think arrays at the moment
>>> follow
>>> at least _partial_ reference semantics, then why does:
>>>
>>> # char[] A = "123"; // Yes, it's static, bear with me.
>>> # char[] B = A;
>>> # B.reverse;
>>>
>>> Reverse _also_ the contents of A?
>>
>> There might have been be an argument that .reverse and .sort should follow Walter's Copy-on-Write rules of engagement, but the current behavior is documented and relied upon in current code.
>
>Besides those reasons writing "B.reverse" to me indicates you want to affect B hence no COW while "reverse(B)" says you want a reversed B hence COW. That's one reason why I don't really like the current syntax hack of being able to write B.tolower() to mean tolower(B).

Utterly confusing!  reserve(b) and B.reverse have nothing in their name to imply that either one copies the data.  By default COW should not happen.  Believe me, look at .NET where everything is COW.  New memory allocations all over the place.  IMHO .dup is there for a reason, and nothing is preventing you from doing:

foo.dup.reverse

If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing.  Plus no confusion as to what does cow and what doesn't.  I can copy the thing first with .dup if I want.  This isn't C where it's 5 lines of code every time you need to copy an array!

-Sha


August 01, 2005
"Shammah Chancellor" <Shammah_member@pathlink.com> wrote in message news:dcleqr$2ti5$1@digitaldaemon.com...
> In article <dclba9$2pif$1@digitaldaemon.com>, Ben Hinkle says...
>>
>>
>>"Derek Parnell" <derek@psych.ward> wrote in message news:a118xxgyuee7.t1828b9vk5du$.dlg@40tude.net...
>>> On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:
>>>
>>>
>>> [snip]
>>>> What is bizarre is the current array semantics, be it due to "close to
>>>> the
>>>> metal" requirements, or whatever. If you don't think arrays at the
>>>> moment
>>>> follow
>>>> at least _partial_ reference semantics, then why does:
>>>>
>>>> # char[] A = "123"; // Yes, it's static, bear with me.
>>>> # char[] B = A;
>>>> # B.reverse;
>>>>
>>>> Reverse _also_ the contents of A?
>>>
>>> There might have been be an argument that .reverse and .sort should
>>> follow
>>> Walter's Copy-on-Write rules of engagement, but the current behavior is
>>> documented and relied upon in current code.
>>
>>Besides those reasons writing "B.reverse" to me indicates you want to
>>affect
>>B hence no COW while "reverse(B)" says you want a reversed B hence COW.
>>That's one reason why I don't really like the current syntax hack of being
>>able to write B.tolower() to mean tolower(B).
>
> Utterly confusing!  reserve(b) and B.reverse have nothing in their name to
> imply
> that either one copies the data.  By default COW should not happen.
> Believe me,
> look at .NET where everything is COW.  New memory allocations all over the
> place.  IMHO .dup is there for a reason, and nothing is preventing you
> from
> doing:
>
> foo.dup.reverse
>
> If somebody else comes along, they will knows you are copying the array.
> It's
> only 4 more characters of typing.  Plus no confusion as to what does cow
> and
> what doesn't.  I can copy the thing first with .dup if I want.  This isn't
> C
> where it's 5 lines of code every time you need to copy an array!
>
> -Sha

You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?


August 01, 2005
In article <dclfvs$2usj$1@digitaldaemon.com>, Ben Hinkle says...
>
>
>"Shammah Chancellor" <Shammah_member@pathlink.com> wrote in message news:dcleqr$2ti5$1@digitaldaemon.com...
>> In article <dclba9$2pif$1@digitaldaemon.com>, Ben Hinkle says...
>>>
>>>
>>>"Derek Parnell" <derek@psych.ward> wrote in message news:a118xxgyuee7.t1828b9vk5du$.dlg@40tude.net...
>>> [snip]
>>>Besides those reasons writing "B.reverse" to me indicates you want to
>>>affect
>>>B hence no COW while "reverse(B)" says you want a reversed B hence COW.
>>>That's one reason why I don't really like the current syntax hack of being
>>>able to write B.tolower() to mean tolower(B).
>>
>> Utterly confusing!  reserve(b) and B.reverse have nothing in their name to
>> imply
>> that either one copies the data.  By default COW should not happen.
>> Believe me,
>> look at .NET where everything is COW.  New memory allocations all over the
>> place.  IMHO .dup is there for a reason, and nothing is preventing you
>> from
>> doing:
>>
>> foo.dup.reverse
>>
>> If somebody else comes along, they will knows you are copying the array.
>> It's
>> only 4 more characters of typing.  Plus no confusion as to what does cow
>> and
>> what doesn't.  I can copy the thing first with .dup if I want.  This isn't
>> C
>> where it's 5 lines of code every time you need to copy an array!
>>
>> -Sha
>
>You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?

I wasn't proposing a change at all.  I was disagreing with Derek.  I think COW is a bad thing for API functions to be doing mysteriously.  It leads to crap like this:

foo = foo.Replace("Hello","");
dateFoo = dateFoo.AddDays(1);

If I want a duplicate something, in D, it's as easy as saying:
# foo2 = foo.dup.replace("Hello","");
(Not that replace is a valid property for char[]s, but you get my gist)

This leads to effective memory use, and no confusion about:

reverse(b), or b.reverse

Which one does c-o-w?  The name certainly doesn't say, maybe by somebodies reasoning it might make sense that one does cow and one doesn't.  But certainly not mine, from the information given.

Also, you might say for consistency, always use cow.  But cow is not always what you want. Since there's no way to manually un-cowify it,  It would make logical sense to NEVER do cow, and let the programmer call dup first.

-Sha




August 01, 2005
Hi,

>If I want a duplicate something, in D, it's as easy as saying:
># foo2 = foo.dup.replace("Hello","");
>(Not that replace is a valid property for char[]s, but you get my gist)

Exactly.

>This leads to effective memory use, and no confusion about:
>reverse(b), or b.reverse
>
>Which one does c-o-w?  The name certainly doesn't say, maybe by somebodies reasoning it might make sense that one does cow and one doesn't.  But certainly not mine, from the information given.

IMHO, and for consistency, it should never do COW. If a user wants to do COW, let the user do it. That's exactly what I mean by reference semantics, so it seems we are in agreement here.

>Also, you might say for consistency, always use cow.  But cow is not always you want. Since there's no way to manually un-cowify it,  It would make logical sense to NEVER do cow, and let the programmer call dup first.

Interestingly enough (and one of my points), .length does COW about half of the time, and there's no way to un-cowify it.

That's a great word, btw, un-cowify. It had me chuckling.

Cheers,
--AJG.


August 01, 2005
"Shammah Chancellor" <Shammah_member@pathlink.com> wrote in message news:dclk4p$1o0$1@digitaldaemon.com...
> In article <dclfvs$2usj$1@digitaldaemon.com>, Ben Hinkle says...
>>
>>
>>"Shammah Chancellor" <Shammah_member@pathlink.com> wrote in message news:dcleqr$2ti5$1@digitaldaemon.com...
>>> In article <dclba9$2pif$1@digitaldaemon.com>, Ben Hinkle says...
>>>>
>>>>
>>>>"Derek Parnell" <derek@psych.ward> wrote in message news:a118xxgyuee7.t1828b9vk5du$.dlg@40tude.net...
>>>> [snip]
>>>>Besides those reasons writing "B.reverse" to me indicates you want to
>>>>affect
>>>>B hence no COW while "reverse(B)" says you want a reversed B hence COW.
>>>>That's one reason why I don't really like the current syntax hack of
>>>>being
>>>>able to write B.tolower() to mean tolower(B).
>>>
>>> Utterly confusing!  reserve(b) and B.reverse have nothing in their name
>>> to
>>> imply
>>> that either one copies the data.  By default COW should not happen.
>>> Believe me,
>>> look at .NET where everything is COW.  New memory allocations all over
>>> the
>>> place.  IMHO .dup is there for a reason, and nothing is preventing you
>>> from
>>> doing:
>>>
>>> foo.dup.reverse
>>>
>>> If somebody else comes along, they will knows you are copying the array.
>>> It's
>>> only 4 more characters of typing.  Plus no confusion as to what does cow
>>> and
>>> what doesn't.  I can copy the thing first with .dup if I want.  This
>>> isn't
>>> C
>>> where it's 5 lines of code every time you need to copy an array!
>>>
>>> -Sha
>>
>>You've lost me. Are you proposing a change to any existing behavior or
>>coding practice (ie COW)?
>
> I wasn't proposing a change at all.  I was disagreing with Derek.  I think
> COW
> is a bad thing for API functions to be doing mysteriously.  It leads to
> crap
> like this:
>
> foo = foo.Replace("Hello","");
> dateFoo = dateFoo.AddDays(1);

I didn't read Derek's post as proposing reverse use COW. He was pointing out that it doesn't. It's too bad you see COW as mysterious.

> If I want a duplicate something, in D, it's as easy as saying:
> # foo2 = foo.dup.replace("Hello","");
> (Not that replace is a valid property for char[]s, but you get my gist)
>
> This leads to effective memory use, and no confusion about:
>
> reverse(b), or b.reverse
>
> Which one does c-o-w?  The name certainly doesn't say, maybe by somebodies
> reasoning it might make sense that one does cow and one doesn't.  But
> certainly
> not mine, from the information given.

The statement about effective memory use only is true when the operation is guaranteed to change the string. If foo in the example didn't contain any Hellos then the dup would be wasteful. Plus I'm surprised you don't see any difference between reverse(b) and b.reverse since it's common in OOP to interpret b.foo as acting on b while foo(b) is just some function of b.

> Also, you might say for consistency, always use cow.  But cow is not
> always what
> you want. Since there's no way to manually un-cowify it,  It would make
> logical
> sense to NEVER do cow, and let the programmer call dup first.

That would be a big change in D style since many times you do not know if a dup will be needed or not (eg most of the functions in std.string might just return the original string).


August 01, 2005
On Mon, 1 Aug 2005 16:54:49 +0000 (UTC), Shammah Chancellor wrote:


>>>>"Derek Parnell" <derek@psych.ward> wrote in message news:a118xxgyuee7.t1828b9vk5du$.dlg@40tude.net...
>>>> [snip]
>>>>Besides those reasons writing "B.reverse" to me indicates you want to
>>>>affect
>>>>B hence no COW while "reverse(B)" says you want a reversed B hence COW.
>>>>That's one reason why I don't really like the current syntax hack of being
>>>>able to write B.tolower() to mean tolower(B).

> I was disagreing with Derek.  I think COW
> is a bad thing for API functions to be doing mysteriously.  It leads to crap
> like this:
> 
> foo = foo.Replace("Hello","");
> dateFoo = dateFoo.AddDays(1);

Hi Shammah,
I wasn't actually saying that .reverse must use CoW. I was saying that it
didn't and that fact seems go counter to Walter's general principle (as I
understand it) about when to use Cow or not. I thought that one should use
CoW if the code is actually changing the data *and* the data might be
accessible to the calling routine. Thus as the .reverse will change the
data for lengths > 1, and the data is probably accessible to the code using
.reverse, one could have expected it to CoW.

Of course, I might be misunderstanding that 'general principle' ;-)

As the current behaviour is documented, we can cope with this seeming exception.

-- 
Derek Parnell
Melbourne, Australia
2/08/2005 7:21:43 AM