December 04, 2007
"Oskar Linde" wrote
> mandel wrote:
>> Steven Schveighoffer wrote:
>> [..]
>>> Now I create the valid array slices:
>>>
>>> int[] array3 = array1[$..$];
>>> int[] array4 = array2[0..0];
>>>
>>> Note that both of these arrays are bit-for-bit identical (both have 0 length and the same ptr value).  Which one points to which piece of memory?  How is the GC to decide which memory gets collected?
>> I see the problem.
>> The first possible solution that comes to my mind seeing this is to
>> make array1[0..0] and array1[$..$] equal.
>> array1[$..$] could point to the begin of the array.
>> Since the slice length is null, it shouldn't matter - would it?
>
> Appending to a (empty or not) array slice starting at the start of an allocated block appends in-place rather than allocate a new array. This is the reason
>
> while(x)
>   a ~= b;
>
> can be reasonably efficient.

Hm... I think you are slightly incorrect.  I think the array is appended to in place ONLY if the data after the slice is unallocated.  In this case, it would be allocated, so the array would be re-allocated elsewhere.

> So appending to the [$..$] array would (without padding) mean that you corrupt the following array.

I think this is incorrect for the reasons I stated above.  An allocated block should never be re-assigned to another array.

Maybe I am wrong, but I think mandel might have a possible solution to this problem.  If you slice an empty array (or even allocate an empty array), set the pointer to null.  No reason to allocate an empty array, and no reason you need to keep memory around for it.  If you append to it, it's going to be like appending to an init array anyways.  That would also make null comparisons more consistent like:

int[] array1 = array2[0..0];

if(array1 is null) // evaluates to true!
...

-Steve


December 04, 2007
Steven Schveighoffer wrote:
> "Oskar Linde" wrote
>> mandel wrote:
>>> Steven Schveighoffer wrote:
>>> [..]
>>>> Now I create the valid array slices:
>>>>
>>>> int[] array3 = array1[$..$];
>>>> int[] array4 = array2[0..0];
>>>>
>>>> Note that both of these arrays are bit-for-bit identical (both have 0
>>>> length and the same ptr value).  Which one points to which piece of
>>>> memory?  How is the GC to decide which memory gets collected?
>>> I see the problem.
>>> The first possible solution that comes to my mind seeing this is to
>>> make array1[0..0] and array1[$..$] equal.
>>> array1[$..$] could point to the begin of the array.
>>> Since the slice length is null, it shouldn't matter - would it?
>> Appending to a (empty or not) array slice starting at the start of an allocated block appends in-place rather than allocate a new array. This is the reason
>>
>> while(x)
>>   a ~= b;
>>
>> can be reasonably efficient.
> 
> Hm... I think you are slightly incorrect.  I think the array is appended to in place ONLY if the data after the slice is unallocated.  In this case, it would be allocated, so the array would be re-allocated elsewhere.

Try this:

        char[] ab = "ab".dup;
        char[] a = ab[0..1];
        a ~= "c";
        writefln("ab = ",ab);

>> So appending to the [$..$] array would (without padding) mean that you corrupt the following array.
> 
> I think this is incorrect for the reasons I stated above.  An allocated block should never be re-assigned to another array.

See my example above. The allocated block is deduced from the slice .ptr. If the pointer points at the start of another array, DMD would have no way of knowing it isn't a slice of that other array.

> Maybe I am wrong, but I think mandel might have a possible solution to this problem.  If you slice an empty array (or even allocate an empty array), set the pointer to null.  No reason to allocate an empty array, and no reason you need to keep memory around for it.  If you append to it, it's going to be like appending to an init array anyways.  That would also make null comparisons more consistent like:
> 
> int[] array1 = array2[0..0];
> 
> if(array1 is null) // evaluates to true!
> ...

There are several cases where it is useful to retain the pointer when the array is of zero length, for example a zero length regexp sub-expression match.

It used to be the case that setting a slice length to 0 (via the length property) made the pointer null as well. That changed a while ago.

-- 
Oskar
December 04, 2007
"Oskar Linde" wrote
> Try this:
>
>         char[] ab = "ab".dup;
>         char[] a = ab[0..1];
>         a ~= "c";
>         writefln("ab = ",ab);

Outputs "ac"

So this appears to be a bug then.  Because from the spec:

"Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array"

and then for the append operator example:

"a ~= b;		// a becomes the concatenation of a and b"

Changing the line to a = a ~ "c" changes the output of the program to "ab".

Another reason why this is seems to be a bug and NOT a feature:

        string ab = "ab".idup;
        string a = ab[0..1];
        a ~= "c";
        writefln("ab = ",ab); // also outputs "ac"

This changes an invariant string without compiler complaint!

Note that Tango has this problem too.

>
>>> So appending to the [$..$] array would (without padding) mean that you corrupt the following array.
>>
>> I think this is incorrect for the reasons I stated above.  An allocated block should never be re-assigned to another array.
>
> See my example above. The allocated block is deduced from the slice .ptr. If the pointer points at the start of another array, DMD would have no way of knowing it isn't a slice of that other array.

I think your example exposes a bug, and does not agree with what the spec says.

There seems to be a silent agreement among everyone that D should behave that way, but I can't find anything in the spec that states it should.  Is this something that is planned to be fixed or at least described correctly in the spec?

If someone desires this behavior, I would say that it's possible to keep a reference to the entire array and use the copy operator.  i.e.:

ab[1..$] = "c";

perhaps there could be another way to extend the slice if more buffer space exists?  With the caveat that you know that if you have other references to that data, they could be changed too?  I can see the usefulness of using an array as a buffer which keeps its allocated space as it shrinks, but this is not worth having x ~= y not mean the same thing as x = x ~ y.  The current meaning is too error prone in my opinion.

-Steve


December 04, 2007
Steven Schveighoffer wrote:
> "Oskar Linde" wrote
>> Try this:
>>
>>         char[] ab = "ab".dup;
>>         char[] a = ab[0..1];
>>         a ~= "c";
>>         writefln("ab = ",ab);
> 
> Outputs "ac"
> 
> So this appears to be a bug then.  Because from the spec:
> 
> "Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array"

Then the spec is wrong.  The current behavior is very deliberate, insofar as the code is concerned.  Look at internal/gc/gc.d.  It is also deliberate for interior slices to always reallocate on an append.  But the runtime has no way to know whether something pointing to the head of a block is a slice or is the original array.  I've never actually found this to be a problem in practice, and I'll admit to having used the slice terminology from time to time, because it's more succinct than resizing using the length property.

>>>> So appending to the [$..$] array would (without padding) mean that you corrupt the following array.
>>> I think this is incorrect for the reasons I stated above.  An allocated block should never be re-assigned to another array.
>> See my example above. The allocated block is deduced from the slice .ptr. If the pointer points at the start of another array, DMD would have no way of knowing it isn't a slice of that other array.
> 
> I think your example exposes a bug, and does not agree with what the spec says.

The spec also says that D 1.0 has inheritable contracts, and maybe we will one day, but it's not even on the radar at the moment.  For better or worse, I've learned not to put much stock in what the spec says about some things.

> There seems to be a silent agreement among everyone that D should behave that way, but I can't find anything in the spec that states it should.  Is this something that is planned to be fixed or at least described correctly in the spec?
> 
> If someone desires this behavior, I would say that it's possible to keep a reference to the entire array and use the copy operator.  i.e.:
> 
> ab[1..$] = "c";
> 
> perhaps there could be another way to extend the slice if more buffer space exists?

It would be easy to allow all slices to be extended in place, even the interior ones.  But going the other direction would be difficult.  The proposed T[new] syntax might help in that direction, but I hate it.


Sean
December 04, 2007
"Steven Schveighoffer" wrote
> Another reason why this is seems to be a bug and NOT a feature:
>
>        string ab = "ab".idup;
>        string a = ab[0..1];
>        a ~= "c";
>        writefln("ab = ",ab); // also outputs "ac"
>
> This changes an invariant string without compiler complaint!

more bugs :)

import std.stdio;

struct X
{
        char[5] myArray;
        int x;
}

void main()
{
        X[] x = new X[2];
        x[0].myArray[] = "hello";
        char[] myslice = x[0].myArray[0..3];
        writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]);
        myslice ~= "hithere";
        writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]);
        writefln("%s %d", x[0].myArray, x[0].x);
}

output:

868FE8 868FE0 868FE0
868FE8 868FE0 868FE0
helhi 25970

-Steve


December 04, 2007
Steven Schveighoffer wrote:
> "Steven Schveighoffer" wrote
>> Another reason why this is seems to be a bug and NOT a feature:
>>
>>        string ab = "ab".idup;
>>        string a = ab[0..1];
>>        a ~= "c";
>>        writefln("ab = ",ab); // also outputs "ac"
>>
>> This changes an invariant string without compiler complaint!
> 
> more bugs :)

This is expected behavior.


Sean
December 05, 2007
"Sean Kelly" wrote
> Steven Schveighoffer wrote:
>> "Steven Schveighoffer" wrote
>>> Another reason why this is seems to be a bug and NOT a feature:
>>>
>>>        string ab = "ab".idup;
>>>        string a = ab[0..1];
>>>        a ~= "c";
>>>        writefln("ab = ",ab); // also outputs "ac"
>>>
>>> This changes an invariant string without compiler complaint!
>>
>> more bugs :)
>
> This is expected behavior.

Behavior by design, perhaps.  Expected, I should hope not.  I would never expect to be able to have one variable overwrite another without obvious casting.  And why should it be 'expected behavior' for the GC to assume that because an array is at the beginning of a memory block, it is free to use any memory in that block?  I think I've proven that there are cases where it should not assume that.

I'm not saying there is a bug in the compiler implementation, or that the docs need to be changed to reflect the compiler behavior.  I'm saying the design here is flat out wrong, and needs to be reflected in the compiler. My recommendation would be to make the ~= behave exactly as the spec says, that it always makes a copy of it's arguments.  If you need buffer-like behavior for performance, write a new type.

Isn't one of Walter's goal to prevent silent runtime errors?  IMO, memory corruption errors are the worst kind of silent errors.

-Steve


December 05, 2007
Steven Schveighoffer wrote:
> "Sean Kelly" wrote
>> Steven Schveighoffer wrote:
>>> "Steven Schveighoffer" wrote
>>>> Another reason why this is seems to be a bug and NOT a feature:
>>>>
>>>>        string ab = "ab".idup;
>>>>        string a = ab[0..1];
>>>>        a ~= "c";
>>>>        writefln("ab = ",ab); // also outputs "ac"
>>>>
>>>> This changes an invariant string without compiler complaint!
>>> more bugs :)
>> This is expected behavior.

In this post I'm commenting on the example shown above, not the 2nd one (which to be honest is much more worrying).  I am a bit confused as to which example Sean was saying was "expected behaviour".

> Behavior by design, perhaps.  Expected, I should hope not.  I would never expect to be able to have one variable overwrite another without obvious casting.  

Both variables above are references to the same data.  You're using one variable to change that data, therefore the other variable which still refers to the same data, sees the changes.

If the concatenation operation had to reallocate the memory it would produce a copy, and you wouldn't see the changes.

So, this behaviour is non deterministic, however...

> And why should it be 'expected behavior' for the GC to assume that
> because an array is at the beginning of a memory block, it is free to use any memory in that block?  I think I've proven that there are cases where it should not assume that.

The assumption fits with D's (semi-)official "copy on write" policy.  If you want to write to memory and you cannot be sure you are the only reference then you should copy the data before writing.

Following this guideline makes the behaviour deterministic, and...

> I'm not saying there is a bug in the compiler implementation, or that the docs need to be changed to reflect the compiler behavior.  I'm saying the design here is flat out wrong, and needs to be reflected in the compiler. My recommendation would be to make the ~= behave exactly as the spec says, that it always makes a copy of it's arguments.  If you need buffer-like behavior for performance, write a new type.

The current behaviour allows you to skip the copy step if you _know_ you hold the only reference to the data, it's putting the choice/power in the programmers hands.  As always, power can be a dangerous thing if missused :)

> Isn't one of Walter's goal to prevent silent runtime errors?  IMO, memory corruption errors are the worst kind of silent errors.

The example shown above is not corrupting any memory.  The 2nd one (not shown above) seems to be and it worries me much more.

Regan
December 05, 2007
Steven Schveighoffer wrote:
> import std.stdio;
> 
> struct X
> {
>         char[5] myArray;
>         int x;
> }
> 
> void main()
> {
>         X[] x = new X[2];
>         x[0].myArray[] = "hello";
>         char[] myslice = x[0].myArray[0..3];
>         writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]);
>         myslice ~= "hithere";
>         writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]);
>         writefln("%s %d", x[0].myArray, x[0].x);
> }
> 
> output:
> 
> 868FE8 868FE0 868FE0
> 868FE8 868FE0 868FE0
> helhi 25970

This one worries me.

I believe the problem is caused by the memory address of myArray[0] being the same as the memory address of the struct.  Is this what you realised Sean... I may be a bit slow on the uptake here :)

When the slice needs to reallocate the GC checks this address and finds enough space following the struct (or perhaps it has allocated on a power of two boundary and already has enough) and it allows the concatenation to write to that memory.

The problem is that it doesn't realise the memory was allocated to a struct, and is being reallocated by an array slice.  So, the array concatenation overwrites the memory occupied by the int 'x'.

Ick.

I would have expected a static array to be un-reallocatable, so any concatenation performed on a slice of one to cause a copy to be made. But of course all that information is lost at the place where the reallocation is done, it's simply a memory address with a certain amount of memory associated with it.

R
December 05, 2007
"Regan Heath" wrote
> Steven Schveighoffer wrote:
>> "Sean Kelly" wrote
>>> Steven Schveighoffer wrote:
>>>> "Steven Schveighoffer" wrote
>>>>> Another reason why this is seems to be a bug and NOT a feature:
>>>>>
>>>>>        string ab = "ab".idup;
>>>>>        string a = ab[0..1];
>>>>>        a ~= "c";
>>>>>        writefln("ab = ",ab); // also outputs "ac"
>>>>>
>>>>> This changes an invariant string without compiler complaint!
>>>> more bugs :)
>>> This is expected behavior.
>
> In this post I'm commenting on the example shown above, not the 2nd one (which to be honest is much more worrying).  I am a bit confused as to which example Sean was saying was "expected behaviour".
>
>> Behavior by design, perhaps.  Expected, I should hope not.  I would never expect to be able to have one variable overwrite another without obvious casting.
>
> Both variables above are references to the same data.  You're using one variable to change that data, therefore the other variable which still refers to the same data, sees the changes.
>
> If the concatenation operation had to reallocate the memory it would produce a copy, and you wouldn't see the changes.
>
> So, this behaviour is non deterministic, however...

The problem is that invariant data is changing.  This is a no-no for pure functions which Walter has planned.  If invariant data can change without violating the rules of the spec, then the compiler implementation or design is flawed.  I think the design is what is flawed.

I have several problems with this concat operator issue.

First, that x ~= y does not effect the same behavior as x = x ~ y.  This is a fundamental flaw in the language in my opinion.  any operator of the op= form is supposed to mean the same as x = x op y.  This is consistent throughout all of D, except in this case.

Second, there is the issue of the spec.  The spec clearly states that concatenation should result in a copy of both sides.  Obviously, this isn't true in all cases.  The spec should be changed for both D 1.x and 2.x IMMEDIATELY to prevent unsuspecting coders from using ~= when what they really want is just ~.

Third, I have not seen this T[new] operator described anywhere, but I am concerned that D 1.0 will not be updated.  This leaves all coders who are not ready to switch to D 2 at risk.  But from the inferred behavior of T[new], I'm expecting that this will probably fix the problem.

-Steve