Possible change to array runtime?

Index » General » Possible change to array runtime?

Possible change to array runtime?
Mar 13, 2014 Steven Schveighoffer
Mar 13, 2014 monarch_dodra
Mar 13, 2014 Steven Schveighoffer
Mar 13, 2014 monarch_dodra
Mar 13, 2014 Steven Schveighoffer
Mar 13, 2014 monarch_dodra
Mar 13, 2014 Steven Schveighoffer
Mar 13, 2014 monarch_dodra
Mar 13, 2014 Steven Schveighoffer
Mar 16, 2014 Joseph Rushton Wakeling
Mar 20, 2014 Jonathan M Davis
Mar 13, 2014 Andrei Alexandrescu
Mar 13, 2014 Steven Schveighoffer
Mar 13, 2014 Andrei Alexandrescu
Mar 14, 2014 Dicebot
Mar 14, 2014 Steven Schveighoffer
Mar 14, 2014 Andrej Mitrovic
Mar 20, 2014 Jonathan M Davis
Mar 13, 2014 deadalnix
Mar 13, 2014 sclytrack
Mar 14, 2014 deadalnix
Mar 17, 2014 sclytrack
Mar 15, 2014 sclytrack
Mar 14, 2014 Steven Schveighoffer
Mar 14, 2014 Don
Mar 14, 2014 Andrei Alexandrescu
Mar 14, 2014 deadalnix
Mar 14, 2014 Steven Schveighoffer
Mar 16, 2014 Timon Gehr

March 13, 2014

Posted by Steven Schveighoffer

Permalink

Steven Schveighoffer

Permalink

Buried inside another large thread, Don and Dicebot complained about D2's prevention of array stomping as being the biggest breaking change for D2, not that it breaks code, but silently destroys performance.

A little background for people who aren't aware:

1. D1 took the position that if an array slice started at the beginning of a block, and you appended to that slice, it would extend in-block no matter what else was referencing that data.

In other words:

char[] arr = "abc"; // this was ok in D1, which did not have the immutable keyword
char[] arr2 = arr[0..1];
arr2 ~= "de";

assert(arr == "ade"); // stomped!

This was a "feature", not a bug, because the idea is, once you have built a large or appropriately sized buffer, you may want to forget what it had, and re-use it by doing:

arr.length = 0;

2. D2 added a mechanism to prevent stomping, especially in the case of immutable, by storing the "used" length of a block inside the block itself.

This was done by yours truly, and at the same time, I also improved performance of array appending by exploiting the fact that D2 has thread-local storage by default to avoid a global GC lock.

However, any code that is compiled from D1, the following line still works fine:

arr.length = 0;

But has a vastly different meaning. Instead of re-using the block, it *abandons* the block, and allocates a new one. This leaves the original block as garbage, especially if nothing is left pointing at the original block.

3. Don's company uses D1 as its language, I highly recommend watching Don's Dconf13 presentation (and look forward to his Dconf14 one!) to see how effective D code can create unbelievable speed, especially where array slices are concerned. But to the above line, in D2, they must add the following code to get the same behavior:

arr.assumeSafeAppend();

This is an "unsafe" operation which says, "I don't care that there is still data in there, that other slices may refer to, just throw it away."

And in fact, I believe the performance would improve over D1 if they did that. But at the moment, just "switching to D2" will cause unacceptable performance loss.

However, there may be a way to keep the original behavior in the right circumstances. It's a tad bit hacky, but makes a lot of sense. So here goes:

Proposal: iff you are setting the length of an array via the length property AND you are setting the length to 0 AND the array starts at the beginning of the block AND the array is mutable (not const or immutable), THEN assumeSafeAppend is called on that array AUTOMATICALLY.

rationale for this not affecting existing code:

If you mean to abandon an array, left to other references, are you more likely to use arr = null, or arr.length = 0? The former disavows all claims to the array, allows it to be garbage collected if no other references exist, and is shorter to type. arr.length = 0 retains the pointer to the array, which keeps it in memory, but it has no access to the existing data. The only valid operation on that array at that point is to append, which means you are going to reallocate anyway, there was no reason to keep a pointer at the original array.

I think in vast majority of cases, where one intends the behavior to reallocate, they would use arr = null. In the other cases, I would argue, they don't know what they are doing, they incorrectly think it's going to take over the array again. In that case, they likely don't care if the existing data gets stomped.

It does not disallow the original operation, you can do the same thing with arr = arr[0..0];

The reason we disallow it on const or immutable is simply to prevent implicit overwriting of immutable data. In cases where you have a buffer you are building, and especially in cases of D1 code porting to D2, const and immutable would not exist.

I think this would "fix" the problem Sociomantic has, and not break any existing code that wasn't already broken.

The single largest drawback that I can think of is that the behavior would be different in a very small number of cases. Are those cases valid? Is this enough to reject the proposal?

Destroy.

-Steve

March 13, 2014

Re: Possible change to array runtime?

Posted by monarch_dodra
in reply to Steven Schveighoffer

Permalink

monarch_dodra

Posted in reply to Steven Schveighoffer

Permalink

On Thursday, 13 March 2014 at 15:24:01 UTC, Steven Schveighoffer wrote:
>
> Destroy.
>
> -Steve

Please keep in mind that if the objects stored are RAII, then if/when we will have a finalizing GC, the stomped elements will have been leaked.

Clobbering elements is more than just "I won't use these elements anymore", it's "I won't use them, and they are safe to be discarded of right now".

In know that's a big "if", but it could happen. If we go the way of your proposal, we are definitively closing that door.

--------

Semi-on topic, it would greatly help if assumeSafeAppend was nothrow. As of right now, there are a lot of places in phobos where it could be used, but isn't, because of this. In particular, assumeSafeAppend can be used to both shrink an array, *or* grow an array up-to capacity.

Its greater use in phobos would help give it more visibility and exposure.

March 13, 2014

Re: Possible change to array runtime?

Posted by Steven Schveighoffer
in reply to monarch_dodra

Permalink

Steven Schveighoffer

Posted in reply to monarch_dodra

Permalink

On Thu, 13 Mar 2014 11:53:15 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:

> On Thursday, 13 March 2014 at 15:24:01 UTC, Steven Schveighoffer wrote:
>>
>> Destroy.
>>
>> -Steve
>
> Please keep in mind that if the objects stored are RAII, then if/when we will have a finalizing GC, the stomped elements will have been leaked.
>
> Clobbering elements is more than just "I won't use these elements anymore", it's "I won't use them, and they are safe to be discarded of right now".
>
> In know that's a big "if", but it could happen. If we go the way of your proposal, we are definitively closing that door.

I'm not understanding this. Can you explain further/give example?

> Semi-on topic, it would greatly help if assumeSafeAppend was nothrow. As of right now, there are a lot of places in phobos where it could be used, but isn't, because of this. In particular, assumeSafeAppend can be used to both shrink an array, *or* grow an array up-to capacity.
>
> Its greater use in phobos would help give it more visibility and exposure.

https://github.com/D-Programming-Language/druntime/pull/147

reserve and capacity were made nothrow, not sure why assumeSafeAppend shouldn't also be.

-Steve

March 13, 2014

Re: Possible change to array runtime?

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Permalink

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Permalink

On 3/13/14, 8:24 AM, Steven Schveighoffer wrote:
> Proposal: iff you are setting the length of an array via the length
> property AND you are setting the length to 0 AND the array starts at the
> beginning of the block AND the array is mutable (not const or
> immutable), THEN assumeSafeAppend is called on that array AUTOMATICALLY.

I think this is a sensible proposal, and well argued. However it may break code (even code that arguably wasn't really nicely thought through), and in subtle ways. "We upgraded the compiler, and our 50KLOC app now produces invalid results." Very hard to get working on that, in particular since it's unclear whether this particular matter was the actual culprit.

I think there's a much simpler approach that could help. Define and document a function clearForAppend that resets the length to 0 and puts the array in gear for appending. Maybe an optional parameter indicates a minimum desired capacity.

Then people who want to migrate from D1 to D2 or generally consider using this idiom can grep their source code for "length *= *0" and make the change on their own pace and with appropriate testing.

Andrei

March 13, 2014

Re: Possible change to array runtime?

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Permalink

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

Permalink

On Thu, 13 Mar 2014 12:29:59 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 3/13/14, 8:24 AM, Steven Schveighoffer wrote:
>> Proposal: iff you are setting the length of an array via the length
>> property AND you are setting the length to 0 AND the array starts at the
>> beginning of the block AND the array is mutable (not const or
>> immutable), THEN assumeSafeAppend is called on that array AUTOMATICALLY.
>
> I think this is a sensible proposal, and well argued. However it may break code (even code that arguably wasn't really nicely thought through), and in subtle ways. "We upgraded the compiler, and our 50KLOC app now produces invalid results." Very hard to get working on that, in particular since it's unclear whether this particular matter was the actual culprit.

A good point that the error that would occur, if you depended on setting length = 0 to not overwrite your mutable array, would be corruption, and that is not a good result. Even if your code was foolish to begin with.

As a mechanism to check this, we could throw an exception when appending to an array that has been set to length = 0, but not had assumeSafeAppend called on it. This would obviously just be a bug-finding mechanism, and not for production.

>
> I think there's a much simpler approach that could help. Define and document a function clearForAppend that resets the length to 0 and puts the array in gear for appending. Maybe an optional parameter indicates a minimum desired capacity.
>
> Then people who want to migrate from D1 to D2 or generally consider using this idiom can grep their source code for "length *= *0" and make the change on their own pace and with appropriate testing.

Arguably this is already possible within a project, such as Sociomantic's. They haven't done it yet. I don't know why.

-Steve

March 13, 2014

Re: Possible change to array runtime?

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Permalink

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Permalink

On 3/13/14, 10:18 AM, Steven Schveighoffer wrote:
> On Thu, 13 Mar 2014 12:29:59 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> I think there's a much simpler approach that could help. Define and
>> document a function clearForAppend that resets the length to 0 and
>> puts the array in gear for appending. Maybe an optional parameter
>> indicates a minimum desired capacity.
>>
>> Then people who want to migrate from D1 to D2 or generally consider
>> using this idiom can grep their source code for "length *= *0" and
>> make the change on their own pace and with appropriate testing.
>
> Arguably this is already possible within a project, such as
> Sociomantic's. They haven't done it yet. I don't know why.

Don?

Andrei

March 13, 2014

Re: Possible change to array runtime?

Posted by monarch_dodra
in reply to Steven Schveighoffer

Permalink

monarch_dodra

Posted in reply to Steven Schveighoffer

Permalink

On Thursday, 13 March 2014 at 16:17:17 UTC, Steven Schveighoffer wrote:
> On Thu, 13 Mar 2014 11:53:15 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:
>> Please keep in mind that if the objects stored are RAII, then if/when we will have a finalizing GC, the stomped elements will have been leaked.
>>
>> Clobbering elements is more than just "I won't use these elements anymore", it's "I won't use them, and they are safe to be discarded of right now".
>>
>> In know that's a big "if", but it could happen. If we go the way of your proposal, we are definitively closing that door.
>
> I'm not understanding this. Can you explain further/give example?

Well, image "File": It's a struct that owns a "file handle", and when the struct is destroyed, it releases/closes the file. Basic RAII.

Now, imagine we want to open several files. We'd use:
File[] files;

"As of today", this does not end well, as the GC does not finalize the array elements, and the file handles are leaked. We could hope, that one day, the GC *will* finalize the array elements.

However, with your proposal, imagine this code:

File[] files;
files ~= File("foo"); //Opens file "foo"
files.length = 0;
files ~= File("bar"); //Clobbers "foo"

With this setup, the File handling "foo" gets clobbered, ruining any chance of releasing it ever.

The "only way" to make it work (AFAIK), would be for "length = 0" to first finalize the elements in the array. However, you do that, you may accidentally destroy elements that are still "live" and referenced by another array.

The same example would work with RefCounted or whatever.

I'm not too hot about this proposal. My main gripe is that while "length = 0" *may* mean "*I* want to discard this data", there is no guarantee you don't have someone else that has a handle on said data, and sure as hell doesn't want it clobbered.

For what it's worth, I think the problem would go away all by itself if "assumeSafeAppend" had more exposition, and was actually used. I think a simpler solution would be to simply educate users of this function, and promote its use. Its simpler than adding a special case language change.

> https://github.com/D-Programming-Language/druntime/pull/147
>
> reserve and capacity were made nothrow, not sure why assumeSafeAppend shouldn't also be.
>
> -Steve

The irony is that reserve can't actually be tagged pure nor nothrow: Since it can cause relocation, it can call postblit, which in turn may actually be impure or throw.

assumeSafeAppend, on the other hand, is *certifiably* pure and nothrow, since it never actually touches any data.

I had opened a pull to fix this, but it was never merged, due to the back and forth "fiasco" of tagging said reserve. Since I'm not fluent in D-runtime, I just let the issue drop.

March 13, 2014

Re: Possible change to array runtime?

Posted by Steven Schveighoffer
in reply to monarch_dodra

Permalink

Steven Schveighoffer

Posted in reply to monarch_dodra

Permalink

On Thu, 13 Mar 2014 13:44:01 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:

> On Thursday, 13 March 2014 at 16:17:17 UTC, Steven Schveighoffer wrote:
>> On Thu, 13 Mar 2014 11:53:15 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:
>>> Please keep in mind that if the objects stored are RAII, then if/when we will have a finalizing GC, the stomped elements will have been leaked.
>>>
>>> Clobbering elements is more than just "I won't use these elements anymore", it's "I won't use them, and they are safe to be discarded of right now".
>>>
>>> In know that's a big "if", but it could happen. If we go the way of your proposal, we are definitively closing that door.
>>
>> I'm not understanding this. Can you explain further/give example?
>
> Well, image "File": It's a struct that owns a "file handle", and when the struct is destroyed, it releases/closes the file. Basic RAII.
>
> Now, imagine we want to open several files. We'd use:
> File[] files;
>
> "As of today", this does not end well, as the GC does not finalize the array elements, and the file handles are leaked. We could hope, that one day, the GC *will* finalize the array elements.
>
> However, with your proposal, imagine this code:
>
> File[] files;
> files ~= File("foo"); //Opens file "foo"
> files.length = 0;
> files ~= File("bar"); //Clobbers "foo"
>
> With this setup, the File handling "foo" gets clobbered, ruining any chance of releasing it ever.

Line 3 should be files = null. There is no point to setting length to 0, and mostly this is a relic from D1 code. That was my basis of why it shouldn't cause problems.

> The "only way" to make it work (AFAIK), would be for "length = 0" to first finalize the elements in the array. However, you do that, you may accidentally destroy elements that are still "live" and referenced by another array.

In fact, assumeSafeAppend *should* finalize the elements in the array, if it had that capability. When you call assumeSafeAppend, you are telling the runtime that you are done with the extra elements.

> I'm not too hot about this proposal. My main gripe is that while "length = 0" *may* mean "*I* want to discard this data", there is no guarantee you don't have someone else that has a handle on said data, and sure as hell doesn't want it clobbered.

Again, the =null is a better solution. There is no point to keeping the same block in reference, but without any access to elements, unless you want to overwrite it.

The problem is that we have this new mechanism that keeps those intact. If I could have imagined this outcome and was aware of this logic, I would have kept the length = 0 mechanics from D1 to begin with.

> For what it's worth, I think the problem would go away all by itself if "assumeSafeAppend" had more exposition, and was actually used. I think a simpler solution would be to simply educate users of this function, and promote its use. Its simpler than adding a special case language change.

The issue I'm examining is that people are reluctant to move off of D1, because their D1 code behaves well when they do length = 0, and it behaves badly in D2, even though it compiles and works correctly. They do not have assumeSafeAppend in D1. I'm unsure why they have not used it, either out of ignorance of the function or they have decided it's not worth the effort, I have no idea.

>> https://github.com/D-Programming-Language/druntime/pull/147
>>
>> reserve and capacity were made nothrow, not sure why assumeSafeAppend shouldn't also be.
>>
>
> The irony is that reserve can't actually be tagged pure nor nothrow: Since it can cause relocation, it can call postblit, which in turn may actually be impure or throw.
>
> assumeSafeAppend, on the other hand, is *certifiably* pure and nothrow, since it never actually touches any data.

It does touch data, but it should be pure and nothrow.

> I had opened a pull to fix this, but it was never merged, due to the back and forth "fiasco" of tagging said reserve. Since I'm not fluent in D-runtime, I just let the issue drop.

Which PR?

-Steve

March 13, 2014

Re: Possible change to array runtime?

Posted by monarch_dodra
in reply to Steven Schveighoffer

Permalink

monarch_dodra

Posted in reply to Steven Schveighoffer

Permalink

On Thursday, 13 March 2014 at 18:09:55 UTC, Steven Schveighoffer wrote:
>> I had opened a pull to fix this, but it was never merged, due to the back and forth "fiasco" of tagging said reserve. Since I'm not fluent in D-runtime, I just let the issue drop.
>
> Which PR?
>
> -Steve

This first one was:
https://github.com/D-Programming-Language/druntime/pull/553
It "simply" marked it as nothrow, pure, but was turned down, as it called "throwing" code.

The second on is:
https://github.com/D-Programming-Language/druntime/pull/632
It shuffles code around to "prove" the call is actually nothrow, but IMO, is not worth it.

I believe in the first pull more than the second, which I believe is more "proof". If you are willing to review, I can re-open the first.

March 13, 2014

Re: Possible change to array runtime?

Posted by monarch_dodra
in reply to Steven Schveighoffer

Permalink

monarch_dodra

Posted in reply to Steven Schveighoffer

Permalink

On Thursday, 13 March 2014 at 18:09:55 UTC, Steven Schveighoffer wrote:
> On Thu, 13 Mar 2014 13:44:01 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:
>
>> On Thursday, 13 March 2014 at 16:17:17 UTC, Steven Schveighoffer wrote:
>>> On Thu, 13 Mar 2014 11:53:15 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:
>>>> Please keep in mind that if the objects stored are RAII, then if/when we will have a finalizing GC, the stomped elements will have been leaked.
>>>>
>>>> Clobbering elements is more than just "I won't use these elements anymore", it's "I won't use them, and they are safe to be discarded of right now".
>>>>
>>>> In know that's a big "if", but it could happen. If we go the way of your proposal, we are definitively closing that door.
>>>
>>> I'm not understanding this. Can you explain further/give example?
>>
>> Well, image "File": It's a struct that owns a "file handle", and when the struct is destroyed, it releases/closes the file. Basic RAII.
>>
>> Now, imagine we want to open several files. We'd use:
>> File[] files;
>>
>> "As of today", this does not end well, as the GC does not finalize the array elements, and the file handles are leaked. We could hope, that one day, the GC *will* finalize the array elements.
>>
>> However, with your proposal, imagine this code:
>>
>> File[] files;
>> files ~= File("foo"); //Opens file "foo"
>> files.length = 0;
>> files ~= File("bar"); //Clobbers "foo"
>>
>> With this setup, the File handling "foo" gets clobbered, ruining any chance of releasing it ever.
>
> Line 3 should be files = null. There is no point to setting length to 0, and mostly this is a relic from D1 code. That was my basis of why it shouldn't cause problems.

well... "should"/"could". The problem (IMO) is taking perfectly valid code, and making a subtle and silent change to it, changing its behavior and potentially breaking it. It's really the most pernicious kind of change.

>> The "only way" to make it work (AFAIK), would be for "length = 0" to first finalize the elements in the array. However, you do that, you may accidentally destroy elements that are still "live" and referenced by another array.
>
> In fact, assumeSafeAppend *should* finalize the elements in the array, if it had that capability. When you call assumeSafeAppend, you are telling the runtime that you are done with the extra elements.

Good point. Very very good point. As a matter of fact, this could be implemented right now, couldn't it?

>> I'm not too hot about this proposal. My main gripe is that while "length = 0" *may* mean "*I* want to discard this data", there is no guarantee you don't have someone else that has a handle on said data, and sure as hell doesn't want it clobbered.
>
> Again, the =null is a better solution. There is no point to keeping the same block in reference, but without any access to elements, unless you want to overwrite it.
>
> The problem is that we have this new mechanism that keeps those intact. If I could have imagined this outcome and was aware of this logic, I would have kept the length = 0 mechanics from D1 to begin with.

I don't like the fact that this can only be implemented in the compiler. Because we *are* talking about "literal" 0, right? Run-time 0 wouldn't have this behavior?

By implementing this, then no user defined array wrapper would be able to emulate this " = 0" special behavior.

By that same token, I don't see why "0" would get such special treatment, when I'm certain you'd find just as many instance of "length = 1;", which means to say "keep the first element, and start clobering from there".

>> For what it's worth, I think the problem would go away all by itself if "assumeSafeAppend" had more exposition, and was actually used. I think a simpler solution would be to simply educate users of this function, and promote its use. Its simpler than adding a special case language change.
>
> The issue I'm examining is that people are reluctant to move off of D1, because their D1 code behaves well when they do length = 0, and it behaves badly in D2, even though it compiles and works correctly. They do not have assumeSafeAppend in D1. I'm unsure why they have not used it, either out of ignorance of the function or they have decided it's not worth the effort, I have no idea.

Well... "s/some people/sociomantic/". How hard would it be to add it to D1, to help the migration process? I know it's "closed", but there are always exceptions.

Honestly, I'm 100% fine with braking "compile-time" changes. Behavioral changes on the other hand...

>>> https://github.com/D-Programming-Language/druntime/pull/147
>>>
>>> reserve and capacity were made nothrow, not sure why assumeSafeAppend shouldn't also be.
>>>
>>
>> The irony is that reserve can't actually be tagged pure nor nothrow: Since it can cause relocation, it can call postblit, which in turn may actually be impure or throw.
>>
>> assumeSafeAppend, on the other hand, is *certifiably* pure and nothrow, since it never actually touches any data.
>
> It does touch data, but it should be pure and nothrow.

Right. *BUT*, it is not safe. This means that any code that uses "length = 0;" becomes unsafe.

At least I *think* it's unsafe, there is always a think line between "get my hands dirty under the hood" and "memory safety".

I'm pretty sure that with creative use of assumeSafeAppend, you could do illegal memory access.

Top | Forum index | About this forum

Forums