Thread overview
[phobos] slice vs. append
Mar 22, 2010
David Simcha
March 22, 2010
Right now array append is reasonably fast due to Steve's great work. Basically the append operation caches the capacities of the last arrays that were appended to, which makes the capacity query fast most of the time.

I was thinking of a possible change. How about having the slice operator arr[a .. b] remove the array from the cache? That way we can handle arr.length = n differently:

(a) if n > arr.length, resize appropriately
(b) if n < arr.length AND the array is in the cache, keep the array in
the cache.

The change is at (b). If the array is in the cache and its length is made smaller, then we can be sure the capacity will stay correct after the resizing. This is because we know for sure there will be no stomping - if stomping were possible, the array would not be in the cache.

Could this work? And if it does, would it be a good change to make?


Andrei
March 22, 2010
I personally have written a lot of code that relies on the slice operator being extremely cheap, which is the whole point of the way D arrays are designed.  For example, using slicing and tail recursion instead of indices and looping is a very elegant, readable way of implementing binary search. I'm not sure we want to add any overhead here, even if it's only a few instructions.

On Mon, Mar 22, 2010 at 11:04 AM, Andrei Alexandrescu <andrei at erdani.com>wrote:

> Right now array append is reasonably fast due to Steve's great work. Basically the append operation caches the capacities of the last arrays that were appended to, which makes the capacity query fast most of the time.
>
> I was thinking of a possible change. How about having the slice operator arr[a .. b] remove the array from the cache? That way we can handle arr.length = n differently:
>
> (a) if n > arr.length, resize appropriately
> (b) if n < arr.length AND the array is in the cache, keep the array in the
> cache.
>
> The change is at (b). If the array is in the cache and its length is made smaller, then we can be sure the capacity will stay correct after the resizing. This is because we know for sure there will be no stomping - if stomping were possible, the array would not be in the cache.
>
> Could this work? And if it does, would it be a good change to make?
>
>
> Andrei
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100322/0d16d80b/attachment-0001.html>
March 22, 2010
I don't think you understand the purpose of the cache.  It does not store arrays, it stores BlkInfo's.  This is information from the GC about where a memory block starts, its length, and what flags it was created with.  Appending in-place to an array does not change the cached blockinfo.

First, setting length to be smaller already does what you propose, except it does not look anything up in the cache.

Second, slicing is a way way more common operation than setting the length, and you would be changing this common operation that does 2 reads and 2 writes to a function call with a cache search.  Any gain you think you get from removing the block from the cache would be overshadowed by the bloating of the slice operation itself.

Maybe I'm not understanding your request properly.

-Steve



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> 
> Right now array append is reasonably fast due to Steve's great work. Basically the append operation caches the capacities of the last arrays that were appended to, which makes the capacity query fast most of the time.

I was thinking
> of a possible change. How about having the slice operator arr[a .. b] remove the array from the cache? That way we can handle arr.length = n differently:

(a) if n > arr.length, resize appropriately
(b) if n
> < arr.length AND the array is in the cache, keep the array in the cache.

The change is at (b). If the array is in the cache and its length
> is made smaller, then we can be sure the capacity will stay correct after the resizing. This is because we know for sure there will be no stomping - if stomping were possible, the array would not be in the cache.

Could this
> work? And if it does, would it be a good change to make?


Andrei



March 22, 2010
On 03/22/2010 10:16 AM, Steve Schveighoffer wrote:
> I don't think you understand the purpose of the cache.  It does not store arrays, it stores BlkInfo's.  This is information from the GC about where a memory block starts, its length, and what flags it was created with.  Appending in-place to an array does not change the cached blockinfo.

Yes, but at the end of the day, you do have a primitive that tells you whether a given array extends to the end of a given block and consequently is ok to append to in-place. Correct?

> First, setting length to be smaller already does what you propose, except it does not look anything up in the cache.

You told me that for best results I need to do:

arr.length = 0;
arr.assumeSafeAppend();

The last call makes the system assume that I saved no subarray of arr that I'm interested in keeping. No?

> Second, slicing is a way way more common operation than setting the length, and you would be changing this common operation that does 2 reads and 2 writes to a function call with a cache search.  Any gain you think you get from removing the block from the cache would be overshadowed by the bloating of the slice operation itself.

I'd guess you and David are right.


Andrei
March 22, 2010
On 03/22/2010 10:38 AM, Steve Schveighoffer wrote:
> ----- Original Message ----
>> From: Andrei Alexandrescu<andrei at erdani.com> Yes, but at the end of
>> the day, you do have a primitive that tells you whether a given
>> array extends to the end of a given block and consequently is ok to
>> append to in-place. Correct?
>
> The cache is not that.  That primitive is determined by the GC block info.  Essentially, with the blockinfo, I can tell where an array ends, and therefore know whether it can be appended in place.  This is true regardless of whether the blockinfo is in the cache or not.
>
> The cache simply allows me to skip the expensive blockinfo lookup. Essentially, by removing a blockinfo from the cache, it doesn't prevent any behavior, it just makes it slower.

OK.

>> You told me that for best results I need to do:
>>
>> arr.length = 0; arr.assumeSafeAppend();
>>
>> The last call makes the system assume that I saved no subarray of arr that I'm interested in keeping. No?
>
> Yes.  I think I see where you were going with the request.  You want to remove that second line?  It simply does not work, because an array isn't marked differently because it was sliced, and the runtime cannot tell.
>
> What about code like this:
>
> auto arr2 = arr; arr2.length = 0;

Yeah, it looks like array assignment also needs to be hooked.

> Now, if you assume because the user used length instead of slicing, it's ok to append, what happens to arr?  That is why we need the assume call.  To get what you want, any time an array is copied, even via function calls, you'd have to kill appending to it.  I don't think that it's worth the performance loss on the common operations in order to save a line of code in an uncommon situation.

I agree.


Andrei