Thread overview
relative benefit of .reserve and .length
Apr 28, 2016
Jay Norwood
Apr 28, 2016
Basile B.
Apr 29, 2016
sigod
Apr 30, 2016
Jay Norwood
May 02, 2016
Eto Demerzel
April 28, 2016
I timed some code recently and found that .reserve made almost no improvement when appending.  It appears that the actual change to the length by the append had a very high overhead of something over 200 instructions executed, regardless if the .reserve was done.  This was a simple append to an integer array.

The only way I found to avoid this was to set the length outside the loop and update the array values by index.  That was on the order of 10x faster.



April 28, 2016
On Thursday, 28 April 2016 at 12:56:24 UTC, Jay Norwood wrote:
> I timed some code recently and found that .reserve made almost no improvement when appending.  It appears that the actual change to the length by the append had a very high overhead of something over 200 instructions executed, regardless if the .reserve was done.  This was a simple append to an integer array.
>
> The only way I found to avoid this was to set the length outside the loop and update the array values by index.  That was on the order of 10x faster.

Have you looked at the way .reserve is used in Appender ? In this struct reserving has a true impact. Exactly the opposite of what you've observed: if nothing is reserved in an appender then the Appender is not worth (unfortunately I have a benchmark for this but on another machine :/).

Out of an appender I believe .reserve can be used to force page creation if you know that several pages will be allocated.
For example for an ubyte[] when .length goes from 16 to 17 the memory block *really* allocated by realloc goes from 16 to 4096.
April 28, 2016
On 4/28/16 8:56 AM, Jay Norwood wrote:
> I timed some code recently and found that .reserve made almost no
> improvement when appending.  It appears that the actual change to the
> length by the append had a very high overhead of something over 200
> instructions executed, regardless if the .reserve was done.  This was a
> simple append to an integer array.

.reserve should make an improvement for large amount of appending, since you pre-allocate the data.

However, the operation to append is still quite slow, it involves calling a druntime function that cannot be inlined, and must do a bunch of operations to lookup the current defined length in the array. The way I look at it is a compromise between efficiency and convenience (the fact that you can simply append to any slice anywhere is liberating). In my experience, the appending operation was slower before my changes to the runtime (and .reserve was added at that time). What .reserve does is prevent the incremental allocation growth and copying the data from one block to another (not to mention less strain on the GC). It does not reduce the function calls or the lookup of metadata.

Let's say you are appending 100,000 integers to an array. At 50,000, it cannot extend any more, so it must allocate a new block. This means the GC must find a larger block (in addition to the ones it has already incrementally allocated to get to 50,000) to accommodate the data, and then copy all the data over. This is the operation that is saved with .reserve.

> The only way I found to avoid this was to set the length outside the
> loop and update the array values by index.  That was on the order of 10x
> faster.

This is ALWAYS going to be much faster, as setting an element is 2 instructions at the most. That vs. a runtime call is always going to win.

If you can do it this way, I'd recommend doing so. Array appending operation is for convenience, at a reasonable performance.

-Steve
April 28, 2016
On 4/28/16 9:09 AM, Basile B. wrote:

> Out of an appender I believe .reserve can be used to force page creation
> if you know that several pages will be allocated.
> For example for an ubyte[] when .length goes from 16 to 17 the memory
> block *really* allocated by realloc goes from 16 to 4096.

Hm... I don't think that's the behavior that the GC did, but maybe it's changed.

It should go in powers of 2 up to 4096. And there is extra data needed for determining the length.

So really, it's from 15 to 16 (may be less now that the GC calls dtors), and it should go from 16 bytes to 32 with normal append.

If you reserve, you can specify a higher starting point (e.g. 4096 if you wish).

-Steve
April 29, 2016
On Thursday, 28 April 2016 at 14:08:26 UTC, Steven Schveighoffer wrote:
> On 4/28/16 8:56 AM, Jay Norwood wrote:
>> [...]
>
> .reserve should make an improvement for large amount of appending, since you pre-allocate the data.
>
> [...]

How about `assumeSafeAppend`? Does it have any positive impact on performance?
April 30, 2016
On Friday, 29 April 2016 at 10:10:26 UTC, sigod wrote:
> How about `assumeSafeAppend`? Does it have any positive impact on performance?

assumeSafeAppend made it even slower ... about 20x instead of 10x worse than the indexed assign.  Release build, win32.


May 02, 2016
On 4/29/16 6:10 AM, sigod wrote:
> On Thursday, 28 April 2016 at 14:08:26 UTC, Steven Schveighoffer wrote:
>> On 4/28/16 8:56 AM, Jay Norwood wrote:
>>> [...]
>>
>> .reserve should make an improvement for large amount of appending,
>> since you pre-allocate the data.
>>
>> [...]
>
> How about `assumeSafeAppend`? Does it have any positive impact on
> performance?

I don't know why you would call it. assumeSafeAppend is an expensive no-op if you have appendable array.

-Steve
May 02, 2016
On Monday, 2 May 2016 at 14:47:01 UTC, Steven Schveighoffer wrote:
> On 4/29/16 6:10 AM, sigod wrote:
>> On Thursday, 28 April 2016 at 14:08:26 UTC, Steven Schveighoffer wrote:
>>> On 4/28/16 8:56 AM, Jay Norwood wrote:
>>>> [...]
>>>
>>> .reserve should make an improvement for large amount of appending,
>>> since you pre-allocate the data.
>>>
>>> [...]
>>
>> How about `assumeSafeAppend`? Does it have any positive impact on
>> performance?
>
> I don't know why you would call it. assumeSafeAppend is an expensive no-op if you have appendable array.
>
> -Steve

For example:

	auto invalid_tokens = uninitializedArray!(string[])(result.failure);
	invalid_tokens.length = 0;

	foreach (index, ref token_result; result.results) {
		if (token_result.error == "NotRegistered") {
			invalid_tokens.assumeSafeAppend() ~= tokens[index];
		}
		else ...
	}

	// use invalid_tokens

It would've been almost perfect code if `assumeSafeAppend` wasn't very costly. Now I need to declare separate length for `invalid_tokens`, so I can assign tokens by index. Which I don't like because arrays have `length` built into them.
May 02, 2016
On 5/2/16 5:57 PM, Eto Demerzel wrote:

> For example:
>
>      auto invalid_tokens = uninitializedArray!(string[])(result.failure);
>      invalid_tokens.length = 0;
>
>      foreach (index, ref token_result; result.results) {
>          if (token_result.error == "NotRegistered") {
>              invalid_tokens.assumeSafeAppend() ~= tokens[index];
>          }
>          else ...
>      }
>
>      // use invalid_tokens
>
> It would've been almost perfect code if `assumeSafeAppend` wasn't very
> costly. Now I need to declare separate length for `invalid_tokens`, so I
> can assign tokens by index. Which I don't like because arrays have
> `length` built into them.

Only call assumeSafeAppend when you *remove* data from the array end. When you call it on an array that is already appendable, it does nothing. Appendable means the array slice ends at the end of the actual data.

Better code:

invalid_tokens.length = 0;
invalid_tokens.assumeSafeAppend;

Without seeing your "else", I can't say where else it should be, if at all.

-Steve