Thread overview
log(n) amortized growth for containers
Dec 12, 2015
Timon Gehr
Dec 12, 2015
Marco Nembrini
Dec 12, 2015
deadalnix
Dec 13, 2015
Jimmy Cao
Dec 13, 2015
Jimmy Cao
December 10, 2015
Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and Randomized Slide to Front.

Here's a quick thought on growth schedule for arrays. The classic approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 16, etc. That way, n memory moves are necessary for achieving a capacity equal to n, so on average we spend a constant amount of time per appended element.

The main disadvantage with this growth schedule is there are on average n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the memory allocator (long discussion) so a smaller one (such as the golden cut or less) would be better.

Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The latter is considered scalable all right and the hope is to get a corresponding reduction in the slack memory space.

To figure a growth schedule that leads to log(n) work per element, a differential equation must be solved:

s(n) / s'(n) = log(x)

where s(n) will be the (continuous extension of the) growth schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how many elements we moved around, and s'(n) is the capacity after n steps. The division gives us how much work was done per element.

To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work per element.

So I entered the equation above in Wolfram:

https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29

So the growth schedule is exponential of the LogIntegral(n). Let's write it as:

f(n) = 2^^li(n)

Not too nice, but easy to tabulate for the relevant ranges. Let's take a look at the slack space at capacity f(n):

slack(f(n)) = f(n) - f(n - 1)

Sadly the expression doesn't seem to simplify and I couldn't find a good online plotter that can plot 2^^li(x). Here's what Mathematica free shows:

https://www.wolframalpha.com/input/?i=2%5Eli%28x%29+-+2%5Eli%28x+-+1%29

Compare with:

https://www.wolframalpha.com/input/?i=2%5Ex+-+2%5E%28x+-+1%29

The slack elements seem to grow considerably slower, which is great.

Does anyone know of a plotter that supports logarithmic integral? Of course better ideas about all of the above are welcome!


Thanks,

Andrei
December 10, 2015
On 12/10/15 10:55 PM, Andrei Alexandrescu wrote:
>
> So the growth schedule is exponential of the LogIntegral(n). Let's write
> it as:
>
> f(n) = 2^^li(n)

I was wrong here, that's the number of total moves. The growth schedule is proportional to the derivative of that:

f(n) = 2^^li(n)/log(n)


Andrei

December 12, 2015
On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
>
> Here's a quick thought on growth schedule for arrays. The classic
> approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
> 16, etc. That way, n memory moves are necessary for achieving a capacity
> equal to n, so on average we spend a constant amount of time per
> appended element.
>
> The main disadvantage with this growth schedule is there are on average
> n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
> memory allocator (long discussion) so a smaller one (such as the golden
> cut or less) would be better.
>
> Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
> latter is considered scalable all right and the hope is to get a
> corresponding reduction in the slack memory space. ...


E.g. the following sequence of capacities leads to amortized Θ(log n).

c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.

December 11, 2015
On 12/11/15 9:02 PM, Timon Gehr wrote:
> On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
>>
>> Here's a quick thought on growth schedule for arrays. The classic
>> approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
>> 16, etc. That way, n memory moves are necessary for achieving a capacity
>> equal to n, so on average we spend a constant amount of time per
>> appended element.
>>
>> The main disadvantage with this growth schedule is there are on average
>> n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
>> memory allocator (long discussion) so a smaller one (such as the golden
>> cut or less) would be better.
>>
>> Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
>> latter is considered scalable all right and the hope is to get a
>> corresponding reduction in the slack memory space. ...
>
>
> E.g. the following sequence of capacities leads to amortized Θ(log n).
>
> c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.

How do you prove that? -- Andrei

December 12, 2015
I've been using the same mechanism as jemalloc in SDC's runtime and it bucket basically by keeping 2 bits of precision + shift. It goes as :

4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40, 48, 56, 64, ...

It work quite well in practice.

December 12, 2015
On Saturday, 12 December 2015 at 02:12:13 UTC, Andrei Alexandrescu wrote:
> On 12/11/15 9:02 PM, Timon Gehr wrote:
>> On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
>>>
>>> Here's a quick thought on growth schedule for arrays. The classic
>>> approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
>>> 16, etc. That way, n memory moves are necessary for achieving a capacity
>>> equal to n, so on average we spend a constant amount of time per
>>> appended element.
>>>
>>> The main disadvantage with this growth schedule is there are on average
>>> n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
>>> memory allocator (long discussion) so a smaller one (such as the golden
>>> cut or less) would be better.
>>>
>>> Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
>>> latter is considered scalable all right and the hope is to get a
>>> corresponding reduction in the slack memory space. ...
>>
>>
>> E.g. the following sequence of capacities leads to amortized Θ(log n).
>>
>> c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
>
> How do you prove that? -- Andrei

Forget about going to continuous extensions, and substitute the derivative of s(n) with (s(n+1) - s(n)) / 1.
Plugging into s(n) / s'(n) = log(n) and solving for s(n+1) gives you a recursion for s:
s(n+1) = s(n) + s(n) / log(n).
Then you decide what you want for s(0) and fix small things like log(0) not being defined or s(n) / log(n) not always being a whole number.

You'll notice that Timon has log(c(k)) and not log(k) in his expression, which I think comes from some confusion in your original post between the number of elements n ( which is c(k)?) and the number of times you have to do move elements (k used by Timon)
December 13, 2015
On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
> Does anyone know of a plotter that supports logarithmic integral? Of course better ideas about all of the above are welcome!
>

Yes, my go-to for my university plotting needs is SageMath.

Here is the plot of the two functions, the red one is without the li(x).  The vertical axis scale is logarithmic.

https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage
December 13, 2015
On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
> To figure a growth schedule that leads to log(n) work per element, a differential equation must be solved:
>
> s(n) / s'(n) = log(x)
>
> where s(n) will be the (continuous extension of the) growth schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how many elements we moved around, and s'(n) is the capacity after n steps. The division gives us how much work was done per element.
>
> To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work per element.
>
> So I entered the equation above in Wolfram:
>
> https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29

I like this approach to amortized analysis.  I have a question, though,

the diff eq you have is

s(n) / s'(n) = log(x)

But the WolframAlpha query solves for effectively:

s(n) / s'(n) = log(n)

Why does x = n?

Shouldn't x be s'(n), so that s(n)/s'(n), how much work was done per element, is bounded by the number of elements appended (that is about the capacity s'(n))?  I thought number of elements != n, but rather something between s'(n) and s'(n-1)?

December 13, 2015
On 12/13/15 2:56 PM, Jimmy Cao wrote:
> On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
>> Does anyone know of a plotter that supports logarithmic integral? Of
>> course better ideas about all of the above are welcome!
>>
>
> Yes, my go-to for my university plotting needs is SageMath.
>
> Here is the plot of the two functions, the red one is without the
> li(x).  The vertical axis scale is logarithmic.
>
> https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage

Thanks, great!

Now, this is going to be difficult to evaluate in the real world. Microbenchmarks are unlikely to be useful - most just grow a vector up to wazoo, which works faster with a more aggressive growth schedule. A better test is in a real application where several arrays compete for data cache.


Andrei