| Thread overview | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 10, 2015 log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and Randomized Slide to Front. Here's a quick thought on growth schedule for arrays. The classic approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 16, etc. That way, n memory moves are necessary for achieving a capacity equal to n, so on average we spend a constant amount of time per appended element. The main disadvantage with this growth schedule is there are on average n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the memory allocator (long discussion) so a smaller one (such as the golden cut or less) would be better. Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The latter is considered scalable all right and the hope is to get a corresponding reduction in the slack memory space. To figure a growth schedule that leads to log(n) work per element, a differential equation must be solved: s(n) / s'(n) = log(x) where s(n) will be the (continuous extension of the) growth schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how many elements we moved around, and s'(n) is the capacity after n steps. The division gives us how much work was done per element. To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work per element. So I entered the equation above in Wolfram: https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29 So the growth schedule is exponential of the LogIntegral(n). Let's write it as: f(n) = 2^^li(n) Not too nice, but easy to tabulate for the relevant ranges. Let's take a look at the slack space at capacity f(n): slack(f(n)) = f(n) - f(n - 1) Sadly the expression doesn't seem to simplify and I couldn't find a good online plotter that can plot 2^^li(x). Here's what Mathematica free shows: https://www.wolframalpha.com/input/?i=2%5Eli%28x%29+-+2%5Eli%28x+-+1%29 Compare with: https://www.wolframalpha.com/input/?i=2%5Ex+-+2%5E%28x+-+1%29 The slack elements seem to grow considerably slower, which is great. Does anyone know of a plotter that supports logarithmic integral? Of course better ideas about all of the above are welcome! Thanks, Andrei | ||||
December 10, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 12/10/15 10:55 PM, Andrei Alexandrescu wrote:
>
> So the growth schedule is exponential of the LogIntegral(n). Let's write
> it as:
>
> f(n) = 2^^li(n)
I was wrong here, that's the number of total moves. The growth schedule is proportional to the derivative of that:
f(n) = 2^^li(n)/log(n)
Andrei
| |||
December 12, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote: > > Here's a quick thought on growth schedule for arrays. The classic > approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, > 16, etc. That way, n memory moves are necessary for achieving a capacity > equal to n, so on average we spend a constant amount of time per > appended element. > > The main disadvantage with this growth schedule is there are on average > n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the > memory allocator (long discussion) so a smaller one (such as the golden > cut or less) would be better. > > Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The > latter is considered scalable all right and the hope is to get a > corresponding reduction in the slack memory space. ... E.g. the following sequence of capacities leads to amortized Θ(log n). c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1. | |||
December 11, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On 12/11/15 9:02 PM, Timon Gehr wrote:
> On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
>>
>> Here's a quick thought on growth schedule for arrays. The classic
>> approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
>> 16, etc. That way, n memory moves are necessary for achieving a capacity
>> equal to n, so on average we spend a constant amount of time per
>> appended element.
>>
>> The main disadvantage with this growth schedule is there are on average
>> n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
>> memory allocator (long discussion) so a smaller one (such as the golden
>> cut or less) would be better.
>>
>> Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
>> latter is considered scalable all right and the hope is to get a
>> corresponding reduction in the slack memory space. ...
>
>
> E.g. the following sequence of capacities leads to amortized Θ(log n).
>
> c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
How do you prove that? -- Andrei
| |||
December 12, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | I've been using the same mechanism as jemalloc in SDC's runtime and it bucket basically by keeping 2 bits of precision + shift. It goes as : 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40, 48, 56, 64, ... It work quite well in practice. | |||
December 12, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Saturday, 12 December 2015 at 02:12:13 UTC, Andrei Alexandrescu wrote:
> On 12/11/15 9:02 PM, Timon Gehr wrote:
>> On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
>>>
>>> Here's a quick thought on growth schedule for arrays. The classic
>>> approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
>>> 16, etc. That way, n memory moves are necessary for achieving a capacity
>>> equal to n, so on average we spend a constant amount of time per
>>> appended element.
>>>
>>> The main disadvantage with this growth schedule is there are on average
>>> n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
>>> memory allocator (long discussion) so a smaller one (such as the golden
>>> cut or less) would be better.
>>>
>>> Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
>>> latter is considered scalable all right and the hope is to get a
>>> corresponding reduction in the slack memory space. ...
>>
>>
>> E.g. the following sequence of capacities leads to amortized Θ(log n).
>>
>> c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
>
> How do you prove that? -- Andrei
Forget about going to continuous extensions, and substitute the derivative of s(n) with (s(n+1) - s(n)) / 1.
Plugging into s(n) / s'(n) = log(n) and solving for s(n+1) gives you a recursion for s:
s(n+1) = s(n) + s(n) / log(n).
Then you decide what you want for s(0) and fix small things like log(0) not being defined or s(n) / log(n) not always being a whole number.
You'll notice that Timon has log(c(k)) and not log(k) in his expression, which I think comes from some confusion in your original post between the number of elements n ( which is c(k)?) and the number of times you have to do move elements (k used by Timon)
| |||
December 13, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote: > Does anyone know of a plotter that supports logarithmic integral? Of course better ideas about all of the above are welcome! > Yes, my go-to for my university plotting needs is SageMath. Here is the plot of the two functions, the red one is without the li(x). The vertical axis scale is logarithmic. https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage | |||
December 13, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
> To figure a growth schedule that leads to log(n) work per element, a differential equation must be solved:
>
> s(n) / s'(n) = log(x)
>
> where s(n) will be the (continuous extension of the) growth schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how many elements we moved around, and s'(n) is the capacity after n steps. The division gives us how much work was done per element.
>
> To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work per element.
>
> So I entered the equation above in Wolfram:
>
> https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29
I like this approach to amortized analysis. I have a question, though,
the diff eq you have is
s(n) / s'(n) = log(x)
But the WolframAlpha query solves for effectively:
s(n) / s'(n) = log(n)
Why does x = n?
Shouldn't x be s'(n), so that s(n)/s'(n), how much work was done per element, is bounded by the number of elements appended (that is about the capacity s'(n))? I thought number of elements != n, but rather something between s'(n) and s'(n-1)?
| |||
December 13, 2015 Re: log(n) amortized growth for containers | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jimmy Cao | On 12/13/15 2:56 PM, Jimmy Cao wrote:
> On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
>> Does anyone know of a plotter that supports logarithmic integral? Of
>> course better ideas about all of the above are welcome!
>>
>
> Yes, my go-to for my university plotting needs is SageMath.
>
> Here is the plot of the two functions, the red one is without the
> li(x). The vertical axis scale is logarithmic.
>
> https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage
Thanks, great!
Now, this is going to be difficult to evaluate in the real world. Microbenchmarks are unlikely to be useful - most just grow a vector up to wazoo, which works faster with a more aggressive growth schedule. A better test is in a real application where several arrays compete for data cache.
Andrei
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply