Thread overview  


July 01, 2010 [Issue 4412] New: Array capacity growth spikey and the ratio approaches 1.0  

 
http://d.puremagic.com/issues/show_bug.cgi?id=4412 Summary: Array capacity growth spikey and the ratio approaches 1.0 Product: D Version: 2.032 Platform: Other OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: druntime AssignedTo: sean@invisibleduck.org ReportedBy: acehreli@yahoo.com  Comment #0 from Ali Cehreli <acehreli@yahoo.com> 20100701 10:33:06 PDT  There are spikes in the array capacity growth. Additionally, the capacity/length ratio approaches 1.0; as a result, the complexity of the append operation would not be 'amortized constant' for large length. import std.stdio; import std.array; void main() { int[] a; size_t oldCapacity; foreach (i; 0 .. 100_000_000) { a ~= i; if (a.capacity != oldCapacity) { writeln("length: ", a.length, " capacity: ", capacity(a), " ratio: ", cast(double)capacity(a) / oldCapacity); oldCapacity = capacity(a); } } } Produces: length: 1 capacity: 3 ratio: inf please ignore this one ;) length: 4 capacity: 7 ratio: 2.33333 length: 8 capacity: 15 ratio: 2.14286 length: 16 capacity: 31 ratio: 2.06667 length: 32 capacity: 63 ratio: 2.03226 length: 64 capacity: 127 ratio: 2.01587 length: 128 capacity: 255 ratio: 2.00787 length: 256 capacity: 509 ratio: 1.99608 length: 512 capacity: 1021 ratio: 2.00589 length: 1022 capacity: 2045 ratio: 2.00294 length: 2046 capacity: 3069 ratio: 1.50073 length: 3070 capacity: 4093 ratio: 1.33366 length: 4094 capacity: 5117 ratio: 1.25018 ... length: 1153022 capacity: 1154045 ratio: 1.00089 length: 1154046 capacity: 1155069 ratio: 1.00089 length: 1155070 capacity: 3153917 ratio: 2.7305 < spike length: 3153918 capacity: 3154941 ratio: 1.00032 length: 3154942 capacity: 3155965 ratio: 1.00032 ... length: 4741118 capacity: 4742141 ratio: 1.00022 length: 4742142 capacity: 4743165 ratio: 1.00022 length: 4743166 capacity: 12333053 ratio: 2.60017 < spike length: 12333054 capacity: 12334077 ratio: 1.00008 length: 12334078 capacity: 12335101 ratio: 1.00008 ... (there are more spikes later on)  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 01, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412 Ali Cehreli <acehreli@yahoo.com> changed: What Removed Added  CC acehreli@yahoo.com  Comment #1 from Ali Cehreli <acehreli@yahoo.com> 20100701 10:34:39 PDT  Forgot to mention that the compiler is dmd 2.047  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 01, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412 Steven Schveighoffer <schveiguy@yahoo.com> changed: What Removed Added  StatusNEW ASSIGNED CC sean@invisibleduck.org Version2.032 D2 AssignedTosean@invisibleduck.org schveiguy@yahoo.com  Comment #2 from Steven Schveighoffer <schveiguy@yahoo.com> 20100701 11:25:43 PDT  I'll look into it.  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 02, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412 Ali Cehreli <acehreli@yahoo.com> changed: What Removed Added  URL http://www.mailarchive.com  /digitalmarsd@puremagic.co  m/msg33113.html  Comment #3 from Ali Cehreli <acehreli@yahoo.com> 20100702 09:16:11 PDT  The following is an excerpt from the newsgroup thread that's in the URL field. > Steven Schveighoffer wrote: > > newCapacity isn't called when setting the length explicitly, because > > it's not considered to be an append. > > You are right. > > Looks like I was successful in locating the culprit. :) I took liberty to write ++a.length and also outputted the number of elements that are in reserve at each allocation: > > import std.stdio; > import std.array; > > void main() > { > int[] a; > size_t oldCapacity; > > foreach (i; 0 .. 100_000_000) { > ++a.length; > a[$1] = i; > > if (capacity(a) != oldCapacity) { > writeln("length: ", a.length, > " capacity: ", capacity(a), > " ratio: ", cast(double)capacity(a) / oldCapacity, > " reserve: ", capacity(a)  a.length); > oldCapacity = capacity(a); > } > } > } > > Now the spikes are gone, and the allocations asymptote at onceper1024 elements for an int array: > > length: 1 capacity: 3 ratio: inf reserve: 2 > length: 4 capacity: 7 ratio: 2.33333 reserve: 3 > length: 8 capacity: 15 ratio: 2.14286 reserve: 7 > length: 16 capacity: 31 ratio: 2.06667 reserve: 15 > length: 32 capacity: 63 ratio: 2.03226 reserve: 31 > length: 64 capacity: 127 ratio: 2.01587 reserve: 63 > length: 128 capacity: 255 ratio: 2.00787 reserve: 127 > length: 256 capacity: 509 ratio: 1.99608 reserve: 253 > length: 512 capacity: 1021 ratio: 2.00589 reserve: 509 > length: 1022 capacity: 2045 ratio: 2.00294 reserve: 1023 > length: 2046 capacity: 3069 ratio: 1.50073 reserve: 1023 > length: 3070 capacity: 4093 ratio: 1.33366 reserve: 1023 > ... > > In the long run, there is one allocation per 1024 elements. That still feels like amortized constant, but considering that relocation times would be O(N), I think reserve should still grow with N, but maybe not too much. (I am not sure about this last point at all.) > 4x1024 = 4096 == page size. The newCapacity function is supposed to reduce the effect of this phenomenon. i.e., when appending, it's assumed you are going to continue appending, so to reduce the overhead, the amount added to the memory block is supposed to be related to the total number of bytes already allocated. It doesn't seem like the function is working very well... It could have been something I did, or it could have always been broken :) I honestly did not examine the code at all, I just only ever read the comments, assuming it works. Please add these comments to the bug report, it will help when I look at it. I don't have a lot of time to look at it right now, so I'm afraid I'll forget what we were doing later :) Steve  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 02, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412 bearophile_hugs@eml.cc changed: What Removed Added  CC bearophile_hugs@eml.cc  Comment #4 from bearophile_hugs@eml.cc 20100702 09:31:02 PDT  This is a simplified version of that code: import std.stdio: writeln; void main() { int[] array; size_t oldCapacity; foreach (i; 0 .. 20_000) { array ~= i; size_t newCapacity = array.capacity(); if (newCapacity != oldCapacity) { writeln(cast(double)newCapacity / oldCapacity); oldCapacity = newCapacity; } } } It prints: inf 2.33333 2.14286 2.06667 2.03226 2.01587 2.00787 1.99608 2.00589 2.00294 1.50073 1.33366 1.25018 1.20012 1.16675 1.14292 1.12505 1.11115 1.10003 1.09093 1.08335 1.07694 1.07144 1.06668 1.06251 1.05883 1.05556 1.05264 This output shows that there is a bug: to allow an O(1) amortized append the size of the array has to grow geometrically. So I suggest to use a geometric growth factor of 2 until the memory block is "large enough", something like 200MB or 300MB. And after such absolute memory limit then use a smaller geometric growth factor, like 1.3, to avoid wasting too much memory.  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 02, 2010 Re: [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to bearophile_hugs@eml.cc  dbugmail@puremagic.com wrote:
> http://d.puremagic.com/issues/show_bug.cgi?id=4412
>
>
> bearophile_hugs@eml.cc changed:
>
> What Removed Added
> 
> CC bearophile_hugs@eml.cc
>
>
>  Comment #4 from bearophile_hugs@eml.cc 20100702 09:31:02 PDT 
> This is a simplified version of that code:
>
>
> import std.stdio: writeln;
> void main() {
> int[] array;
> size_t oldCapacity;
>
> foreach (i; 0 .. 20_000) {
> array ~= i;
> size_t newCapacity = array.capacity();
> if (newCapacity != oldCapacity) {
> writeln(cast(double)newCapacity / oldCapacity);
> oldCapacity = newCapacity;
> }
> }
> }
>
>
> It prints:
>
> inf
> 2.33333
> 2.14286
> 2.06667
> 2.03226
> 2.01587
> 2.00787
> 1.99608
> 2.00589
> 2.00294
> 1.50073
> 1.33366
> 1.25018
> 1.20012
> 1.16675
> 1.14292
> 1.12505
> 1.11115
> 1.10003
> 1.09093
> 1.08335
> 1.07694
> 1.07144
> 1.06668
> 1.06251
> 1.05883
> 1.05556
> 1.05264
>
> This output shows that there is a bug: to allow an O(1) amortized append the
> size of the array has to grow geometrically.
...when it is actually being move. Per my previous message  you may want to evaluate and print the ratio only when the array is effectively relocated.
Andrei

July 02, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412  Comment #5 from bearophile_hugs@eml.cc 20100702 09:48:24 PDT  This is a modified program, after a suggestion by Andrei: import std.stdio: writeln; void main() { int[] array; int* old_ptr; foreach (i; 0 .. 100_000_000) { array ~= i; int* new_ptr = array.ptr; if (array.ptr != old_ptr) { old_ptr = array.ptr; writeln(cast(double)array.capacity() / array.length); } } } It outputs: 3 1.75 1.875 1.9375 1.96875 1.98438 1.99219 1.98828 1.99414 3.00001 2.7305 2.60017 2.48002 2.37 It shows that when relocation happens the capacity() grows significantly. But that grow factor looks quite large.  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 02, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412  Comment #6 from Steven Schveighoffer <schveiguy@yahoo.com> 20100702 10:30:19 PDT  OK, here's the deal: When the current block can be extended into the next page, it is. This does not take into account any scaling factors. That is, if you ask for one more byte, and a page can be lumped onto the end, then only 1 page is added. This explains the behavior most of the time. If a page cannot be added, then it uses the newCapacity function. However, I think this function may return too high a value. Essentially, the function looks like this: 100 + (1000L * size) / log2plus1(newcap); Where size is the size of a single element in bytes, newcap is the size of the *total* memory being requested (old size + appended size) in bytes, and log2plus1 essentially gets 1 + highest bit set. The result is a percentage of the total size requested to use. Where I think the problem is using size in this equation. I don't really understand why size has to be taken into account since newcap already takes size into account. What happens is, let's say you are doing an array of bytes, and you are requesting 100 bytes. size is 1, newcap is 100. log2plus1(100) is 7. So the resulting formula looks like: 100 + (1000L * 1) / 7 => 242, meaning 242% of the original size requested. Now, if we use the integer, which is 4 bytes, as our array type, the formula now has 400 as newcap and size is 4. Resulting in: 100 + (1000L * 4) / 9 => 544, meaning 544% of the original size requested. I'm not sure why larger array element types should need more relative space, it doesn't make a lot of sense, but not much is explained in the comment on why the formula is the way it is. I'm unsure how to change this, any ideas? For one, I think we can use the newCapacity function always, even when appending pages (I think it will just append as many pages as it can up to the requested size).  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 14, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412 Fawzi Mohamed <fawzi@gmx.ch> changed: What Removed Added  CC fawzi@gmx.ch  Comment #7 from Fawzi Mohamed <fawzi@gmx.ch> 20100714 05:29:31 PDT  When this issue was discovered in tango I invested some time thinking about it. What I came out with is http://github.com/fawzi/blip/blob/master/blip/util/Grow.d a version of which I had also used in tango. Looking again at it I think that I decided to grow based on memory usage, and give "extra" space to arrays with large elements just using rounding up. Particularly with 32bit one should be aware that some calculation in extreme cases could overflow. I release the growing code with whatever license needed.  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
July 14, 2010 [Issue 4412] Array capacity growth spikey and the ratio approaches 1.0  

 
Posted in reply to Ali Cehreli  http://d.puremagic.com/issues/show_bug.cgi?id=4412 Steven Schveighoffer <schveiguy@yahoo.com> changed: What Removed Added  StatusASSIGNED RESOLVED Resolution FIXED  Comment #8 from Steven Schveighoffer <schveiguy@yahoo.com> 20100714 13:36:01 PDT  Fixed in changeset http://www.dsource.org/projects/druntime/changeset/332 This should grow at a more normal rate (logarigthmic in relation to the size of the array) There will be short growths, that's when just appending free pages can satisfy the append vs. having to commit more pages from the OS. More tweaking may get this to be a better curve, but I'm pretty happy with it now. Your test program now looks like this: length: 1 capacity: 3 ratio: inf length: 4 capacity: 7 ratio: 2.33333 length: 8 capacity: 15 ratio: 2.14286 length: 16 capacity: 31 ratio: 2.06667 length: 32 capacity: 63 ratio: 2.03226 length: 64 capacity: 127 ratio: 2.01587 length: 128 capacity: 255 ratio: 2.00787 length: 256 capacity: 511 ratio: 2.00392 length: 512 capacity: 1019 ratio: 1.99413 length: 1020 capacity: 2043 ratio: 2.00491 length: 2044 capacity: 4091 ratio: 2.00245 length: 4092 capacity: 7163 ratio: 1.75092 length: 7164 capacity: 8187 ratio: 1.14296 length: 8188 capacity: 9211 ratio: 1.12508 length: 9212 capacity: 15355 ratio: 1.66703 length: 15356 capacity: 24571 ratio: 1.6002 length: 24572 capacity: 25595 ratio: 1.04168 length: 25596 capacity: 40955 ratio: 1.60012 length: 40956 capacity: 41979 ratio: 1.025 length: 41980 capacity: 65531 ratio: 1.56104 length: 65532 capacity: 73723 ratio: 1.12501 length: 73724 capacity: 74747 ratio: 1.01389 length: 74748 capacity: 113659 ratio: 1.52058 length: 113660 capacity: 122875 ratio: 1.08108 length: 122876 capacity: 123899 ratio: 1.00833 length: 123900 capacity: 188411 ratio: 1.52068 length: 188412 capacity: 282619 ratio: 1.50001 length: 282620 capacity: 294907 ratio: 1.04348 length: 294908 capacity: 295931 ratio: 1.00347 length: 295932 capacity: 435195 ratio: 1.4706 length: 435196 capacity: 442363 ratio: 1.01647 length: 442364 capacity: 651259 ratio: 1.47223 length: 651260 capacity: 655355 ratio: 1.00629 length: 655356 capacity: 950267 ratio: 1.45 length: 950268 capacity: 951291 ratio: 1.00108 length: 951292 capacity: 1380347 ratio: 1.45102 length: 1380348 capacity: 1392635 ratio: 1.0089 ...  Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email  You are receiving this mail because:  
« First ‹ Prev 1 2 

Copyright © 19992017 by the D Language Foundation