Thread overview  


July 01, 2010 Spikes in array capacity  

 
There is something strange in the array capacity algorithm. import std.stdio; import std.array; void main() { int[] a; size_t oldCapacity; foreach (i; 0 .. 100_000_000) { a ~= i; if (a.capacity != oldCapacity) { writeln("length: ", a.length, " capacity: ", capacity(a), " ratio: ", cast(double)capacity(a) / oldCapacity); oldCapacity = capacity(a); } } } Produces: length: 1 capacity: 3 ratio: inf please ignore this one ;) length: 4 capacity: 7 ratio: 2.33333 length: 8 capacity: 15 ratio: 2.14286 length: 16 capacity: 31 ratio: 2.06667 length: 32 capacity: 63 ratio: 2.03226 length: 64 capacity: 127 ratio: 2.01587 length: 128 capacity: 255 ratio: 2.00787 length: 256 capacity: 509 ratio: 1.99608 length: 512 capacity: 1021 ratio: 2.00589 length: 1022 capacity: 2045 ratio: 2.00294 length: 2046 capacity: 3069 ratio: 1.50073 length: 3070 capacity: 4093 ratio: 1.33366 length: 4094 capacity: 5117 ratio: 1.25018 ... length: 1153022 capacity: 1154045 ratio: 1.00089 length: 1154046 capacity: 1155069 ratio: 1.00089 length: 1155070 capacity: 3153917 ratio: 2.7305 < spike length: 3153918 capacity: 3154941 ratio: 1.00032 length: 3154942 capacity: 3155965 ratio: 1.00032 ... length: 4741118 capacity: 4742141 ratio: 1.00022 length: 4742142 capacity: 4743165 ratio: 1.00022 length: 4743166 capacity: 12333053 ratio: 2.60017 < spike length: 12333054 capacity: 12334077 ratio: 1.00008 length: 12334078 capacity: 12335101 ratio: 1.00008 ... (there are more spikes later on) Is that intended? Ali 
July 01, 2010 Re: Spikes in array capacity  

 
Posted in reply to Ali Çehreli  On Thu, 01 Jul 2010 12:46:27 0400, Ali Çehreli <acehreli@yahoo.com> wrote:
> There is something strange in the array capacity algorithm.
>
> import std.stdio;
> import std.array;
>
> void main()
> {
> int[] a;
> size_t oldCapacity;
>
> foreach (i; 0 .. 100_000_000) {
> a ~= i;
>
> if (a.capacity != oldCapacity) {
> writeln("length: ", a.length,
> " capacity: ", capacity(a),
> " ratio: ", cast(double)capacity(a) / oldCapacity);
> oldCapacity = capacity(a);
> }
> }
> }
>
> Produces:
>
> length: 1 capacity: 3 ratio: inf please ignore this one ;)
> length: 4 capacity: 7 ratio: 2.33333
> length: 8 capacity: 15 ratio: 2.14286
> length: 16 capacity: 31 ratio: 2.06667
> length: 32 capacity: 63 ratio: 2.03226
> length: 64 capacity: 127 ratio: 2.01587
> length: 128 capacity: 255 ratio: 2.00787
> length: 256 capacity: 509 ratio: 1.99608
> length: 512 capacity: 1021 ratio: 2.00589
> length: 1022 capacity: 2045 ratio: 2.00294
> length: 2046 capacity: 3069 ratio: 1.50073
> length: 3070 capacity: 4093 ratio: 1.33366
> length: 4094 capacity: 5117 ratio: 1.25018
> ...
> length: 1153022 capacity: 1154045 ratio: 1.00089
> length: 1154046 capacity: 1155069 ratio: 1.00089
> length: 1155070 capacity: 3153917 ratio: 2.7305 < spike
> length: 3153918 capacity: 3154941 ratio: 1.00032
> length: 3154942 capacity: 3155965 ratio: 1.00032
> ...
> length: 4741118 capacity: 4742141 ratio: 1.00022
> length: 4742142 capacity: 4743165 ratio: 1.00022
> length: 4743166 capacity: 12333053 ratio: 2.60017 < spike
> length: 12333054 capacity: 12334077 ratio: 1.00008
> length: 12334078 capacity: 12335101 ratio: 1.00008
> ... (there are more spikes later on)
>
> Is that intended?
>
> Ali
No, that is not intended. Please file a bugzilla against druntime.
Steve

July 01, 2010 Re: Spikes in array capacity  

 
Posted in reply to Steven Schveighoffer  Steven Schveighoffer wrote: > On Thu, 01 Jul 2010 12:46:27 0400, Ali Çehreli <acehreli@yahoo.com> wrote: > >> There is something strange in the array capacity algorithm. ... >> length: 1153022 capacity: 1154045 ratio: 1.00089 >> length: 1154046 capacity: 1155069 ratio: 1.00089 >> length: 1155070 capacity: 3153917 ratio: 2.7305 < spike >> length: 3153918 capacity: 3154941 ratio: 1.00032 >> length: 3154942 capacity: 3155965 ratio: 1.00032 > No, that is not intended. Please file a bugzilla against druntime. http://d.puremagic.com/issues/show_bug.cgi?id=4412 Thanks, Ali 
July 02, 2010 Re: Spikes in array capacity  

 
Posted in reply to Ali Çehreli  Hello Ali, > There is something strange in the array capacity algorithm. > [...] > Is that intended? > > Ali > I'm going to take a guess that the gc is (after some point) allocating into the smallest hole with "enough" capacity. If you re run the test but with something else going on to add some noise, do the spikes move? void makeNoise() { static byte[][256] data; data[rand() % $] = new byte[rand() % 512]; }  ... <IXOYE>< 
July 02, 2010 Re: Spikes in array capacity  

 
Posted in reply to BCS  BCS wrote: >> There is something strange in the array capacity algorithm. >> [...] >> Is that intended? > I'm going to take a guess that the gc is (after some point) allocating > into the smallest hole with "enough" capacity. You may be right. :) > If you re run the test > but with something else going on to add some noise, do the spikes move? Yes, the spikes move. > void makeNoise() > { > static byte[][256] data; > data[rand() % $] = new byte[rand() % 512]; > } I am not familiar with the dmd internals; but following the code took me to the function newCapacity() in src/druntime/src/rt/lifetime.d. The capacity growth is more complicated than what I've been assuming so far. And yes, GC is in the picture. Quoting the comments from that file: /* * Better version by Dave Fladebo: * This uses an inverse logorithmic algorithm to preallocate a bit more * space for larger arrays. *  Arrays smaller than PAGESIZE bytes are left asis, so for the most * common cases, memory allocation is 1 to 1. The small overhead added * doesn't affect small array perf. (it's virtually the same as * current). *  Larger arrays have some space preallocated. *  As the arrays grow, the relative preallocated space shrinks. *  The logorithmic algorithm allocates relatively more space for * midsize arrays, making it very fast for medium arrays (for * midtolarge arrays, this turns out to be quite a bit faster than the * equivalent realloc() code in C, on Linux at least. Small arrays are * just as fast as GCC). *  Perhaps most importantly, overall memory usage and stress on the GC * is decreased significantly for demanding environments. */ So there may not be a bug after all... Ali 
July 02, 2010 Re: Spikes in array capacity  

 
Posted in reply to Ali Çehreli  On Fri, 02 Jul 2010 01:34:26 0400, Ali Çehreli <acehreli@yahoo.com> wrote:
> BCS wrote:
>
> >> There is something strange in the array capacity algorithm.
> >> [...]
> >> Is that intended?
>
> > I'm going to take a guess that the gc is (after some point) allocating
> > into the smallest hole with "enough" capacity.
>
> You may be right. :)
>
> > If you re run the test
> > but with something else going on to add some noise, do the spikes move?
>
> Yes, the spikes move.
>
> > void makeNoise()
> > {
> > static byte[][256] data;
> > data[rand() % $] = new byte[rand() % 512];
> > }
>
> I am not familiar with the dmd internals; but following the code took me to the function newCapacity() in src/druntime/src/rt/lifetime.d. The capacity growth is more complicated than what I've been assuming so far.
>
> And yes, GC is in the picture. Quoting the comments from that file:
>
> /*
> * Better version by Dave Fladebo:
> * This uses an inverse logorithmic algorithm to preallocate a bit more
> * space for larger arrays.
> *  Arrays smaller than PAGESIZE bytes are left asis, so for the most
> * common cases, memory allocation is 1 to 1. The small overhead added
> * doesn't affect small array perf. (it's virtually the same as
> * current).
> *  Larger arrays have some space preallocated.
> *  As the arrays grow, the relative preallocated space shrinks.
> *  The logorithmic algorithm allocates relatively more space for
> * midsize arrays, making it very fast for medium arrays (for
> * midtolarge arrays, this turns out to be quite a bit faster than the
> * equivalent realloc() code in C, on Linux at least. Small arrays are
> * just as fast as GCC).
> *  Perhaps most importantly, overall memory usage and stress on the GC
> * is decreased significantly for demanding environments.
> */
>
> So there may not be a bug after all...
>
> Ali
If your original code is all that was being run, I think there is a bug.
I'll tell you why  the GC provides a way to append pages to an allocated block. Essentially, you can glue consecutive unused pages together to make a larger block, thereby growing your block.
But anything under a page is simply reallocated because anything less than a page is inside a pool of blocks of the same size, and those cannot be glued together.
So when your array is growing, and it gets larger than a page, it should grow a single page at a time. But I think (and I stress think, I'm not sure) that when a block of pages is deallocated, each page becomes independently free. They do not stay glued together. So for an allocating requesting 1 or 2 more pages to grow by 2000 pages is suspect. I still have to examine the code, but I think there is a problem.
Back to my original assertion  to disprove the theory that some other "larger hole" is why it gets so big, in one instance your number of elements grows from 1,155,070 (4MB) to 3,153,917 (12MB), that's almost a 200% increase. In order for that 12MB hole to exist, something must have allocated it before, and then deallocated it. It can't be your loop because your loop has not allocated that much space yet. I would find that very strange to happen somewhere in the runtime without you specifically allocating it. If however, your original post is not the entire code, then there may be some possibility to that.
BTW, you can take the newCapacity function out of the loop by simply setting the length and then writing the last element, i.e.:
a.length += 1;
a[$1] = i;
newCapacity isn't called when setting the length explicitly, because it's not considered to be an append.
Steve

July 02, 2010 Re: Spikes in array capacity  

 
Posted in reply to Steven Schveighoffer  Steven Schveighoffer wrote: > On Fri, 02 Jul 2010 01:34:26 0400, Ali Çehreli <acehreli@yahoo.com> wrote: > >> BCS wrote: >> >> >> There is something strange in the array capacity algorithm. >> >> [...] >> >> Is that intended? >> >> > I'm going to take a guess that the gc is (after some point) allocating >> > into the smallest hole with "enough" capacity. >> >> You may be right. :) >> >> > If you re run the test >> > but with something else going on to add some noise, do the spikes >> move? >> >> Yes, the spikes move. >> >> > void makeNoise() >> > { >> > static byte[][256] data; >> > data[rand() % $] = new byte[rand() % 512]; >> > } >> >> I am not familiar with the dmd internals; but following the code took >> me to the function newCapacity() in src/druntime/src/rt/lifetime.d. >> The capacity growth is more complicated than what I've been assuming >> so far. >> >> And yes, GC is in the picture. Quoting the comments from that file: >> >> /* >> * Better version by Dave Fladebo: >> * This uses an inverse logorithmic algorithm to preallocate a bit more >> * space for larger arrays. >> *  Arrays smaller than PAGESIZE bytes are left asis, so for the most >> * common cases, memory allocation is 1 to 1. The small overhead added >> * doesn't affect small array perf. (it's virtually the same as >> * current). >> *  Larger arrays have some space preallocated. >> *  As the arrays grow, the relative preallocated space shrinks. >> *  The logorithmic algorithm allocates relatively more space for >> * midsize arrays, making it very fast for medium arrays (for >> * midtolarge arrays, this turns out to be quite a bit faster than the >> * equivalent realloc() code in C, on Linux at least. Small arrays are >> * just as fast as GCC). >> *  Perhaps most importantly, overall memory usage and stress on the GC >> * is decreased significantly for demanding environments. >> */ >> >> So there may not be a bug after all... >> >> Ali > > If your original code is all that was being run, I think there is a bug. The original code was it. > I'll tell you why  the GC provides a way to append pages to an > allocated block. Essentially, you can glue consecutive unused pages > together to make a larger block, thereby growing your block. > > But anything under a page is simply reallocated because anything less > than a page is inside a pool of blocks of the same size, and those > cannot be glued together. > > So when your array is growing, and it gets larger than a page, it should > grow a single page at a time. But I think (and I stress think, I'm not > sure) that when a block of pages is deallocated, each page becomes > independently free. They do not stay glued together. So for an > allocating requesting 1 or 2 more pages to grow by 2000 pages is > suspect. I still have to examine the code, but I think there is a problem. > > Back to my original assertion  to disprove the theory that some other > "larger hole" is why it gets so big, in one instance your number of > elements grows from 1,155,070 (4MB) to 3,153,917 (12MB), that's almost a > 200% increase. In order for that 12MB hole to exist, something must > have allocated it before, and then deallocated it. It can't be your > loop because your loop has not allocated that much space yet. I would > find that very strange to happen somewhere in the runtime without you > specifically allocating it. If however, your original post is not the > entire code, then there may be some possibility to that. > > BTW, you can take the newCapacity function out of the loop by simply > setting the length and then writing the last element, i.e.: > > a.length += 1; > a[$1] = i; > > newCapacity isn't called when setting the length explicitly, because > it's not considered to be an append. You are right. Looks like I was successful in locating the culprit. :) I took liberty to write ++a.length and also outputted the number of elements that are in reserve at each allocation: import std.stdio; import std.array; void main() { int[] a; size_t oldCapacity; foreach (i; 0 .. 100_000_000) { ++a.length; a[$1] = i; if (capacity(a) != oldCapacity) { writeln("length: ", a.length, " capacity: ", capacity(a), " ratio: ", cast(double)capacity(a) / oldCapacity, " reserve: ", capacity(a)  a.length); oldCapacity = capacity(a); } } } Now the spikes are gone, and the allocations asymptote at onceper1024 elements for an int array: length: 1 capacity: 3 ratio: inf reserve: 2 length: 4 capacity: 7 ratio: 2.33333 reserve: 3 length: 8 capacity: 15 ratio: 2.14286 reserve: 7 length: 16 capacity: 31 ratio: 2.06667 reserve: 15 length: 32 capacity: 63 ratio: 2.03226 reserve: 31 length: 64 capacity: 127 ratio: 2.01587 reserve: 63 length: 128 capacity: 255 ratio: 2.00787 reserve: 127 length: 256 capacity: 509 ratio: 1.99608 reserve: 253 length: 512 capacity: 1021 ratio: 2.00589 reserve: 509 length: 1022 capacity: 2045 ratio: 2.00294 reserve: 1023 length: 2046 capacity: 3069 ratio: 1.50073 reserve: 1023 length: 3070 capacity: 4093 ratio: 1.33366 reserve: 1023 ... In the long run, there is one allocation per 1024 elements. That still feels like amortized constant, but considering that relocation times would be O(N), I think reserve should still grow with N, but maybe not too much. (I am not sure about this last point at all.) > Steve Thank you very much, Ali 
July 02, 2010 Re: Spikes in array capacity  

 
Posted in reply to Ali Çehreli  On Fri, 02 Jul 2010 10:01:11 0400, Ali Çehreli <acehreli@yahoo.com> wrote:
> Steven Schveighoffer wrote:
> > newCapacity isn't called when setting the length explicitly, because
> > it's not considered to be an append.
>
> You are right.
>
> Looks like I was successful in locating the culprit. :) I took liberty to write ++a.length and also outputted the number of elements that are in reserve at each allocation:
>
> import std.stdio;
> import std.array;
>
> void main()
> {
> int[] a;
> size_t oldCapacity;
>
> foreach (i; 0 .. 100_000_000) {
> ++a.length;
> a[$1] = i;
>
> if (capacity(a) != oldCapacity) {
> writeln("length: ", a.length,
> " capacity: ", capacity(a),
> " ratio: ", cast(double)capacity(a) / oldCapacity,
> " reserve: ", capacity(a)  a.length);
> oldCapacity = capacity(a);
> }
> }
> }
>
> Now the spikes are gone, and the allocations asymptote at onceper1024 elements for an int array:
>
> length: 1 capacity: 3 ratio: inf reserve: 2
> length: 4 capacity: 7 ratio: 2.33333 reserve: 3
> length: 8 capacity: 15 ratio: 2.14286 reserve: 7
> length: 16 capacity: 31 ratio: 2.06667 reserve: 15
> length: 32 capacity: 63 ratio: 2.03226 reserve: 31
> length: 64 capacity: 127 ratio: 2.01587 reserve: 63
> length: 128 capacity: 255 ratio: 2.00787 reserve: 127
> length: 256 capacity: 509 ratio: 1.99608 reserve: 253
> length: 512 capacity: 1021 ratio: 2.00589 reserve: 509
> length: 1022 capacity: 2045 ratio: 2.00294 reserve: 1023
> length: 2046 capacity: 3069 ratio: 1.50073 reserve: 1023
> length: 3070 capacity: 4093 ratio: 1.33366 reserve: 1023
> ...
>
> In the long run, there is one allocation per 1024 elements. That still feels like amortized constant, but considering that relocation times would be O(N), I think reserve should still grow with N, but maybe not too much. (I am not sure about this last point at all.)
>
4x1024 = 4096 == page size.
The newCapacity function is supposed to reduce the effect of this phenomenon. i.e., when appending, it's assumed you are going to continue appending, so to reduce the overhead, the amount added to the memory block is supposed to be related to the total number of bytes already allocated. It doesn't seem like the function is working very well... It could have been something I did, or it could have always been broken :) I honestly did not examine the code at all, I just only ever read the comments, assuming it works.
Please add these comments to the bug report, it will help when I look at it. I don't have a lot of time to look at it right now, so I'm afraid I'll forget what we were doing later :)
Steve

July 02, 2010 Re: Spikes in array capacity  

 
Posted in reply to Ali Çehreli  Ali Çehreli wrote:
[snip]
> Now the spikes are gone, and the allocations asymptote at onceper1024 elements for an int array:
>
> length: 1 capacity: 3 ratio: inf reserve: 2
> length: 4 capacity: 7 ratio: 2.33333 reserve: 3
> length: 8 capacity: 15 ratio: 2.14286 reserve: 7
> length: 16 capacity: 31 ratio: 2.06667 reserve: 15
> length: 32 capacity: 63 ratio: 2.03226 reserve: 31
> length: 64 capacity: 127 ratio: 2.01587 reserve: 63
> length: 128 capacity: 255 ratio: 2.00787 reserve: 127
> length: 256 capacity: 509 ratio: 1.99608 reserve: 253
> length: 512 capacity: 1021 ratio: 2.00589 reserve: 509
> length: 1022 capacity: 2045 ratio: 2.00294 reserve: 1023
> length: 2046 capacity: 3069 ratio: 1.50073 reserve: 1023
> length: 3070 capacity: 4093 ratio: 1.33366 reserve: 1023
> ....
>
> In the long run, there is one allocation per 1024 elements. That still feels like amortized constant, but considering that relocation times would be O(N), I think reserve should still grow with N, but maybe not too much. (I am not sure about this last point at all.)
You may want to also print the pointer to the first element of the array. That provides you information about how often the array is actually moved.
Andrei

July 02, 2010 Re: Spikes in array capacity  

 
Posted in reply to Andrei Alexandrescu  Andrei Alexandrescu wrote: > Ali Çehreli wrote: > [snip] >> Now the spikes are gone, and the allocations asymptote at >> onceper1024 elements for an int array: >> >> length: 1 capacity: 3 ratio: inf reserve: 2 >> length: 4 capacity: 7 ratio: 2.33333 reserve: 3 >> length: 8 capacity: 15 ratio: 2.14286 reserve: 7 >> length: 16 capacity: 31 ratio: 2.06667 reserve: 15 >> length: 32 capacity: 63 ratio: 2.03226 reserve: 31 >> length: 64 capacity: 127 ratio: 2.01587 reserve: 63 >> length: 128 capacity: 255 ratio: 2.00787 reserve: 127 >> length: 256 capacity: 509 ratio: 1.99608 reserve: 253 >> length: 512 capacity: 1021 ratio: 2.00589 reserve: 509 >> length: 1022 capacity: 2045 ratio: 2.00294 reserve: 1023 >> length: 2046 capacity: 3069 ratio: 1.50073 reserve: 1023 >> length: 3070 capacity: 4093 ratio: 1.33366 reserve: 1023 >> .... >> >> In the long run, there is one allocation per 1024 elements. That still >> feels like amortized constant, but considering that relocation times >> would be O(N), I think reserve should still grow with N, but maybe not >> too much. (I am not sure about this last point at all.) > > You may want to also print the pointer to the first element of the > array. That provides you information about how often the array is > actually moved. Good point. :) Yes, spikes are only when the array is moved. I've also discovered that a basic array invariant is violated at size 509 as well: import std.string; import std.array; void main() { int[] a; a.length = 509; a ~= 0; assert(a.capacity >= a.length, format("capacity: %s length: %s", a.capacity, a.length)); } Outputs core.exception.AssertError@deneme.d(17444): capacity: 509 length: 510 I will update the bug with these. Ali 
« First ‹ Prev 1 2 

Copyright © 19992016 by the D Language Foundation