Thread overview
dynamic array capacity
Dec 29, 2010
spir
Dec 29, 2010
%u
Dec 29, 2010
spir
Dec 29, 2010
spir
Dec 29, 2010
spir
December 29, 2010
Hello,

Is there a common idiom to pre-allocate a dynamic array. I mean allocating to avoid numerous re-allocations in loop, not setting length & filling content.
The differences are:
(0) no content --> no init
(1) one can simply append, instead of setting elements at defined indices
(2) one can append sub-sequences
(3) when the initially requested capacity is full, the array automagically resizes as needed (usng D's builtin dynarray realloc scheme).

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

December 29, 2010
On Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir@gmail.com> wrote:

> Hello,
>
> Is there a common idiom to pre-allocate a dynamic array. I mean allocating to avoid numerous re-allocations in loop, not setting length & filling content.

int[] x;
x.reserve(10000); // pre-allocate at least 10000 elements for appending

Another way to do it:

Appender!(int[]) app;
app.reserve(10000);

this will perform much better than builtin array appending.

-Steve
December 29, 2010
== Quote from Steven Schveighoffer (schveiguy@yahoo.com)'s article
> On Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir@gmail.com> wrote:
> > Hello,
> >
> > Is there a common idiom to pre-allocate a dynamic array. I mean allocating to avoid numerous re-allocations in loop, not setting length & filling content.
> int[] x;
> x.reserve(10000); // pre-allocate at least 10000 elements for appending
> Another way to do it:
> Appender!(int[]) app;
> app.reserve(10000);
> this will perform much better than builtin array appending.
> -Steve

Oh, wow! D2 really kicks D1's ass here..
December 29, 2010
On Wed, 29 Dec 2010 11:24:01 -0500
"Steven Schveighoffer" <schveiguy@yahoo.com> wrote:

> On Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir@gmail.com> wrote:
> 
> > Hello,
> >
> > Is there a common idiom to pre-allocate a dynamic array. I mean allocating to avoid numerous re-allocations in loop, not setting length & filling content.
> 
> int[] x;
> x.reserve(10000); // pre-allocate at least 10000 elements for appending
> 
> Another way to do it:
> 
> Appender!(int[]) app;
> app.reserve(10000);
> 
> this will perform much better than builtin array appending.
> 
> -Steve

Thank you, Steven. I could not find anything like 'reserve' in the property table because it's not a property ;-)

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

December 29, 2010
On Wed, 29 Dec 2010 11:24:01 -0500
"Steven Schveighoffer" <schveiguy@yahoo.com> wrote:

> On Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir@gmail.com> wrote:
> 
> > Hello,
> >
> > Is there a common idiom to pre-allocate a dynamic array. I mean allocating to avoid numerous re-allocations in loop, not setting length & filling content.
> 
> int[] x;
> x.reserve(10000); // pre-allocate at least 10000 elements for appending
> 
> Another way to do it:
> 
> Appender!(int[]) app;
> app.reserve(10000);
> 
> this will perform much better than builtin array appending.
> 
> -Steve

I've done some timings using reserve and Appender. Seems not to work on my use case (decomposition of a string [actually a sequence of code points] according to NFD). (see sample code below)
* use reserve (to source string's length) with builtin append (operator '~=') --> 20% slower
* use Appender w/o reserve --> 3 times slower
* user Appender + its own reserve --> 1.5 times slower (i.e. divide above time per 2)

I'm surprised that reserve does not speed up builtin appending, since it can only avoid numerous reallocations. How to interpret that?
I'm even more surprised of Appender's results on this use case, after having read about it's performance several times on the list. Strange. Can it be due to the fact that I only append sub-sequences? (the decomposition '*p' below is also a mini-array)

Denis

// NFD decomposition
Code[] decomposition (Code[] codes) {
    /** Decompose code sequence according to form "NFD". */
//~     Code[] decompCodes;     // new, decomposed; pile
    Appender!(Code[]) decompCodes;
    Decomp* p;              // pointer to single-code decomposition
    Ordinal i0 = 0;         // reference index past last decomposed code

    // Decompose each code.
//~     decompCodes.reserve(codes.length);  // worse time! ???
//~     foreach (i,code ; codes) {
//~         // case composed code
//~         p = (code in DECOMPOSITIONS);
//~         // Do not even check whether code is composite
//~         // for simple ASCII & latin characters.
//~         if (code > 0xBF)
//~             if (p) {
//~                 // Append past simple codes, if any.
//~                 if (i > i0) decompCodes ~= codes[i0..i];
//~                 // Append current code's decomposition.
//~                 decompCodes ~= *p;
//~                 // Update reference index.
//~                 i0 = i + 1;
//~             }
//~     }
//~     // Append last simple codes (if any).
//~     decompCodes ~= codes[i0..$];

    decompCodes.reserve(codes.length);  // worse time! ???
    foreach (i,code ; codes) {
        // case composed code
        p = (code in DECOMPOSITIONS);
        // Do not even check whether code is composite
        // for simple ASCII & latin characters.
        if (code > 0xBF)
            if (p) {
                // Append past simple codes, if any.
                if (i > i0) decompCodes.put(codes[i0..i]);
                // Append current code's decomposition.
                decompCodes.put(*p);
                // Update reference index.
                i0 = i + 1;
            }
    }
    // Append last simple codes (if any).
    decompCodes.put(codes[i0..$]);

//~     return decompCodes;
    return decompCodes.data;
}
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

December 29, 2010
On Wed, 29 Dec 2010 13:14:29 -0500, spir <denis.spir@gmail.com> wrote:

> I've done some timings using reserve and Appender. Seems not to work on my use case (decomposition of a string [actually a sequence of code points] according to NFD). (see sample code below)
> * use reserve (to source string's length) with builtin append (operator '~=') --> 20% slower
> * use Appender w/o reserve --> 3 times slower
> * user Appender + its own reserve --> 1.5 times slower (i.e. divide above time per 2)

What is the baseline for this?  I.e. what is it 20% slower than? FWIW, Appender should be much faster than builtin append, even without reserve.

However, Appender has a recently fixed bug (not fixed in 2.051) where appending *arrays* of elements goes very slow.  I see you are doing that in a couple spots.

> I'm surprised that reserve does not speed up builtin appending, since it can only avoid numerous reallocations. How to interpret that?
> I'm even more surprised of Appender's results on this use case, after having read about it's performance several times on the list. Strange. Can it be due to the fact that I only append sub-sequences? (the decomposition '*p' below is also a mini-array)

It should speed up appending.  If it doesn't, then it's either a bug or pilot error.  As I said before, Appender in 2.051 and earlier has a bug where appending an array is very slow.

But builtin appending should be faster if you reserve.

Simple tests I run prove that this is true.  Recent developments in how the appending array grows mitigate this quite a bit, but it certainly will result in less memory being consumed, and it always runs faster.

-Steve
December 29, 2010
On Wed, 29 Dec 2010 13:48:48 -0500
"Steven Schveighoffer" <schveiguy@yahoo.com> wrote:

> On Wed, 29 Dec 2010 13:14:29 -0500, spir <denis.spir@gmail.com> wrote:
> 
> > I've done some timings using reserve and Appender. Seems not to work on
> > my use case (decomposition of a string [actually a sequence of code
> > points] according to NFD). (see sample code below)
> > * use reserve (to source string's length) with builtin append (operator
> > '~=') --> 20% slower
> > * use Appender w/o reserve --> 3 times slower
> > * user Appender + its own reserve --> 1.5 times slower (i.e. divide
> > above time per 2)
> 
> What is the baseline for this?  I.e. what is it 20% slower than? FWIW, Appender should be much faster than builtin append, even without reserve.

The baseline is builtin append (~=) without reserving. (Sorry, thought it was clear.)

> However, Appender has a recently fixed bug (not fixed in 2.051) where appending *arrays* of elements goes very slow.  I see you are doing that in a couple spots.

As explained below, I'm doing _only_ that. So (as guessed), seems it may be the reason why Appender performs so badly in my case.

> > I'm surprised that reserve does not speed up builtin appending, since it can only avoid numerous reallocations. How to interpret that?

* This question remains. ??? *

> > I'm even more surprised of Appender's results on this use case, after having read about it's performance several times on the list. Strange. Can it be due to the fact that I only append sub-sequences? (the decomposition '*p' below is also a mini-array)
> 
> It should speed up appending.  If it doesn't, then it's either a bug or pilot error.  As I said before, Appender in 2.051 and earlier has a bug where appending an array is very slow.
> 
> But builtin appending should be faster if you reserve.

Yop!

> Simple tests I run prove that this is true.  Recent developments in how the appending array grows mitigate this quite a bit, but it certainly will result in less memory being consumed, and it always runs faster.

I can upload the modified version using reserve & Appender (with the timing module) if you like. (The original one is at https://bitbucket.org/denispir/denispir-d/src, files pile.d and chrono.d.)
Anyway, why reserve does not help builtin append is a mystery. In the worst case, it should not trouble.
By the way, how comes I do not need to import anything to be able to use reserve (like if it were a builtin prop)?

About Appender, I have had a look at the code in std.array and could not find any sign of why -- except that appending a slice loops over its elements --calling put() for each one--, but this should not be _that_ slow, I guess. (Or should it?) Maybe a specialisation of put to take an array (or slice) would help? (Instead of using an array interface, as is done now.)
The details of realloc are beyong my competence (e.g. the computation of resizing ratio), so I can hardly search further.

> -Steve

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

December 29, 2010
On Wed, 29 Dec 2010 14:49:35 -0500, spir <denis.spir@gmail.com> wrote:

> On Wed, 29 Dec 2010 13:48:48 -0500
> "Steven Schveighoffer" <schveiguy@yahoo.com> wrote:
>
>> On Wed, 29 Dec 2010 13:14:29 -0500, spir <denis.spir@gmail.com> wrote:
>>
>> > I've done some timings using reserve and Appender. Seems not to work  
>> on
>> > my use case (decomposition of a string [actually a sequence of code
>> > points] according to NFD). (see sample code below)
>> > * use reserve (to source string's length) with builtin append  
>> (operator
>> > '~=') --> 20% slower
>> > * use Appender w/o reserve --> 3 times slower
>> > * user Appender + its own reserve --> 1.5 times slower (i.e. divide
>> > above time per 2)
>>
>> What is the baseline for this?  I.e. what is it 20% slower than? FWIW,
>> Appender should be much faster than builtin append, even without reserve.
>
> The baseline is builtin append (~=) without reserving. (Sorry, thought it was clear.)

Then I would be suspect of your test.  In simple tests I just ran, reserving outperforms normal appending (though not by a huge amount).  If it's *slower*, then there's something else wrong, as the worst case is it degenerates to non-reserved appending, since the same code paths are followed.

>
>> However, Appender has a recently fixed bug (not fixed in 2.051) where
>> appending *arrays* of elements goes very slow.  I see you are doing that
>> in a couple spots.
>
> As explained below, I'm doing _only_ that. So (as guessed), seems it may be the reason why Appender performs so badly in my case.

Yes, most likely.  I mean it was like 10x slower in tests I ran.

Try this version of std.array (shouldn't need to recompile phobos, it's a template): http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/array.d?rev=2238&format=raw

>
>> > I'm surprised that reserve does not speed up builtin appending, since  
>> it
>> > can only avoid numerous reallocations. How to interpret that?
>
> * This question remains. ??? *

I would lean towards pilot error (sorry), since simple tests show reserve does speed things up.  I don't have a lot of time to look into your code in depth.  If you can show a simple tests that proves otherwise, I'd be happy to look at it.

>> Simple tests I run prove that this is true.  Recent developments in how
>> the appending array grows mitigate this quite a bit, but it certainly will
>> result in less memory being consumed, and it always runs faster.
>
> I can upload the modified version using reserve & Appender (with the timing module) if you like. (The original one is at https://bitbucket.org/denispir/denispir-d/src, files pile.d and chrono.d.)

I use the 'time' command to run benchmarks usually because I've been burned in the past with using custom-built timing mechanisms that have flaws in them.

> Anyway, why reserve does not help builtin append is a mystery. In the worst case, it should not trouble.

I'll augment that and say "in your application", since it seems to perform expectedly in my tests.

> By the way, how comes I do not need to import anything to be able to use reserve (like if it were a builtin prop)?

It's in object.di, always imported.

I'm wondering, if those should be added to the 'properties' page of the spec.

> About Appender, I have had a look at the code in std.array and could not find any sign of why -- except that appending a slice loops over its elements --calling put() for each one--, but this should not be _that_ slow, I guess. (Or should it?) Maybe a specialisation of put to take an array (or slice) would help? (Instead of using an array interface, as is done now.)
> The details of realloc are beyong my competence (e.g. the computation of resizing ratio), so I can hardly search further.

The issue is that appending was not amortized when appending arrays vs. appending individual elements (where appending *is* amortized).  In order to achieve the appearance of O(1) appending, you must grow the array proportionally instead of linearly.

In simple terms, you want to essentially double the capacity whenever you realloc.  The actual aglorithm uses a logarithmic growth curve that settles around 1.4x.

The flawed appender code was just adding enough space to accommodate the data being appended, but only if an array was appended (appending single elements was using the growth curve).  Not only that, but it wasn't taking advantage of the ability to extend into consecutive pages without having to move data around.

-Steve