Appender is ... slow (page 4)

August 15, 2014

Re: Appender is ... slow

Posted by monarch_dodra
in reply to Messenger

Permalink

monarch_dodra

Posted in reply to Messenger

Permalink

On Friday, 15 August 2014 at 11:57:30 UTC, Messenger wrote:
> T[size] beats all of those on dmd head, though it is inarguably a
> bit limiting.

Hey guys, just a bit of background and my own understanding of
Appender, having worked on it a fair bit.

First of all, Appender was not designed as a neck-breaking,
mind-bending speed object. It is merely a "tool" to offset the
"slow" GC-based appending.

Indeed, when doing a raw GC-append, you first have to give the GC
a pointer to the start of your array. The GC will then lookup in
which block that pointer belongs, then look up the info related
to that block, check if appending is possible, and then do the
append proper...
...And then it will do all that all over again on the next append.

Appender is simply a tool to "cache" the results of that once,
and then do quicker appends.

There are two other things to take into consideration with
Appender: For starters, it can append to an *existing* array it
is given. Second, you may destroy the Appender object at any
time, and the referenced array is still valid: Appender does not
*own* its buffer, and as such, is not allowed certain
optimizations.

Really, it's just designed for convenience and "pretty good
speed".

Also, another thing to take into account when benchmarking, is
that Appender is a reference semantic object: It has a payload
which itself references an array. This creates a double
indirection. This usually doesn't have much impact, but with the
right optimizations, it can probably explain the x10 performance
differences we are seeing, in our *synthetic* benchmarks. I have
some doubts about the validity of the results in a real
application.

So TL;DR; yeah, you can probably do faster. But Appender is
convenient, fast enough, and works with the GC.

If you *do* need super speeds, look into something a bit more
manual: Walter's ScopeBuffer would be a good choice.

I also did some work on something called ScopeAppender, but
didn't have time to finish it yet.
https://github.com/monarchdodra/phobos/compare/ScopeAppender
It provides better speeds and deterministic management, at the
cost of partial private buffer ownership.

On Friday, 15 August 2014 at 12:08:58 UTC, Philippe Sigaud via Digitalmars-d-learn wrote: > Hmm, what about a sort of linked list of static arrays, that allocates > a new one when necessary? Appender is not a container, and has no freedom on the data it manipulates. It has to be able to accept an array as input, and when it is finished, it needs to be able to return an actual array, so that's arguably out of the question.

Well, I created a wrapper around a std.array.uninitializedArray call, to manage the interface I need (queue behavior: pushing at the end, popping at the beginning). When hitting the end of the current array, it either reuse the current buffer or create a new one, depending of the remaining capacity. On the 'synthetic' benchmarks, it performs quite reasonably: half the time of Array or Appender (twice faster), 5x faster than standard array, and 3-4x slower than uninitializedArray. And... It does not change the timings in my code, it even makes things slower when pre-allocating to much. Only by pre-allocating only a few elements do I get back the original timings. So, I guess I'm suffering from a bad case of premature optimization :) I thought that, having lots of concatenation in my code, that'd be a bottleneck. But it appears than pre-allocation does not give me any speed-up. Well, at least it got me thinking, testing LDC a bit more and learning things on Array and Appender ;) Thank for your help guys, it's now time for the -profile switch again...

On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud wrote: > Well, I created a wrapper around a std.array.uninitializedArray call, to manage the interface I need Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course). > I thought that, having lots of concatenation in my code, that'd be a bottleneck. But it appears than pre-allocation does not give me any speed-up. If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*)

On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote: > On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud wrote: >> Well, I created a wrapper around a std.array.uninitializedArray call, to manage the interface I need > > Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course). Hmm, what's elaborate construction? They are classes and have constructors, of course. I assumed that this produced only null's in the array. >> I thought that, having lots of concatenation in my code, that'd be a bottleneck. But it appears than pre-allocation does not give me any speed-up. > > If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*) OK.

On Friday, 15 August 2014 at 16:51:20 UTC, Philippe Sigaud wrote: > On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote: >> Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course). > > Hmm, what's elaborate construction? They are classes and have constructors, of course. I assumed that this produced only null's in the array. Actually, my statement was inaccurate. More specifically, never use anything that wasn't first properly initialized. Note that in some cases, "operator=" is itself elaborate, meaning it will also read data, so that's not a valid method of initialization. uninitializedArray simply creates an array with unspecified data in it. You *could* just use "new" instead. It's not really any slower, and has the advantage of being certifiably safe.

On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud wrote: > From time to time, I try to speed up some array-heavy code by using std.array.Appender, reserving some capacity and so on. > > It never works. Never. It gives me executables that are maybe 30-50% slower than bog-standard array code. > > I don't do anything fancy: I just replace the types, use clear() instead of = null... > > Do people here get good results from Appender? And if yes, how are you using it? compiler, version, OS, architecture, flags? Have you looked at the assembly to check that all the Appender method calls are being inlined?

On Fri, Aug 15, 2014 at 10:04 PM, John Colvin via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > compiler, version, OS, architecture, flags? Compiler: DMD 2.065 and LDC 0.14 OS: Linux 64bits (8 cores, but there is no parallelism here) flags: -O -release -inline (and -noboundscheck for DMD) > Have you looked at the assembly to check that all the Appender method calls are being inlined? I do not know how to look at the assembly, neither do I know how to see if Appender method calls are being inlined. I did spend some time with -profile and gained a nice 10% increase in speed with that, fighting bottlenecks in my code.

On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote: > If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*) Is that because it's able to grab memory from the GC without actually having to allocate anything? Normally, I would have expected allocations to far outweigh the cost on extension and that preallocating would help a lot. But that would be with malloc or C++'s new rather than the GC, which has already allocated memory to reuse after it collects garbage. - Jonathan M Davis

On Friday, 15 August 2014 at 21:24:25 UTC, Jonathan M Davis wrote: > On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote: >> If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*) > > Is that because it's able to grab memory from the GC without actually having to allocate anything? Normally, I would have expected allocations to far outweigh the cost on extension and that preallocating would help a lot. But that would be with malloc or C++'s new rather than the GC, which has already allocated memory to reuse after it collects garbage. > > - Jonathan M Davis It's mostly just because GC-array appending is slow. A single operation is itself not that slow, but if you plan to append 10_000 elements, then the total cost will add up. A lot.

Forums