August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Messenger | On Friday, 15 August 2014 at 11:57:30 UTC, Messenger wrote: > T[size] beats all of those on dmd head, though it is inarguably a > bit limiting. Hey guys, just a bit of background and my own understanding of Appender, having worked on it a fair bit. First of all, Appender was not designed as a neck-breaking, mind-bending speed object. It is merely a "tool" to offset the "slow" GC-based appending. Indeed, when doing a raw GC-append, you first have to give the GC a pointer to the start of your array. The GC will then lookup in which block that pointer belongs, then look up the info related to that block, check if appending is possible, and then do the append proper... ...And then it will do all that all over again on the next append. Appender is simply a tool to "cache" the results of that once, and then do quicker appends. There are two other things to take into consideration with Appender: For starters, it can append to an *existing* array it is given. Second, you may destroy the Appender object at any time, and the referenced array is still valid: Appender does not *own* its buffer, and as such, is not allowed certain optimizations. Really, it's just designed for convenience and "pretty good speed". Also, another thing to take into account when benchmarking, is that Appender is a reference semantic object: It has a payload which itself references an array. This creates a double indirection. This usually doesn't have much impact, but with the right optimizations, it can probably explain the x10 performance differences we are seeing, in our *synthetic* benchmarks. I have some doubts about the validity of the results in a real application. So TL;DR; yeah, you can probably do faster. But Appender is convenient, fast enough, and works with the GC. If you *do* need super speeds, look into something a bit more manual: Walter's ScopeBuffer would be a good choice. I also did some work on something called ScopeAppender, but didn't have time to finish it yet. https://github.com/monarchdodra/phobos/compare/ScopeAppender It provides better speeds and deterministic management, at the cost of partial private buffer ownership. |
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Philippe Sigaud | On Friday, 15 August 2014 at 12:08:58 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:
> Hmm, what about a sort of linked list of static arrays, that allocates
> a new one when necessary?
Appender is not a container, and has no freedom on the data it manipulates. It has to be able to accept an array as input, and when it is finished, it needs to be able to return an actual array, so that's arguably out of the question.
|
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Philippe Sigaud | Well, I created a wrapper around a std.array.uninitializedArray call, to manage the interface I need (queue behavior: pushing at the end, popping at the beginning). When hitting the end of the current array, it either reuse the current buffer or create a new one, depending of the remaining capacity. On the 'synthetic' benchmarks, it performs quite reasonably: half the time of Array or Appender (twice faster), 5x faster than standard array, and 3-4x slower than uninitializedArray. And... It does not change the timings in my code, it even makes things slower when pre-allocating to much. Only by pre-allocating only a few elements do I get back the original timings. So, I guess I'm suffering from a bad case of premature optimization :) I thought that, having lots of concatenation in my code, that'd be a bottleneck. But it appears than pre-allocation does not give me any speed-up. Well, at least it got me thinking, testing LDC a bit more and learning things on Array and Appender ;) Thank for your help guys, it's now time for the -profile switch again... |
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Philippe Sigaud | On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud wrote: > Well, I created a wrapper around a std.array.uninitializedArray call, to manage the interface I need Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course). > I thought that, having lots of concatenation in my code, that'd be a bottleneck. But it appears than pre-allocation does not give me any speed-up. If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*) |
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote: > On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud wrote: >> Well, I created a wrapper around a std.array.uninitializedArray call, to manage the interface I need > > Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course). Hmm, what's elaborate construction? They are classes and have constructors, of course. I assumed that this produced only null's in the array. >> I thought that, having lots of concatenation in my code, that'd be a bottleneck. But it appears than pre-allocation does not give me any speed-up. > > If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*) OK. |
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Philippe Sigaud | On Friday, 15 August 2014 at 16:51:20 UTC, Philippe Sigaud wrote:
> On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
>> Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course).
>
> Hmm, what's elaborate construction? They are classes and have constructors, of course. I assumed that this produced only null's in the array.
Actually, my statement was inaccurate. More specifically, never use anything that wasn't first properly initialized. Note that in some cases, "operator=" is itself elaborate, meaning it will also read data, so that's not a valid method of initialization.
uninitializedArray simply creates an array with unspecified data in it.
You *could* just use "new" instead. It's not really any slower, and has the advantage of being certifiably safe.
|
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Philippe Sigaud | On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud wrote:
> From time to time, I try to speed up some array-heavy code by using std.array.Appender, reserving some capacity and so on.
>
> It never works. Never. It gives me executables that are maybe 30-50% slower than bog-standard array code.
>
> I don't do anything fancy: I just replace the types, use clear() instead of = null...
>
> Do people here get good results from Appender? And if yes, how are you using it?
compiler, version, OS, architecture, flags?
Have you looked at the assembly to check that all the Appender method calls are being inlined?
|
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Fri, Aug 15, 2014 at 10:04 PM, John Colvin via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > compiler, version, OS, architecture, flags? Compiler: DMD 2.065 and LDC 0.14 OS: Linux 64bits (8 cores, but there is no parallelism here) flags: -O -release -inline (and -noboundscheck for DMD) > Have you looked at the assembly to check that all the Appender method calls are being inlined? I do not know how to look at the assembly, neither do I know how to see if Appender method calls are being inlined. I did spend some time with -profile and gained a nice 10% increase in speed with that, fighting bottlenecks in my code. |
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
> If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*)
Is that because it's able to grab memory from the GC without actually having to allocate anything? Normally, I would have expected allocations to far outweigh the cost on extension and that preallocating would help a lot. But that would be with malloc or C++'s new rather than the GC, which has already allocated memory to reuse after it collects garbage.
- Jonathan M Davis
|
August 15, 2014 Re: Appender is ... slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Friday, 15 August 2014 at 21:24:25 UTC, Jonathan M Davis wrote:
> On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
>> If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*)
>
> Is that because it's able to grab memory from the GC without actually having to allocate anything? Normally, I would have expected allocations to far outweigh the cost on extension and that preallocating would help a lot. But that would be with malloc or C++'s new rather than the GC, which has already allocated memory to reuse after it collects garbage.
>
> - Jonathan M Davis
It's mostly just because GC-array appending is slow. A single operation is itself not that slow, but if you plan to append 10_000 elements, then the total cost will add up. A lot.
|
Copyright © 1999-2021 by the D Language Foundation