Speed kills (page 3)

On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei Alexandrescu wrote: > On 03/08/2016 09:14 AM, ixid wrote: >> Since I posted this thread I've learned std.algorithm.sum is 4 times >> slower than a naive loop sum. Even if this is for reasons of accuracy >> this is exactly what I am talking about- this is a hidden iceberg of >> terrible performance that will reflect poorly on D. That's so slow the >> function needs a health warning. > > Whoa. What's happening there? Do we have anyone on it? -- Andrei They just don't do the same thing, sum() uses pairwise summation which is safer as I understand it. Corresponding issue: https://issues.dlang.org/show_bug.cgi?id=15722

On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei Alexandrescu wrote: > On 03/08/2016 09:14 AM, ixid wrote: >> Since I posted this thread I've learned std.algorithm.sum is 4 times >> slower than a naive loop sum. Even if this is for reasons of accuracy >> this is exactly what I am talking about- this is a hidden iceberg of >> terrible performance that will reflect poorly on D. That's so slow the >> function needs a health warning. > > Whoa. What's happening there? Do we have anyone on it? -- Andrei Ilya has long term plans for this, but I have a short-term fix that will buy us a very large performance improvement here (if my old benchmarks were correct). Give me a few mins to prep the pull request :)

On 3/9/16 9:03 AM, John Colvin wrote: > On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei Alexandrescu wrote: >> On 03/08/2016 09:14 AM, ixid wrote: >>> Since I posted this thread I've learned std.algorithm.sum is 4 times >>> slower than a naive loop sum. Even if this is for reasons of accuracy >>> this is exactly what I am talking about- this is a hidden iceberg of >>> terrible performance that will reflect poorly on D. That's so slow the >>> function needs a health warning. >> >> Whoa. What's happening there? Do we have anyone on it? -- Andrei > > Ilya has long term plans for this, but I have a short-term fix that will > buy us a very large performance improvement here (if my old benchmarks > were correct). Give me a few mins to prep the pull request :) Thanks much! -- Andrei

On Wednesday, 9 March 2016 at 14:04:40 UTC, Andrei Alexandrescu wrote: > On 3/9/16 9:03 AM, John Colvin wrote: >> On Wednesday, 9 March 2016 at 13:26:45 UTC, Andrei Alexandrescu wrote: >>> On 03/08/2016 09:14 AM, ixid wrote: >>>> [...] >>> >>> Whoa. What's happening there? Do we have anyone on it? -- Andrei >> >> Ilya has long term plans for this, but I have a short-term fix that will >> buy us a very large performance improvement here (if my old benchmarks >> were correct). Give me a few mins to prep the pull request :) > > Thanks much! -- Andrei https://github.com/D-Programming-Language/phobos/pull/4069

On Wednesday, 9 March 2016 at 13:42:40 UTC, cym13 wrote: > > They just don't do the same thing, sum() uses pairwise summation which is safer as I understand it. Corresponding issue: https://issues.dlang.org/show_bug.cgi?id=15722 That third comment about how it's not obvious which algorithm sum is using is a good one.

On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote: > Since I posted this thread I've learned std.algorithm.sum is 4 times slower than a naive loop sum. Even if this is for reasons of accuracy this is exactly what I am talking about- this is a hidden iceberg of terrible performance that will reflect poorly on D. That's so slow the function needs a health warning. I seen a few cases while exploring D. Not all fully researched (apologies for that), but since there appears to be interest in identification I'll list them. * Lower-casing strings (likely upper-casing), and some character type checks. Routines like to toLower and asLowerCase call functions that work for all unicode characters. I was able to create much faster versions by checking if the character was ascii, then calling either the ascii version or the more general version. Same is true for a few routines like isNumber. Some have the ascii check optimization built in, but not all. If this optimization is added, it might also be useful to add a few common combinations (or a generic solution, if that's feasible). For example, to check if a character is alpha-numeric, one currently ORs two tests from the standard library, isAlpha and isNumber. Putting in an ascii optimization check requires putting it before doing the OR, rather than inside the tests being ORed. * Large associative arrays When associative arrays get beyond about 10 million entries performance starts to decline. I believe this is due to resizing the arrays. It's worse with strings as keys than integers as keys. Having a way to reserve capacity may help under some circumstances. * Associative arrays - Converting keys to immutable versions for lookup Associative arrays want immutable values as keys. Far as I can tell, immutable values are also required when performing a lookup, even if a new entry won't be stored. A couple apps I've written walk through large lists of text values, naturally available as char[] because they are read from input streams. To test presence in an associative array, it's necessary to copy them to immutable strings first. I haven't fully researched this one, but my experience is that copying from char[] to string becomes a meaningful cost. On the surface, this appears to be an optimization opportunity, to create the immutable strings only when actually storing a new value. --Jon

March 09, 2016

Re: Speed kills

Posted by H. S. Teoh
in reply to Jon D

Permalink

H. S. Teoh

Posted in reply to Jon D

Permalink

On Wed, Mar 09, 2016 at 08:30:10PM +0000, Jon D via Digitalmars-d wrote:
> On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
> >Since I posted this thread I've learned std.algorithm.sum is 4 times slower than a naive loop sum. Even if this is for reasons of accuracy this is exactly what I am talking about- this is a hidden iceberg of terrible performance that will reflect poorly on D. That's so slow the function needs a health warning.

In the case of std.algorithm.sum, the focus is on accuracy rather than performance.  It does some extra work to ensure maximum accuracy in the result, so it shouldn't be expected to have top performance.  Granted, though, the docs could be clearer about this accuracy vs. performance tradeoff.  Please file a bug on this (or better yet, submit a PR for it).

In any case, 4 times slower sounds a bit excessive... it would be good to investigate why this is happening and fix it.

> I seen a few cases while exploring D. Not all fully researched (apologies for that), but since there appears to be interest in identification I'll list them.

One of the big performance killers that people are likely to run into is autodecoding for strings in all range-based algorithms.  Walter has repeatedly complained about this, but so far Andrei (and some others) have resisted getting rid of autodecoding.  Fortunately, you can (mostly) suppress this by wrapping your strings in std.encoding.codeUnits before operating on them. Nevertheless, it's one of those hidden gotchas that could result in poorer performance than what you could get otherwise.

Another big performance killer that I repeatedly run into is the overly-eager GC collection schedule.  Some may argue against the use of the GC, period, but I have found that even with the GC, it's possible to get 30-40% speedups just by calling GC.disable() and manually triggering GC.collect() at a lower frequency than the default. This is especially true when your program performs many allocations of small objects, like (small) strings. I have observed this in at least 2-3 different memory-intensive programs of mine, including a recent experiment in writing a faster CSV parser than std.csv.  Arguably, we should fix this in druntime itself so that collection cycles are run less often, though I'm not sure what implications this may have on low-memory systems. Sometimes, you don't even need to do this for the entire program; you can get big performance boosts just by wrapping GC.disable() / GC.enable() around strategic sections of allocation-intensive code.

> * Lower-casing strings (likely upper-casing), and some character type
> checks.
> 
> Routines like to toLower and asLowerCase call functions that work for all unicode characters. I was able to create much faster versions by checking if the character was ascii, then calling either the ascii version or the more general version. Same is true for a few routines like isNumber. Some have the ascii check optimization built in, but not all.
> 
> If this optimization is added, it might also be useful to add a few common combinations (or a generic solution, if that's feasible). For example, to check if a character is alpha-numeric, one currently ORs two tests from the standard library, isAlpha and isNumber. Putting in an ascii optimization check requires putting it before doing the OR, rather than inside the tests being ORed.

These sound like valuable additions to Phobos. Submit a PR, please?

[...]
> * Associative arrays - Converting keys to immutable versions for lookup
> 
> Associative arrays want immutable values as keys. Far as I can tell, immutable values are also required when performing a lookup, even if a new entry won't be stored. A couple apps I've written walk through large lists of text values, naturally available as char[] because they are read from input streams. To test presence in an associative array, it's necessary to copy them to immutable strings first. I haven't fully researched this one, but my experience is that copying from char[] to string becomes a meaningful cost.
[...]

This is arguably a bug.  If you're only looking up a key, you should be able to pass in const(char)[] rather than immutable(char)[] (aka string).  The GC performance problem I mentioned above is especially noticeable when there are large numbers of small allocations, as would be the case if you constantly have to call .idup just because AA lookups demand immutable keys.  Please file an issue for this, if there isn't one already filed.

T

-- 
Windows 95 was a joke, and Windows 98 was the punchline.

On Wednesday, 9 March 2016 at 21:01:13 UTC, H. S. Teoh wrote: > On Wed, Mar 09, 2016 at 08:30:10PM +0000, Jon D via Digitalmars-d wrote: >> On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote: >> >[...] > > In the case of std.algorithm.sum, the focus is on accuracy rather than performance. It does some extra work to ensure maximum accuracy in the result, so it shouldn't be expected to have top performance. Granted, though, the docs could be clearer about this accuracy vs. performance tradeoff. Please file a bug on this (or better yet, submit a PR for it). > > In any case, 4 times slower sounds a bit excessive... it would be good to investigate why this is happening and fix it. I think you'd be good at reviewing this: https://github.com/D-Programming-Language/phobos/pull/4069

On Wednesday, 9 March 2016 at 20:30:10 UTC, Jon D wrote: > > I seen a few cases while exploring D. > Turns out there are issues filed for each of the performance issues I mentioned: * Lower casing strings: https://issues.dlang.org/show_bug.cgi?id=11229 * Large associative arrays: https://issues.dlang.org/show_bug.cgi?id=2504 * Associative arrays - Checking membership with mutable values (char arrays) rather strings (immutable): https://issues.dlang.org/show_bug.cgi?id=15038

Forums