avoid toLower in std.algorithm.sort compare alias (page 3)

On Mon, 23 Apr 2012 16:43:20 +0100, Steven Schveighoffer <schveiguy@yahoo.com> wrote: > While dealing with unicode in my std.stream rewrite, I've found that hand-decoding dchars is way faster than using library calls. After watching Andrei's talk on generic and generative programming I have to ask, which routines are you avoiding .. it seems we need to make them as good as the hand coded code you've written... R -- Using Opera's revolutionary email client: http://www.opera.com/mail/

On Tuesday, 24 April 2012 at 11:24:44 UTC, Regan Heath wrote: > On Mon, 23 Apr 2012 16:43:20 +0100, Steven Schveighoffer <schveiguy@yahoo.com> wrote: > >> While dealing with unicode in my std.stream rewrite, I've found that hand-decoding dchars is way faster than using library calls. > > After watching Andrei's talk on generic and generative programming I have to ask, which routines are you avoiding .. it seems we need to make them as good as the hand coded code you've written... from memory (don't have the code in front of me right now), it was std.uni.decode, and using foreach(dchar d; str) (which cannot be inlined currently). IIRC, std.uni.decode was not being inlined. So I tried hand-inlining it (I also discovered some optimizations it was not using), and it made a huge difference. In regards to this discussion, I think icmp can also be improved when run on a char array, by doing a byte comparison (no dchar decoding) until it finds a difference. That might be a huge speedup. Right now, all dchars are being decoded, and translated to the toLower counterpart. It may have an opposite effect, however, if there are a lot of strings that are equivalent when ignoring case, but not exactly the same. -Steve

On Tuesday, 24 April 2012 at 14:54:48 UTC, Steven Schveighoffer wrote: > On Tuesday, 24 April 2012 at 11:24:44 UTC, Regan Heath wrote: >> After watching Andrei's talk on generic and generative programming I have to ask, which routines are you avoiding .. it seems we need to make them as good as the hand coded code you've written... > > from memory (don't have the code in front of me right now), it was std.uni.decode, and using foreach(dchar d; str) (which cannot be inlined currently). > > IIRC, std.uni.decode was not being inlined. So I tried hand-inlining it (I also discovered some optimizations it was not using), and it made a huge difference. BTW, you can check out my github branch of phobos named new-io2, look at the textstream struct to see what I've inlined. -Steve

April 24, 2012

Re: avoid toLower in std.algorithm.sort compare alias

Posted by Jonathan M Davis
in reply to Regan Heath

Permalink

Jonathan M Davis

Posted in reply to Regan Heath

Permalink

On Tuesday, April 24, 2012 12:24:44 Regan Heath wrote:
> On Mon, 23 Apr 2012 16:43:20 +0100, Steven Schveighoffer
> 
> <schveiguy@yahoo.com> wrote:
> > While dealing with unicode in my std.stream rewrite, I've found that hand-decoding dchars is way faster than using library calls.
> 
> After watching Andrei's talk on generic and generative programming I have to ask, which routines are you avoiding .. it seems we need to make them as good as the hand coded code you've written...

In general, when operating on strings generically, you up having to treat them as ranges of dchar and decode everything, but there are a lot of cases where you can special-case algorithms for narrow strings and avoid decoding them. Phobos does this a lot (though it can probably do a better job of it in a number of places), so by using functions from there rather than rolling your own, the problem is reduced, but any time that you're doing a lot of generic string processing, there's a decent chance that you're going to have to special case some stuff for arrays of char, wchar, and dchar in order to fully optimize it. And I don't think that there's really a way out of that beyond having a lot of functions already available (and already optimized) to do a lot of the string processing for you. There's a definite tension between genericity and effciency in the case of string processing - due primarily to variable length encodings.

- Jonathan M Davis

Am Sun, 22 Apr 2012 09:23:45 +0200 schrieb "Jay Norwood" <jayn@prismnet.com>: > On Sunday, 22 April 2012 at 06:26:42 UTC, Jonathan M Davis wrote: > > > > You can look at the code. It checks each of the characters in > > place. Unlike > > toLower, it doesn't need to generate a new string. But as far > > as the > > comparison goes, they're the same - hence that line in the docs. > > > > - Jonathan M Davis > > ok, I did look at the code just now, and I'll sleep better knowing that it doesn't do the whole string conversion. I misunderstood your pseudo-code to mean that two lower case strings were being created prior to the compare. > > However, icmp code does appear to call the toLower conversion on both characters without first comparing the characters for equality, which misses the chance to do a simple compare that would avoid the two calls. /----- check for equality :) v cmp!"a != b && std.uni.toLower(a) < std.uni.toLower(b)"(r1, r2) -- Marco

Forums