faster splitter (page 19)

On Wednesday, 1 June 2016 at 12:41:19 UTC, Seb wrote: > On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote: >> On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: >>> There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake than something to optimize for. >> >> At compile time you may not know the length of the needle, like in the grep command. > > 1) how about a CTFE find? > > s.find!(needle, pred) > > If we can initialize boyer-moore or KMP at compile time - it should be the fastest! > > At ndslice such functions are called bifacial: > > http://dlang.org/phobos/std_experimental_ndslice_iteration.html > > Imho a lot more in std.algorithm should be able to profit from facts known at compile-time. > > 2) Even for a runtime runtime one-letter needle I am pretty sure it's worth to specialize That makes sense. I think that std.algorithm needs to be revised and optimized for speed. We cannot afford to have suboptimal algorithms in there.

On 06/01/2016 08:41 AM, Seb wrote: > On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote: >> On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: >>> There is a special version of find for searching a single char in a >>> string. Using a one-letter needle string is more like a user mistake >>> than something to optimize for. >> >> At compile time you may not know the length of the needle, like in the >> grep command. > > 1) how about a CTFE find? > > s.find!(needle, pred) > > If we can initialize boyer-moore or KMP at compile time - it should be > the fastest! That would require partial evaluation, which sadly we don't have in D. -- Andrei

On Wednesday, 1 June 2016 at 12:41:19 UTC, Seb wrote: > On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote: >> On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: >>> There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake than something to optimize for. >> >> At compile time you may not know the length of the needle, like in the grep command. > > 1) how about a CTFE find? > What I wanted to say, is that in real life, the input of the search routine is very often run-time user provided data. Think of search box in browsers and apps, command line parameter à la grep, etc. The "string" search function should not catastrophically break down on special input, like 1 character strings, unusual Unicode or when needle==haystack. I only said this to not lose the focus on what is being tried to be achieved here. It's often a danger of micro-optimization and unit test focused development, that a lot of time and effort are spent on improvements that are completely irrelevant when checked against what is really needed in real world (i.e. we're full in bike shed territory here).

On Wednesday, 1 June 2016 at 13:47:10 UTC, Patrick Schluter wrote: >> > What I wanted to say, is that in real life, the input of the search routine is very often run-time user provided data. Think of search box in browsers and apps, command line parameter à la grep, etc. The "string" search function should not catastrophically break down on special input, like 1 character strings, unusual Unicode or when needle==haystack. I only said this to not lose the focus on what is being tried to be achieved here. > It's often a danger of micro-optimization and unit test focused development, that a lot of time and effort are spent on improvements that are completely irrelevant when checked against what is really needed in real world (i.e. we're full in bike shed territory here). That's true. We should try to optimize for the worst case, i.e. random input. However, it is often the case that the needle is known at compile time, e.g. things like if (input.canFind("foo")) There might be ways to optimize those cases at compile time.

June 02, 2016

Re: faster splitter

Posted by Chris
in reply to Chris

Permalink

Chris

Posted in reply to Chris

Permalink

On Wednesday, 1 June 2016 at 14:32:38 UTC, Chris wrote:

For fun, I've written a search function that alternates between the beginning and the end of a string. I'm basically using Andrei's search algorithm for this. It is, of course only useful, if `needle` is expected to be either towards the beginning or the end of a string. If `needle` occurs multiple times, it matches where ever it encounters it first.

T[] findUnique(T)(T[] haystack, T[] needle)
{
	static size_t computeSkip(T[] needle)
	{
		size_t result = 1;
		while (result < needle.length && needle[$ - 1 - result] != needle[$ - 1])
		{
			++result;
		}
		return result;
	}
	
	if (needle.length == 0) return haystack;
	immutable size_t len = haystack.length;
	immutable size_t lastIndex = needle.length - 1;
	auto last = needle[lastIndex];
	auto first = needle[0];
	size_t skip = 0;
	for (size_t i = lastIndex, j = len-lastIndex-1; i < len; ++i, --j)
	{
		if (last != haystack[i])
			goto back;
		for (size_t k = 0; ; ++k)
		{
			if (k == lastIndex) return haystack[i-lastIndex..$];
			if (haystack[i - lastIndex + k] != needle[k]) break;
		}
		back:
		if (i > j) break;
		if (first != haystack[j])
			continue;
		for (size_t k = lastIndex; ; --k)
		{
			if (k == 0) return haystack[j..$];
			if (haystack[j + k] != needle[k]) break;
		}
		if (skip == 0) skip = computeSkip(needle);
		i += skip;
    j -= skip-1;
	}
	return haystack[$..$];
}

unittest
{
	string s1 = "Hello abc world!";
	assert (findUnique(s1, "abc") == "abc world!");
	assert (findUnique(s1, "def") == "");
	string s2 = "Ola, mundo!";
	assert (findUnique(s2, "do!") == "do!");
	string s3 = "abc";
	assert (findUnique(s3, "c") == "c");
	assert (findUnique(s3, "z") == "");
	string s4 = "aaabaaa";
	assert (findUnique(s4, "b") == "baaa");
}

I added it to the benchmarking suite (I had to turn off the error warnings, see description above). Naturally, it shines in the first test, as the needle is at the end of the text. The whole thing is to be taken with a grain of salt ;)

Search in Alice in Wonderland
       std: 535 ±8
    manual: 454 ±10
      qznc: 421 ±5
     Chris: 460 ±10
    Andrei: 643 ±19
   Andrei2: 314 ±6
   Andrei3: 352 ±5
    Biloop: 100 ±0
Search in random short strings
       std: 219 ±33
    manual: 194 ±47
      qznc: 123 ±11
     Chris: 219 ±69
    Andrei: 113 ±10
   Andrei2: 104 ±4
   Andrei3: 103 ±3
    Biloop: 195 ±46
Mismatch in random long strings
       std: 193 ±30
    manual: 249 ±103
      qznc: 156 ±30
     Chris: 260 ±108
    Andrei: 238 ±97
   Andrei2: 107 ±11
   Andrei3: 115 ±12
    Biloop: 141 ±44
Search random haystack with random needle
       std: 189 ±25
    manual: 193 ±57
      qznc: 152 ±26
     Chris: 203 ±59
    Andrei: 176 ±35
   Andrei2: 103 ±6
   Andrei3: 110 ±8
    Biloop: 141 ±28
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu wrote: > On 05/31/2016 04:18 PM, Chris wrote: >> I actually thought that dmd didn't place >> `computeSkip` inside of the loop. This begs the question if it should be >> moved to the loop, in case we use it in Phobos, to make sure that it is >> as fast as possible even with dmd. However, I like it the way it is now. > > You may want to then try https://dpaste.dzfl.pl/392710b765a9, which generates inline code on all compilers. -- Andrei I ported this to Phobos style (A2Phobos in the benchmark) and changed the pull request [0]. In practically all tests, this algorithms wipes the floor with the competition. [0] https://github.com/dlang/phobos/pull/4362

On Tuesday, 31 May 2016 at 22:50:37 UTC, David Nadlinger wrote: > Another thing that might be interesting to do (now that you have a "clever" baseline) is to start counting cycles and make some comparisons against manual asm/intrinsics implementations. For short(-ish) needles, PCMPESTRI is probably the most promising candidate, although I suspect that for \r\n scanning in long strings in particular, an optimised AVX2 solution might have higher throughput. > > Of course these observations are not very valuable without backing them up with measurements, but it seems like before optimising a generic search algorithm for short-needle test cases, having one's eyes on a solid SIMD baseline would be a prudent thing to do. The current algorithm is generic with respect to the predicate. Once we use SSE/AVX tricks, it is a special case for equality. As a next step in Phobos, this is probably worth it for strings. We could probably steal some well-optimized strcmp from somewhere.

On Monday, 30 May 2016 at 18:20:39 UTC, Andrei Alexandrescu wrote: > On 05/30/2016 05:31 AM, qznc wrote: >> On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote: >>> worthwhile to use word loads [0] instead. Really fancy would be SSE. I wrote a splitter in SSE4.2 some time ago as acontribution to a github project. Perhaps it is related. https://github.com/HFTrader/string-splitting/blob/master/splithb2.cpp Cheers,

Forums