faster splitter (page 14)

May 27, 2016

Re: faster splitter

Posted by Chris
in reply to Andrei Alexandrescu

Permalink

Chris

Posted in reply to Andrei Alexandrescu

Permalink

On Friday, 27 May 2016 at 13:29:00 UTC, Andrei Alexandrescu wrote:
> On 5/27/16 8:28 AM, Chris wrote:
>
> This is surprising. It should be easy to construct examples where brute force search performs arbitrarily poor, e.g. searching for "aaaaaaa...aab" in a long string with only "a"s.

I will look into this. Atm, I'm using qznc's benchmark.

>> NB: I've found that `foreach` is usually faster than a manual `for`-loop.
>
> That's surprising too. I'd be interested to look at a disassembly if you have one.

I have to correct myself. A manual loop is faster, I couldn't believe it either, so I benchmarked the same function with `foreach` and `for`. Turns out that `for` is _way_ faster.

>> I used qznc's benchmarking, and it's clear that the current
>> implementation of std.algorithm.find is quite a poor performer. We do
>> have to work on that. Library functions should be fast, no matter what.
>> If someone uses `find`, they wanna be sure it's optimized for speed.
>
> That's definitely the way to go. Thanks for looking into it!
>
>
> Andrei

There is a bug, if it is one, in qznc's `manual_find`. It doesn't find strings at the end of a string, as "ing" in "string", and it does not find a string that is equal to the search string as in "abba" == "abba".

This outperforms both `manual_find` and the improved `std find`

string findStringS_Manual(string haystack, string needle)
{

	if (needle.length > haystack.length)
		return haystack[$..$];
	outer:
	for (auto i = 0; i < haystack.length; i++)
	{
		if (haystack[i] != needle[0])
			continue;
		for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
			if (haystack[j] != needle[k])
			continue outer;
		return haystack[i..$];
	}
	return haystack[$..$];
}

A typical example would be:

std find    	took     25642297
manual find 	took      7426676  // manual_find
my std find 	took      6654496  // improved find
findStringS 	took      7159556  // foreach(i, c; haystack)
findStringS_  	took      5315498  // for (auto i = 0 ...) see above

I think it's clear that `std find` is veeeery slow. If anyone wants to test my function, please do so. I don't want to spread wrong data, but this is what I get at the moment (ldc latest version).

On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote: > > This outperforms both `manual_find` and the improved `std find` > > string findStringS_Manual(string haystack, string needle) > { > > if (needle.length > haystack.length) > return haystack[$..$]; > outer: > for (auto i = 0; i < haystack.length; i++) > { > if (haystack[i] != needle[0]) > continue; > for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k) > if (haystack[j] != needle[k]) > continue outer; > return haystack[i..$]; > } > return haystack[$..$]; > } > Little bug fix: for (size_t j = (i+1 < haystack.length) ? i+1 : i, k = 1; k < needle.length; ++j, ++k) Else it will be out or range when you look for "za" and the string ends with "z". There is a slight performance penalty, but it's still faster than the rest. Maybe there are cleverer ways. But every little check and decision comes with a performance penalty. The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast. std find took 12573666 manual find took 7418454 my std find took 6903854 <=== findStringS took 7166720 findStringS_ took 6579529 <=== std find took 11849892 manual find took 7407091 my std find took 6573102 <=== findStringS took 7296610 findStringS_ took 6573214 <=== std find took 10744361 manual find took 7576133 my std find took 6332014 <=== findStringS took 7224946 findStringS_ took 6555976 <===

On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote: > I think it's clear that `std find` is veeeery slow. If anyone wants to test my function, please do so. I don't want to spread wrong data, but this is what I get at the moment (ldc latest version). If you want to see find (current or my improved version) get really slow, then reduce the alphabet for the haystack (e.g. --alphabet=ab). ;)

On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote: > The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast. I think it really depends on the use case. The manual algorithm is really naive and std-find is slightly more advanced. My benchmark creates a random haystack and needle, but this is not real world data. I believe the "slightly more advanced" approach is a good idea, but there is no data to prove it. It might be interesting what algorithms other language's standard libraries (C++, Python, Java, etc) use. Thanks for looking at the benchmarking, Chris! The more eyes, the better. :)

On 5/27/16 10:41 AM, Chris wrote: > The improved `std find` comes very close to the manual function above > now, sometimes it's even faster or at least as fast. What you want to do here for indexed access is to match the last element first. If no match, continue etc. If there's a match, enter an inner loop where you don't need to check for the end of the haystack. -- Andrei

On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote: > On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote: >> >> This outperforms both `manual_find` and the improved `std find` >> >> string findStringS_Manual(string haystack, string needle) >> { >> >> if (needle.length > haystack.length) >> return haystack[$..$]; >> outer: >> for (auto i = 0; i < haystack.length; i++) >> { >> if (haystack[i] != needle[0]) >> continue; >> for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k) >> if (haystack[j] != needle[k]) >> continue outer; >> return haystack[i..$]; >> } >> return haystack[$..$]; >> } >> > > Little bug fix: > > for (size_t j = (i+1 < haystack.length) ? i+1 : i, k = 1; k < needle.length; ++j, ++k) > The upper bound of the outer loop should be haystack.length-needle.length. There's no point in searching after that point as the needle would not fit anymore and it avoids the buffer overrun you have in your code when the first character of the needle is found after that point. You can also remove then your ?: test in the j loop as that case is handled automatically then.

On Friday, 27 May 2016 at 19:17:44 UTC, Chris wrote: > Oops, I've been found out. :-) Thanks. You're right of course, and I've already noticed that bug as it fails on not found. I got the bounds wrong. I had the same "bug" when I wrote my search function on the project at work. I also found out that using the simplest loop is the generally the best way to go. The processors nowaday have loop-caches, fast level1 caches etc, all the fancy search algorithms like Boyer-Moore etc. can only overcome their memory and initialisation overhead for really huge haystacks. Here's the C code of my memmem() function that replaced the Boyer-Moore I had before. This implementation outran the BM search for sizes up to several hundreds megaytes on SPARC64 VII+ machine. On x86-64 I'm sure it would even be worse. void * memmem(const void *haystack, size_t haystack_len, const void *needle, size_t needle_len) { if(!haystack_len || !needle_len || needle_len > haystack_len) return NULL; uint8_t u8 = *((uint8_t *)needle); const uint8_t *p = (const uint8_t *) haystack; for(size_t i = haystack_len - needle_len + 1; i; --i) { if(*p == u8 && !memcmp(p, needle, needle_len)) return (void *)p; else p++; } return NULL; } (sorry for posting C code, as I'm only beginning to learn D).

On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote: > The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast. > > std find took 12573666 > manual find took 7418454 > my std find took 6903854 <=== > findStringS took 7166720 > findStringS_ took 6579529 <=== I just updated my benchmark. It now checks lots of different scenarios instead of you having to specify one. You can only set the number of iterations now. It generates a random scenario (haystack length, needle length, alphabet, etc) and runs it once with all algorithms. Instead of recording the absolute runtime, it records the slowdown compared to the fastest of the three. Average those and also compute the absolute deviation (the second number). Example: std find: 153 ±33 manual find: 112 ±19 my std find: 119 ±16 So manual find is on average 12% slower than the fastest ±19%. I would interpret that as no significant difference between manual and improved find.

Forums