May 27, 2016
On Friday, 27 May 2016 at 13:29:00 UTC, Andrei Alexandrescu wrote:
> On 5/27/16 8:28 AM, Chris wrote:
>
> This is surprising. It should be easy to construct examples where brute force search performs arbitrarily poor, e.g. searching for "aaaaaaa...aab" in a long string with only "a"s.

I will look into this. Atm, I'm using qznc's benchmark.

>> NB: I've found that `foreach` is usually faster than a manual `for`-loop.
>
> That's surprising too. I'd be interested to look at a disassembly if you have one.

I have to correct myself. A manual loop is faster, I couldn't believe it either, so I benchmarked the same function with `foreach` and `for`. Turns out that `for` is _way_ faster.

>> I used qznc's benchmarking, and it's clear that the current
>> implementation of std.algorithm.find is quite a poor performer. We do
>> have to work on that. Library functions should be fast, no matter what.
>> If someone uses `find`, they wanna be sure it's optimized for speed.
>
> That's definitely the way to go. Thanks for looking into it!
>
>
> Andrei

There is a bug, if it is one, in qznc's `manual_find`. It doesn't find strings at the end of a string, as "ing" in "string", and it does not find a string that is equal to the search string as in "abba" == "abba".

This outperforms both `manual_find` and the improved `std find`

string findStringS_Manual(string haystack, string needle)
{
	
	if (needle.length > haystack.length)
		return haystack[$..$];
	outer:
	for (auto i = 0; i < haystack.length; i++)
	{
		if (haystack[i] != needle[0])
			continue;
		for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
			if (haystack[j] != needle[k])
			continue outer;
		return haystack[i..$];
	}
	return haystack[$..$];
}

A typical example would be:

std find    	took     25642297
manual find 	took      7426676  // manual_find
my std find 	took      6654496  // improved find
findStringS 	took      7159556  // foreach(i, c; haystack)
findStringS_  	took      5315498  // for (auto i = 0 ...) see above

I think it's clear that `std find` is veeeery slow. If anyone wants to test my function, please do so. I don't want to spread wrong data, but this is what I get at the moment (ldc latest version).
May 27, 2016
On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote:
>
> This outperforms both `manual_find` and the improved `std find`
>
> string findStringS_Manual(string haystack, string needle)
> {
> 	
> 	if (needle.length > haystack.length)
> 		return haystack[$..$];
> 	outer:
> 	for (auto i = 0; i < haystack.length; i++)
> 	{
> 		if (haystack[i] != needle[0])
> 			continue;
> 		for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
> 			if (haystack[j] != needle[k])
> 			continue outer;
> 		return haystack[i..$];
> 	}
> 	return haystack[$..$];
> }
>

Little bug fix:

for (size_t j = (i+1 < haystack.length) ? i+1 : i, k = 1; k < needle.length; ++j, ++k)

Else it will be out or range when you look for "za" and the string ends with "z". There is a slight performance penalty, but it's still faster than the rest. Maybe there are cleverer ways. But every little check and decision comes with a performance penalty.

The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast.

std find    	took     12573666
manual find 	took      7418454
my std find 	took      6903854 <===
findStringS 	took      7166720
findStringS_  	took      6579529 <===

std find    	took     11849892
manual find 	took      7407091
my std find 	took      6573102 <===
findStringS 	took      7296610
findStringS_  	took      6573214 <===

std find    	took     10744361
manual find 	took      7576133
my std find 	took      6332014 <===
findStringS 	took      7224946
findStringS_  	took      6555976 <===


May 27, 2016
On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote:
> I think it's clear that `std find` is veeeery slow. If anyone wants to test my function, please do so. I don't want to spread wrong data, but this is what I get at the moment (ldc latest version).

If you want to see find (current or my improved version) get really slow, then reduce the alphabet for the haystack (e.g. --alphabet=ab). ;)
May 27, 2016
On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote:
> The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast.

I think it really depends on the use case. The manual algorithm is really naive and std-find is slightly more advanced. My benchmark creates a random haystack and needle, but this is not real world data. I believe the "slightly more advanced" approach is a good idea, but there is no data to prove it.

It might be interesting what algorithms other language's standard libraries (C++, Python, Java, etc) use.

Thanks for looking at the benchmarking, Chris! The more eyes, the better. :)
May 27, 2016
On 5/27/16 10:41 AM, Chris wrote:
> The improved `std find` comes very close to the manual function above
> now, sometimes it's even faster or at least as fast.

What you want to do here for indexed access is to match the last element first. If no match, continue etc. If there's a match, enter an inner loop where you don't need to check for the end of the haystack. -- Andrei
May 27, 2016
On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote:
> On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote:
>>
>> This outperforms both `manual_find` and the improved `std find`
>>
>> string findStringS_Manual(string haystack, string needle)
>> {
>> 	
>> 	if (needle.length > haystack.length)
>> 		return haystack[$..$];
>> 	outer:
>> 	for (auto i = 0; i < haystack.length; i++)
>> 	{
>> 		if (haystack[i] != needle[0])
>> 			continue;
>> 		for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
>> 			if (haystack[j] != needle[k])
>> 			continue outer;
>> 		return haystack[i..$];
>> 	}
>> 	return haystack[$..$];
>> }
>>
>
> Little bug fix:
>
> for (size_t j = (i+1 < haystack.length) ? i+1 : i, k = 1; k < needle.length; ++j, ++k)
>

The upper bound of the outer loop should be haystack.length-needle.length. There's no point in searching after that point as the needle would not fit anymore and it avoids the buffer overrun you have in your code when the first character of the needle is found after that point.
You can also remove then your ?: test in the j loop as that case is handled automatically then.
May 27, 2016
Oops, I've been found out. :-) Thanks. You're right of course, and I've already noticed that bug as it fails on not found. I got the bounds wrong.
May 27, 2016
On Friday, 27 May 2016 at 19:17:44 UTC, Chris wrote:
> Oops, I've been found out. :-) Thanks. You're right of course, and I've already noticed that bug as it fails on not found. I got the bounds wrong.

I had the same "bug" when I wrote my search function on the project at work. I also found out that using the simplest loop is the generally the best way to go. The processors nowaday have loop-caches, fast level1 caches etc, all the fancy search algorithms like Boyer-Moore etc. can only overcome their memory and initialisation overhead for really huge haystacks.

Here's the C code of my memmem() function that replaced the Boyer-Moore I had before. This implementation outran the BM search for sizes up to several hundreds megaytes on SPARC64 VII+ machine. On x86-64 I'm sure it would even be worse.

void * memmem(const void *haystack, size_t haystack_len, const void *needle, size_t needle_len)
{
  if(!haystack_len || !needle_len || needle_len > haystack_len)
    return NULL;

  uint8_t u8 = *((uint8_t *)needle);
  const uint8_t *p = (const uint8_t *) haystack;
  for(size_t i = haystack_len - needle_len + 1; i; --i) {
    if(*p == u8 && !memcmp(p, needle, needle_len))
      return (void *)p;
    else
      p++;
  }
  return NULL;
}

(sorry for posting C code, as I'm only beginning to learn D).
May 27, 2016
On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote:
> The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast.
>
> std find    	took     12573666
> manual find 	took      7418454
> my std find 	took      6903854 <===
> findStringS 	took      7166720
> findStringS_  	took      6579529 <===

I just updated my benchmark. It now checks lots of different scenarios instead of you having to specify one. You can only set the number of iterations now.

It generates a random scenario (haystack length, needle length, alphabet, etc) and runs it once with all algorithms. Instead of recording the absolute runtime, it records the slowdown compared to the fastest of the three. Average those and also compute the absolute deviation (the second number).

Example:
std find:    153 ±33
manual find: 112 ±19
my std find: 119 ±16

So manual find is on average 12% slower than the fastest ±19%. I would interpret that as no significant difference between manual and improved find.
May 27, 2016
Now added Chris' algo to the benchmark:

std find:    155 ±33
manual find: 112 ±19
qznc find:   122 ±18  <--- improved find
Chris find:  133 ±20  <--- findStringS_

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19