May 27, 2016
On 05/27/2016 04:39 PM, qznc wrote:
> Now added Chris' algo to the benchmark:
>
> std find:    155 ±33
> manual find: 112 ±19
> qznc find:   122 ±18  <--- improved find
> Chris find:  133 ±20  <--- findStringS_

Does any of these feature simultaneously:

* check the last element first
* consequently stop early when the length of the haystack falls below the length of the needle
* consequently no check against the haystack limit is necessary in the inner loop
* use only one induction variable

?


Andrei
May 27, 2016
On Friday, 27 May 2016 at 20:50:52 UTC, Andrei Alexandrescu wrote:
> On 05/27/2016 04:39 PM, qznc wrote:
>> Now added Chris' algo to the benchmark:
>>
>> std find:    155 ±33
>> manual find: 112 ±19
>> qznc find:   122 ±18  <--- improved find
>> Chris find:  133 ±20  <--- findStringS_
>
> Does any of these feature simultaneously:
>
> * check the last element first
> * consequently stop early when the length of the haystack falls below the length of the needle
> * consequently no check against the haystack limit is necessary in the inner loop

Std find (currently in Phobos) and qznc find (see pull request) fulfill those three. Manual and Chris find are simpler.

> * use only one induction variable

None of those. I made such a variant of qznc find and it gives me the same numbers.

May 27, 2016
On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote:
> I have to correct myself. A manual loop is faster, I couldn't believe it either, so I benchmarked the same function with `foreach` and `for`. Turns out that `for` is _way_ faster.

Just about the only reason I could think of for this to happen is if the compiler fails to inline the range primitives from std.array. Otherwise, the loops should be pretty much equivalent to LLVM's optimiser.

This is so fundamental to D's strategic promise that we can't afford to get it wrong.

 — David
May 27, 2016
On Friday, 27 May 2016 at 18:21:06 UTC, Andrei Alexandrescu wrote:
> What you want to do here for indexed access is to match the last element first. If no match, continue etc. If there's a match, enter an inner loop where you don't need to check for the end of the haystack. -- Andrei

Another plot twist: Sentinels look interesting.

Wilzbach [0] mentioned it in the pull request:
"Btw on a related issue - it seems like no one has actually bother to implement Andrei's "Fastware" sentinels in Phobos. If you are already in benchmarking, you might give this specialization for mutable RandomAccessRanges a try ;-)"

I tried it [1]. It changes the numbers from

std find:    153 ±32
manual find: 112 ±19
qznc find:   121 ±18
Chris find:  132 ±20

to

std find:    160 ±31
manual find: 118 ±24
qznc find:   109 ±13  <--- using the sentinel trick
Chris find:  142 ±27

It is normal that the numbers of the other tests change, since those are relative to the fastest one in each run. When qznc find 'wins' more often, the others get more slowdown.

This means another special case for mutable ranges could be worthwhile.

[0] https://github.com/dlang/phobos/pull/4362#issuecomment-222224387
[1] https://github.com/qznc/find-is-too-slow/commit/043d1d2f6dced625e964e8df883ad5a45d7fe654
May 27, 2016
On 05/27/2016 06:19 PM, qznc wrote:
> manual find: 118 ±24
> qznc find:   109 ±13  <--- using the sentinel trick
> Chris find:  142 ±27
>
> It is normal that the numbers of the other tests change, since those are
> relative to the fastest one in each run. When qznc find 'wins' more
> often, the others get more slowdown.

What compiler?

> This means another special case for mutable ranges could be worthwhile.

Also mind the throwing possibilities.


Andrei
May 28, 2016
On Friday, 27 May 2016 at 20:39:03 UTC, qznc wrote:
> Now added Chris' algo to the benchmark:
>
> std find:    155 ±33
> manual find: 112 ±19
> qznc find:   122 ±18  <--- improved find
> Chris find:  133 ±20  <--- findStringS_

Did you fix the bugs in my algorithm? S
May 28, 2016
On Saturday, 28 May 2016 at 01:30:11 UTC, Andrei Alexandrescu wrote:
> On 05/27/2016 06:19 PM, qznc wrote:
>> manual find: 118 ±24
>> qznc find:   109 ±13  <--- using the sentinel trick
>> Chris find:  142 ±27
>>
>> It is normal that the numbers of the other tests change, since those are
>> relative to the fastest one in each run. When qznc find 'wins' more
>> often, the others get more slowdown.
>
> What compiler?

LDC. For DMD the numbers show no improvement:

std find:    170 ±31
manual find: 132 ±25
qznc find:   103 ±6
Chris find:  157 ±30

to

std find:    168 ±30
manual find: 131 ±25
qznc find:   104 ±8    <--- sentinel
Chris find:  155 ±29

(Caveat: Chris find has a bug, which is triggered once in the 10000 runs each.)

The sentinel trick is only proof of concept. The sentinel value is `needleBack+1`, but range elements need not support addition. Finding a sentinel is hard and most certainly requires more assumptions about the ranges.

>> This means another special case for mutable ranges could be worthwhile.
>
> Also mind the throwing possibilities.

What do you mean? Using exceptions?


If anybody wants to replicate. It only takes a few seconds and there are no dependencies except dmd/ldc.

  git clone https://github.com/qznc/find-is-too-slow
  cd find-is-too-slow
  make dmd
  make ldc
May 28, 2016
On 5/28/16 6:56 AM, qznc wrote:
> The sentinel value is `needleBack+1`, but range elements need not
> support addition. Finding a sentinel is hard and most certainly requires
> more assumptions about the ranges.

No need for a sentinel really so long as you first search for the last element of the needle, in the haystack.

Allow me to put on the table a simple brute force routine, specialized for arrays with copyable elements, no custom predicate etc. Far as I can tell it does no redundant work:

T[] find(T)(T[] haystack, T[] needle)
{
    if (needle.length == 0) return haystack;
    immutable lastIndex = needle.length - 1;
    auto last = needle[lastIndex];
    size_t j = lastIndex;
    for (; j < haystack.length; ++j)
    {
        if (haystack[j] != last) continue;
        immutable k = j - lastIndex;
        // last elements match
        for (size_t i = 0; ; ++i)
        {
            if (i == lastIndex) return haystack[k .. $];
            if (needle[i] != haystack[k + i]) break;
        }
    }
    return haystack[$ .. $];
}

unittest
{
    string s1 = "hello abc world";
    assert(find(s1, "abc") == "abc world");
    assert(find(s1, "def") == "");
}

void main(){}


Andrei

May 28, 2016
On Friday, 27 May 2016 at 21:31:48 UTC, David Nadlinger wrote:
>
> Just about the only reason I could think of for this to happen is if the compiler fails to inline the range primitives from std.array. Otherwise, the loops should be pretty much equivalent to LLVM's optimiser.
>
> This is so fundamental to D's strategic promise that we can't afford to get it wrong.
>
>  — David

I agree, and it has been said on this forum that tight manual loops are faster than the range paradigm (I think bearophile was one of them, if I remember correctly). Maybe we should benchmark this too, just in case.

May 28, 2016
On Saturday, 28 May 2016 at 12:47:59 UTC, Andrei Alexandrescu wrote:
> On 5/28/16 6:56 AM, qznc wrote:
>> The sentinel value is `needleBack+1`, but range elements need not
>> support addition. Finding a sentinel is hard and most certainly requires
>> more assumptions about the ranges.
>
> No need for a sentinel really so long as you first search for the last element of the needle, in the haystack.
>
> Allow me to put on the table a simple brute force routine, specialized for arrays with copyable elements, no custom predicate etc. Far as I can tell it does no redundant work:
>
> T[] find(T)(T[] haystack, T[] needle)
> {
>     if (needle.length == 0) return haystack;
>     immutable lastIndex = needle.length - 1;
>     auto last = needle[lastIndex];
>     size_t j = lastIndex;
>     for (; j < haystack.length; ++j)
>     {
>         if (haystack[j] != last) continue;
>         immutable k = j - lastIndex;
>         // last elements match
>         for (size_t i = 0; ; ++i)
>         {
>             if (i == lastIndex) return haystack[k .. $];
>             if (needle[i] != haystack[k + i]) break;
>         }
>     }
>     return haystack[$ .. $];
> }
>
> unittest
> {
>     string s1 = "hello abc world";
>     assert(find(s1, "abc") == "abc world");
>     assert(find(s1, "def") == "");
> }
>
> void main(){}
>
>
> Andrei

I included your function (I also fixed the bug in my findStringS_). Here's a typical result:

std find:    208 ±61
manual find: 127 ±29
qznc find:   108 ±13
Chris find:  140 ±32
Andrei find: 126 ±27
=====
std find:    207 ±59
manual find: 128 ±30
qznc find:   108 ±13
Chris find:  141 ±32
Andrei find: 125 ±27

I used dmd, because I don't have ldc on my laptop. qznc's find is clearly the winner.