May 31, 2016
On 5/31/16 1:54 PM, qznc wrote:
> On Tuesday, 31 May 2016 at 01:55:16 UTC, Andrei Alexandrescu wrote:
>> I agree it's difficult to characterize the behavior of substring
>> search with one number. There are many dimensions of variation. (But
>> there's no reason for an emotional response.) A few possible baselines
>> come to mind:
>>
>> * Search a long string for a one-character string, match and fail.
>
> There is a special version of find for searching a single char in a
> string. Using a one-letter needle string is more like a user mistake
> than something to optimize for.

Oh, I see what you mean - it's a string consisting only of one character, not one character 'x'. I agree, but I'm just saying it's a sound baseline.

>> * Take an English text string. Search for a substring consisting of
>> its last portion (e.g. 5% of the length).
>
> How long should the english text be? A Tweet? A book? A Gigabyte of log
> files?

I'm thinking of a few exponentially increasing sizes. e.g. 100, 1000, 10_000, 100_000, 1_000_000.

> English text means basically ASCII and no Unicode?

My understanding (and correct me if I'm wrong) is that we're discussing search with the default predicate, ignoring equivalent grapheme representations etc. So there's no need to mind encoding at all, which means the text could be in any language. Foreign languages would further increase the vocabulary, but that's the only effect.

>> * Take an English text string. Search for a substring consisting of a
>> fraction of the text (e.g. 3%) with additional characters prepended.
>> Repeat for appended.
>
> Why the prepend/append? To force a mismatch?

Yah, and it's a good test for strings that have some equal portion yet they are different, which seems to me puts a greater strain on the algorithms. It is arguably a likely situation in the real world.


Thanks,

Andrei

May 31, 2016
On Tuesday, 31 May 2016 at 18:31:47 UTC, Andrei Alexandrescu wrote:
> On 5/31/16 1:54 PM, qznc wrote:
>> Using a one-letter needle string is more like a user mistake than
>> something to optimize for.
>
> How is splitting on ',' a user mistake? -- Andrei

The mistake is to split on "," instead of ','. The slow splitter at the start of this thread searches for "\r\n".

Your lazy-skip algorithm looks great!

./benchmark.ldc
Search in Alice in Wonderland
       std: 168 ±6     +29 ( 107)   -3 ( 893)
    manual: 112 ±3     +28 (  81)   -1 ( 856)
      qznc: 149 ±4     +30 (  79)   -1 ( 898)
     Chris: 142 ±5     +28 ( 102)   -2 ( 898)
    Andrei: 153 ±3     +28 (  79)   -1 ( 919)
   Andrei2: 101 ±2     +34 (  31)   -1 ( 969)
Search random haystack with random needle
       std: 172 ±19    +61 ( 161)  -11 ( 825)
    manual: 161 ±47    +73 ( 333)  -35 ( 666)
      qznc: 163 ±21    +33 ( 314)  -15 ( 661)
     Chris: 190 ±47    +80 ( 302)  -33 ( 693)
    Andrei: 140 ±37    +60 ( 315)  -27 ( 676)
   Andrei2: 103 ±6     +57 (  64)   -2 ( 935)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz

The Alice benchmark searches Alice in Wonderland for "find a pleasure in all their simple joys" and
finds it in the last sentence.
May 31, 2016
On Tuesday, 31 May 2016 at 18:56:14 UTC, qznc wrote:

>
> The mistake is to split on "," instead of ','. The slow splitter at the start of this thread searches for "\r\n".
>
> Your lazy-skip algorithm looks great!
>
> ./benchmark.ldc
> Search in Alice in Wonderland
>        std: 168 ±6     +29 ( 107)   -3 ( 893)
>     manual: 112 ±3     +28 (  81)   -1 ( 856)
>       qznc: 149 ±4     +30 (  79)   -1 ( 898)
>      Chris: 142 ±5     +28 ( 102)   -2 ( 898)
>     Andrei: 153 ±3     +28 (  79)   -1 ( 919)
>    Andrei2: 101 ±2     +34 (  31)   -1 ( 969)
> Search random haystack with random needle
>        std: 172 ±19    +61 ( 161)  -11 ( 825)
>     manual: 161 ±47    +73 ( 333)  -35 ( 666)
>       qznc: 163 ±21    +33 ( 314)  -15 ( 661)
>      Chris: 190 ±47    +80 ( 302)  -33 ( 693)
>     Andrei: 140 ±37    +60 ( 315)  -27 ( 676)
>    Andrei2: 103 ±6     +57 (  64)   -2 ( 935)
>  (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz
>
> The Alice benchmark searches Alice in Wonderland for "find a pleasure in all their simple joys" and
> finds it in the last sentence.

Would it speed things up even more, if we put the function `computeSkip` into the loop or is this done automatically by the compiler?
May 31, 2016
On Tuesday, 31 May 2016 at 19:29:25 UTC, Chris wrote:
> Would it speed things up even more, if we put the function `computeSkip` into the loop or is this done automatically by the compiler?

LDC inlines it. DMD does not.

More numbers:

./benchmark.ldc
Search in Alice in Wonderland
       std: 147 ±1
    manual: 100 ±0
      qznc: 121 ±1
     Chris: 103 ±1
    Andrei: 144 ±1
   Andrei2: 105 ±1
Search in random short strings
       std: 125 ±15
    manual: 117 ±10
      qznc: 104 ±6
     Chris: 123 ±14
    Andrei: 104 ±5
   Andrei2: 103 ±4
Mismatch in random long strings
       std: 140 ±22
    manual: 164 ±64
      qznc: 115 ±13
     Chris: 167 ±63
    Andrei: 161 ±68
   Andrei2: 106 ±9
Search random haystack with random needle
       std: 138 ±27
    manual: 135 ±33
      qznc: 116 ±16
     Chris: 141 ±36
    Andrei: 131 ±33
   Andrei2: 109 ±12
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Random short strings has haystacks of 10 to 300 characters and needles of 2 to 10. Basically, no time for initialisation.

Random long strings has haystacks of size 1000, 10_000, 100_000, or 1_000_000 and needles 50 to 500. It inserts a character into a random index of the needle to force a mismatch.

The last one is the configuration as before.

Overall, Andrei2 (the lazy compute skip) is really impressive. :)
May 31, 2016
On Tuesday, 31 May 2016 at 19:59:50 UTC, qznc wrote:
> On Tuesday, 31 May 2016 at 19:29:25 UTC, Chris wrote:
>> Would it speed things up even more, if we put the function `computeSkip` into the loop or is this done automatically by the compiler?
>
> LDC inlines it. DMD does not.
>
> More numbers:
>
> ./benchmark.ldc
> Search in Alice in Wonderland
>        std: 147 ±1
>     manual: 100 ±0
>       qznc: 121 ±1
>      Chris: 103 ±1
>     Andrei: 144 ±1
>    Andrei2: 105 ±1
> Search in random short strings
>        std: 125 ±15
>     manual: 117 ±10
>       qznc: 104 ±6
>      Chris: 123 ±14
>     Andrei: 104 ±5
>    Andrei2: 103 ±4
> Mismatch in random long strings
>        std: 140 ±22
>     manual: 164 ±64
>       qznc: 115 ±13
>      Chris: 167 ±63
>     Andrei: 161 ±68
>    Andrei2: 106 ±9
> Search random haystack with random needle
>        std: 138 ±27
>     manual: 135 ±33
>       qznc: 116 ±16
>      Chris: 141 ±36
>     Andrei: 131 ±33
>    Andrei2: 109 ±12
>  (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>
> Random short strings has haystacks of 10 to 300 characters and needles of 2 to 10. Basically, no time for initialisation.
>
> Random long strings has haystacks of size 1000, 10_000, 100_000, or 1_000_000 and needles 50 to 500. It inserts a character into a random index of the needle to force a mismatch.
>
> The last one is the configuration as before.
>
> Overall, Andrei2 (the lazy compute skip) is really impressive. :)

Yep. It's really impressive. I actually thought that dmd didn't place `computeSkip` inside of the loop. This begs the question if it should be moved to the loop, in case we use it in Phobos, to make sure that it is as fast as possible even with dmd. However, I like it the way it is now.

`Adrei2` is that it performs consistently well.
May 31, 2016
On 05/31/2016 04:18 PM, Chris wrote:
> I actually thought that dmd didn't place
> `computeSkip` inside of the loop. This begs the question if it should be
> moved to the loop, in case we use it in Phobos, to make sure that it is
> as fast as possible even with dmd. However, I like it the way it is now.

You may want to then try https://dpaste.dzfl.pl/392710b765a9, which generates inline code on all compilers. -- Andrei
May 31, 2016
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu wrote:
> You may want to then try https://dpaste.dzfl.pl/392710b765a9, which generates inline code on all compilers. -- Andrei

In general, it might be beneficial to use ldc.intrinsics.llvm_expect (cf. __builtin_expect) for things like that in order to optimise basic block placement. (We should probably have a compiler-independent API for that in core.*, by the way.) In this case, the skip computation path is probably small enough for that not to matter much, though.

Another thing that might be interesting to do (now that you have a "clever" baseline) is to start counting cycles and make some comparisons against manual asm/intrinsics implementations. For short(-ish) needles, PCMPESTRI is probably the most promising candidate, although I suspect that for \r\n scanning in long strings in particular, an optimised AVX2 solution might have higher throughput.

Of course these observations are not very valuable without backing them up with measurements, but it seems like before optimising a generic search algorithm for short-needle test cases, having one's eyes on a solid SIMD baseline would be a prudent thing to do.

 — David
June 01, 2016
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu wrote:
>
> You may want to then try https://dpaste.dzfl.pl/392710b765a9, which generates inline code on all compilers. -- Andrei

I've added it as `Andrei3`. It runs faster with dmd, but it's not as good with ldc. Seems like ldc performs some extra optimization when moving `computeSkip` into the loop, something it doesn't bother to do when it's already there.

dmd -O -release -inline -noboundscheck *.d -ofbenchmark.dmd
./benchmark.dmd
Search in Alice in Wonderland
       std: 190 ±4
    manual: 115 ±4
      qznc: 106 ±2
     Chris: 160 ±4
    Andrei: 159 ±4
   Andrei2: 108 ±2
   Andrei3: 100 ±0
Search in random short strings
       std: 222 ±27
    manual: 193 ±49
      qznc: 120 ±12
     Chris: 224 ±57
    Andrei: 114 ±9
   Andrei2: 106 ±5
   Andrei3: 102 ±3
Mismatch in random long strings
       std: 186 ±28
    manual: 206 ±85
      qznc: 118 ±14
     Chris: 265 ±104
    Andrei: 194 ±85
   Andrei2: 116 ±18
   Andrei3: 102 ±4
Search random haystack with random needle
       std: 189 ±38
    manual: 171 ±45
      qznc: 118 ±11
     Chris: 225 ±52
    Andrei: 149 ±32
   Andrei2: 110 ±7
   Andrei3: 103 ±6
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

./benchmark.ldc
Search in Alice in Wonderland
       std: 170 ±1
    manual: 143 ±2
      qznc: 133 ±1
     Chris: 144 ±1
    Andrei: 196 ±10
   Andrei2: 100 ±0
   Andrei3: 111 ±1
Search in random short strings
       std: 223 ±30
    manual: 211 ±51
      qznc: 124 ±12
     Chris: 223 ±61
    Andrei: 115 ±10
   Andrei2: 102 ±3
   Andrei3: 105 ±4
Mismatch in random long strings
       std: 181 ±17
    manual: 253 ±109
      qznc: 146 ±24
     Chris: 248 ±108
    Andrei: 228 ±96
   Andrei2: 101 ±2
   Andrei3: 108 ±6
Search random haystack with random needle
       std: 187 ±22
    manual: 208 ±60
      qznc: 152 ±27
     Chris: 202 ±58
    Andrei: 173 ±35
   Andrei2: 102 ±4
   Andrei3: 110 ±8
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

June 01, 2016
On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote:
> There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake than something to optimize for.

At compile time you may not know the length of the needle, like in the grep command.


June 01, 2016
On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote:
> On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote:
>> There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake than something to optimize for.
>
> At compile time you may not know the length of the needle, like in the grep command.

1) how about a CTFE find?

s.find!(needle, pred)

If we can initialize boyer-moore or KMP at compile time - it should be the fastest!

At ndslice such functions are called bifacial:

http://dlang.org/phobos/std_experimental_ndslice_iteration.html

Imho a lot more in std.algorithm should be able to profit from facts known at compile-time.

2) Even for a runtime runtime one-letter needle I am pretty sure it's worth to specialize