May 30, 2016
On Monday, 30 May 2016 at 18:20:39 UTC, Andrei Alexandrescu wrote:
>
> Please throw this hat into the ring as well: it should improve average search on large vocabulary dramatically.
>
> https://dpaste.dzfl.pl/dc8dc6e1eb53
>
> It uses a BM-inspired trick - once the last characters matched, if the match subsequently fails it needn't start from the next character in the haystack. The "skip" is computed lazily and in a separate function so as to keep the loop tight. All in all a routine worth a look. I wanted to write this for a long time. -- Andrei

Cool. So far, my experience with separate functions has been that they slow down the loop. But this might do the trick.
May 30, 2016
On Monday, 30 May 2016 at 18:57:15 UTC, Chris wrote:
> On Monday, 30 May 2016 at 18:20:39 UTC, Andrei Alexandrescu wrote:
>>
>> Please throw this hat into the ring as well: it should improve average search on large vocabulary dramatically.
>>
>> https://dpaste.dzfl.pl/dc8dc6e1eb53
>>
>> It uses a BM-inspired trick - once the last characters matched, if the match subsequently fails it needn't start from the next character in the haystack. The "skip" is computed lazily and in a separate function so as to keep the loop tight. All in all a routine worth a look. I wanted to write this for a long time. -- Andrei
>
> Cool. So far, my experience with separate functions has been that they slow down the loop. But this might do the trick.

Congratulations!

DMD:

./benchmark.dmd
       std: 178 ±31    +36 (4475)  -29 (5344)
    manual: 167 ±46    +82 (2883)  -32 (7054)
      qznc: 114 ±7     +18 (1990)   -5 (7288)
     Chris: 228 ±49    +82 (3050)  -35 (6780)
    Andrei: 103 ±5     +47 ( 658)   -2 (9295)
(avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz

LDC:

       std: 184 ±19    +28 (3420)  -14 (6523)
    manual: 205 ±59   +120 (2463)  -39 (7443)
      qznc: 151 ±25    +44 (2983)  -17 (6911)
     Chris: 194 ±57    +78 (3702)  -46 (6251)
    Andrei: 101 ±2     +42 ( 435)   -1 (9542)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
May 30, 2016
On 05/30/2016 04:00 PM, Chris wrote:
> ./benchmark.dmd
>         std: 178 ±31    +36 (4475)  -29 (5344)
>      manual: 167 ±46    +82 (2883)  -32 (7054)
>        qznc: 114 ±7     +18 (1990)   -5 (7288)
>       Chris: 228 ±49    +82 (3050)  -35 (6780)
>      Andrei: 103 ±5     +47 ( 658)   -2 (9295)
> (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
>
> LDC:
>
>         std: 184 ±19    +28 (3420)  -14 (6523)
>      manual: 205 ±59   +120 (2463)  -39 (7443)
>        qznc: 151 ±25    +44 (2983)  -17 (6911)
>       Chris: 194 ±57    +78 (3702)  -46 (6251)
>      Andrei: 101 ±2     +42 ( 435)   -1 (9542)
>   (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz

Thanks for looking into this! @qznc, could you please look into updating https://github.com/dlang/phobos/pull/4362 with this result? One possible tweak is see whether replacing the function call with inline code helps. Thanks! -- Andrei
May 30, 2016
On Monday, 30 May 2016 at 20:08:46 UTC, Andrei Alexandrescu wrote:
> On 05/30/2016 04:00 PM, Chris wrote:
>> ./benchmark.dmd
>>         std: 178 ±31    +36 (4475)  -29 (5344)
>>      manual: 167 ±46    +82 (2883)  -32 (7054)
>>        qznc: 114 ±7     +18 (1990)   -5 (7288)
>>       Chris: 228 ±49    +82 (3050)  -35 (6780)
>>      Andrei: 103 ±5     +47 ( 658)   -2 (9295)
>> (avg slowdown vs fastest; absolute deviation)
>> CPU ID: GenuineIntel Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
>>
>> LDC:
>>
>>         std: 184 ±19    +28 (3420)  -14 (6523)
>>      manual: 205 ±59   +120 (2463)  -39 (7443)
>>        qznc: 151 ±25    +44 (2983)  -17 (6911)
>>       Chris: 194 ±57    +78 (3702)  -46 (6251)
>>      Andrei: 101 ±2     +42 ( 435)   -1 (9542)
>>   (avg slowdown vs fastest; absolute deviation)
>> CPU ID: GenuineIntel Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
>
> Thanks for looking into this! @qznc, could you please look into updating https://github.com/dlang/phobos/pull/4362 with this result? One possible tweak is see whether replacing the function call with inline code helps. Thanks! -- Andrei

What function call should be replaced with inline code?

Overall, I'm very confused and displeased by the benchmark right now. Initially, the problem was that a naive manual find was slower than std find in Phobos. Look at the LDC numbers–manual find is slower than std find. By tweaking the benchmark, the numbers can be shifted around arbitrarily. Even without tweaking, it is very processor dependent. Here are the numbers from my laptop:

./benchmark.ldc
       std: 149 ±24    +29 (4275)  -21 (5604)
    manual: 129 ±37    +90 (2055)  -23 (7917)
      qznc: 131 ±22    +26 (4381)  -20 (5488)
     Chris: 154 ±35    +81 (2192)  -22 (7672)
    Andrei: 118 ±28    +82 (1762)  -16 (8194)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz

./benchmark.dmd
       std: 143 ±22    +26 (4219)  -19 (5602)
    manual: 158 ±31    +47 (3436)  -24 (6492)
      qznc: 101 ±2     +12 (1349)   -1 (7851)
     Chris: 216 ±38    +56 (3393)  -28 (6541)
    Andrei: 135 ±31    +48 (3326)  -24 (6589)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz

And Desktop:

./benchmark.ldc
       std: 129 ±24    +40 (3121)  -17 (6767)
    manual: 129 ±31    +59 (2668)  -21 (7244)
      qznc: 112 ±14    +30 (2542)   -9 (7312)
     Chris: 134 ±33    +58 (2835)  -23 (7068)
    Andrei: 123 ±27    +53 (2679)  -18 (7225)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

./benchmark.dmd
       std: 157 ±31    +44 (3693)  -24 (6234)
    manual: 143 ±41    +73 (2854)  -28 (7091)
      qznc: 116 ±21    +35 (3092)  -14 (6844)
     Chris: 181 ±50    +74 (3452)  -38 (6510)
    Andrei: 136 ±38    +64 (2975)  -27 (6953)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

May 30, 2016
On 05/30/2016 06:16 PM, qznc wrote:
> What function call should be replaced with inline code?

The call to computeSkip.

> Overall, I'm very confused and displeased by the benchmark right now.

I agree it's difficult to characterize the behavior of substring search with one number. There are many dimensions of variation. (But there's no reason for an emotional response.) A few possible baselines come to mind:

* Search a long string for a one-character string, match and fail.

* Take an English text string. Search for a substring consisting of its last portion (e.g. 5% of the length).

* Take an English text string. Search for a substring consisting of a fraction of the text (e.g. 3%) with additional characters prepended. Repeat for appended.


Andrei

May 31, 2016
On Monday, 30 May 2016 at 22:16:27 UTC, qznc wrote:
>
> And Desktop:
>
> ./benchmark.ldc
>        std: 129 ±24    +40 (3121)  -17 (6767)
>     manual: 129 ±31    +59 (2668)  -21 (7244)
>       qznc: 112 ±14    +30 (2542)   -9 (7312)
>      Chris: 134 ±33    +58 (2835)  -23 (7068)
>     Andrei: 123 ±27    +53 (2679)  -18 (7225)
>  (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>
> ./benchmark.dmd
>        std: 157 ±31    +44 (3693)  -24 (6234)
>     manual: 143 ±41    +73 (2854)  -28 (7091)
>       qznc: 116 ±21    +35 (3092)  -14 (6844)
>      Chris: 181 ±50    +74 (3452)  -38 (6510)
>     Andrei: 136 ±38    +64 (2975)  -27 (6953)
>  (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Benchmark from desktop machine:

DMD:
       std: 164 ±34    +43 (4054)  -29 (5793)
    manual: 150 ±41    +72 (2889)  -29 (7032)
      qznc: 103 ±6     +42 ( 878)   -2 (9090)
     Chris: 205 ±43    +81 (2708)  -29 (7232)
    Andrei: 136 ±31    +53 (2948)  -22 (6977)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

===

LDC:

       std: 138 ±23    +35 (3457)  -18 (6360)
    manual: 145 ±33    +45 (3748)  -27 (6181)
      qznc: 105 ±7     +17 (2267)   -4 (7534)
     Chris: 135 ±33    +56 (3061)  -23 (6882)
    Andrei: 121 ±27    +52 (2630)  -18 (7301)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

On my laptop Andrei's was the fasted (see post above).
May 31, 2016
On Tuesday, 31 May 2016 at 08:43:59 UTC, Chris wrote:
> On Monday, 30 May 2016 at 22:16:27 UTC, qznc wrote:
>>
>> And Desktop:
>>
>> ./benchmark.ldc
>>        std: 129 ±24    +40 (3121)  -17 (6767)
>>     manual: 129 ±31    +59 (2668)  -21 (7244)
>>       qznc: 112 ±14    +30 (2542)   -9 (7312)
>>      Chris: 134 ±33    +58 (2835)  -23 (7068)
>>     Andrei: 123 ±27    +53 (2679)  -18 (7225)
>>  (avg slowdown vs fastest; absolute deviation)
>> CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>>
>> ./benchmark.dmd
>>        std: 157 ±31    +44 (3693)  -24 (6234)
>>     manual: 143 ±41    +73 (2854)  -28 (7091)
>>       qznc: 116 ±21    +35 (3092)  -14 (6844)
>>      Chris: 181 ±50    +74 (3452)  -38 (6510)
>>     Andrei: 136 ±38    +64 (2975)  -27 (6953)
>>  (avg slowdown vs fastest; absolute deviation)
>> CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>
> Benchmark from desktop machine:
>
> DMD:
>        std: 164 ±34    +43 (4054)  -29 (5793)
>     manual: 150 ±41    +72 (2889)  -29 (7032)
>       qznc: 103 ±6     +42 ( 878)   -2 (9090)
>      Chris: 205 ±43    +81 (2708)  -29 (7232)
>     Andrei: 136 ±31    +53 (2948)  -22 (6977)
>  (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>
> ===
>
> LDC:
>
>        std: 138 ±23    +35 (3457)  -18 (6360)
>     manual: 145 ±33    +45 (3748)  -27 (6181)
>       qznc: 105 ±7     +17 (2267)   -4 (7534)
>      Chris: 135 ±33    +56 (3061)  -23 (6882)
>     Andrei: 121 ±27    +52 (2630)  -18 (7301)
>  (avg slowdown vs fastest; absolute deviation)
> CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>
> On my laptop Andrei's was the fasted (see post above).

Comparing the chips involved, could it be cache related?

3M cache; Andrei wins both:
http://ark.intel.com/products/75459/Intel-Core-i5-4200U-Processor-3M-Cache-up-to-2_60-GHz

4M cache; qznc wins DMD (and is faster than the LDC's best? What?); Andrei wins LDC:
http://ark.intel.com/products/43560/Intel-Core-i7-620M-Processor-4M-Cache-2_66-GHz

8M cache; qznc wins both:
http://ark.intel.com/products/65719/Intel-Core-i7-3770-Processor-8M-Cache-up-to-3_90-GHz
http://ark.intel.com/products/75122/Intel-Core-i7-4770-Processor-8M-Cache-up-to-3_90-GHz

Normally, I'd expect the 4200U to be similar to the desktop parts.  Unless...

Say, for the laptops (and I guess the desktops too, but it's more important in a mobile), did you verify the CPU frequency scaling wasn't interfering?

-Wyatt
May 31, 2016
On Tuesday, 31 May 2016 at 17:31:29 UTC, Wyatt wrote:
> qznc wins DMD (and is faster than the LDC's best?

Careful! These are not absolute numbers, but relative slowdowns. You cannot compare the numbers between LDC and DMD.



May 31, 2016
On Tuesday, 31 May 2016 at 01:55:16 UTC, Andrei Alexandrescu wrote:
> I agree it's difficult to characterize the behavior of substring search with one number. There are many dimensions of variation. (But there's no reason for an emotional response.) A few possible baselines come to mind:
>
> * Search a long string for a one-character string, match and fail.

There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake than something to optimize for.

> * Take an English text string. Search for a substring consisting of its last portion (e.g. 5% of the length).

How long should the english text be? A Tweet? A book? A Gigabyte of log files?

English text means basically ASCII and no Unicode?

> * Take an English text string. Search for a substring consisting of a fraction of the text (e.g. 3%) with additional characters prepended. Repeat for appended.

Why the prepend/append? To force a mismatch?


May 31, 2016
On 5/31/16 1:54 PM, qznc wrote:
> Using a one-letter needle string is more like a user mistake than
> something to optimize for.

How is splitting on ',' a user mistake? -- Andrei