May 24, 2016
On 05/24/2016 08:38 AM, Andrei Alexandrescu wrote:
> One somewhat unrelated thing that can be done with Boyer-Moore: use a
> constant-length skip array instead of dynamic allocation. Upon
> saturation, continue with brute force. -- Andrei

Another idea: in every searched string there is one specific index that offers the most information to a failed search, allowing for a large skip. Roughly speaking it's the largest index around which there are many characters not seen elsewhere in the searched string. (In a string with all distinct characters, that would be the last character.) If that offset is cheap to compute and offers a good skip, a search starting from that index would be very fast. -- Andrei
May 24, 2016
On Tuesday, 24 May 2016 at 10:44:12 UTC, qznc wrote:
> Unfortunately, it is slower. My current measurements [0]:
>
> $ ./benchmark 10000000 10 # large haystack, few iterations
> std find    took    753837231
> manual find took    129561980
> $ ./benchmark 10 10000000 # small haystack, many iterations
> std find    took    721676721
> manual find took    216418870
>
> The nested-for-loop implementation is roughly 4 times faster!
>
> Caveat: I'm not completely sure my benchmark is fair yet.
>
> [0] https://github.com/qznc/find-is-too-slow

Plot twist: manual find is not always faster.

My benchmark now generates random haystack and needle. Sometimes Phobos clearly wins. Example:

$ ./benchmark.ldc -n39 -l10000 -i10000
Found at 10000
std find    took    289529102
manual find took    368958679

This is compiled with LDC, needle is 39 chars, haystack 10000 chars, and 10000 iterations. "Found at 10000" means the needle was not found.

Very often manual find wins though. Takeaway: Manual find is too naive. However, something slows down std find in general. More research needed.

PS: Any recommendations for a Linux dissassembler? Objdump is ok, but some arrows for jumps would be nice.
May 24, 2016
On 05/24/2016 03:47 PM, qznc wrote:
>
> PS: Any recommendations for a Linux dissassembler? Objdump is ok, but
> some arrows for jumps would be nice.

Try http://asm.dlang.org for short tests. -- Andrei
May 24, 2016
On Tuesday, 24 May 2016 at 19:47:52 UTC, qznc wrote:
> PS: Any recommendations for a Linux dissassembler? Objdump is ok, but some arrows for jumps would be nice.

I discovered radare2. While it is actually a debugger (like gdb), it includes a nice disassembler. Example view, which shows the loop calling std.algorithm.find repeatedly:

       |   0x00402e08,   0 |           nop [rax+rax+0x0]
      .--> 0x00402e10,   0 |           mov rdx, [rbx]
      ||   0x00402e13    0 |           mov rcx, [rbx+0x8]
      ||   0x00402e17    0 |           mov rdi, r15
      ||   0x00402e1a    0 |           mov rsi, r12
      ||   0x00402e1d    0 |           call 0x409160  ; 4 = sym._D3std9algorithm9searching34__T4fin
      ||   0x00402e22    0 |           mov rcx, [rbx]
      ||   0x00402e25    0 |           sub rcx, rax
      ||   0x00402e28,   0 |           mov [rbx+0x20], rcx
      ||   0x00402e2c,   0 |           dec ebp
      `==< 0x00402e2e    0 |           jnz 0x402e10  ; 5 = 0x00402e10
       |   0x00402e30,   0 |           lea r15, [rsp]

Note the ASCII arrow on the left! Objdump should do that. :)
May 25, 2016
On Tuesday, 24 May 2016 at 22:01:17 UTC, qznc wrote:
> On Tuesday, 24 May 2016 at 19:47:52 UTC, qznc wrote:
>
> I discovered radare2. While it is actually a debugger (like gdb), it includes a nice disassembler. Example view, which shows the loop calling std.algorithm.find repeatedly:
>
>        |   0x00402e08,   0 |           nop [rax+rax+0x0]
>       .--> 0x00402e10,   0 |           mov rdx, [rbx]
>       ||   0x00402e13    0 |           mov rcx, [rbx+0x8]
>       ||   0x00402e17    0 |           mov rdi, r15
>       ||   0x00402e1a    0 |           mov rsi, r12
>       ||   0x00402e1d    0 |           call 0x409160  ; 4 = sym._D3std9algorithm9searching34__T4fin
>       ||   0x00402e22    0 |           mov rcx, [rbx]
>       ||   0x00402e25    0 |           sub rcx, rax
>       ||   0x00402e28,   0 |           mov [rbx+0x20], rcx
>       ||   0x00402e2c,   0 |           dec ebp
>       `==< 0x00402e2e    0 |           jnz 0x402e10  ; 5 = 0x00402e10
>        |   0x00402e30,   0 |           lea r15, [rsp]
>
> Note the ASCII arrow on the left! Objdump should do that. :)

Any suggestions how we can make find more efficient?
May 25, 2016
On Wednesday, 25 May 2016 at 10:28:47 UTC, Chris wrote:
> Any suggestions how we can make find more efficient?

I will send a pull request soon.

My current patch [0] improves it, but still contains a bug.

[0] https://github.com/qznc/find-is-too-slow/commit/76efc706a2c1d9e885800ca9b830850935700a95
May 25, 2016
On Wednesday, 25 May 2016 at 10:45:35 UTC, qznc wrote:
> On Wednesday, 25 May 2016 at 10:28:47 UTC, Chris wrote:
>> Any suggestions how we can make find more efficient?
>
> I will send a pull request soon.
>
> My current patch [0] improves it, but still contains a bug.
>
> [0] https://github.com/qznc/find-is-too-slow/commit/76efc706a2c1d9e885800ca9b830850935700a95

And here it is: https://github.com/dlang/phobos/pull/4362

Destroy!
May 25, 2016
On Wednesday, 25 May 2016 at 11:30:33 UTC, qznc wrote:
> On Wednesday, 25 May 2016 at 10:45:35 UTC, qznc wrote:
>> On Wednesday, 25 May 2016 at 10:28:47 UTC, Chris wrote:
>>> Any suggestions how we can make find more efficient?
>>
>> I will send a pull request soon.
>>
>> My current patch [0] improves it, but still contains a bug.
>>
>> [0] https://github.com/qznc/find-is-too-slow/commit/76efc706a2c1d9e885800ca9b830850935700a95
>
> And here it is: https://github.com/dlang/phobos/pull/4362
>
> Destroy!

Great stuff! I really appreciate your efforts. As I said, my code uses find/canFind a lot.
May 27, 2016
On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote:
> On 05/23/2016 03:11 PM, qznc wrote:
>
> Conventional wisdom has it that find() is brute force and that's that, but probably it's time to destroy. Selectively using advanced searching algorithms for the appropriate inputs is very DbI-ish.
>
> There are a few nice precedents of blend algorithms, see e.g. http://effbot.org/zone/stringlib.htm.
>
> Writing a generic subsequence search blend algorithm, one that chooses the right algorithm based on a combination of static and dynamic decisions, is quite a fun and challenging project. Who wanna?
>
>
> Andrei

I've played around with some algorithms and kept them as simple as possible, but I've found that a linear brute force for-loop always beats them (it's the extra decision(s), I suppose). It looks nice in theory, but in hardware reality a stupid loop is more efficient.

NB: I've found that `foreach` is usually faster than a manual `for`-loop.

I used qznc's benchmarking, and it's clear that the current implementation of std.algorithm.find is quite a poor performer. We do have to work on that. Library functions should be fast, no matter what. If someone uses `find`, they wanna be sure it's optimized for speed.
May 27, 2016
On 5/27/16 8:28 AM, Chris wrote:
> I've played around with some algorithms and kept them as simple as
> possible, but I've found that a linear brute force for-loop always beats
> them (it's the extra decision(s), I suppose). It looks nice in theory,
> but in hardware reality a stupid loop is more efficient.

This is surprising. It should be easy to construct examples where brute force search performs arbitrarily poor, e.g. searching for "aaaaaaa...aab" in a long string with only "a"s.

> NB: I've found that `foreach` is usually faster than a manual `for`-loop.

That's surprising too. I'd be interested to look at a disassembly if you have one.

> I used qznc's benchmarking, and it's clear that the current
> implementation of std.algorithm.find is quite a poor performer. We do
> have to work on that. Library functions should be fast, no matter what.
> If someone uses `find`, they wanna be sure it's optimized for speed.

That's definitely the way to go. Thanks for looking into it!


Andrei