Thread overview
Too slow readln
Jul 16, 2017
unDEFER
Jul 16, 2017
Jon Degenhardt
Jul 16, 2017
unDEFER
Jul 16, 2017
unDEFER
Jul 17, 2017
Ali Çehreli
July 16, 2017
Hello, there!

I have the next "grep" code:
https://dpaste.dzfl.pl/7b7273f96ab2

And I have the directory to run it:
$ time /home/undefer/MyFiles/Projects/TEST/D/grep "HELLO" .
./strace.log: [pid 18365] write(1, "HELLO\n", 6HELLO

real	1m17.096s
user	0m54.828s
sys	0m13.340s

The same result I get with ldc2..

The same with bash and grep:
$ time for i in `find .`; do file -b "$i" | grep -q text && grep -a "HELLO" "$i"; done
[pid 18365] write(1, "HELLO\n", 6HELLO

real	0m42.461s
user	0m23.244s
sys	0m22.300s

Only `file` for all files:
$ time find . -exec file {} + >/dev/null

real	0m15.013s
user	0m14.556s
sys	0m0.436s

Only grep for all files:
$ for i in `find .`; do file -b "$i" | grep -q text && echo "$i"; done > LIST1
$ time for i in `cat LIST1`; do grep -a "HELLO" "$i"; done
[pid 18365] write(1, "HELLO\n", 6HELLO

real	0m4.431s
user	0m1.112s
sys	0m3.148s

So 15+4.4 much lesser than 42.46. Why? How "find" so fast can run "file" so many times?
And why 42.461s much lesser 1m17.096s?

The second version of grep:
https://dpaste.dzfl.pl/9db5bc2f0a26

$ time /home/undefer/MyFiles/Projects/TEST/D/grep2 "HELLO" `cat LIST1`
./strace.log: [pid 18365] write(1, "HELLO\n", 6HELLO

real	0m1.871s
user	0m1.824s
sys	0m0.048s

$ time grep -a "HELLO" `cat LIST1`
./strace.log:[pid 18365] write(1, "HELLO\n", 6HELLO

real	0m0.075s
user	0m0.044s
sys	0m0.028s

The profiler says that readln eats CPU. So why 0m0.075s much lesser 0m1.871s?

How to write in D grep not slower than GNU grep?
July 16, 2017
On Sunday, 16 July 2017 at 17:03:27 UTC, unDEFER wrote:
> [snip]
>
> How to write in D grep not slower than GNU grep?

GNU grep is pretty fast, it's tough to beat it reading one line at a time. That's because it can play a bit of a trick and do the initial match ignoring line boundaries and correct line boundaries later. There's a good discussion in this thread ("Why GNU grep is fast" by Mike Haertel): https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

--Jon
July 16, 2017
On Sunday, 16 July 2017 at 17:37:34 UTC, Jon Degenhardt wrote:
> On Sunday, 16 July 2017 at 17:03:27 UTC, unDEFER wrote:
>> [snip]
>>
>> How to write in D grep not slower than GNU grep?
>
> GNU grep is pretty fast, it's tough to beat it reading one line at a time. That's because it can play a bit of a trick and do the initial match ignoring line boundaries and correct line boundaries later. There's a good discussion in this thread ("Why GNU grep is fast" by Mike Haertel): https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
>
> --Jon

Thank you. I understand yet another trick:
$ find . -exec file -bi {} +
is the same
$ file -bi `find .`
July 16, 2017
I understand the main problem. dirEntries by default follows symlinks.
Without it my first grep works only 28.338s. That really cool!
July 16, 2017
On 07/16/2017 10:37 AM, Jon Degenhardt wrote:

> There's a good discussion in this thread ("Why GNU grep is fast" by Mike
> Haertel):
> https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
>
> --Jon

Another fast GNU utility was on Reddit a month ago:


https://www.reddit.com/r/programming/comments/6gxf02/how_is_gnus_yes_so_fast_xpost_runix/

Ali