May 11, 2018
On Friday, May 11, 2018 11:44:04 Steven Schveighoffer via Digitalmars-d- announce wrote:
> Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers).

Curiously, the grep on FreeBSD seems to be GNU's grep with some additional patches, though I expect that it's a ways behind whatever GNU is releasing now, because while they were willing to put some GPLv2 stuff in FreeBSD, they have not been willing to have anything to do with GPLv3. FreeBSD's grep claims to be version 2.5.1-FreeBSD, whereas ports has the gnugrep package which is version 2.27, so that implies a fairly large version difference between the two. I have no idea how they compare in terms of performance. Either way, I would have expected FreeBSD to be using their own implementation, not something from GNU, especially since they seem to be trying to purge GPL stuff from FreeBSD. So, the fact that FreeBSD is using GNU's grep is a bit surprising. If I had to guess, I would guess that they switched to the GNU version at some point in the past, because it was easier to grab it than to make what they had faster, but I don't know. Either way, it sounds like Mac OS X either didn't take their grep from FreeBSD in this case, or they took it from an older version before FreeBSD switching to using GNU's grep.

- Jonathan M Davis
May 11, 2018
On 5/11/18 11:44 AM, Steven Schveighoffer wrote:
> On 5/10/18 7:22 PM, Steven Schveighoffer wrote:
> 
>> However, this example *does* show the power of iopipe -- it handles all flavors of unicode with one template function, is quite straightforward (though I want to abstract the line tracking code, that stuff is really tricky to get right). Oh, and it's roughly 10x faster than grep, and a bunch faster than fgrep, at least on my machine ;) I'm tempted to add regex processing to see if it still beats grep.
> 
> Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers).
> 
> So at least there is something to strive for :)

More testing reveals that as I increase the context lines to print, iopipe performs better than GNU grep. A shocking thing is that at 9 lines of context, grep goes up slightly, but all of a sudden at 10 lines of context, it doubles in the time taken (and is now slower than the iopipe_search).

Also noting: my Linux VM does not have ldc, so these are dmd numbers.

-Steve
May 11, 2018
On Friday, 11 May 2018 at 15:44:04 UTC, Steven Schveighoffer wrote:
> On 5/10/18 7:22 PM, Steven Schveighoffer wrote:
>
> Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers).

Yeah, the MacOS default versions of the Unix text processing tools are really slow. It's worth installing the GNU versions if doing performance comparisons on MacOS, or because you work with large files. Homebrew and MacPorts both have the GNU versions. Some relevant packages: coreutils, grep, gsed (sed), gawk (awk).

Most tools are in coreutils. Many will be installed with a 'g' prefix by default, leaving the existing tools in place. e.g. 'cut' will be installed as 'gcut' unless specified otherwise.

--Jon

May 11, 2018
On Friday, 11 May 2018 at 16:07:26 UTC, Steven Schveighoffer wrote:
> On 5/11/18 11:44 AM, Steven Schveighoffer wrote:
>> On 5/10/18 7:22 PM, Steven Schveighoffer wrote:
>> 
>>> [...]
>> 
>> Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers).
>> 
>> So at least there is something to strive for :)
>
> More testing reveals that as I increase the context lines to print, iopipe performs better than GNU grep. A shocking thing is that at 9 lines of context, grep goes up slightly, but all of a sudden at 10 lines of context, it doubles in the time taken (and is now slower than the iopipe_search).
>
> Also noting: my Linux VM does not have ldc, so these are dmd numbers.
>
> -Steve

What stops you from downloading a linux release from here?

https://github.com/ldc-developers/ldc/releases
May 11, 2018
On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote:
> OK, so at dconf I spoke with a few very smart guys about how I can use mmap to make a zero-copy buffer. And I implemented this on the plane ride home.
>
> However, I am struggling to find a use case for this that showcases why you would want to use it. While it does work, and works beautifully, it doesn't show any measurable difference vs. the array allocated buffer that copies data when it needs to extend.
>
> If anyone has any good use cases for it, I'm open to suggestions. Something that is going to potentially increase performance is an application that needs to keep the buffer mostly full when extending (i.e. something like 75% full or more).
>
> The buffer is selected by using `rbufd` instead of just `bufd`. Everything should be a drop-in replacement except for that.
>
> Note: I have ONLY tested on Macos, so if you find bugs in other OSes let me know. This is still a Posix-only library for now, but more on that later...
>
> As a test for Ring buffers, I implemented a simple "grep-like" search program that doesn't use regex, but phobos' canFind to look for lines that match. It also prints some lines of context, configurable on the command line. The lines of context I thought would show better performance with the RingBuffer than the standard buffer since it has to keep a bunch of lines in the buffer. But alas, it's roughly the same, even with large number of lines for context (like 200).
>
> However, this example *does* show the power of iopipe -- it handles all flavors of unicode with one template function, is quite straightforward (though I want to abstract the line tracking code, that stuff is really tricky to get right). Oh, and it's roughly 10x faster than grep, and a bunch faster than fgrep, at least on my machine ;) I'm tempted to add regex processing to see if it still beats grep.
>
> Next up (when my bug fix for dmd is merged, see https://issues.dlang.org/show_bug.cgi?id=17968) I will be migrating iopipe to depend on https://github.com/MartinNowak/io, which should unlock Windows support (and I will add RingBuffer Windows support at that point).
>
> Enjoy!
>
> https://github.com/schveiguy/iopipe
> https://code.dlang.org/packages/iopipe
> http://schveiguy.github.io/iopipe/
>
> -Steve

Since mmap is involved, it would be interesting to see if this can be extended for interprocess communication, akin boost::interprocess https://www.boost.org/doc/libs/1_67_0/doc/html/interprocess.html

boost::interprocess uses mmap[1] followed by shm_open[2] by default (unless specified to use SysV shm)

[1] https://github.com/boostorg/interprocess/blob/4f8459e868617f88ff105633a9aa82221d5e9bb1/include/boost/interprocess/mapped_region.hpp#L698
[2] https://github.com/boostorg/interprocess/blob/develop/include/boost/interprocess/shared_memory_object.hpp#L315

May 11, 2018
On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer wrote:
> On 5/11/18 1:30 AM, Dmitry Olshansky wrote:
>> On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote:
>>> OK, so at dconf I spoke with a few very smart guys about how I can use mmap to make a zero-copy buffer. And I implemented this on the plane ride home.
>>>
>>> However, I am struggling to find a use case for this that showcases why you would want to use it. While it does work, and works beautifully, it doesn't show any measurable difference vs. the array allocated buffer that copies data when it needs to extend.
>> 
>> I’d start with something clinicaly synthetic.
>> Say your record size is exactly half of buffer + 1 byte. If you were to extend the size of buffer, it would amortize.
>
> Hm.. this wouldn't work, because the idea is to keep some of the buffer full. What will happen here is that the buffer will extend to be able to accomodate the extra byte, and then you are back to having less of the buffer full at once. Iopipe is not afraid to increase the buffer :)

Then you cannot test it in such way.

>
>> 
>> Basically:
>> 16 Mb buffer fixed
>> vs
>> 16 Mb mmap-ed ring
>> 
>> Where you read pieces in 8M+1 blocks.Yes, we are aiming to blow the CPU cache there. Otherwise CPU cache is so fast that ocasional copy is zilch, once we hit primary memory it’s not. Adjust sizes for your CPU.
>
> This isn't how it will work. The system looks at the buffer and says "oh, I can just read 8MB - 1 byte," which gives you 2 bytes less than you need. Then you need the extra 2 bytes, so it will increase the buffer to hold at least 2 records.
>
> I do get the point of having to go outside the cache. I'll look and see if maybe specifying a 1000 line context helps ;)

Nope. Consider reading binary records where you know length in advance and skip over it w/o need to touch every byte. There it might help. If you touch every byte and do something the cost of copying the tail is zilch.

One example is net string which is:

13,Hello, world!

Basically length in ascii digits ‘,’ followed by tgat much UTF-8 codeunits.
No decoding nessary.

Torrent files use that I think, maybe other files. Is a nice example that avoids scans to find delimiters.

>
> Update: nope, still pretty much the same.
>
>> The amount of work done per byte though has to be minimal to actually see anything.
>
> Right, this is another part of the problem -- if copying is so rare compared to the other operations, then the difference is going to be lost in the noise.
>
> What I have learned here is:
>
> 1. Ring buffers are really cool (I still love how it works) and perform as well as normal buffers

This is also good. Normal ring buffers usually suck  in speed department.

> 2. The use cases are much smaller than I thought
> 3. In most real-world applications, they are a wash, and not worth the OS tricks needed to use it.
> 4. iopipe makes testing with a different kind of buffer really easy, which was one of my original goals. So I'm glad that works!
>
> I'm going to (obviously) leave them there, hoping that someone finds a good use case, but I can say that my extreme excitement at getting it to work was depressed quite a bit when I found it didn't really gain much in terms of performance for the use cases I have been doing.
>> Should be mostly trivial in fact. I mean our first designs for IOpipe is where I wanted regex to work with it.
>> 
>> Basically - if we started a match, extend window until we get it or lose it. Then release up to the next point of potential start.
>
> I'm thinking it's even simpler than that. All matches are dead on a line break (it's how grep normally works), so you simply have to parse the lines and run each one via regex. What I don't know is how much it costs regex to startup and run on an individual line.

It is malloc/free/addRange/removeRange for each call. I optimized 2.080 to reuse last recently used engine w/o these costs but I’ll have to check if it covers all cases.

>
> One thing I could do to amortize is keep 2N lines in the buffer, and run the regex on a whole context's worth of lines, then dump them all.

I believe integrating iopipe awareness it in regex will easily make it 50% faster. A guestimate though.

>
> I don't get why grep is so bad at this, since it is supposedly

grep on Mac is a piece of sheat, sadly and I don’t know why exactly (too old?). Use some 3-rd party thing like ‘sift’ written in Go.

>
> -Steve

May 12, 2018
On Friday, 11 May 2018 at 23:46:16 UTC, Dmitry Olshansky wrote:
> On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer wrote:
>> On 5/11/18 1:30 AM, Dmitry Olshansky wrote:
>>> On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote:
> grep on Mac is a piece of sheat, sadly and I don’t know why exactly (too old?). Use some 3-rd party thing like ‘sift’ written in Go.

You can always use GNU grep. The one that comes with macOS is pretty old and slow. If you have macports, its just `port install grep`. I'm sure brew will have a similar package for GNU grep.
May 12, 2018
On 5/11/18 5:42 PM, Joakim wrote:
> On Friday, 11 May 2018 at 16:07:26 UTC, Steven Schveighoffer wrote:
>> On 5/11/18 11:44 AM, Steven Schveighoffer wrote:
>>> On 5/10/18 7:22 PM, Steven Schveighoffer wrote:
>>>
>>>> [...]
>>>
>>> Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers).
>>>
>>> So at least there is something to strive for :)
>>
>> More testing reveals that as I increase the context lines to print, iopipe performs better than GNU grep. A shocking thing is that at 9 lines of context, grep goes up slightly, but all of a sudden at 10 lines of context, it doubles in the time taken (and is now slower than the iopipe_search).
>>
>> Also noting: my Linux VM does not have ldc, so these are dmd numbers.
>>
> 
> What stops you from downloading a linux release from here?
> 
> https://github.com/ldc-developers/ldc/releases

So I did that, it's not much faster, a few milliseconds. Still about half as fast as GNU grep.

But I am not expecting any miracles here. GNU grep does pretty much everything it can to achieve performance -- including eschewing the standard library buffering system as I am doing. I can probably match the performance at some point, but I doubt it's worth worrying about. It's still really really fast without trying to do anything crazy.

I hope at some point, however, to work with Dmitry to add iopipe-based regex engine so we can see how much better we can make regex.

-Steve
May 12, 2018
On Saturday, 12 May 2018 at 12:14:28 UTC, Steven Schveighoffer wrote:
> On 5/11/18 5:42 PM, Joakim wrote:
>> On Friday, 11 May 2018 at 16:07:26 UTC, Steven Schveighoffer wrote:
>>>[...]
>> 
>> What stops you from downloading a linux release from here?
>> 
>> https://github.com/ldc-developers/ldc/releases
>
> So I did that, it's not much faster, a few milliseconds. Still about half as fast as GNU grep.
>
> But I am not expecting any miracles here. GNU grep does pretty much everything it can to achieve performance -- including eschewing the standard library buffering system as I am doing. I can probably match the performance at some point, but I doubt it's worth worrying about. It's still really really fast without trying to do anything crazy.
>

I could offer a few tricks to fix that w/o getting too dirty. GNU grep is fast, but std.regex is faster then that in raw speed on a significant class of quite common patterns. But I loaded file at once.

> I hope at some point, however, to work with Dmitry to add iopipe-based regex engine so we can see how much better we can make regex.

As such initiative goes it’s either now or never. Please get in touch directly over Slack or smth, let’s make it roll. I wanted to do grep-like utility since 2012. Now at long last we have all the building blocks.

>
> -Steve

May 12, 2018
On Saturday, 12 May 2018 at 12:45:16 UTC, Dmitry Olshansky wrote:
> On Saturday, 12 May 2018 at 12:14:28 UTC, Steven Schveighoffer wrote:
>>[...]
>
> I could offer a few tricks to fix that w/o getting too dirty. GNU grep is fast, but std.regex is faster then that in raw speed on a significant class of quite common patterns. But I loaded file at once.
>
>> [...]
>
> As such initiative goes it’s either now or never. Please get in touch directly over Slack or smth, let’s make it roll. I wanted to do grep-like utility since 2012. Now at long last we have all the building blocks.

If you're talking about writing a grep prototype in D, that's a great idea, especially for publicizing D. :)