Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
April 04, 2019 Poor regex performance? | ||||
---|---|---|---|---|
| ||||
The following code, that just runs a regex against a large exim log to report on top senders, is 140 times slower than similar C code using PCRE, when compiled with just -O. With a bunch of other flags I got it down to only 13x slower than C code that's using libc regcomp/regexec. import std.stdio, std.string, std.regex, std.array, std.algorithm; T min(T)(T a, T b) { if (a < b) return a; return b; } void main() { ulong[string] emailcounts; auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^@]+@(\S+))"); foreach (line; File("exim_mainlog").byLine()) { auto m = line.match(re); if (m) { ++emailcounts[m.front[1].idup]; } } string[] senders = emailcounts.keys; sort!((a, b) { return emailcounts[a] > emailcounts[b]; })(senders); foreach (i; 0 .. min(senders.length, 5)) { writefln("%5s %s", emailcounts[senders[i]], senders[i]); } } Other code's available at https://github.com/jrfondren/topsender-bench I get D down to 1.2x slower with PCRE and getline() I wrote this part of the way through chapter 1 of "The D Programming Language", so my question is mainly: is this a fair result? std.regex is very slow and I should reach for PCRE if regex speed matters? Or is this code severely flawed somehow? I'm using a random production log; not trying to make things difficult. Relatedly, how can I add custom compiler flags to rdmd, in a D script? For example, -L-lpcre |
April 04, 2019 Re: Poor regex performance? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Julian | If you need performance use ldc not dmd (assumed). LLVM has many factors better code optimizes than dmd does. |
April 04, 2019 Re: Poor regex performance? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Julian | On Thursday, 4 April 2019 at 09:53:06 UTC, Julian wrote: > > Relatedly, how can I add custom compiler flags to rdmd, in a D script? > For example, -L-lpcre Configuration variable "DFLAGS". On Windows you can specify it in the sc.ini file. On Linux: https://dlang.org/dmd-linux.html |
April 04, 2019 Re: Poor regex performance? | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote: > If you need performance use ldc not dmd (assumed). > > LLVM has many factors better code optimizes than dmd does. Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE. |
April 04, 2019 Re: Poor regex performance? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Julian | On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote:
> On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote:
>> If you need performance use ldc not dmd (assumed).
>>
>> LLVM has many factors better code optimizes than dmd does.
>
> Thanks! I already had dmd installed from a brief look at D a long
> time ago, so I missed the details at https://dlang.org/download.html
>
> ldc2 -O3 does a lot better, but the result is still 30x slower
> without PCRE.
You need to disable the GC.
by importing core.memory : GC;
and calling GC.Disable();
the next thing is to avoid the .idup and cast to string instead.
|
April 04, 2019 Re: Poor regex performance? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Julian | On Thu, Apr 04, 2019 at 09:53:06AM +0000, Julian via Digitalmars-d-learn wrote: [...] > auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^@]+@(\S+))"); [...] ctRegex is a crock; use regex() instead and it might actually work better. T -- Stop staring at me like that! It's offens... no, you'll hurt your eyes! |
April 04, 2019 Re: Poor regex performance? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Julian | On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote: > On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote: >> If you need performance use ldc not dmd (assumed). >> >> LLVM has many factors better code optimizes than dmd does. > > Thanks! I already had dmd installed from a brief look at D a long > time ago, so I missed the details at https://dlang.org/download.html > > ldc2 -O3 does a lot better, but the result is still 30x slower > without PCRE. Try: ldc2 -O3 -release -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -enable-inlining This will improve inlining and optimization across the runtime library boundaries. This can help in certain types of code. |
Copyright © 1999-2021 by the D Language Foundation