September 26, 2014 [Issue 13532] New: std.regex performance (enums; regex vs ctRegex) | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=13532 Issue ID: 13532 Summary: std.regex performance (enums; regex vs ctRegex) Product: D Version: D2 Hardware: All OS: All Status: NEW Keywords: performance Severity: enhancement Priority: P5 Component: Phobos Assignee: nobody@puremagic.com Reporter: thecybershadow@gmail.com I noticed something strange after accidentally introducing a performance regression in a program using std.regex. Benchmark program: /////////////////////////////////////////// import std.algorithm; import std.array; import std.conv; import std.datetime; import std.file; import std.regex; import std.stdio; import std.string; enum expr = `;.*`; enum repl = ""; enum fn = `alice30.txt`; enum N = 5000; string[] lines; void regexInline() { lines .map!(line => line .replaceAll(regex(expr), repl) ) .array ; } void regexAuto() { auto r = regex(expr); lines .map!(line => line .replaceAll(r, repl) ) .array ; } void regexStatic() { static r = regex(expr); lines .map!(line => line .replaceAll(r, repl) ) .array ; } void regexEnum() { enum r = regex(expr); lines .map!(line => line .replaceAll(r, repl) ) .array ; } void ctRegexInline() { lines .map!(line => line .replaceAll(ctRegex!expr, repl) ) .array ; } void ctRegexAuto() { auto r = ctRegex!expr; lines .map!(line => line .replaceAll(r, repl) ) .array ; } void ctRegexStatic() { static r = ctRegex!expr; lines .map!(line => line .replaceAll(r, repl) ) .array ; } void ctRegexEnum() { enum r = ctRegex!expr; lines .map!(line => line .replaceAll(r, repl) ) .array ; } Regex!char re(string pattern)() { static Regex!char r; if (r.empty) r = regex(pattern); return r; } void reInline() { lines .map!(line => line .replaceAll(re!expr, repl) ) .array ; } alias funcs = TypeTuple!( regexInline, regexAuto, regexStatic, regexEnum, ctRegexInline, ctRegexAuto, ctRegexStatic, ctRegexEnum, reInline, ); void main() { auto text = cast(string)read(fn); lines = text.splitLines(); auto results = benchmark!funcs(N); foreach (i, func; funcs) writeln( __traits(identifier, func), "\t", to!Duration(results[i]), ); } /////////////////////////////////////////// Here are my results: regexInline 10 secs, 174 ms, 254 μs, and 2 hnsecs regexAuto 8 secs, 249 ms, 92 μs, and 5 hnsecs regexStatic 8 secs, 155 ms, 231 μs, and 1 hnsec regexEnum 19 secs, 358 ms, 66 μs, and 8 hnsecs ctRegexInline 21 secs, 399 ms, 346 μs, and 5 hnsecs ctRegexAuto 10 secs, 57 ms, and 418 μs ctRegexStatic 10 secs, 66 ms, 489 μs, and 9 hnsecs ctRegexEnum 21 secs, 593 ms, 486 μs, and 9 hnsecs reInline 8 secs, 430 ms, 852 μs, and 3 hnsecs The first surprise for me was that declaring a regex object (either Regex or StaticRegex) with "enum" was so much slower. It makes sense now that I think about it: creating a struct literal inside a loop will be more expensive than referencing one already residing somewhere in memory. Perhaps it might be worth mentioning in the documentation to avoid using enum with compiled regexes. The second surprise was that ctRegex was slower than regular regex, although the difference is not significative. I don't know whether this needs any action, feel free to WONTFIX. -- |
Copyright © 1999-2021 by the D Language Foundation