Thread overview
[Issue 17161] [REG 2.072.2] Massive Regex Slowdown
Feb 09, 2017
Jack Stouffer
Feb 09, 2017
Jack Stouffer
Feb 09, 2017
Dmitry Olshansky
Feb 09, 2017
Dmitry Olshansky
Feb 09, 2017
Jack Stouffer
February 09, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #1 from Jack Stouffer <jack@jackstouffer.com> ---
Introduced here: https://github.com/dlang/phobos/pull/4995

--
February 09, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #2 from Jack Stouffer <jack@jackstouffer.com> ---
Bad news: I see a similar performance decrease for run-time regex as well.

# 2.073.0
$ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
./test2  4.44s user 0.09s system 98% cpu 4.591 total

# 2.072.2
~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt |
time ./test2
./test2  3.20s user 0.09s system 98% cpu 3.344 total

I consistently get around a second and a half longer run time with 2.073.

Code

import std.algorithm;
import std.array;
import std.range;
import std.regex;
import std.stdio;
import std.typecons;
import std.utf;

static variants = [
    "agggtaaa|tttaccct",
    "[cgt]gggtaaa|tttaccc[acg]",
    "a[act]ggtaaa|tttacc[agt]t",
    "ag[act]gtaaa|tttac[agt]ct",
    "agg[act]taaa|ttta[agt]cct",
    "aggg[acg]aaa|ttt[cgt]ccct",
    "agggt[cgt]aa|tt[acg]accct",
    "agggta[cgt]a|t[acg]taccct",
    "agggtaa[cgt]|[acg]ttaccct",
];

void main()
{
    auto app = appender!string;
    app.reserve(5_000_000);
    app.put(stdin
        .byLineCopy(KeepTerminator.yes)
        .joiner
        .byChar);

    auto seq = app.data;

    auto regexLineFeeds = regex(">.*\n|\n");
    seq = seq.replaceAll(regexLineFeeds, "");

    foreach (pattern; variants)
    {
        writeln(pattern, " ", seq.matchAll(pattern).walkLength);
    }
}

--
February 09, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com

--- Comment #3 from Dmitry Olshansky <dmitry.olsh@gmail.com> ---
(In reply to Jack Stouffer from comment #2)
> Bad news: I see a similar performance decrease for run-time regex as well.
> 
> # 2.073.0
> $ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
> ./test2  4.44s user 0.09s system 98% cpu 4.591 total
> 
> # 2.072.2
> ~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt
> | time ./test2
> ./test2  3.20s user 0.09s system 98% cpu 3.344 total
> 
> I consistently get around a second and a half longer run time with 2.073.
> 

This is interesting find, thanks for sharing!

Will investigate the R-T issue, C-T is (sadly) to be expected.

--
February 09, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> ---
(In reply to Dmitry Olshansky from comment #3)
> (In reply to Jack Stouffer from comment #2)
> > Bad news: I see a similar performance decrease for run-time regex as well.
> > 
> > # 2.073.0
> > $ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
> > ./test2  4.44s user 0.09s system 98% cpu 4.591 total
> > 
> > # 2.072.2
> > ~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt
> > | time ./test2
> > ./test2  3.20s user 0.09s system 98% cpu 3.344 total
> > 
> > I consistently get around a second and a half longer run time with 2.073.
> > 
> 
> This is interesting find, thanks for sharing!
> 
> Will investigate the R-T issue, C-T is (sadly) to be expected.

Mystery solved - in R-T version regex is parsed at C-T (because of static) therefore the disabling of Kickstart affect it too.

--
February 09, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #5 from Jack Stouffer <jack@jackstouffer.com> ---
(In reply to Dmitry Olshansky from comment #3)
> Will investigate the R-T issue, C-T is (sadly) to be expected.

Is there anyway to revert the CT regex to 2.072 behavior? It would be great if a performance regression of this size on one of the selling points of D could be fixed immediately.

--
February 12, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #6 from github-bugzilla@puremagic.com ---
Commits pushed to stable at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/5a2491a847beb035b37ee2a270029499065b1919 Fix Issue 17161 - Revert all changes to std.regex from 2.072.2 onwards

https://github.com/dlang/phobos/commit/c4f4cfeda6ba60e2df6eef05bc1f8946982e9a99 Merge pull request #5113 from JackStouffer/revert-regex

Issue 17161 - [REG 2.072.2] Massive Regex Slowdown

--
February 12, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

github-bugzilla@puremagic.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--
February 16, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #7 from github-bugzilla@puremagic.com ---
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/5a2491a847beb035b37ee2a270029499065b1919 Fix Issue 17161 - Revert all changes to std.regex from 2.072.2 onwards

https://github.com/dlang/phobos/commit/c4f4cfeda6ba60e2df6eef05bc1f8946982e9a99 Merge pull request #5113 from JackStouffer/revert-regex

--
February 24, 2017
https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #8 from github-bugzilla@puremagic.com ---
Commits pushed to newCTFE at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/5a2491a847beb035b37ee2a270029499065b1919 Fix Issue 17161 - Revert all changes to std.regex from 2.072.2 onwards

https://github.com/dlang/phobos/commit/c4f4cfeda6ba60e2df6eef05bc1f8946982e9a99 Merge pull request #5113 from JackStouffer/revert-regex

--