Thread overview
[Issue 8725] New: segmentation fault with negative-lookahead in module-level regex
Sep 26, 2012
Val Markovic
Sep 26, 2012
Val Markovic
Sep 26, 2012
Dmitry Olshansky
Sep 26, 2012
Val Markovic
Nov 30, 2012
Dmitry Olshansky
September 26, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725

           Summary: segmentation fault with negative-lookahead in
                    module-level regex
           Product: D
           Version: D2
          Platform: x86_64
        OS/Version: Mac OS X
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: val@markovic.io


--- Comment #0 from Val Markovic <val@markovic.io> 2012-09-25 22:31:39 PDT ---
The following program crashes with a segmentation fault:

-------------
#!/usr/bin/env rdmd

import std.stdio;
import std.regex;

auto italic = regex( r"\*
                    (?!\s+)
                    (.*?)
                    (?!\s+)
                    \*", "gx" );

void main() {
  string input = "this * is* interesting, *very* interesting";
  writeln( replace( input, italic, "<i>$1</i>" ) );
}
--------------

If one removes the first line with (?!\s+), then the program doesn't crash.

I was under the impression that this snippet of code operates under the SafeD subset and therefore shouldn't cause a segmentation fault. A thrown exception on problems or something, that I can understand. But a segfault?

In other sad news, these are the first lines of D I've ever written :( ... so much for experimentation...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 26, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725



--- Comment #1 from Val Markovic <val@markovic.io> 2012-09-25 22:33:03 PDT ---
Oh, and the segfault goes away if I put the regex creation directly in the call, like so:

  writeln( replace( input, regex( r"\*
                                  (?!\s+)
                                  (.*?)
                                  (?!\s+)
                                  \*", "gx" ), "<i>$1</i>" ) );

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 26, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #2 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-09-26 06:46:49 PDT ---
I suspect that is a long standing bug with compile-time evaluation that compiler parses regex pattern at compile time wrongly (unlike at R-T). See also: http://d.puremagic.com/issues/show_bug.cgi?id=7810

The problem is that once D compiler sees an initialized global variable it has to const-fold it:

int fact10 = factorial(10);
//will compute and hardcode the value of factorial(10)

then with regex ...:
auto italic = regex( ... );
// *parses* and *generates* binary object for compiled regex pattern object
with all the datastructures for matching it
All of this *at compile time* via CTFE, see about it here (near the bottom of):
http://dlang.org/function.html

Though previously it only caused unexpectedly long compilation time (CTFE is
slow) and in a select cases it failed with assert *during compilation*, it
never segfaulted.
Probably internal structure has subtle corruption that self-test failed to
catch.

E.g this one also works because italic regex is created at run-time:

import std.stdio;
import std.regex;


void main() {
 auto italic = regex( r"\*
                    (?!\s+)
                    (.*?)
                    (?!\s+)
                    \*", "gx" );
  string input = "this * is* interesting, *very* interesting";
  writeln( replace( input, italic, "<i>$1</i>" ) );
}

Also a tip: the second lookahead should be lookbehind! As is is it will test that \* is not a space indeed... Also both can be just \s, because \s+ matches whenever \s matches. And since you don't capture the contents of lookahead/lookbehind it'll be faster/simpler to use a single \s.

About SafeD: it shouldn't segfault but the program listed is @system (as this
is the default) :). Otherwise since regex is @trusted, it's my responsibilty to
verfiy  that it is memory safe, so blame me (or rather the compiler).

To be actually in SafeD try putting @safe: at the top of your code or just tag
main and all functions with @safe.
AFAIK writeln in SafeD  wouldn't work as it's still @system (obviously it
should be safe/trusted). To be honest SafeD hasn't been addressed properly in
the standard library yet.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 26, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725



--- Comment #3 from Val Markovic <val@markovic.io> 2012-09-26 09:39:30 PDT ---
Thanks for the explanation!

WRT the regex string being faulty, I was aware of that; I was just experimenting when I encountered a segfault.

Thanks for the pointer about adding @safe: at the top; too bad writeln is still @system. That kinda kills the usefulness of SafeD, doesn't it? I mean if I literally can't write a Hello World program in SafeD, then SafeD is quite far from ready. :)

I've read the TDPL last week and this is my first encounter with writing real D code; all in all, the language is freaking awesome (goodbye C++) and I'm even willing to live with esoteric bugs in the compiler/libs if I can work around them. I understand that D is still a work-in-progress language.

I intend to write a substantial (multi KLOC) D program as a learning experience; will report any bugs I find as I find them.

Anyway, good luck fixing this. :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 30, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-11-30 12:49:42 PST ---
Works with current git master.
Must have been fixed along with the compiler bug in 7810.

*** This issue has been marked as a duplicate of issue 7810 ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 01, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725



--- Comment #5 from github-bugzilla@puremagic.com 2012-12-01 00:12:43 PST ---
Commit pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/0f2947d4d1360f0a0f797279e6f13f95695e45ec bugfixes for compile-time regex

fix issue 8725

fix issue 8349

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------