Thread overview
[Issue 11350] New: libphobos2 regex match segfaults when a rare HTTP header is received
Oct 25, 2013
sha0coder
Oct 25, 2013
Dmitry Olshansky
Oct 25, 2013
Dmitry Olshansky
October 25, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11350

           Summary: libphobos2 regex match segfaults when a rare HTTP
                    header is received
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: sha0@badchecksum.net


--- Comment #0 from sha0coder <sha0@badchecksum.net> 2013-10-25 03:30:36 PDT ---
A simple std.net.curl.get() is performed to a remote host, which contains some
rare http headers, (I don't define the onReceiveHeader callback) but the
liphobos2 call to the default onReceiveHeader() which apply a regex to the
header, and then crashes.

I connect on this way:

    auto conn = HTTP();
    conn.connectTimeout(dur!"seconds"(4));
    conn.addRequestHeader("User-agent","Mozilla/5.0 (Windows NT 6.1; rv:20.0)
Gecko/20100101 Firefox/20.0");
    char[] html = get(url,conn);


It seems the bug is at:

/usr/include/dmd/phobos/std/regex.d  line 6348

6537 public auto match(R, RegEx)(R input, RegEx re)
6538     if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
6539 {
6540     return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
6541 }

Maybe is an encoding problem, it seems the input is:
>>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
da�H4STeF



(gdb) bt
#0  0xb76c8d13 in rt.deh2.terminate() () from
/usr/lib/i386-linux-gnu/libphobos2.so.0.63
#1  0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#2  0x080b04cc in
_D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
(this=0x95ac0774, input=646197483453546546, prog=...)
    at /usr/include/dmd/phobos/std/regex.d:6348
#3  0x080a09a2 in
_D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
(__HID46=0x95ac0b18, re=..., input=646197483453546546) at
/usr/include/dmd/phobos/std/regex.d:6540
#4  0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from
/usr/lib/i386-linux-gnu/libphobos2.so.0.63
#5  0xb769125a in std.net.curl.Curl.onReceiveHeader() () from
/usr/lib/i386-linux-gnu/libphobos2.so.0.63
#6  0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from
/usr/lib/i386-linux-gnu/libphobos2.so.0.63
#7  0xb72a5e7a in Curl_client_write () from
/usr/lib/i386-linux-gnu/libcurl.so.4
#8  0xb72a4912 in Curl_http_readwrite_headers () from
/usr/lib/i386-linux-gnu/libcurl.so.4
#9  0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4
#10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4
#11 0xb72be793 in curl_easy_perform () from
/usr/lib/i386-linux-gnu/libcurl.so.4
#12 0xb7691093 in std.net.curl.Curl.perform() () from
/usr/lib/i386-linux-gnu/libphobos2.so.0.63
#13 0xb768d8e1 in std.net.curl.HTTP._perform() () from
/usr/lib/i386-linux-gnu/libphobos2.so.0.63
#14 0xb768d734 in std.net.curl.HTTP.perform() () from
/usr/lib/i386-linux-gnu/libphobos2.so.0.63
#15 0x08081aac in
_D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa
(client=..., sendData=579669917507256320,
    url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762
#16 0x08081948 in
_D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa
(conn=..., url=10576998119117946914)
    at /usr/include/dmd/phobos/std/net/curl.d:364

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 25, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11350


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #1 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-10-25 11:21:26 PDT ---
(In reply to comment #0)
> 
> It seems the bug is at:
> 
> /usr/include/dmd/phobos/std/regex.d  line 6348
> 
> 6537 public auto match(R, RegEx)(R input, RegEx re)
> 6538     if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
> 6539 {
> 6540     return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
> 6541 }
> 
> Maybe is an encoding problem, it seems the input is:
> >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
> da�H4STeF
>

Would be nice to see what pattern that is and how exactly the argument to it looks like.

I tried to reproduce with this:

void main()
{
    import std.regex;

    ubyte[] header = [0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46];
    auto m = match(cast(char[]) header, regex("(.*?): (.*)$"));
    assert(m.empty);
}

I get:

std.utf.UTFException@C:\dmd2\windows\bin\..\..\src\phobos\std\utf.d(1113):
Invalid UTF-8 sequence (at index 1)

No crashes.
Now it may have to do with shared object / PIC code for all I know, as I'm
testing on Win32.

But w/o a smaller or at least complete reproduceble test-case there is nothing to work on.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 25, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11350



--- Comment #2 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-10-25 11:40:08 PDT ---
(In reply to comment #0)
> It seems the bug is at:

No and I think I know what it is.

> Maybe is an encoding problem, it seems the input is:
> >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
> da�H4STeF

Yes, this is broken UTF-8 and hence...
> 
> 
> 
> (gdb) bt
> #0  0xb76c8d13 in rt.deh2.terminate() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63

> #1  0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63

it throws and exception ...

> #2  0x080b04cc in
> _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
> (this=0x95ac0774, input=646197483453546546, prog=...)
>     at /usr/include/dmd/phobos/std/regex.d:6348

.. inside of std.regex.match. But the thing is - we are doing it inside of a callback of C-library CURL (browse the call stack to curl_easy_perform). IT HAS NO IDEA what to do with exception hence the crash.

So the fix would be to insulate it with try/catch inside of that onRecieve callback.

> #3  0x080a09a2 in
> _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
> (__HID46=0x95ac0b18, re=..., input=646197483453546546) at
> /usr/include/dmd/phobos/std/regex.d:6540
> #4  0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #5  0xb769125a in std.net.curl.Curl.onReceiveHeader() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #6  0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #7  0xb72a5e7a in Curl_client_write () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #8  0xb72a4912 in Curl_http_readwrite_headers () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #9  0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4
> #10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4
> #11 0xb72be793 in curl_easy_perform () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #12 0xb7691093 in std.net.curl.Curl.perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #13 0xb768d8e1 in std.net.curl.HTTP._perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #14 0xb768d734 in std.net.curl.HTTP.perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #15 0x08081aac in
> _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa
> (client=..., sendData=579669917507256320,
>     url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762
> #16 0x08081948 in
> _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa
> (conn=..., url=10576998119117946914)
>     at /usr/include/dmd/phobos/std/net/curl.d:364

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------