Jump to page: 1 2
Thread overview
[Issue 9438] New: Strange RefCounted stack overflow
Feb 01, 2013
Maxim Fomin
Feb 01, 2013
Maxim Fomin
Feb 01, 2013
Maxim Fomin
Feb 01, 2013
Maxim Fomin
Feb 01, 2013
Maxim Fomin
Feb 01, 2013
Maxim Fomin
Feb 04, 2013
Maxim Fomin
Feb 09, 2013
Maxim Fomin
Feb 11, 2013
Walter Bright
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438

           Summary: Strange RefCounted stack overflow
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: monarchdodra@gmail.com


--- Comment #0 from monarchdodra@gmail.com 2013-02-01 07:26:12 PST ---
I've been chasing this on and off for a couple months now. Basically, trying to access the RefCounted.refCountedStore.isInitialized of a non-initialized RefCounted in a field of a temporary will create a stack overflow. I know that's not clear, but here is the reduced usecase:

//----
import std.container, std.stdio, std.typecons, std.exception;

struct S
{
  RefCounted!int _data;

  this(int)
  {_data.refCountedStore.ensureInitialized();}

  int get() @property
  {
      writeln("here");
      enforce(_data.refCountedStore.isInitialized); //OH NOES!!!
      writeln("there");
      return _data.refCountedPayload;
  }
}

void main()
{
    // 1)
    writeln(S(1).get);

    // 2)
    S s;
    writeln(s.get).collectException();

    // 3)
    writeln(S().get);
}
//----

1) This will create a temporary S, and intialize the ref counted. the writeln works.

2) The creates a non-temporary S. Trying to access the ref counted will
(correctly) throw an exception.

3) This will stack overflow at the "//OH NOES!!!" line: It will first call:

ref inout(RefCountedStore) refCountedStore() inout

To get the store, and then will recursively call "isInitialized" until the program stack overflows. I have no idea why:

//----
        @property nothrow @safe
        bool isInitialized() const
        {
            return _store !is null;
        }
//----

This seems to me like the tip of a more serious bug somewhere. I would be very pleased if someone with more knowledge than me could try to look into it?

I think it might also create problems with things such as arrays of arrays: Every time I've tried to fix http://d.puremagic.com/issues/show_bug.cgi?id=6153 I've had crashes (NOT asserts/enforeces), and I think this might be the reason.

Originally found with this code:
//----
void main()
{
    writeln(Array!int()[0]);
}
//----

Yes, the code is wrong, but it should *assert*. Currently, it just dies.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438


Maxim Fomin <maxim@maxim-fomin.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maxim@maxim-fomin.ru


--- Comment #1 from Maxim Fomin <maxim@maxim-fomin.ru> 2013-02-01 08:11:50 PST ---
The situation is more complicated.

import std.container, std.stdio, std.typecons, std.exception;

struct S
{
  RefCounted!int _data;

  this(int)
  {_data.refCountedStore.ensureInitialized();}

  int get() @property
  {
      writeln("here");
      enforce(_data.refCountedStore.isInitialized); //OH NOES!!! //13
      writeln("there");
      return _data.refCountedPayload;
  }
}

void main()
{
    version (A) {
      writeln(S(1).get);
     }

    version (B) {
      S s;
      writeln(s.get).collectException();
    }

    version (C) {
        writeln(S().get);
    }
}

When compiling with version A or B, everything is fine. Version C fails enforcement on line 13. Both A and C throws failed enforcement. Both B and C segfault in Refcounted dtor. Tested on linux64 git head.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438



--- Comment #2 from Maxim Fomin <maxim@maxim-fomin.ru> 2013-02-01 08:25:53 PST ---
Oh, it depends on compiler switches. Reduced from previous by combining B and C:

import std.container, std.stdio, std.typecons, std.exception;

struct S
{
  RefCounted!int _data;

  int get() @property
  {
      writeln("here");
      enforce(_data.refCountedStore.isInitialized); //OH NOES!!!
      writeln("there");
      return _data.refCountedPayload;
  }
}

void main()
{
      S s;
      writeln(s.get).collectException();
        writeln(S().get);
}

When compiling with -g, it prints "here", "here" and enforcement failure, without -g it segfaults as like above. Since valgrind does not detect for -g version memory errors, the issue may be dmd codegen bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438


monarchdodra@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
           Severity|major                       |critical


--- Comment #3 from monarchdodra@gmail.com 2013-02-01 09:09:48 PST ---
I wanted to try to get RefCounted out of the way, and I was able to get a massively reduced test case. That said, the result is mind bogglingly wtf...

FYI: win32 on win7.64

//----
import std.stdio, std.exception;

//S is merely a struct with a pointer. And a destructor.
struct S
{
    int* _payload = null;

    ~this()
    {
        writeln("~this: ", _payload);
        if (!_payload) return;
        writeln("Should not be here...");
    }
}

void foo(int* p)
{
  throw new Exception("bla");
}

void main()
{
    int* p = S()._payload;
    writeln("wait for it...");
    foo(S()._payload);
}
//----
~this: null
wait for it...
~this: FFFF05E8
Should not be here...
object.Exception@main.d(17): bla
//----

Apparently, we create a temporary, an exception is thrown, the temporary gets corrupted, and then things start getting awry in the destructor.

I'm getting this also for as far back as 2.55 (didn't try anything earlier), so it doesn't seem to be a regression.

In any case, exceptions silently corrupting stack temporaries? That's a critical in my book.

Who was it again that said our destructors weren't very well tested?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438



--- Comment #4 from monarchdodra@gmail.com 2013-02-01 09:17:05 PST ---
(In reply to comment #3)
> I wanted to try to get RefCounted out of the way, and I was able to get a massively reduced test case. That said, the result is mind bogglingly wtf...

Just to add, I just tried compiling with different flags, including -O -release -debug -g:

The scenario occurs in all cases, except when the flag "-g" is set, in which case things work correctly.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438



--- Comment #5 from Maxim Fomin <maxim@maxim-fomin.ru> 2013-02-01 09:18:14 PST ---
(In reply to comment #3)
> I wanted to try to get RefCounted out of the way, and I was able to get a massively reduced test case. That said, the result is mind bogglingly wtf...
> 
> FYI: win32 on win7.64
> 
> //----

Cannot reproduce on linux 64 githead.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438



--- Comment #6 from Maxim Fomin <maxim@maxim-fomin.ru> 2013-02-01 10:07:03 PST ---
(In reply to comment #3)
> I wanted to try to get RefCounted out of the way, and I was able to get a massively reduced test case. That said, the result is mind bogglingly wtf...
> 
> FYI: win32 on win7.64
> 

Can reproduce. And -g fixes the program.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438



--- Comment #7 from Maxim Fomin <maxim@maxim-fomin.ru> 2013-02-01 12:56:04 PST ---
Reduced for linux

struct RefCounted
{
    void *p;
    ~this()
    {
        p = null;
    }
}

struct S
{
  RefCounted _data;

  int get() @property
  {
      throw new Exception("");
  }
}

void main()
{
      S s;
      S().get;
}

With -g it throws, without -g segfaults.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438



--- Comment #8 from Maxim Fomin <maxim@maxim-fomin.ru> 2013-02-01 13:24:49 PST ---
It really seems to be codegen bug. The problem is that presence of code like in main function (struct temporary + simple stack struct) makes dmd generate wrong exception handler table.

If you compile main.d one version with -release -O -noboundcheck and other version with the same switches and additionally with -g, you will have absolutely identical asm (obj2asm output) except the single difference is in data segment.

In segfaulting version you have

.data    segment
_HandlerTable0:
    db    050h,000h,000h,000h,063h,000h,000h,000h    ;P...c...
    db    002h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    019h,000h,000h,000h,048h,000h,000h,000h    ;....H...
    db    0ffffffffh,0ffffffffh,0ffffffffh,0ffffffffh,000h,000h,000h,000h
;........
    db    057h,000h,000h,000h,000h,000h,000h,000h    ;W.......
    db    02bh,000h,000h,000h,037h,000h,000h,000h    ;+...7...
    db    000h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    042h,000h,000h,000h,000h,000h,000h,000h    ;B....... // 42h

and in throwing version you will have

_HandlerTable0:
    db    050h,000h,000h,000h,063h,000h,000h,000h    ;P...c...
    db    002h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    019h,000h,000h,000h,048h,000h,000h,000h    ;....H...
    db    0ffffffffh,0ffffffffh,0ffffffffh,0ffffffffh,000h,000h,000h,000h
;........
    db    057h,000h,000h,000h,000h,000h,000h,000h    ;W.......
    db    02bh,000h,000h,000h,037h,000h,000h,000h    ;+...7...
    db    000h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    03eh,000h,000h,000h,000h,000h,000h,000h    ;>....... //3eh

If you patch incorrect binary, the bug goes away.

Corrupted handler table leads to following problem (asm snippet from main):

0x0000000000418888 <+60>:    jmp    <_Dmain+72>
0x000000000041888a <+62>:    lea    -0x10(%rbp),%rdi //3Eh
0x000000000041888e <+66>:    callq  <_D4main1S11__fieldDtorMFZv> //42h
0x0000000000418893 <+71>:    retq
0x0000000000418894 <+72>:    sub    $0x8,%rsp
0x0000000000418898 <+76>:    callq  0x4188a3 <_Dmain+87>
0x000000000041889d <+81>:    add    $0x8,%rsp
0x00000000004188a1 <+85>:    jmp    0x4188ad <_Dmain+97>
0x00000000004188a3 <+87>:    lea    -0x18(%rbp),%rdi
0x00000000004188a7 <+91>:    callq  0x418810 <_D4main1S11__fieldDtorMFZv>
0x00000000004188ac <+96>:    retq
0x00000000004188ad <+97>:    xor    %eax,%eax
0x00000000004188af <+99>:    pop    %r15

In segfaulting version druntime unwinds up to _Dmain+66, after instruction which sets into %rdi this reference, hence dtor receives corrupted pointer. In correct version druntime unwinds up to _Dmain+62, so the this pointer is correct.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 03, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9438



--- Comment #9 from monarchdodra@gmail.com 2013-02-03 09:26:40 PST ---
(In reply to comment #8)
> It really seems to be codegen bug.

Thankyou for investigating this. As I said, it really is out of my league. Do you know what the next step is for fixing this? Who we should forward it to?

We should really get this fixed. Stack corruption when an exception is thrown? Nothing good can come out of this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2