Thread overview
Trouble understanding crash when class is returned by value from C++
Sep 03, 2012
Andrej Mitrovic
Sep 03, 2012
Daniel Green
Sep 03, 2012
Iain Buclaw
Sep 03, 2012
Andrej Mitrovic
Sep 03, 2012
Iain Buclaw
September 03, 2012
I'm trying to add at least *some* type of pass-by-value support for C++ classes when wrapping C++ libraries to D. I figured I could fake a value class by using a D struct with a thunk field which matches the size of the C++ object.

Returning a C++ object by value works in this plain C++ example (using
g++ on win32):
test.cpp: http://codepad.org/55pttk3I
$ g++ -m32 -g test.cpp -o main.exe -lstdc++
$ main.exe

If I take the same code but remove main and instead use a D driver app like so:
test.cpp: http://codepad.org/ZqieSXrb
main.d: http://codepad.org/6E5sbc7e

I compile it:
$ g++ -m32 -g -c ./test.cpp -o test.obj
$ gdc -m32 -g main.d test.obj -o main.exe -lstdc++
$ main.exe

and then I get a crash:
The instruction at "0x6fc8ea39" referenced memory at "0x006f6f62". The
memory could not be "read".

GDB tells me:
Program received signal SIGSEGV, Segmentation fault.
0x6fc8ea39 in libstdc++-6!_ZNSsC1ERKSs ()
   from C:\MinGW\bin\libstdc++-6.dll

If I replace the std::string field with an ordinary 'char*' the crash is gone, so my wild guess is the crash happens in one of std::string's special member functions (ctor/dtor/etc..).

C++ sizeof() tells me FileName is 4 bytes long, so I've matched that in the fake D struct. If I increase the 'thunk' field to 9 bytes the crash disappears. I have a hunch stack corruption might be to blame.

I can notice some difference in the ASM listings:
C++ plain sample: http://pastebin.com/xw3BhwwR
D driver sample: http://pastebin.com/TLa8k5A3

The suspicious thing there is the missing LEA instruction in the D listing. If I change the thunk field to 9 bytes the LEA instruction appears again (and this is when the crash disappears).

My ASM-foo is really weak though, so I don't know what any of this means. Anyone know what's going on?
September 03, 2012
My best guess, is the issue is related to the struct being 4 bytes.
A similar segfault occurs if you attempt to access in a similar manner using c++.

A 4 byte struct will fit into a single register making pointers unnecessary/slower and it's likely some part of the ABI has taken this into consideration and the compiler is optimizing access to this.

However, I would imagine that such optimizations would not be allowed with C++ and so by using a class it requires a pointer type and not the optimized struct.

The following returns the value of 4 when I inspect the variable refValue instead of the correct address and segfaults.

http://codepad.org/eepFTfbX
September 03, 2012
On 3 September 2012 16:52, Daniel Green <venix1@gmail.com> wrote:
> My best guess, is the issue is related to the struct being 4 bytes.
> A similar segfault occurs if you attempt to access in a similar manner using
> c++.
>
> A 4 byte struct will fit into a single register making pointers unnecessary/slower and it's likely some part of the ABI has taken this into consideration and the compiler is optimizing access to this.
>
> However, I would imagine that such optimizations would not be allowed with C++ and so by using a class it requires a pointer type and not the optimized struct.
>
> The following returns the value of 4 when I inspect the variable refValue instead of the correct address and segfaults.
>
> http://codepad.org/eepFTfbX


Indeed,  C++ classes are always passed in memory by design.  Whereas pointers could be passed in registers.  The difference between ABI handling of void* and FileName* here matter a lot.  And this is one reason why you need to ensure that function signatures match in both D and C/C++ code.

extern "C"
FileName value_FileName(void* refVal)
{
    return *(FileName*)refVal;
}

By the way, why extern "C" when extern (C++) works just fine? :-)


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
September 03, 2012
On 9/3/12, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> Indeed,  C++ classes are always passed in memory by design.  Whereas
pointers could be passed in registers.

That's cool. I learn something new every day. :)

> And this is one
> reason why you need to ensure that function signatures match in both D
> and C/C++ code.

Yeah that's doable when the type is a POD but when it's a class returned by value there is no equivalent in D since D classes are always references, so I can't match the D function signature to the C one.

> extern "C"
> FileName value_FileName(void* refVal)
> {
>     return *(FileName*)refVal;
> }

That won't work either since FileName is still in the return type and I can't match the function signature on the D side (it still crashes). The only thing I can think of is to match the C++ function signature to the D side via something like:

C++:
class FileName { ... } // same as before

struct Fake
{
    char __thunk[4];
};

Fake value_FileName(void* refVal)
{
    return *(Fake*)(&(*(FileName*)refVal));
}

It's ugly but it does seem to work and matches the D function signature. It would be a lot simpler if the return type was castable to (char[4]), but C/++ doesn't support returning arrays by value. :)

> By the way, why extern "C" when extern (C++) works just fine? :-)

I'm working on a codegenerator which uses C as the glue language, similar to SWIG. But the plan is to support more features than SWIG and have a faster and less memory-intensive cross-language virtual method invocation mechanism. Unlike SWIG I support passing PODs by value, but passing non-POD classes by value was problematic and I can see now why.

Thanks for your help guys!
September 03, 2012
On 3 September 2012 18:15, Andrej Mitrovic <andrej.mitrovich@gmail.com> wrote:
> On 9/3/12, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
>> Indeed,  C++ classes are always passed in memory by design.  Whereas
> pointers could be passed in registers.
>
> That's cool. I learn something new every day. :)
>
>> And this is one
>> reason why you need to ensure that function signatures match in both D
>> and C/C++ code.
>
> Yeah that's doable when the type is a POD but when it's a class returned by value there is no equivalent in D since D classes are always references, so I can't match the D function signature to the C one.
>
>> extern "C"
>> FileName value_FileName(void* refVal)
>> {
>>     return *(FileName*)refVal;
>> }
>
> That won't work either since FileName is still in the return type and I can't match the function signature on the D side (it still crashes). The only thing I can think of is to match the C++ function signature to the D side via something like:
>

Ah, sorry, my bad.  I was testing marking D structs as addressable (meaning are always passed in memory) whilst in the middle of looking at the difference between D and C++ codegen. Must have left that turned on still in my copy of gdc. ;-)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';