Thread overview
Odd behavior in fiber unittest - printf stops optimization crash
Mar 01, 2015
Kevin Brogan
Mar 01, 2015
Kevin Brogan
Mar 01, 2015
Kai Nacke
Mar 01, 2015
Kai Nacke
Mar 02, 2015
Kevin Brogan
March 01, 2015
msvc 64 bit build.

Trying to figure out why the unit test for core.thread is failing for release but not for debug.

First I wanted to figure out which tests were causing the crash. Turns out that only the fiber related tests are crashing. All thread related unit tests are fine.

Thoughts are that it would be an optimization problem, since release is crashing but not debug.

I switch out the -O3 flag in build scripts for core.thread with -02. No more crash.

So that's confirmed...  but what line of code is causing the crash? Switch back to -O3 and disable all of the fibre unit tests except for what looked to be the simplest one.

unittest
{
    int x = 0;

    (new Fiber({
        x++;
    })).call();
    assert( x == 1 );
}

Using printf statements I determine that the crash is happening inside of a naked assembly function called fiber_switchContext.

There's almost zero comments so I don't understand exactly why it's doing what it's doing. Seems to be saving the win64 abi non-volatile registers as well as something to do with thread local storage using the GS register which is completely undocumented on msdn from what I can tell.

I have questions about the function. Why is RBX pushed after the XMM registers. (xmm alignment i guess?) Why are RDI and RSI not saved, as they are non-volatile as well according to https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx.

So while I'm writing this I realize I haven't tested what happens if I modify the context switch to save RDI and RSI as well...  unit test is working now!!! next!

Earlier I had figured out that placing a printf on either side of the call to fiber_switchContext would eliminate the crash. I guess it changed the optimized code...

On to the next failing fiber related unit test!
March 01, 2015
Seems that all of them are working now except for two. testNonvolatileRegister is failing as is "Multiple threads running shared fibers".

I'm not sure that the RSI RDI additions are what was required since they are clearly not saved in the non volatile register test.
March 01, 2015
On Sunday, 1 March 2015 at 00:30:03 UTC, Kevin Brogan wrote:
> msvc 64 bit build.
>
> Trying to figure out why the unit test for core.thread is failing for release but not for debug.
>
> First I wanted to figure out which tests were causing the crash. Turns out that only the fiber related tests are crashing. All thread related unit tests are fine.
>
> Thoughts are that it would be an optimization problem, since release is crashing but not debug.
>
> I switch out the -O3 flag in build scripts for core.thread with -02. No more crash.
>
> So that's confirmed...  but what line of code is causing the crash? Switch back to -O3 and disable all of the fibre unit tests except for what looked to be the simplest one.
>
> unittest
> {
>     int x = 0;
>
>     (new Fiber({
>         x++;
>     })).call();
>     assert( x == 1 );
> }
>
> Using printf statements I determine that the crash is happening inside of a naked assembly function called fiber_switchContext.
>
> There's almost zero comments so I don't understand exactly why it's doing what it's doing. Seems to be saving the win64 abi non-volatile registers as well as something to do with thread local storage using the GS register which is completely undocumented on msdn from what I can tell.
>
> I have questions about the function. Why is RBX pushed after the XMM registers. (xmm alignment i guess?) Why are RDI and RSI not saved, as they are non-volatile as well according to https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx.
>
> So while I'm writing this I realize I haven't tested what happens if I modify the context switch to save RDI and RSI as well...  unit test is working now!!! next!
>
> Earlier I had figured out that placing a printf on either side of the call to fiber_switchContext would eliminate the crash. I guess it changed the optimized code...
>
> On to the next failing fiber related unit test!

Hi Kevin!

fiber_switchContext has to parameters: the stack pointer of the old fiber (oldp) and the stack pointer of the new fiber (newp).
In essence, the state of the old fiber is pushed on the stack pointed to by oldp. Then the state of the new fiber is loaded from the stack pointed to by newp.
The context switch is finished with a JMP to the new fiber code.

The GS: prefix points to some Windows internal structure. The first element is the Thread Information Block:
Definition from <winnt.h>:

typedef struct _NT_TIB {
    struct _EXCEPTION_REGISTRATION_RECORD *ExceptionList;
    PVOID StackBase;
    PVOID StackLimit;
    PVOID SubSystemTib;
....
}

The saved elements are therefore the exception chain, the stack base and the stack limit.

The order of the XMM register is simply dictated by alignment as you guessed.

Please note the comment from start of function: if you change the stack layout then you also have to change function Fiber.initStack because here the initial layout is created.

A context switch with fiber_switchContext looks like a normal function call to the compiler. The nonvolatile registers must be preserved. That RDI/RSI are not saved is a bug. Compare with the function from druntime 2.067: the bug is fixed there.

Regards,
Kai
March 01, 2015
On Sunday, 1 March 2015 at 13:35:54 UTC, Kai Nacke wrote:
> A context switch with fiber_switchContext looks like a normal function call to the compiler. The nonvolatile registers must be preserved. That RDI/RSI are not saved is a bug. Compare with the function from druntime 2.067: the bug is fixed there.

I pushed this to master. Maybe you wanna try it.

Regards,
Kai

March 02, 2015
On Sunday, 1 March 2015 at 14:56:18 UTC, Kai Nacke wrote:
> On Sunday, 1 March 2015 at 13:35:54 UTC, Kai Nacke wrote:
>> A context switch with fiber_switchContext looks like a normal function call to the compiler. The nonvolatile registers must be preserved. That RDI/RSI are not saved is a bug. Compare with the function from druntime 2.067: the bug is fixed there.
>
> I pushed this to master. Maybe you wanna try it.
>
> Regards,
> Kai

thanks for the info