Jump to page: 1 2
Thread overview
GDC generates invalid assembly around fiber yield operations (Not re-reading data from clobberedd memory to registers)
May 03, 2015
Liran Zvibel
May 05, 2015
Liran Zvibel
May 14, 2015
Liran Zvibel
May 14, 2015
Johannes Pfau
May 15, 2015
Johannes Pfau
May 15, 2015
Iain Buclaw
May 15, 2015
Iain Buclaw
May 15, 2015
Johannes Pfau
May 15, 2015
Liran Zvibel
May 15, 2015
Liran Zvibel
May 18, 2015
ketmar
May 03, 2015
Hi,

We are trying to port a large fiber based application to GDC.
Our application works well when compiled with DMD with optimizations.
It fails very quickly with GDC (even before we tried optimizations), and we were able to narrow it to how GDC treats yields.

Please look at the following small program (minor changes from the core.thread.Fiber example):

import std.stdio;
import core.thread;

enum numAddition = 3;
long globalSum = 0;

class DerivedFiber : Fiber
{
    int index;
    this(int _index)
    {
        index = _index;
        super( &run );
    }

private :
    void run()
    {
        foreach(int k; 0 .. numAddition) {
            globalSum += otherFunc();
            writefln("DerivedFiber(%d) iteration %d globalSum is %d", index, k, globalSum);
        }
    }

    long otherFunc() {
        yield();
        return index;
    }
}

int main()
{
    Fiber derived1 = new DerivedFiber(1);
    Fiber derived2 = new DerivedFiber(2);

    foreach(j; 0 .. (numAddition+1)) {
        derived1.call();
        derived2.call();
    }

    assert(globalSum == (1+2)*numAddition);
    return 0;
}


And the following output when compiling first with DMD and then with GDC:

bash-4.3# dmd -release -O fiber.d -ofdfb
bash-4.3# ./dfb
DerivedFiber(1) iteration 0 globalSum is 1
DerivedFiber(2) iteration 0 globalSum is 3
DerivedFiber(1) iteration 1 globalSum is 4
DerivedFiber(2) iteration 1 globalSum is 6
DerivedFiber(1) iteration 2 globalSum is 7
DerivedFiber(2) iteration 2 globalSum is 9
bash-4.3# /opt/gdc/bin/gdc  -ggdb fiber.d -ogfb
bash-4.3# ./gfb
DerivedFiber(1) iteration 0 globalSum is 1
DerivedFiber(2) iteration 0 globalSum is 2
DerivedFiber(1) iteration 1 globalSum is 2
DerivedFiber(2) iteration 1 globalSum is 4
DerivedFiber(1) iteration 2 globalSum is 3
DerivedFiber(2) iteration 2 globalSum is 6
core.exception.AssertError@fiber.d(41): Assertion failure
[ Removed the very simple stack...]


When looking at the generated assembly, it's very easy to see that the value of 'globalSum' is read to register before the call to `otherFunc` (and also does not refresh it across the 'foreach'). The problem is that otherFunc calls 'yield' which changes context and causes 'globalSum' to be updated.
GDC has to know that after `yield` is called memory is potentially clobbered, and re-read memory back to registers.

I tried some things that did not help:
- declaring 'globalSum' as 'shared' or '__gshared'.
- declaring 'globalSum' as 'static' inside DerivedFiber, also tried adding 'shared' or '__gshared'.
- adding 'asm  {"nop;" :  :  : "memory" ;}' after the 'yield'.
- defining 'otherFunc' with @attribute("returns_twice") -- got 'error: unknown attribute returns_twice'.

Two things did work, but are unpractical when porting a big application (over 100k loc) that heavily relies on Fibers:
- using a temporary variable to hold the 'otherFunc' return, then add it to 'globalSum'
- calling core.atomic.otomicOp!"+="(globalSum, otherFunc())


We COULD identify all functions that immediately call 'yield' and mark them with @attribute("returns_twice"), but that does not seem to be supported by GDC.

Any ideas?

Thanks!

Liran
May 05, 2015
Hi,

Another thing that I noticed about this issue, is that in this isolated test case, the problem only happens if the added value is returned from the function that performs the 'yield()'.
If I call 'yield()' directly, then add to 'globalSum', or if I just call 'otherFunc()', ignore the return value and then add to 'globalSum' then the result is correct.

An example of run version that works:
    void run()
    {
        foreach(int k; 0 .. numAddition) {
            otherFunc();
            globalSum += index;
            writefln("DerivedFiber(%d) iteration %d globalSum is %d", index, k, globalSum);
        }
    }

An example of run version that breaks (exactly like the one from my previous mail):
    void run()
    {
        foreach(int k; 0 .. numAddition) {
            globalSum += otherFunc();
            writefln("DerivedFiber(%d) iteration %d globalSum is %d", index, k, globalSum);
        }
    }

Maybe this can help you understand something ...

Thanks,

Liran
May 14, 2015
Hi,

I forgot to mention is before, I'm running with GDC commit b022dd4cac195d85e9c3a6a37f2501a07ade455a from April 7th based on GCC 4.9.

Does anyone have experience compiling (and then running) Fiber based applications with GDC?
What are you doing?

Is there a plan to add support for @attribute("returns_twice") to GDC?

Thanks!

Liran

On Sunday, 3 May 2015 at 18:45:36 UTC, Liran Zvibel wrote:
> Hi,
>
> We are trying to port a large fiber based application to GDC.
> Our application works well when compiled with DMD with optimizations.
> It fails very quickly with GDC (even before we tried optimizations), and we were able to narrow it to how GDC treats yields.
>
> Please look at the following small program (minor changes from the core.thread.Fiber example):
>
> import std.stdio;
> import core.thread;
>
> enum numAddition = 3;
> long globalSum = 0;
>
> class DerivedFiber : Fiber
> {
>     int index;
>     this(int _index)
>     {
>         index = _index;
>         super( &run );
>     }
>
> private :
>     void run()
>     {
>         foreach(int k; 0 .. numAddition) {
>             globalSum += otherFunc();
>             writefln("DerivedFiber(%d) iteration %d globalSum is %d", index, k, globalSum);
>         }
>     }
>
>     long otherFunc() {
>         yield();
>         return index;
>     }
> }
>
> int main()
> {
>     Fiber derived1 = new DerivedFiber(1);
>     Fiber derived2 = new DerivedFiber(2);
>
>     foreach(j; 0 .. (numAddition+1)) {
>         derived1.call();
>         derived2.call();
>     }
>
>     assert(globalSum == (1+2)*numAddition);
>     return 0;
> }
>
>
> And the following output when compiling first with DMD and then with GDC:
>
> bash-4.3# dmd -release -O fiber.d -ofdfb
> bash-4.3# ./dfb
> DerivedFiber(1) iteration 0 globalSum is 1
> DerivedFiber(2) iteration 0 globalSum is 3
> DerivedFiber(1) iteration 1 globalSum is 4
> DerivedFiber(2) iteration 1 globalSum is 6
> DerivedFiber(1) iteration 2 globalSum is 7
> DerivedFiber(2) iteration 2 globalSum is 9
> bash-4.3# /opt/gdc/bin/gdc  -ggdb fiber.d -ogfb
> bash-4.3# ./gfb
> DerivedFiber(1) iteration 0 globalSum is 1
> DerivedFiber(2) iteration 0 globalSum is 2
> DerivedFiber(1) iteration 1 globalSum is 2
> DerivedFiber(2) iteration 1 globalSum is 4
> DerivedFiber(1) iteration 2 globalSum is 3
> DerivedFiber(2) iteration 2 globalSum is 6
> core.exception.AssertError@fiber.d(41): Assertion failure
> [ Removed the very simple stack...]
>
>
> When looking at the generated assembly, it's very easy to see that the value of 'globalSum' is read to register before the call to `otherFunc` (and also does not refresh it across the 'foreach'). The problem is that otherFunc calls 'yield' which changes context and causes 'globalSum' to be updated.
> GDC has to know that after `yield` is called memory is potentially clobbered, and re-read memory back to registers.
>
> I tried some things that did not help:
> - declaring 'globalSum' as 'shared' or '__gshared'.
> - declaring 'globalSum' as 'static' inside DerivedFiber, also tried adding 'shared' or '__gshared'.
> - adding 'asm  {"nop;" :  :  : "memory" ;}' after the 'yield'.
> - defining 'otherFunc' with @attribute("returns_twice") -- got 'error: unknown attribute returns_twice'.
>
> Two things did work, but are unpractical when porting a big application (over 100k loc) that heavily relies on Fibers:
> - using a temporary variable to hold the 'otherFunc' return, then add it to 'globalSum'
> - calling core.atomic.otomicOp!"+="(globalSum, otherFunc())
>
>
> We COULD identify all functions that immediately call 'yield' and mark them with @attribute("returns_twice"), but that does not seem to be supported by GDC.
>
> Any ideas?
>
> Thanks!
>
> Liran
May 14, 2015
Am Thu, 14 May 2015 15:52:05 +0000
schrieb "Liran Zvibel" <liran@weka.io>:

> Hi,
> 
> I forgot to mention is before, I'm running with GDC commit b022dd4cac195d85e9c3a6a37f2501a07ade455a from April 7th based on GCC 4.9.
> 
> Does anyone have experience compiling (and then running) Fiber
> based applications with GDC?
> What are you doing?
> 
> Is there a plan to add support for @attribute("returns_twice") to GDC?
> 
> Thanks!

I can reproduce this using the latest master branch. This exact issue is kinda weird: GCC can't cache the globalSum across a call to otherFunc. otherFunc calls yield which is in a different module. yield could modify the globalSum value and the generated code would be invalid even without Fibers.

Iain: I think our codegen might be wrong. -fdump-tree-original:

@61     var_decl         name: @76      mngl: @77      type: @60
                         scpe: @54      srcp: test.d:5
                         size: @39      algn: 64       used: 1
@62     plus_expr        type: @60      op 0: @61      op 1: @78
@78     call_expr        type: @60      fn  : @96      0   : @23

Does GCC guarantee an evaluation order for plus_expr? This is
kinda weird btw. The rewrite might already happen in the frontend. If we
rewrite
globalSum += otherFunc();
to
globalSum = globalSum + otherFunc();
and follow strict LTR evaluation the code we generate is correct and
the code DMD generates is incorrect.

OTOH I don't know the exact rules for += but intuitively it should first evaluate the RHS, then load the LHS.




Note: The issue the OP assumed is also worth discussing. It can't happen with global variables, but a second Fiber could modify a local variable. I'm not sure if it's possible to construct a case where GCC legitimately assumes a variable can't escape and we modify it through yield.
May 15, 2015
Am Thu, 14 May 2015 19:02:48 +0200
schrieb Johannes Pfau <nospam@example.com>:

> ...

TLDR
As a workaround replace

globalSum += otherFunc();

which GDC currently treats as

globalSum = globalSum + otherFunc();

with

globalSum = otherFunc() + globalSum;

May 15, 2015
On 15 May 2015 at 09:08, Johannes Pfau via D.gnu <d.gnu@puremagic.com> wrote:
> Am Thu, 14 May 2015 19:02:48 +0200
> schrieb Johannes Pfau <nospam@example.com>:
>
>> ...
>
> TLDR
> As a workaround replace
>
> globalSum += otherFunc();
>
> which GDC currently treats as
>
> globalSum = globalSum + otherFunc();
>
> with
>
> globalSum = otherFunc() + globalSum;
>

That is an interesting workaround, should we perhaps reconsider our code generation here?

Iain.
May 15, 2015
On 15 May 2015 at 09:13, Iain Buclaw <ibuclaw@gdcproject.org> wrote:
> On 15 May 2015 at 09:08, Johannes Pfau via D.gnu <d.gnu@puremagic.com> wrote:
>> Am Thu, 14 May 2015 19:02:48 +0200
>> schrieb Johannes Pfau <nospam@example.com>:
>>
>>> ...
>>
>> TLDR
>> As a workaround replace
>>
>> globalSum += otherFunc();
>>
>> which GDC currently treats as
>>
>> globalSum = globalSum + otherFunc();
>>
>> with
>>
>> globalSum = otherFunc() + globalSum;
>>
>
> That is an interesting workaround, should we perhaps reconsider our code generation here?
>

I can confirm that C codegen does infact emit 'foo += bar()'  as  'foo
= bar() + foo'

Which only strengthens the reasoning to change it.

Liran, can you raise a bug report?  Also, can we use your small sample (names will be anonymised) to put into the testsuite?


Regards
Iain.
May 15, 2015
Am Fri, 15 May 2015 09:23:33 +0200
schrieb "Iain Buclaw via D.gnu" <d.gnu@puremagic.com>:

> On 15 May 2015 at 09:13, Iain Buclaw <ibuclaw@gdcproject.org> wrote:
> > On 15 May 2015 at 09:08, Johannes Pfau via D.gnu <d.gnu@puremagic.com> wrote:
> >> Am Thu, 14 May 2015 19:02:48 +0200
> >> schrieb Johannes Pfau <nospam@example.com>:
> >>
> >>> ...
> >>
> >> TLDR
> >> As a workaround replace
> >>
> >> globalSum += otherFunc();
> >>
> >> which GDC currently treats as
> >>
> >> globalSum = globalSum + otherFunc();
> >>
> >> with
> >>
> >> globalSum = otherFunc() + globalSum;
> >>
> >
> > That is an interesting workaround, should we perhaps reconsider our code generation here?
> >
> 
> I can confirm that C codegen does infact emit 'foo += bar()'  as  'foo
> = bar() + foo'
> 
> Which only strengthens the reasoning to change it.
> 
> Liran, can you raise a bug report?  Also, can we use your small sample (names will be anonymised) to put into the testsuite?
> 
> 
> Regards
> Iain.

BTW: For commutative operations we can simply change the operands. For non-commutative operations we'll have to explicitly evaluate the side effects of the RHS before assigning. (-=, ...)

Relevant code is in d-elem.cc: AddAssignExp::toElem etc
May 15, 2015
Hi,
I’m trying to port a 120k loc fiber based application from dmd to gdc, so applying this work around is not a good option for me (also, there are many engineers that will continue generating code…)

Thanks
Liran
> On May 15, 2015, at 10:08, Johannes Pfau via D.gnu <d.gnu@puremagic.com> wrote:
> 
> Am Thu, 14 May 2015 19:02:48 +0200
> schrieb Johannes Pfau <nospam@example.com>:
> 
>> ...
> 
> TLDR
> As a workaround replace
> 
> globalSum += otherFunc();
> 
> which GDC currently treats as
> 
> globalSum = globalSum + otherFunc();
> 
> with
> 
> globalSum = otherFunc() + globalSum;
> 


May 15, 2015
Please feel free to use my code sample in the test suite.
I fill post a bug report.

Thanks!


Liran
> On May 15, 2015, at 10:23, Iain Buclaw via D.gnu <d.gnu@puremagic.com> wrote:
> 
> On 15 May 2015 at 09:13, Iain Buclaw <ibuclaw@gdcproject.org> wrote:
>> On 15 May 2015 at 09:08, Johannes Pfau via D.gnu <d.gnu@puremagic.com> wrote:
>>> Am Thu, 14 May 2015 19:02:48 +0200
>>> schrieb Johannes Pfau <nospam@example.com>:
>>> 
>>>> ...
>>> 
>>> TLDR
>>> As a workaround replace
>>> 
>>> globalSum += otherFunc();
>>> 
>>> which GDC currently treats as
>>> 
>>> globalSum = globalSum + otherFunc();
>>> 
>>> with
>>> 
>>> globalSum = otherFunc() + globalSum;
>>> 
>> 
>> That is an interesting workaround, should we perhaps reconsider our code generation here?
>> 
> 
> I can confirm that C codegen does infact emit 'foo += bar()'  as  'foo
> = bar() + foo'
> 
> Which only strengthens the reasoning to change it.
> 
> Liran, can you raise a bug report?  Also, can we use your small sample (names will be anonymised) to put into the testsuite?
> 
> 
> Regards
> Iain.


« First   ‹ Prev
1 2