Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
November 14, 2010 Compiler optimization breaks multi-threaded code | ||||
---|---|---|---|---|
| ||||
There is one question on SO which seems like a serious problem for atomic ops. http://stackoverflow.com/questions/4165149/compiler-optimization-breaks- multi-threaded-code in short: shared uint cnt; void atomicInc ( ) { uint o; while ( !cas( &cnt, o, o + 1 ) ) o = cnt; } is compiled with dmd -O to something like: shared uint cnt; void atomicInc ( ) { while ( !cas( &cnt, cnt, cnt + 1 ) ) { } } see the web page for details. |
November 14, 2010 Re: Compiler optimization breaks multi-threaded code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Michal Minich | Michal Minich Wrote:
> There is one question on SO which seems like a serious problem for atomic ops.
>
> http://stackoverflow.com/questions/4165149/compiler-optimization-breaks- multi-threaded-code
>
> in short:
>
> shared uint cnt;
> void atomicInc ( ) { uint o; while ( !cas( &cnt, o, o + 1 ) ) o = cnt; }
>
> is compiled with dmd -O to something like:
>
> shared uint cnt;
> void atomicInc ( ) { while ( !cas( &cnt, cnt, cnt + 1 ) ) { } }
>
> see the web page for details.
What a mess. DMD isn't supposed to optimize across asm blocks.
|
November 15, 2010 Re: Compiler optimization breaks multi-threaded code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly Wrote:
> > shared uint cnt;
> > void atomicInc ( ) { uint o; while ( !cas( &cnt, o, o + 1 ) ) o = cnt; }
> >
> > is compiled with dmd -O to something like:
> >
> > shared uint cnt;
> > void atomicInc ( ) { while ( !cas( &cnt, cnt, cnt + 1 ) ) { } }
> What a mess. DMD isn't supposed to optimize across asm blocks.
There're no asm blocks in the code. It's a violated contract of shared data access.
|
November 16, 2010 Re: Compiler optimization breaks multi-threaded code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | Kagamin <spam@here.lot> wrote:
> Sean Kelly Wrote:
>
>>> shared uint cnt;
>>> void atomicInc ( ) { uint o; while ( !cas( &cnt, o, o + 1 ) ) o =
> > > cnt; }
>>>
>>> is compiled with dmd -O to something like:
>>>
>>> shared uint cnt;
>>> void atomicInc ( ) { while ( !cas( &cnt, cnt, cnt + 1 ) ) { } }
>> What a mess. DMD isn't supposed to optimize across asm blocks.
>
> There're no asm blocks in the code. It's a violated contract of shared data access.
cas() contains an asm block. Though I guess in this case the compiler isn't actually optimizing across it. Does atomic!"+="(&cnt, 1) work correctly? I know the issue with shared would still have to be fixed, but that code uses asm for the load as well, so it probably won't be optimized the same way.
|
November 16, 2010 Re: Compiler optimization breaks multi-threaded code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Am 16.11.2010 18:09, schrieb Sean Kelly:
> cas() contains an asm block. Though I guess in this case the compiler
> isn't actually optimizing across it. Does atomic!"+="(&cnt, 1) work
> correctly? I know the issue with shared would still have to be fixed,
> but that code uses asm for the load as well, so it probably won't be
> optimized the same way.
Thanks for looking into the issue around here. Just three comments from my side, Sean.
Disclaimer: based on a couple of hours chasing a bug and not much D experience (but some optimizing C++ compiler experience - so the issue looked familiar :-) )
1) atomicOp is not concerned. You only read memory once in the function call. Whether from a local variable that was loaded from something global or directly from a global, doesn't really matter (except for timing, maybe).
2) You are right, the compiler seems to not optimize across asm statements. So, the example can be fixed with the following hack:
void atomicInc ( ) {
uint o;
while ( !cas( &cnt, o, o + 1 ) ) {
asm { nop; } o = cnt;
}
}
This is however more brittle than it looks, because it is not always clear what "optimizing across an asm block". This version has the issue again:
void atomicInc ( ) {
uint o = cnt;
do {
asm { nop; } o = cnt;
} while ( !cas( &cnt, o, o + 1 ) )
}
While this case might look somewhat obvious, I encountered some problems in more complex code, and finally went for the all-inline-assembler solution to be on the safe side.
3) During my debugging, I believe that I saw the optimizer not only re-ordering reads of shared variables, but also writes to shared variables. IIRC, my Dekker example on SO (which fails for the missing s/l/mfence instructions), also sports a re-ordering of the lines
cnt++;
turn2 = true; flag1 = false;
into
turn2 = true;
cnt++;
flag1 = false;
which in this case is not really important, but might introduce another bug if I was prepared to live with the risk of starvation (and remove turn2). If the compiler would still re-order (haven't tested), cnt++ would be outside of the critical section.
Hope this helps & cheers,
Stephan
|
Copyright © 1999-2021 by the D Language Foundation