[Issue 20838] on modern (x86_64) CPUs, dmd emit cmpxchg8b instead of CMPXCHG16B (page 2)

May 18, 2020

[Issue 20838] on modern (x86_64) CPUs, dmd emit cmpxchg8b instead of CMPXCHG16B

Posted by mw

Permalink

https://issues.dlang.org/show_bug.cgi?id=20838

--- Comment #11 from mw <mingwu@gmail.com> ---
> this is solely an issue with your 'workflow' involving obj2asm.

Thank you. It's indeed the problem of obj2asm, with gnu's tool objdump:

--------------------------------------------------------------------------------
$ ldc2 -m64 c.d
$ objdump -S --disassemble c > c.asm
$ grep -i cmpxch c.asm
    f5b3:       f0 48 0f c7 0e          lock cmpxchg16b (%rsi)
   1ecc3:       f0 48 0f b1 3a          lock cmpxchg %rdi,(%rdx)
   1ed32:       f0 40 0f b0 3a          lock cmpxchg %dil,(%rdx)
   1ed42:       66 f0 0f b1 3a          lock cmpxchg %di,(%rdx)
   30563:       48 8d 15 ba ea 01 00    lea    0x1eaba(%rip),%rdx        #
4f024 <_D4core5cpuid13_hasCmpxchg8byb>
   30573:       48 8d 15 ab ea 01 00    lea    0x1eaab(%rip),%rdx        #
4f025 <_D4core5cpuid14_hasCmpxchg16byb>
   30da6:       f0 48 0f b1 3c ce       lock cmpxchg %rdi,(%rsi,%rcx,8)
--------------------------------------------------------------------------------

I found the cmpxchg16b instruction.

But I'm not sure what the other 'cmpxchg' is. Can some asm expert help explain?


BTW: I find another issue with LDC: with this code on
https://d.godbolt.org/z/HesA24
i.e. remove the import std.stdio and writeln

--------------------------------------------------------------------------------
$ cat c.d
//import std.stdio;
import core.atomic;

struct N {
  N* prev;
  N* next;
}

shared(N) n;

void main() {
  cas(&n, n, n);
  //writeln(size_t.sizeof*2, N.sizeof);
}

$ ldc2 -m64 c.d
$ ./c
Segmentation fault (core dumped)

$ ldc2 --version
LDC - the LLVM D compiler (1.20.0):
  based on DMD v2.090.1 and LLVM 9.0.1
  built with LDC - the LLVM D compiler (1.20.0)
  Default target: x86_64-unknown-linux-gnu
  Host CPU: skylake
  http://dlang.org - http://wiki.dlang.org/LDC
--------------------------------------------------------------------------------

Although on https://d.godbolt.org/z/HesA24
the "Output" dropdown has an option "Run the compiled binary", I select that,
but didn't see the result.


With import std.stdio and writeln, the LDC output behave normally (no
segfault):
--------------------------------------------------------------------------------
$ ldc2 -m64 c.d
$ ./c
1616
--------------------------------------------------------------------------------


Can you try if you can reproduce this segfault on a local Linux box?

--

https://issues.dlang.org/show_bug.cgi?id=20838 --- Comment #12 from kinke <kinke@gmx.net> --- (In reply to mw from comment #11) > Can you try if you can reproduce this segfault on a local Linux box? We're abusing DMD's bug tracker, but anyway: you need to manually take care of required 16-bytes alignment: align(16) shared(N) n; // or `align(2 * size_t.sizeof)` --

https://issues.dlang.org/show_bug.cgi?id=20838 --- Comment #13 from mw <mingwu@gmail.com> --- > you need to manually take care of required 16-bytes alignment: > align(16) shared(N) n; // or `align(2 * size_t.sizeof)` Thank you again! (I'm a newbie to D, not sure where is the best place to continue discuss this? pls let me know.) BUT: can the DMD compiler (after seeing the 'cas' call) take care of this alignment? either silently, or issue an warning message to the programmer? Can I log another bug for this suggestion of DMD compiler improvement? The current behavior that I just discovered is definitely a puzzle for a D newbie like me. With a smarter compiler, it will help new users. --

https://issues.dlang.org/show_bug.cgi?id=20838 --- Comment #14 from kinke <kinke@gmx.net> --- (In reply to mw from comment #13) > Can I log another bug for this suggestion of DMD compiler improvement? Sure. The druntime library is supposed to take care of this already, at least with enabled contracts, see https://github.com/dlang/druntime/blob/48082ac4e4aa1a3c9f1a1ef87659c941dae0f7f6/src/core/atomic.d#L624-L655. It only checks for insufficient size_t alignment though, so that needs to be fixed. Wrt. original DMD issue here, DMD is supposed to use cmpxchg16b already, see https://github.com/dlang/druntime/blob/48082ac4e4aa1a3c9f1a1ef87659c941dae0f7f6/src/core/internal/atomic.d#L582. As it apparently doesn't, I guess the bug is in DMD's codegen. --

https://issues.dlang.org/show_bug.cgi?id=20838 mhh <maxhaton@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |maxhaton@gmail.com --- Comment #15 from mhh <maxhaton@gmail.com> --- I think this is a bug in the dmd inline assembler implementation............................................ Fun. --

https://issues.dlang.org/show_bug.cgi?id=20838 --- Comment #16 from mhh <maxhaton@gmail.com> --- Actually I should've read the thread. Turns out it is indeed a problem with Walters disassembler. Even more fun. --

Forums