Jump to page: 1 2
Thread overview
LLVM asm with constraints, and 2 operands
Jul 18, 2021
Guillaume Piolat
Jul 18, 2021
Basile B.
Jul 18, 2021
Guillaume Piolat
Jul 18, 2021
Basile B.
Jul 18, 2021
Basile B.
Jul 18, 2021
Guillaume Piolat
Jul 19, 2021
kinke
Jul 19, 2021
Guillaume Piolat
Jul 19, 2021
kinke
Jul 19, 2021
Tejas
Jul 19, 2021
kinke
Jul 19, 2021
Guillaume Piolat
Jul 19, 2021
Tejas
Jul 19, 2021
Guillaume Piolat
Jul 19, 2021
kinke
Jul 19, 2021
kinke
Jul 19, 2021
Guillaume Piolat
Jul 19, 2021
Basile B.
Jul 19, 2021
kinke
July 18, 2021

Is anyone versed in LLVM inline asm?

I know how to generate SIMD unary op with:

return __asm!int4("pmovsxwd $1,$0","=x,x",a);

but I struggle to generate 2-operands SIMD ops like:

return __asm!int4("paddd $1,$0","=x,x",a, b);

If you know how to do it => https://d.godbolt.org/z/ccM38bfMT it would probably help build speed of SIMD heavy code, also -O0 performance
Also generating the right instruction is good but it must resist optimization too, so proper LLVM constraints is needed. It would be really helpful if someone has understood the cryptic rules of LLVM assembly constraints.

July 18, 2021

On Sunday, 18 July 2021 at 11:42:24 UTC, Guillaume Piolat wrote:

>

Is anyone versed in LLVM inline asm?

I know how to generate SIMD unary op with:

return __asm!int4("pmovsxwd $1,$0","=x,x",a);

but I struggle to generate 2-operands SIMD ops like:

return __asm!int4("paddd $1,$0","=x,x",a, b);

If you know how to do it => https://d.godbolt.org/z/ccM38bfMT it would probably help build speed of SIMD heavy code, also -O0 performance
Also generating the right instruction is good but it must resist optimization too, so proper LLVM constraints is needed. It would be really helpful if someone has understood the cryptic rules of LLVM assembly constraints.

Yeah I can confirm it's aweful. Took me hours to understand how to use it a bit (my PL has an interface for LLVM asm)

You need to add a "x" to the constraint string

return __asm!int4("paddd $1,$0","=x,x,x",a, b);
  • =x says "returns in whatever is has to"
  • x (1) is the constraint for input a, which is passed as operand $0
  • x (2) is the constraint for input b, which is passed as operand $1

So the thing to get is that the output constraint does not consume anything else, it is standalone.

July 18, 2021

On Sunday, 18 July 2021 at 16:32:46 UTC, Basile B. wrote:

>

Yeah I can confirm it's aweful. Took me hours to understand how to use it a bit (my PL has an interface for LLVM asm)

You need to add a "x" to the constraint string

return __asm!int4("paddd $1,$0","=x,x,x",a, b);
  • =x says "returns in whatever is has to"
  • x (1) is the constraint for input a, which is passed as operand $0
  • x (2) is the constraint for input b, which is passed as operand $1

So the thing to get is that the output constraint does not consume anything else, it is standalone.

Thanks.

Indeed that seems to work even when inline and optimized. Registers are spilled to stack.
A minor concern is what happens when the enclosing function is extern(C) => https://d.godbolt.org/z/s6dM3a3de
I need to check that more...

July 18, 2021

On Sunday, 18 July 2021 at 17:45:05 UTC, Guillaume Piolat wrote:

>

On Sunday, 18 July 2021 at 16:32:46 UTC, Basile B. wrote:

>

[...]

Thanks.

Indeed that seems to work even when inline and optimized. Registers are spilled to stack.
A minor concern is what happens when the enclosing function is extern(C) => https://d.godbolt.org/z/s6dM3a3de
I need to check that more...

I think this should be rejected just like when you use D arrays in the interface of an extern(C) func, as C has no equivalent of __vector (afaik).

July 18, 2021

On Sunday, 18 July 2021 at 18:47:50 UTC, Basile B. wrote:

>

On Sunday, 18 July 2021 at 17:45:05 UTC, Guillaume Piolat wrote:

>

On Sunday, 18 July 2021 at 16:32:46 UTC, Basile B. wrote:

>

[...]

Thanks.

Indeed that seems to work even when inline and optimized. Registers are spilled to stack.
A minor concern is what happens when the enclosing function is extern(C) => https://d.godbolt.org/z/s6dM3a3de
I need to check that more...

I think this should be rejected just like when you use D arrays in the interface of an extern(C) func, as C has no equivalent of __vector (afaik).

but in any case there's a bug.

July 18, 2021

On Sunday, 18 July 2021 at 18:48:47 UTC, Basile B. wrote:

>

On Sunday, 18 July 2021 at 18:47:50 UTC, Basile B. wrote:

>

On Sunday, 18 July 2021 at 17:45:05 UTC, Guillaume Piolat wrote:

>

On Sunday, 18 July 2021 at 16:32:46 UTC, Basile B. wrote:

>

[...]

Thanks.

Indeed that seems to work even when inline and optimized. Registers are spilled to stack.
A minor concern is what happens when the enclosing function is extern(C) => https://d.godbolt.org/z/s6dM3a3de
I need to check that more...

I think this should be rejected just like when you use D arrays in the interface of an extern(C) func, as C has no equivalent of __vector (afaik).

but in any case there's a bug.

I checked and thankfullyit works when the enclosed function is inlined in an extern(C) function, that respects extern(C) ABI.

July 19, 2021

On Sunday, 18 July 2021 at 16:32:46 UTC, Basile B. wrote:

>
  • =x says "returns in whatever is has to"
  • x (1) is the constraint for input a, which is passed as operand $0
  • x (2) is the constraint for input b, which is passed as operand $1

$0 is actually the output operand, $1 is a, and $2 is b.

The official docs are here, but IMO not very user-friendly: https://llvm.org/docs/LangRef.html#inline-assembler-expressions

I recommend using GDC/GCC inline asm instead, where you'll find more examples. For the given paddd example, I'd have gone with

int4 _mm_add_int4(int4 a, int4 b)
{
    asm { "paddd %1, %0" : "=*x" (a) : "x" (b); }
    // the above is equivalent to:
    // __asm!void("paddd $1, $0","=*x,x", &a, b);
    return a;
}

but the produced asm is rubbish (apparently an LLVM issue):

movaps	%xmm1, -24(%rsp)
paddd	%xmm0, %xmm0 // WTF?
movaps	%xmm0, -24(%rsp)
retq

What works reliably is a manual mov:

int4 _mm_add_int4(int4 a, int4 b)
{
    int4 r;
    asm { "paddd %1, %2; movdqa %2, %0" : "=x" (r) : "x" (a), "x" (b); }
    return r;
}

=>

paddd	%xmm1, %xmm0
movdqa	%xmm0, %xmm0 // useless but cannot be optimized away
retq

Note: inline asm syntax and resulting asm in AT&T syntax, not Intel syntax.

July 19, 2021

On Monday, 19 July 2021 at 10:21:58 UTC, kinke wrote:

>

What works reliably is a manual mov:

OK that's what I feared. It's very easy to get that wrong. Thankfully I haven't used __asm a lot.

July 19, 2021

On Monday, 19 July 2021 at 10:21:58 UTC, kinke wrote:

>

What works reliably is a manual mov:

int4 _mm_add_int4(int4 a, int4 b)
{
    int4 r;
    asm { "paddd %1, %2; movdqa %2, %0" : "=x" (r) : "x" (a), "x" (b); }
    return r;
}

This workaround is actually missing the clobber constraint for %2, which might be problematic after inlining.

You can also specify the registers explicitly like so (here exploiting ABI knowledge about a being passed in XMM1, and b in XMM0 for extern(D)):

int4 _mm_add_int4(int4 a, int4 b)
{
    asm { "paddd %1, %0" : "=xmm0" (b) : "xmm1" (a), "xmm0" (b); }
    return b;
}

=>

paddd   xmm0, xmm1
ret

But this might likely tamper with LLVM register allocation optimizations after inlining...

July 19, 2021

On Monday, 19 July 2021 at 10:49:56 UTC, kinke wrote:

>

On[snip]

Is LDC still compatible with GDC/GCC inline asm? I remember Johan saying they will break compatibilty in the near future...

« First   ‹ Prev
1 2