July 19

On Monday, 19 July 2021 at 10:21:58 UTC, kinke wrote:

>

On Sunday, 18 July 2021 at 16:32:46 UTC, Basile B. wrote:

>
  • =x says "returns in whatever is has to"
  • x (1) is the constraint for input a, which is passed as operand $0
  • x (2) is the constraint for input b, which is passed as operand $1

$0 is actually the output operand, $1 is a, and $2 is b.
[...]
Note: inline asm syntax and resulting asm in AT&T syntax, not Intel syntax.

yeah thnaks for the precision, I totally forgot about that.

And what about the extern(C) issue ? Does it make sense to be used when the parameters are int4 ?

July 19

On Monday, 19 July 2021 at 11:16:49 UTC, Tejas wrote:

>

On Monday, 19 July 2021 at 10:49:56 UTC, kinke wrote:

>

On[snip]

Is LDC still compatible with GDC/GCC inline asm? I remember Johan saying they will break compatibilty in the near future...

I'm not aware of any of that; who'd be 'they'? GCC breaking their syntax is IMO unimaginable. LDC supporting it (to some extent) is pretty recent, was introduced with v1.21.

July 19

On Monday, 19 July 2021 at 11:39:02 UTC, Basile B. wrote:

>

And what about the extern(C) issue ? Does it make sense to be used when the parameters are int4 ?

The original inline asm was buggy and only 'worked' by accident (not using the 2nd input operand at all...) with extern(D) reversed parameters. At least for Posix x64, the C calling convention is well-defined for vectors and equivalent to extern(D) (except for the latter's parameter reversal). Windows and 32-bit x86 are different; for Windows, extern(D) pays off, as LDC's ABI is similar to the MSVC++ __vectorcall calling convention (passing vectors in SIMD registers).

July 19

On Monday, 19 July 2021 at 16:05:57 UTC, kinke wrote:

> >

Is LDC still compatible with GDC/GCC inline asm? I remember Johan saying they will break compatibilty in the near future...

I'm not aware of any of that; who'd be 'they'? GCC breaking their syntax is IMO unimaginable. LDC supporting it (to some extent) is pretty recent, was introduced with v1.21.

It went under my radar. Thanks for the tips in this thread.

July 19

On Monday, 19 July 2021 at 10:49:56 UTC, kinke wrote:

>

This workaround is actually missing the clobber constraint for %2, which might be problematic after inlining.

An unrelated other issue with asm/__asm is that it doesn't follow consistent VEX encoding compared to normal compiler output.

sometimes you might want: paddq x, y
          at other times: vpaddq x, y, z

but rarely both in the same program.
So this can easily nullify any gain obtained with VEX transition costs (if they are still a thing).

July 19

On Monday, 19 July 2021 at 16:05:57 UTC, kinke wrote:

>

On Monday, 19 July 2021 at 11:16:49 UTC, Tejas wrote:

>

On Monday, 19 July 2021 at 10:49:56 UTC, kinke wrote:

>

On[snip]

Is LDC still compatible with GDC/GCC inline asm? I remember Johan saying they will break compatibilty in the near future...

I'm not aware of any of that; who'd be 'they'? GCC breaking their syntax is IMO unimaginable. LDC supporting it (to some extent) is pretty recent, was introduced with v1.21.

'They' meant the LDC developers as a whole.

Seems like I might have misunderstood what he was writing if GCC style asm support is so recent.

July 19

On Monday, 19 July 2021 at 16:44:35 UTC, Guillaume Piolat wrote:

>

On Monday, 19 July 2021 at 10:49:56 UTC, kinke wrote:

>

This workaround is actually missing the clobber constraint for %2, which might be problematic after inlining.

An unrelated other issue with asm/__asm is that it doesn't follow consistent VEX encoding compared to normal compiler output.

sometimes you might want: paddq x, y
          at other times: vpaddq x, y, z

but rarely both in the same program.
So this can easily nullify any gain obtained with VEX transition costs (if they are still a thing).

You know that asm is to be avoided whenever possible, but unfortunately, AFAIK intel-intrinsics doesn't fit the usual 'don't worry, simply compile all your code with an appropriate -mattr/-mcpu option' recommendation, as it employs runtime detection of available CPU instructions.

I've just tried another option, but that doesn't play nice with inlining:

import core.simd;
import ldc.attributes;

@target("sse2") // use SSE2 for this function
int4 _mm_add_int4(int4 a, int4 b)
{
    return a + b; // perfect: paddd %xmm1, %xmm0
}

int4 wrapper(int4 a, int4 b)
{
    return _mm_add_int4(a, b);
}

Compiling with -O -mtriple=i686-linux-gnu -mcpu=i686 (=> no SSE2 by default) shows that the inlined version inside wrapper() is the mega slow one, so the extra instructions aren't applied transitively unfortunately.

July 19

On Monday, 19 July 2021 at 17:20:21 UTC, kinke wrote:

>

Compiling with -O -mtriple=i686-linux-gnu -mcpu=i686 (=> no SSE2 by default) shows that the inlined version inside wrapper() is the mega slow one, so the extra instructions aren't applied transitively unfortunately.

Erm sorry should have looked more closely - it's not inlined, and the call seems extremely expensive too, with state pushing and popping going on, apparently to account for the different targets. Brrr, to be avoided at all costs for such tiny functions. :)

July 19

On Monday, 19 July 2021 at 17:20:21 UTC, kinke wrote:

>

You know that asm is to be avoided whenever possible, but unfortunately, AFAIK intel-intrinsics doesn't fit the usual 'don't worry, simply compile all your code with an appropriate -mattr/-mcpu option' recommendation, as it employs runtime detection of available CPU instructions.

intel-intrinsics employs compile-time detection of CPU instructions.
If not available, it will work anyway(tm) with alternate slower pathes (and indeed need the right -mattr, so this is the one worry you do get).

So, not using @target("feature") right now, figured it would be helpful for runtime dispatch, but that means literring the code with __traits(targetHasFeature).

Next ›   Last »
1 2