Jump to page: 1 2
Thread overview
Using YMM registers causes an undefined label error
Mar 05, 2021
z
Mar 05, 2021
Rumbu
Mar 05, 2021
z
Mar 06, 2021
Rumbu
Mar 06, 2021
Imperatorn
Mar 06, 2021
Mike Parker
Mar 06, 2021
kinke
Mar 06, 2021
kinke
Mar 06, 2021
Rumbu
Mar 06, 2021
Imperatorn
Mar 06, 2021
Guillaume Piolat
Mar 06, 2021
kinke
Mar 09, 2021
z
Mar 09, 2021
z
Mar 19, 2021
z
March 05, 2021
XMM registers work, but as soon as they are changed into YMM DMD outputs "bad type/size of operands %s" and LDC outputs an "label YMM0 is undefined" error. Are they not supported?
To illutrate : https://run.dlang.io/is/IqDHlK

By the way, how can i use instructions that are not listed in [1]?(vfmaddxxxps for example) And how are function parameters accessed if they are not on the stack?(looking up my own code in a debugger, i see that the majority of pointer parameters are already in registers rather than being on the stack.)
I need those so that i can write a better answer for [2].

Big thanks
[1] https://dlang.org/spec/iasm.html#supported_opcodes
[2] https://forum.dlang.org/thread/qyybpvwvbfkhlvulvuxa@forum.dlang.org
March 05, 2021
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:
> XMM registers work, but as soon as they are changed into YMM DMD outputs "bad type/size of operands %s" and LDC outputs an "label YMM0 is undefined" error. Are they not supported?
> To illutrate : https://run.dlang.io/is/IqDHlK
>
> By the way, how can i use instructions that are not listed in [1]?(vfmaddxxxps for example) And how are function parameters accessed if they are not on the stack?(looking up my own code in a debugger, i see that the majority of pointer parameters are already in registers rather than being on the stack.)
> I need those so that i can write a better answer for [2].
>
> Big thanks
> [1] https://dlang.org/spec/iasm.html#supported_opcodes
> [2] https://forum.dlang.org/thread/qyybpvwvbfkhlvulvuxa@forum.dlang.org

First of all, in 64 bit ABI, parameters are not passed on stack, therefore a[RBP] is a nonsense.

void complement32(simdbytes* a, simdbytes* b)

a is in RCX, b is in RDX on Windows
a is in RDI, b is in RSI on Linux

Secondly, there is no such thing as movaps YMMX, [RAX], but vmovaps YMM3, [RAX]
Same for vxorps, but there are 3 operands, not 2.






March 05, 2021
On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:
> First of all, in 64 bit ABI, parameters are not passed on stack, therefore a[RBP] is a nonsense.
>
> void complement32(simdbytes* a, simdbytes* b)
>
> a is in RCX, b is in RDX on Windows
> a is in RDI, b is in RSI on Linux
I'm confused, with your help i've been able to find the function calling convention but on LDC-generated code, sometimes i see the layout being reversed(The function i was looking at is a 7 argument function, all are pointers. The first argument is on the stack, the seventh and last is in RCX) and the offsets don't seem to make sense either(first arguemnt as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])

> Secondly, there is no such thing as movaps YMMX, [RAX], but vmovaps YMM3, [RAX]
> Same for vxorps, but there are 3 operands, not 2.
You're absolutely right, but apparently it only accepts the two-operand version from SSE.
Other AVX/AVX2/AVX512 instructions that have «v» prefixed aren't recognized either("Error: unknown opcode vmovaps"), is AVX(2) with YMM registers supported for «asm{}» statements?


March 06, 2021
On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:
> On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:
>> First of all, in 64 bit ABI, parameters are not passed on stack, therefore a[RBP] is a nonsense.
>>
>> void complement32(simdbytes* a, simdbytes* b)
>>
>> a is in RCX, b is in RDX on Windows
>> a is in RDI, b is in RSI on Linux
> I'm confused, with your help i've been able to find the function calling convention but on LDC-generated code, sometimes i see the layout being reversed(The function i was looking at is a 7 argument function, all are pointers. The first argument is on the stack, the seventh and last is in RCX) and the offsets don't seem to make sense either(first arguemnt as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])
>
>> Secondly, there is no such thing as movaps YMMX, [RAX], but vmovaps YMM3, [RAX]
>> Same for vxorps, but there are 3 operands, not 2.
> You're absolutely right, but apparently it only accepts the two-operand version from SSE.
> Other AVX/AVX2/AVX512 instructions that have «v» prefixed aren't recognized either("Error: unknown opcode vmovaps"), is AVX(2) with YMM registers supported for «asm{}» statements?


I just made some tests, it seems that D has invented his own calling convention. And it's not documented. If you decorate your function with extern(C) it should respect the x86-64 ABI conventions. This is what I got for a 7 parameters function. The two compilers seems to do the same thing:

param no., extern(C), extern(D)
1 RCX		RSP + 56
2 RDX           RSP + 48
3 R8		RSP + 40		
4 R9            R9
5 RSP + 40	R8
6 RSP + 48      RDX
7 RSP + 56      RCX

I would stick to extern(C), the extern(D) convention seems completely illogical, they push the first 3 parameters on the stack from left to right, but if there are less than 4, they use register transfer. WTF.

Note: tested on Windows, probably on Linux both conventions will use Linux ABI conventional registers and will not reserve 32 bytes on stack.

Now, on the other side, it seems that LDC is one step behind DMD because - you are right - it doesn't support AVX-2 instructions operating on ymm registers.


March 06, 2021
On Saturday, 6 March 2021 at 10:45:08 UTC, Rumbu wrote:
> On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:
>> [...]
>
>
> I just made some tests, it seems that D has invented his own calling convention. And it's not documented. If you decorate your function with extern(C) it should respect the x86-64 ABI conventions. This is what I got for a 7 parameters function. The two compilers seems to do the same thing:
>
> [...]

What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.
March 06, 2021
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:
> XMM registers work, but as soon as they are changed into YMM DMD outputs "bad type/size of operands %s" and LDC outputs an "label YMM0 is undefined" error. Are they not supported?
> To illutrate : https://run.dlang.io/is/IqDHlK

LDC's support for DMD-style inline asm is limited; GDC-style inline asm is the preferred way (e.g., not restricted to x86[_64] and no need to worry about calling convention details).

Your example can be reduced to a trivial:

import core.simd;
ubyte32 complement32(ubyte32 a, ubyte32 b)
{
    return a ^ b;
}

which yields the following asm with `ldc2 -mattr=avx -O` (see https://d.godbolt.org/z/ex7YE7):

_D7example12complement32FNhG32hQgZQj:
        vxorps  ymm0, ymm1, ymm0
        ret
March 06, 2021
On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:

> What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.

extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows:

https://dlang.org/spec/abi.html#function_calling_conventions

There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.
March 06, 2021
On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:
> On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:
>
>> What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.
>
> extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows:
>
> https://dlang.org/spec/abi.html#function_calling_conventions
>
> There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.

The main difference is that the params are reversed for extern(D), at least with DMD and LDC, not with GDC. And that can't be easily changed because of all the naked DMD-style inline asm code (GDC doesn't support that, so no problem for GDC). This comes up regularly here in this forum whenever people experiment with DMD-style asm.

There are other slight breakages of that 'spec', e.g., LDC's extern(D) ABI is very similar to Microsoft's __vectorcall (so that e.g. vectors are passed in registers).
March 06, 2021
On Saturday, 6 March 2021 at 12:29:07 UTC, kinke wrote:
> There are other slight breakages of that 'spec', e.g., LDC's extern(D) ABI is very similar to Microsoft's __vectorcall (so that e.g. vectors are passed in registers).

[Windows only, to prevent any more confusion.]
March 06, 2021
On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:
> On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:
>
>> What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.
>
> extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows:
>
> https://dlang.org/spec/abi.html#function_calling_conventions
>
> There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.

Where exactly is documented the extern(D) x86-64 calling convention? Because currently seems like a mess according to the dissasembly. First X parameters on stack from left to right, last 4 in registers. But wait, if you have less than 4 parameters, they are passed in register. Again, WTF?
« First   ‹ Prev
1 2