Search
How do I do a 64 bits mulhi in D?
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
kinke
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Iain Buclaw
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Walter Bright

Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:

``````ulong mulhi(ulong a, ulong b) {
return (ucent(a) * ucent(b)) >> 64;
}
``````

And this is how you'd implement it in say, C++, using `__int128`.

This operation is absolutely capital to do hashing fast, as it allows to do a ton of shit and adds in one go. You'd ask, isn't that what the good old regular multiplication does too? and no, it isn't, because it never shift right, so high bits can never influence lowers ones.

On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:

>

Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:

There's also a BSL-licensed copy in DustMite, used for polynomial hashing.

BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.

On Sunday, 27 November 2022 at 01:53:11 UTC, Vladimir Panteleev wrote:

>

On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:

>

Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:

There's also a BSL-licensed copy in DustMite, used for polynomial hashing.

BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

A solution with LDC, using inline-(LLVM-)IR:

``````ulong mulhi(ulong a, ulong b)
{
import ldc.llvmasm;
return __ir!(`
%a = zext i64 %0 to i128
%b = zext i64 %1 to i128
%r = mul i128 %a, %b
%r2 = lshr i128 %r, 64
%r3 = trunc i128 %r2 to i64
ret i64 %r3`, ulong)(a, b);
}
``````

On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:

>

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

A solution with LDC, using inline-(LLVM-)IR:

``````ulong mulhi(ulong a, ulong b)
{
import ldc.llvmasm;
return __ir!(`
%a = zext i64 %0 to i128
%b = zext i64 %1 to i128
%r = mul i128 %a, %b
%r2 = lshr i128 %r, 64
%r3 = trunc i128 %r2 to i64
ret i64 %r3`, ulong)(a, b);
}
``````

I wasn't aware of that feature. Not very portable, but way more portable than the previous solution. Still compiler specific, but at least not plateform specific anymore.

On Sunday, 27 November 2022 at 14:22:28 UTC, deadalnix wrote:

>

I wasn't aware of that feature. Not very portable, but way more portable than the previous solution. Still compiler specific, but at least not plateform specific anymore.

There is still a problem, but this is getting better (assuming one uses LDC):

https://github.com/llvm/llvm-project/issues/59217

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

For DMD, yes. For GDC and LDC, this uses the syntax which declares input and output registers, so it's inlinable and should compile to one (inlined) instruction.

On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:

>

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

A solution with LDC, using inline-(LLVM-)IR:

For completeness sake, the X86 backend has built-in functions with GDC (LDC might have the same too).

``````import gcc.builtins;

ulong mulhi(ulong a, ulong b)
{
alias __m64 = __vector(short[4]);
__m64 res = __builtin_ia32_pmulhw(*cast(__m64*)&a, *cast(__m64*)&b);
return *cast(ulong*)&res;
}
__EOF__
_D3vec5mulhiFmmZm:
.cfi_startproc
movq	%rdi, %xmm0
movq	%rsi, %xmm1
pmulhw	%xmm1, %xmm0
movq	%xmm0, %rax
ret
.cfi_endproc
``````

On Sunday, 27 November 2022 at 19:20:16 UTC, Iain Buclaw wrote:

>

For completeness sake, the X86 backend has built-in functions with GDC (LDC might have the same too).

This is great to have (and I wish we had more standardized intrinsics across compilers) but I think this is doable without MMX.

```On 11/26/2022 5:53 PM, Vladimir Panteleev wrote:
> On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
>> Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:
>