Thread overview
How do I do a 64 bits mulhi in D?
Nov 27, 2022
deadalnix
Nov 27, 2022
Vladimir Panteleev
Nov 27, 2022
deadalnix
Nov 27, 2022
kinke
Nov 27, 2022
deadalnix
Nov 27, 2022
deadalnix
Nov 27, 2022
Iain Buclaw
Nov 27, 2022
Vladimir Panteleev
Nov 27, 2022
Vladimir Panteleev
Nov 27, 2022
Walter Bright
November 27, 2022

Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:

ulong mulhi(ulong a, ulong b) {
    return (ucent(a) * ucent(b)) >> 64;
}

And this is how you'd implement it in say, C++, using __int128.

This operation is absolutely capital to do hashing fast, as it allows to do a ton of shit and adds in one go. You'd ask, isn't that what the good old regular multiplication does too? and no, it isn't, because it never shift right, so high bits can never influence lowers ones.

November 27, 2022

On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:

>

Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:

For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d

There's also a BSL-licensed copy in DustMite, used for polynomial hashing.

BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.

November 27, 2022

On Sunday, 27 November 2022 at 01:53:11 UTC, Vladimir Panteleev wrote:

>

On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:

>

Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:

For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d

There's also a BSL-licensed copy in DustMite, used for polynomial hashing.

BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

November 27, 2022

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

A solution with LDC, using inline-(LLVM-)IR:

ulong mulhi(ulong a, ulong b)
{
    import ldc.llvmasm;
    return __ir!(`
        %a = zext i64 %0 to i128
        %b = zext i64 %1 to i128
        %r = mul i128 %a, %b
        %r2 = lshr i128 %r, 64
        %r3 = trunc i128 %r2 to i64
        ret i64 %r3`, ulong)(a, b);
}
November 27, 2022

On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:

>

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

A solution with LDC, using inline-(LLVM-)IR:

ulong mulhi(ulong a, ulong b)
{
    import ldc.llvmasm;
    return __ir!(`
        %a = zext i64 %0 to i128
        %b = zext i64 %1 to i128
        %r = mul i128 %a, %b
        %r2 = lshr i128 %r, 64
        %r3 = trunc i128 %r2 to i64
        ret i64 %r3`, ulong)(a, b);
}

I wasn't aware of that feature. Not very portable, but way more portable than the previous solution. Still compiler specific, but at least not plateform specific anymore.

November 27, 2022

On Sunday, 27 November 2022 at 14:22:28 UTC, deadalnix wrote:

>

I wasn't aware of that feature. Not very portable, but way more portable than the previous solution. Still compiler specific, but at least not plateform specific anymore.

There is still a problem, but this is getting better (assuming one uses LDC):

https://github.com/llvm/llvm-project/issues/59217

November 27, 2022

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

For DMD, yes. For GDC and LDC, this uses the syntax which declares input and output registers, so it's inlinable and should compile to one (inlined) instruction.

November 27, 2022

On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:

>

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:

>

That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.

A solution with LDC, using inline-(LLVM-)IR:

For completeness sake, the X86 backend has built-in functions with GDC (LDC might have the same too).

import gcc.builtins;

ulong mulhi(ulong a, ulong b)
{
    alias __m64 = __vector(short[4]);
    __m64 res = __builtin_ia32_pmulhw(*cast(__m64*)&a, *cast(__m64*)&b);
    return *cast(ulong*)&res;
}
__EOF__
_D3vec5mulhiFmmZm:
	.cfi_startproc
	movq	%rdi, %xmm0
	movq	%rsi, %xmm1
	pmulhw	%xmm1, %xmm0
	movq	%xmm0, %rax
	ret
	.cfi_endproc
November 27, 2022

On Sunday, 27 November 2022 at 19:20:16 UTC, Iain Buclaw wrote:

>

For completeness sake, the X86 backend has built-in functions with GDC (LDC might have the same too).

This is great to have (and I wish we had more standardized intrinsics across compilers) but I think this is doable without MMX.

November 27, 2022
On 11/26/2022 5:53 PM, Vladimir Panteleev wrote:
> On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
>> Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:
> 
> For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d


Should work that into:

https://github.com/dlang/dmd/blob/master/druntime/src/core/int128.d#L407