Emulate 64-bit mulh instruction - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » LDC » Emulate 64-bit mulh instruction

Thread overview

Emulate 64-bit mulh instruction
Mar 13, 2019 Kagamin
Mar 13, 2019 Johan Engelen
Mar 13, 2019 lithium iodate

March 13, 2019

Emulate 64-bit mulh instruction

Posted by Kagamin

Kagamin

Apparently this has no intrinsic, so wrote this code for x86 to compute 128 bit product:

ulong[2] mul(ulong a, ulong b)
{
    import ldc.intrinsics;
    ulong a1=cast(uint)a, a2=a>>32;
    ulong b1=cast(uint)b, b2=b>>32;
    ulong c1=a1*b1; //0+64
    ulong c2=a1*b2; //32+64
    ulong c3=a2*b1; //32+64
    ulong c4=a2*b2; //64+64
    auto d1o=llvm_uadd_with_overflow(c1,c2<<32);
    ulong d1=d1o.result;
    c4+=d1o.overflow;
    auto d2o=llvm_uadd_with_overflow(d1,c3<<32);
    ulong d2=d2o.result;
    c4+=d2o.overflow;
    //ulong d1=c1+(c2<<32);
    //ulong d2=d1+(c3<<32);
    ulong d3=c4+(c2>>32);
    ulong d4=d3+(c3>>32);
    return [d4,d2];
}

but the compiler doesn't recognize it as multiplication and doesn't generate single imul instruction. Is the code wrong or the compiler can't recognize it?

March 13, 2019

Re: Emulate 64-bit mulh instruction

Posted by Johan Engelen
in reply to Kagamin

Johan Engelen

Posted in reply to Kagamin

On Wednesday, 13 March 2019 at 16:06:34 UTC, Kagamin wrote:
> Apparently this has no intrinsic, so wrote this code for x86 to compute 128 bit product:
> ...
> but the compiler doesn't recognize it as multiplication and doesn't generate single imul instruction. Is the code wrong or the compiler can't recognize it?

I think the compiler can't recognize it, judging from other posts online.

-Johan

March 13, 2019

Re: Emulate 64-bit mulh instruction

Posted by lithium iodate
in reply to Kagamin

lithium iodate

Posted in reply to Kagamin

On Wednesday, 13 March 2019 at 16:06:34 UTC, Kagamin wrote:
> Apparently this has no intrinsic, so wrote this code for x86 to compute 128 bit product:

I cannot help you with your code directly, but I can propose an alternative:

import ldc.intrinsics;
pragma(LDC_inline_ir)
    R inlineIR(string s, R, P...)(P);

ulong[2] mul(ulong a, ulong b)
{
    ulong[2] result;
    inlineIR!(`
    %a = zext i64 %0 to i128
    %b = zext i64 %1 to i128
    %c = mul i128 %a, %b
    %d = bitcast [2 x i64]* %2 to i128*
    store i128 %c, i128* %d
    ret void`, void)(a, b, &result);
    return result;
}

This is optimized down to mul.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation