I asked Grok to write me a portable 64x64->128 multiplication function in D to get the pattern, and this is what it wrote:

// Multiplies two 64-bit unsigned integers, returning a 128-bit result as (low, high)
(ulong, ulong) mul64x64to128(ulong x, ulong y) {
    // Split into 32-bit halves
    uint x_l = cast(uint)x;
    uint x_h = cast(uint)(x >> 32);
    uint y_l = cast(uint)y;
    uint y_h = cast(uint)(y >> 32);
   
    // Compute partial products
    ulong p_ll = cast(ulong)x_l * y_l;        // Bits 0-63
    ulong p_lh = cast(ulong)x_l * y_h;        // Bits 32-95
    ulong p_hl = cast(ulong)x_h * y_l;        // Bits 32-95
    ulong p_hh = cast(ulong)x_h * y_h;        // Bits 64-127
   
    ulong low = p_ll;                         // Lower 64 bits
    uint p_lh_low = cast(uint)p_lh;
    uint p_hl_low = cast(uint)p_hl;
    uint p_ll_high = cast(uint)(p_ll >> 32);
   
    // Compute carry from middle terms
    ulong sum_mid = cast(ulong)p_lh_low + p_hl_low + p_ll_high;
    ulong carry = sum_mid >> 32;
   
    // Upper 64 bits
    ulong high = p_hh + (p_lh >> 32) + (p_hl >> 32) + carry;
   
    return (low, high);
}


What I loved is that its function returned a pair of ulong's, because apparently it made an assumption that we had in-language tuples! I wonder where it got that idea?

We should learn from our AI overlords! :P