I asked Grok to write me a portable 64x64->128 multiplication function in D to get the pattern, and this is what it wrote:

// Multiplies two 64-bit unsigned integers, returning a 128-bit result as (low, high)
(ulong, ulong) mul64x64to128(ulong x, ulong y) {
// Split into 32-bit halves
uint x_l = cast(uint)x;
uint x_h = cast(uint)(x >> 32);
uint y_l = cast(uint)y;
uint y_h = cast(uint)(y >> 32);

// Compute partial products
ulong p_ll = cast(ulong)x_l * y_l; // Bits 0-63
ulong p_lh = cast(ulong)x_l * y_h; // Bits 32-95
ulong p_hl = cast(ulong)x_h * y_l; // Bits 32-95
ulong p_hh = cast(ulong)x_h * y_h; // Bits 64-127

ulong low = p_ll; // Lower 64 bits
uint p_lh_low = cast(uint)p_lh;
uint p_hl_low = cast(uint)p_hl;
uint p_ll_high = cast(uint)(p_ll >> 32);

// Compute carry from middle terms
ulong sum_mid = cast(ulong)p_lh_low + p_hl_low + p_ll_high;
ulong carry = sum_mid >> 32;

// Upper 64 bits
ulong high = p_hh + (p_lh >> 32) + (p_hl >> 32) + carry;

return (low, high);
}

What I loved is that its function returned a pair of ulong's, because apparently it made an assumption that we had in-language tuples! I wonder where it got that idea?

We should learn from our AI overlords! :P