On 20 June 2012 13:59, Don Clugston <dac@nospam.com> wrote:
You and I seem to be from different planets. I have almost never written as asm function which was suitable for inlining.

Take a look at std.internal.math.biguintX86.d

I do not know how to write that code without inline asm.

Interesting.
I wish I could paste some counter-examples, but they're all proprietary >_<

I think they key detail here is where you stated, they _always_ include a loop. Is this because it's hard to manipulate the compiler into the correct interaction with the flags register?
I'd be interested to compare the compiled D code, and your hand written asm code, to see where exactly the optimiser goes wrong. It doesn't look like you're exploiting too many tricks (at a brief glance), it's just nice tight hand written code, which the optimiser should theoretically be able to get right...

I find optimisers are very good at code simplification, assuming that you massage the code/expressions to neatly match any architectural quirks.
I also appreciate that good x86 code is possibly the hardest architecture for an optimiser to get right...