On 20 June 2012 03:58, Walter Bright <newshound2@digitalmars.com> wrote:
   Do a grep for "asm" across the druntime library sources. Can you justify all
   of that with some other scheme?


I think almost all the blocks I just browsed through could be easily written
with nothing more than the register alias feature I suggested, and perhaps a
couple of opcode intrinsics.

But I see nothing gained by that.

The gain is that by not using IA, the compiler could much better optimise and inline your code. Your code is likely more readable by more people.
Also, since Iain is proposing removing the inline assembler from GDC, it's clearly hard to maintain across different compilers. A higher level language defined construct may be simpler...


And as a bonus, they would also be readable.

I don't agree. The point of IA to me is so I can specify exactly what I want. If I wanted to do it at a higher level, I'd use normal D syntax.

In many cases, you need to write a big block of asm to do one single operation that's not expressible at the higher level... and in my experience, most of the time, that operation is addressing a register directly; most commonly, dealing with the stack pointer, or argument registers direcetly.


I can imagine cases where the
optimiser would have more freedom too.

But if I'm writing IA, I want to do it my way. Not the optimizer's way, which may or may not be able to give me what I want.

I think you typically want to do one very small detail your way, the rest of the function, let the optimiser make it the best of.
The result is very much comparable to the use of intrinsics in high level code.


Yes. C has a register keyword, and nobody uses it anymore. The troubles are many, starting with people always "register"ed the wrong variables, and it really didn't work out too well when compilers started doing live range register assignments. It's ignored by modern C compilers, and hasn't been carried forward into other languages.

You miss the point of the suggestion; as a mechanism to directly address particular registers in high level code, allowing you do eliminate many small asm blocks. C's failing is unrelated, the goal was totally different.


Really? I've never seen that. What about it was fail?

It's actually in DMC, believe it or not. It was a giant failure because nobody used it. It was in Borland's TurboC, too. It pretty much just throws a wrench into the gears of more sophisticated code generators.

I'm not surprised nobody used it in a niche compiler like DMC, especially when it's not supported by major compilers like GCC or MSC... It's not a feature of C, so most people wouldn't ever consider it, or even realise it's possible.

Of course it throws a gear in the works, it's a reasonably complex feature, but IA blocks themselves throw an equally large (and rather similar) gear in the works. The most naive implementation could probably do precisely what IA does, that is, to stop reordering across the IA block.
That should be just as safe when using intrinsics or explicit register aliasing as it is with inline asm. And that's only a start, I think the compiler could do better with time.
The compiler doesn't have much opportunity for improvement with IA, unless the compiler attempts to understand the IA block, which is in a totally different language, and architecture specific. Well defined high-level constructs help the compiler with the understanding it needs to do a good/safe job.
It's the same logic that supports opcode intrinsics, which became almost universally preferred to IA in appropriate situations, and are an undeniable success.


   I really don't understand preferring all these rather convoluted
   enhancements to avoid something simple and straightforward like the inline
   assembler. The use of IA in the D runtime library, for example, has been
   quite successful.


I agree, IA is useful and has been successful, but it has drawbacks too.
  * IA ruins optimisation around the IA block

dmd's optimizer is not so sensitive to that.

How can you safely reorder across an IA block? Is there a well defined mechanism to determine it's safe?
GCC has been failing at that forever. It takes a very conservative approach.
I guess the main problem is because GCC doesn't attempt to understand the asm block, it just pastes it in the output.


This one seems trivial, you just need one intrinsic:

  size_t reqsize = size * newcapacity;
  __jc(&Loverflow);

That's highly risky. The optimizer knows nothing at all about the state of the flags register, and does not take into account a dependency on the C flag when doing code motion. Nor would the compiler guarantee that the C flag is even set by however it chose to do the previous multiply (for example, the LEA instruction is often used to do multiplies, which leaves the C flag untouched. Oops!). Nothing connects the __jc intrinsic to that multiply operation.

True, but you could also perform the multiply explicitly with another intrinsic.
This reordering problem is perhaps the most difficult issue, but not necessarily insurmountable. And it's only really relevant where explicit interaction with the flags are involved.
I suspect it wouldn't be too much trouble to make that intrinsic encode some information that fuses it with the preceding operation as written in the source.
Alternatively use a __noreorder {} scope block or something surrounding the mul and jc..
Another possibility might be to make the intrinsic combine both operations as a compound: if(__mul_getc(T a, T b, ref in T res)) goto blah; // <- eliminates the need to take the address of a label
There are lots of different approaches, I'm sure an elegant solution is possible.


 Although it depends on a '&codeLabel' mechanism to get the label address (GCC
supports this in C, I'd love to see this in D too).

Note that supporting such will wind up disabling a lot of the data flow analysis, which is not set up to handle unknown edges between basic blocks.

No doubt, but it only affects code where that operation appears, which would be rather rare.


To summarize, I see a lot of complex new features, a significant rewrite of the optimizer, and a rewrite of a lot of existing code, and at the end of all that we're pretty much at the same state we are at now.

I agree, it's not trivial. It was just something to think about.
It's not quite the same place. The examples that have come up here are relatively trivial, so it doesn't add so much to those. It would add an awful lot to larger uses of asm, where it's really nice to be able to mix the explicit pseudo-asm code with regular automatic register assignments, and use of standard control structures (if/for/etc)