On 20 June 2012 11:32, Tobias Pankrath <tobias@pankrath.net> wrote:

Inline assembly has been relatively useless in GCC for years. Inline asm
interferes with the optimisers ability to do a good job, which basically
makes use of inline assembly self-defeating.
The only time I ever need to use inline-asm is to interface an arch feature
that has no API. As long as there are intrinsics for all the opcodes one
might want, then it's better to use them.

That said, as stated above, if use of this stuff is for performance, then
using an inline-asm block will ruin the surrounding code anyway,

Could someone explain to me, why inline asm screws up the optimizer? My naive view on the matter is, that the optimizer has full knowledge of what is going on regardless of whether intrinsics or asm is used. I could also think of an optimizer that optimizes inline asm, too. For example by reassigning registers etc.

It's because the compiler doesn't understand assembly code. It has no knowledge of what it actually does, and as a result, just treats it as a black box.

Since it has no idea what it does, and doesn't know how it may or may not relate to the surrounding code, the compiler conservatively preserves the order of operations on either side of the asm block for safety.

Worse, the asm block may write to memory, which potentially invalidates the state of resident present in registers. Most compilers will force a store and reload of non-local variables on either side of the asm block.

This is the main reason opcode intrinsics became popular rather than using the IA, particularly for things like maths/simd/etc, where use of asm is typically for optimisation. You can't use SSE code within an IA block as an optimisation if your use of IA its self causes optimisation to fail in the surrounding code. Usage of IA blocks in most cases of that type will result in slower code.