June 20, 2012
On 20 June 2012 11:32, Tobias Pankrath <tobias@pankrath.net> wrote:

> Inline assembly has been relatively useless in GCC for years. Inline asm
>> interferes with the optimisers ability to do a good job, which basically
>> makes use of inline assembly self-defeating.
>> The only time I ever need to use inline-asm is to interface an arch
>> feature
>> that has no API. As long as there are intrinsics for all the opcodes one
>> might want, then it's better to use them.
>>
>
>  That said, as stated above, if use of this stuff is for performance, then
>> using an inline-asm block will ruin the surrounding code anyway,
>>
>
> Could someone explain to me, why inline asm screws up the optimizer? My naive view on the matter is, that the optimizer has full knowledge of what is going on regardless of whether intrinsics or asm is used. I could also think of an optimizer that optimizes inline asm, too. For example by reassigning registers etc.
>

It's because the compiler doesn't understand assembly code. It has no
knowledge of what it actually does, and as a result, just treats it as a
black box.
Since it has no idea what it does, and doesn't know how it may or may not
relate to the surrounding code, the compiler conservatively preserves the
order of operations on either side of the asm block for safety.
Worse, the asm block may write to memory, which potentially invalidates the
state of resident present in registers. Most compilers will force a store
and reload of non-local variables on either side of the asm block.

This is the main reason opcode intrinsics became popular rather than using the IA, particularly for things like maths/simd/etc, where use of asm is typically for optimisation. You can't use SSE code within an IA block as an optimisation if your use of IA its self causes optimisation to fail in the surrounding code. Usage of IA blocks in most cases of that type will result in slower code.


June 20, 2012
On 19/06/12 20:19, Iain Buclaw wrote:
> Hi,
>
> Had round one of the code review process, so I'm going to post the main
> issues here that most affect D users / the platforms they want to run on
> / the compiler version they want to use.
>
>
>
> 1) D Inline Asm and naked function support is raising far too many alarm
> bells. So would just be easier to remove it and avoid all the other
> comments on why we need middle-end and backend headers in gdc.

You seem to be conflating a couple of unrelated issues here.
One is the calling convention. The other is inline asm.

Comments in the thread about "asm is mostly used for short things which get inlined" leave me completely baffled, as it is completely wrong.

There are two uses for asm, and they are very different:
(1) Functionality. This happens when there are gaps in the language, and you get an abstraction inversion. You can address these with intrinsics.
(2) Speed. High-speed, all-asm functions. These _always_ include a loop.


You seem to be focusing on (1), but case (2) is completely different.

Case (2) cannot be replaced with intrinsics. For example, you can't write asm code using MSVC intrinsics (because the compiler rewrites your code).
Currently, D is the best way to write (2). It is much, much better than an external assembler.
June 20, 2012
On 20/06/12 03:01, Alex Rønne Petersen wrote:
> On 20-06-2012 02:58, Timon Gehr wrote:
>> On 06/20/2012 02:04 AM, Alex Rønne Petersen wrote:
>>> On 20-06-2012 01:55, Timon Gehr wrote:
>>>> On 06/20/2012 12:47 AM, Alex Rønne Petersen wrote:
>>>>> On 19-06-2012 23:52, Walter Bright wrote:
>>>>>> On 6/19/2012 1:36 PM, bearophile wrote:
>>>>>>>> No, but the idea was to allow D to innovate on calling
>>>>>>>> conventions without disturbing code that needed to
>>>>>>>> interface with C.
>>>>>>>
>>>>>>> The idea is nice, but ideas aren't enough. Where are the benchmarks
>>>>>>> that show a
>>>>>>> performance improvement over the C calling convention? And even if
>>>>>>> such
>>>>>>> improvement is present, is it worth it in the face of people that
>>>>>>> don't want to
>>>>>>> add it to GCC?
>>>>>>
>>>>>> GDC can certainly define its D calling convention to match GCC's.
>>>>>> It's
>>>>>> an "implementation defined" thing, not a language defined one.
>>>>>>
>>>>>
>>>>> Then let's please rename it to the DMD ABI instead of calling it the D
>>>>> ABI and making it look like it's part of the language on the website.
>>>>> Further, D mangling rules should be separate from calling convention.
>>>>>
>>>>
>>>> IIRC currently, the calling convention is mangled into the symbol name.
>>>> Do you want to remove this?
>>>
>>> Not that I can see from http://dlang.org/abi.html ?
>>>
>>
>> TypeFunction:
>> CallConvention FuncAttrs Arguments ArgClose Type
>>
>> CallConvention:
>> F // D
>> U // C
>> W // Windows
>> V // Pascal
>> R // C++
>>
>
> I see. I think it's a mistake to call that calling convention "D". I'm
> not against removing it, but the description is highly misleading.

And "C++ calling convention" doesn't make any sense. There is no such thing. On Windows, every vendor does it differently (even the ones who claim to be compatible with one another!).
June 20, 2012
On 20/06/12 00:55, Manu wrote:
> On 20 June 2012 01:07, Walter Bright <newshound2@digitalmars.com
> <mailto:newshound2@digitalmars.com>> wrote:
>
>     On 6/19/2012 1:58 PM, Manu wrote:
>
>         I find a thorough suite of architecture intrinsics are usually
>         the fastest and
>         cleanest way to the best possible code, although 'naked' may be
>         handy in this
>         circumstance too...
>
>
>     Do a grep for "naked" across the druntime library sources. For
>     example, its use in druntime/src/rt/alloca.d, where it is very much
>     needed, as alloca() is one of those "magic" functions.
>
>
> I never argued against naked... I agree it's mandatory.
>
>
>     Do a grep for "asm" across the druntime library sources. Can you
>     justify all of that with some other scheme?
>
>
> I think almost all the blocks I just browsed through could be easily
> written with nothing more than the register alias feature I suggested,
> and perhaps a couple of opcode intrinsics.
> And as a bonus, they would also be readable. I can imagine cases where
> the optimiser would have more freedom too.
>
>
>         Thinking more about the implications of removing the inline asm,
>         what would
>         REALLY roxors, would be a keyword to insist a variable is
>         represented by a
>         register, and by extension, to associate it with a specific
>         register:
>
>
>     This was a failure in C.
>
>
> Really? This is the missing link between mandatory asm blocks, and being
> able to do it in high level code with intrinsics.
> The 'register' keyword was similarly fail as 'inline'.. __forceinline
> was not fail, it is actually mandatory. I'd argue that __forceregister
> would be similarly useful in C aswell, but the real power would come
> from being able to specify the particular register to alias.
>
>         This would almost entirely eliminate the usefulness of an inline
>         assembler.
>         Better yet, this could use the 'new' attribute syntax, which
>         most agree will
>         support arguments:
>         @register(rsp) int x;
>
>
>     Some C compilers did have such pseudo-register abilities. It was a
>     failure in practice.
>
>
> Really? I've never seen that. What about it was fail?
>
>     I really don't understand preferring all these rather convoluted
>     enhancements to avoid something simple and straightforward like the
>     inline assembler. The use of IA in the D runtime library, for
>     example, has been quite successful.
>
>
> I agree, IA is useful and has been successful, but it has drawbacks too.
>    * IA ruins optimisation around the IA block
>    * IA doesn't inline well. intrinsics allow much greater opportunity
> for efficient integration into the calling context
>    * most IA functions are small, and prime candidates for inlining (see
> points 1 and 2)

You and I seem to be from different planets. I have almost never written as asm function which was suitable for inlining.

Take a look at std.internal.math.biguintX86.d

I do not know how to write that code without inline asm.
June 20, 2012
On 20 June 2012 13:51, Don Clugston <dac@nospam.com> wrote:

> On 19/06/12 20:19, Iain Buclaw wrote:
>
>> Hi,
>>
>> Had round one of the code review process, so I'm going to post the main issues here that most affect D users / the platforms they want to run on / the compiler version they want to use.
>>
>>
>>
>> 1) D Inline Asm and naked function support is raising far too many alarm bells. So would just be easier to remove it and avoid all the other comments on why we need middle-end and backend headers in gdc.
>>
>
> You seem to be conflating a couple of unrelated issues here. One is the calling convention. The other is inline asm.
>
> Comments in the thread about "asm is mostly used for short things which get inlined" leave me completely baffled, as it is completely wrong.
>
> There are two uses for asm, and they are very different:
> (1) Functionality. This happens when there are gaps in the language, and
> you get an abstraction inversion. You can address these with intrinsics.
> (2) Speed. High-speed, all-asm functions. These _always_ include a loop.
>
>
> You seem to be focusing on (1), but case (2) is completely different.
>
> Case (2) cannot be replaced with intrinsics. For example, you can't write
> asm code using MSVC intrinsics (because the compiler rewrites your code).
> Currently, D is the best way to write (2). It is much, much better than an
> external assembler.
>

Case 1 has no alternative to inline asm. I've thrown out some crazy ideas to think about (but nobody seems to like them). I still think it could be addressed though.

Case 2; I'm not convinced. These such long functions are the type I'm
generally interested in aswell, and have the most experience with. But in
my experience, they're almost always best written with intrinsics.
If they're small enough to be inlined, then you can't afford not to use
intrinsics. If they are truly big functions, then you begin to sacrifice
readability and maintain-ability, and certainly limit the number of
programmers that can maintain the code.
I rarely fail to produce identical code with intrinsics to that which I
would write with hand written asm. The flags are always the biggest
challenge, as discussed prior in this thread. I think that could be
addressed with better intrinsics.


June 20, 2012
> It's because the compiler doesn't understand assembly code. It has no
> knowledge of what it actually does, and as a result, just treats it as a
> black box.

But this is not set in stone. If I teach a compiler how to optimize intrinsics, can't I teach him to understand and optimize a (maybe small) subset of assembler, too? This must happen in the backend anyway, since intrinsics are platform-dependent, no?
June 20, 2012
On 20 June 2012 13:59, Don Clugston <dac@nospam.com> wrote:

> You and I seem to be from different planets. I have almost never written as asm function which was suitable for inlining.
>
> Take a look at std.internal.math.biguintX86.d
>
> I do not know how to write that code without inline asm.
>

Interesting.
I wish I could paste some counter-examples, but they're all proprietary >_<

I think they key detail here is where you stated, they _always_ include a
loop. Is this because it's hard to manipulate the compiler into the correct
interaction with the flags register?
I'd be interested to compare the compiled D code, and your hand written asm
code, to see where exactly the optimiser goes wrong. It doesn't look like
you're exploiting too many tricks (at a brief glance), it's just nice tight
hand written code, which the optimiser should theoretically be able to get
right...

I find optimisers are very good at code simplification, assuming that you
massage the code/expressions to neatly match any architectural quirks.
I also appreciate that good x86 code is possibly the hardest architecture
for an optimiser to get right...


June 20, 2012
On Wednesday, 20 June 2012 at 02:35:10 UTC, Walter Bright wrote:
> On 6/19/2012 6:06 PM, Alex Rønne Petersen wrote:
>> On 20-06-2012 03:01, Walter Bright wrote:
>>> On 6/19/2012 3:47 PM, Alex Rønne Petersen wrote:
>>>> On 19-06-2012 23:52, Walter Bright wrote:
>>>>> GDC can certainly define its D calling convention to match GCC's. It's
>>>>> an "implementation defined" thing, not a language defined one.
>>>> Then let's please rename it to the DMD ABI instead of calling it the D
>>>> ABI
>>>> and
>>>> making it look like it's part of the language on the website.
>>>
>>> The ABI is not part of the language. For example, the C Standard says
>>> nothing whatsoever about the C ABI.
>>
>> Then it's very misleading that it's under the language reference area of the
>> website and calls it the "D ABI" and not the "DMD ABI". This might have been
>> fine back when there was only DMD, but it really needs to be made clear that
>> this is not an ABI that compilers are required to follow.
>
> You're probably right.

He's definitely right. To have the mangling rules on the same page as the ABI and then act confused when people think it's part of the language? I was sputtering with rage. Sputtering!


June 20, 2012
On 2012-06-20 12:51, Don Clugston wrote:

> You seem to be conflating a couple of unrelated issues here.
> One is the calling convention. The other is inline asm.
>
> Comments in the thread about "asm is mostly used for short things which
> get inlined" leave me completely baffled, as it is completely wrong.
>
> There are two uses for asm, and they are very different:
> (1) Functionality. This happens when there are gaps in the language, and
> you get an abstraction inversion. You can address these with intrinsics.
> (2) Speed. High-speed, all-asm functions. These _always_ include a loop.
>
>
> You seem to be focusing on (1), but case (2) is completely different.
>
> Case (2) cannot be replaced with intrinsics. For example, you can't
> write asm code using MSVC intrinsics (because the compiler rewrites your
> code).
> Currently, D is the best way to write (2). It is much, much better than
> an external assembler.

You do understand that the GCC-style inline assembly will still be available?

-- 
/Jacob Carlborg
June 20, 2012
Jacob Carlborg:

> You do understand that the GCC-style inline assembly will still be available?

Are DMD and LDC2 going to accept that GCC-style inline assembly?

Bye,
bearophile