June 20, 2012
On 20 June 2012 17:15, Don Clugston <dac@nospam.com> wrote:

> On 20/06/12 13:22, Manu wrote:
>
>> I find optimisers are very good at code simplification, assuming that
>
> you massage the code/expressions to neatly match any architectural quirks.
>> I also appreciate that good x86 code is possibly the hardest architecture for an optimiser to get right...
>>
>
> Optimizers improved enormously during the 80's and 90's, but the rate of improvement seems to have slowed.
>
> With x86, out-of-order execution has made it very easy to get reasonably good code, and much harder to achieve perfection. Still, Core i7 is much easier than Core2, since Intel removed one of the most complicated bottlenecks (on core2 and earlier there is a max 3 reads per cycle, of registers you haven't written to in the previous 3 cycles).
>

Yeah okay, I can easily imagine the complexity for an x86 codegen. RISC architectures are so much more predictable.

How do you define 'perfection'? Performance as measured on what particular machine? :)


June 20, 2012
On 20/06/12 14:51, Manu wrote:
> On 20 June 2012 14:44, Don Clugston <dac@nospam.com
> <mailto:dac@nospam.com>> wrote:
>
>     On 20/06/12 13:04, Manu wrote:
>
>         On 20 June 2012 13:51, Don Clugston <dac@nospam.com
>         <mailto:dac@nospam.com>
>
>         <mailto:dac@nospam.com <mailto:dac@nospam.com>>> wrote:
>
>             On 19/06/12 20:19, Iain Buclaw wrote:
>
>                 Hi,
>
>                 Had round one of the code review process, so I'm going
>         to post
>                 the main
>                 issues here that most affect D users / the platforms
>         they want
>                 to run on
>                 / the compiler version they want to use.
>
>
>
>                 1) D Inline Asm and naked function support is raising
>         far too
>                 many alarm
>                 bells. So would just be easier to remove it and avoid
>         all the other
>                 comments on why we need middle-end and backend headers
>         in gdc.
>
>
>             You seem to be conflating a couple of unrelated issues here.
>             One is the calling convention. The other is inline asm.
>
>             Comments in the thread about "asm is mostly used for short
>         things
>             which get inlined" leave me completely baffled, as it is
>         completely
>             wrong.
>
>             There are two uses for asm, and they are very different:
>             (1) Functionality. This happens when there are gaps in the
>         language,
>             and you get an abstraction inversion. You can address these with
>             intrinsics.
>             (2) Speed. High-speed, all-asm functions. These _always_
>         include a loop.
>
>
>             You seem to be focusing on (1), but case (2) is completely
>         different.
>
>             Case (2) cannot be replaced with intrinsics. For example,
>         you can't
>             write asm code using MSVC intrinsics (because the compiler
>         rewrites
>             your code).
>             Currently, D is the best way to write (2). It is much, much
>         better
>             than an external assembler.
>
>
>         Case 1 has no alternative to inline asm. I've thrown out some crazy
>         ideas to think about (but nobody seems to like them). I still
>         think it
>         could be addressed though.
>
>         Case 2; I'm not convinced. These such long functions are the
>         type I'm
>         generally interested in aswell, and have the most experience
>         with. But
>         in my experience, they're almost always best written with
>         intrinsics.
>         If they're small enough to be inlined, then you can't afford not
>         to use
>         intrinsics. If they are truly big functions, then you begin to
>         sacrifice
>         readability and maintain-ability, and certainly limit the number of
>         programmers that can maintain the code.
>
>
>     I don't agree with that. In the situations I'm used to, using
>     intrinsics would not make it easier to read, and would definitely
>     not make it easier to maintain. I find it inconceivable that
>     somebody could understand the processor well enough to maintain the
>     code, and yet not understand asm.
>
>
> These functions of yours are 100% asm, that's not really what I would
> usually call 'inline asm'. That's really just 'asm' :)
> I think you've just illustrated one of my key points actually; that is
> that you can't just insert small inline asm blocks within regular code,
> the optimiser can't deal with it in most cases, so inevitably, the
> entire function becomes asm from start to end.

Personally I call it "inline asm" if I don't need to use a separate assembler. If you're using a different definition, then we don't actually disagree.

>
> I find I can typically produce equivalent code using carefully crafted
> intrinsics within regular C language structures. Also, often enough, the
> code outside the hot loop can be written in normal C for readability,
> since it barely affects performance, and trivial setup code will usually
> optimise perfectly anyway.
>
> You're correct that a person 'maintaining' such code, who doesn't have
> such a thorough understanding of the codegen may ruin it's perfectly
> tuned efficiency. This may be the case, but in a commercial coding
> environment, where a build MUST be delivered yesterday, the guy that
> understands it is on holiday, and you need to tweak the behaviour
> immediately, this is a much safer position to be in.
> This is a very real scenario. I can't afford to ignore this practical
> reality.

OK, it sounds like your use case is a bit different. The kinds of things I deal with are

> I might have a go at compiling the regular D code tonight, and seeing if
> I can produce identical assembly. I haven't tried this so much with x86
> as I have with RISC architectures, which have much more predictable codegen.
>
>
>         I rarely fail to produce identical code with intrinsics to that
>         which I
>         would write with hand written asm. The flags are always the biggest
>         challenge, as discussed prior in this thread. I think that could be
>         addressed with better intrinsics.
>
>
>     Again, look at std.internal.math.BiguintX86. There are many cases
>     there where you can swap two instructions, and the code will still
>     produce the correct result, but it will be 30% slower.
>
>
> But that's precisely the sort of thing optimisers/schedulers are best
> at. Can you point at a particular example where that is the case, that
> the scheduler would get it wrong if left to its own ordering algorithm?
> The opcode tables should have thorough information about the opcode
> timings and latencies.

I don't know. I can just tell you that they don't get it right. I suspect they don't take all of the bottlenecks into account.

For x86 I think the primary difficulty is that you cannot do it in independent passes. Eg, you won't find a register contention bottleneck until you've assigned registers, and the only way to get rid of it is to change the instructions you're using. Which involves backtracking through several passes. Very messy.

> The only thing that I find usually trips it up is
> not having knowledge of the probability of the data being in nearby
> cache. If it has 2 loads, and one is less likely to be in cache, it
> should be scheduled earlier.

Yes, that's definitely true.

>
> As a side question, x86 architectures perform wildly differently from
> each other. How do you reliably say some block of hand written x86 code
> is the best possible code on all available processors?
> Do you just benchmark on a suite of common processors available at the
> time? I can imagine the opcode timing tables, which are presumably
> rather different for every cpu, could easily feed wrong data to the
> codegen...

Yes. You can fairly easily determine a theoretical limit for a piece of code, and if you've reached that, you're optimal.

It's not possible to be simultaneously optimal on Pentium4 and something else, but my experience is that code optimized for PPro-series Intel machines is usually near-optimal on AMD. (The reverse is not true, it's much easier to be optimal on AMD).


>     I think that the SIMD case gives you a misleading impression,
>     because on x86 they are very easy to schedule (they nearly all take
>     the same number of cycles, etc). So it's not hard for the compiler
>     to do a good job of it.
>
>
> True, but it's one of the most common usage scenarios, so it can't be
> ignored. Some other case studies I feel close to are hardware emulation,
> software rasterisation, particles, fluid dynamics, rigid body dynamics,
> FFT's, and audio signal processing. In each, the only time I rarely need
> inline asm, usually only when there is a hole in the high level
> language, as you said earlier. I find this typically surfaces when
> needing to interact with the flags regs directly.

I agree with that. I think the need for asm in those cases could be greatly reduced. I'm just saying that there are cases where eliminating asm is not realistic.
June 20, 2012
On 20 June 2012 15:07, Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> wrote:
> On 20/06/12 14:31, Iain Buclaw wrote:
>>
>> Hands are tied, sorry.
>
>
> Is this planned as a short-term change for which a long-term solution will be developed, or is it likely to be a permanent split with DMD?
>

Likely permanent move away from having the a good portion of the frontend one big special case for i386.  I don't see it as a huge problem though.  However one or two people in IRC have asked if the GDC Extended Assembler could be renamed to __gcc_asm or __asm to make it a special / reserved feature of GDC, rather than competing with the D spec's namespace.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 20, 2012
On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw <ibuclaw@ubuntu.com> wrote:

> Hi,
>
> Had round one of the code review process, so I'm going to post the main issues here that most affect D users / the platforms they want to run on / the compiler version they want to use.
>
>
>
> 1) D Inline Asm and naked function support is raising far too many alarm bells. So would just be easier to remove it and avoid all the other comments on why we need middle-end and backend headers in gdc.
>
>
> 2) Code with #if V1 and V2 raised another bell with the request to remove all code that relies on internal macros with proper if() conditions. If something is always going to be turned off, remove it.
>
> So, we shall also be saying bye bye D1 in GDC.  We'll miss you!
>
>
> 3) For anyone who has submitted patches for Mingw and Apple - sorry, but I'm going to have to yank out or alter certain bits.  Apple GCC is irrelevant now, and some Mingw checks look for if(target) when it should really be checking if(host) and vice versa!
>
>
> Most discussion I would imagine be on the decision to remove D inline assembler support from gdc.  So, nay sayers, do your worst, but unfortunately there is a +1 here for removal.
>
>
> Regards
> Iain
>

I'm very much outside of my area of understanding but would it be possible to use CTFE+mixin to generate GCC asm from DMD style asm allowing people to still use a single version of the asm for both DMD and GDC?

Regards,
Brad Anderson


June 20, 2012
On Wednesday, June 20, 2012 13:33:53 Jacob Carlborg wrote:
> You do understand that the GCC-style inline assembly will still be available?

But inline assembler with the syntax that dmd uses is supposed to be part of the language. So, if gdc doesn't support it, it's not a fully compliant D compiler. It would be like if gdc didn't do

auto a = expression;

but instead did

expression = a auto;

except that the problem is more localized, because inline assembly is rather rare (unlike variable declarations). So, this a is a _huge_ deal.

- Jonathan M Davis
June 20, 2012
On 20 June 2012 17:00, Brad Anderson <eco@gnuk.net> wrote:
> On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
>>
>> Hi,
>>
>> Had round one of the code review process, so I'm going to post the main issues here that most affect D users / the platforms they want to run on / the compiler version they want to use.
>>
>>
>>
>> 1) D Inline Asm and naked function support is raising far too many alarm bells. So would just be easier to remove it and avoid all the other comments on why we need middle-end and backend headers in gdc.
>>
>>
>> 2) Code with #if V1 and V2 raised another bell with the request to remove all code that relies on internal macros with proper if() conditions. If something is always going to be turned off, remove it.
>>
>> So, we shall also be saying bye bye D1 in GDC.  We'll miss you!
>>
>>
>> 3) For anyone who has submitted patches for Mingw and Apple - sorry, but I'm going to have to yank out or alter certain bits.  Apple GCC is irrelevant now, and some Mingw checks look for if(target) when it should really be checking if(host) and vice versa!
>>
>>
>> Most discussion I would imagine be on the decision to remove D inline assembler support from gdc.  So, nay sayers, do your worst, but unfortunately there is a +1 here for removal.
>>
>>
>> Regards
>> Iain
>
>
> I'm very much outside of my area of understanding but would it be possible to use CTFE+mixin to generate GCC asm from DMD style asm allowing people to still use a single version of the asm for both DMD and GDC?
>
> Regards,
> Brad Anderson

Hmm... doable, yes, but it would require a similarly complex construct as the implementation in the compiler.  GCC Assembler is much more expressive than D Inline Assembler, and requires for you to describe everything a given asm command is doing, inputs, outputs, clobbers, and labels that we may jump to (if any).   The only thing I worry is that CTFE is not powerful enough process a long set of instructions at a fast enough rate to make it benefitial.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 20, 2012
Le 20/06/2012 18:18, Iain Buclaw a écrit :
> On 20 June 2012 17:00, Brad Anderson<eco@gnuk.net>  wrote:
>> On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw<ibuclaw@ubuntu.com>  wrote:
>>>
>>> Hi,
>>>
>>> Had round one of the code review process, so I'm going to post the main
>>> issues here that most affect D users / the platforms they want to run on /
>>> the compiler version they want to use.
>>>
>>>
>>>
>>> 1) D Inline Asm and naked function support is raising far too many alarm
>>> bells. So would just be easier to remove it and avoid all the other comments
>>> on why we need middle-end and backend headers in gdc.
>>>
>>>
>>> 2) Code with #if V1 and V2 raised another bell with the request to remove
>>> all code that relies on internal macros with proper if() conditions. If
>>> something is always going to be turned off, remove it.
>>>
>>> So, we shall also be saying bye bye D1 in GDC.  We'll miss you!
>>>
>>>
>>> 3) For anyone who has submitted patches for Mingw and Apple - sorry, but
>>> I'm going to have to yank out or alter certain bits.  Apple GCC is
>>> irrelevant now, and some Mingw checks look for if(target) when it should
>>> really be checking if(host) and vice versa!
>>>
>>>
>>> Most discussion I would imagine be on the decision to remove D inline
>>> assembler support from gdc.  So, nay sayers, do your worst, but
>>> unfortunately there is a +1 here for removal.
>>>
>>>
>>> Regards
>>> Iain
>>
>>
>> I'm very much outside of my area of understanding but would it be possible
>> to use CTFE+mixin to generate GCC asm from DMD style asm allowing people to
>> still use a single version of the asm for both DMD and GDC?
>>
>> Regards,
>> Brad Anderson
>
> Hmm... doable, yes, but it would require a similarly complex construct
> as the implementation in the compiler.  GCC Assembler is much more
> expressive than D Inline Assembler, and requires for you to describe
> everything a given asm command is doing, inputs, outputs, clobbers,
> and labels that we may jump to (if any).   The only thing I worry is
> that CTFE is not powerful enough process a long set of instructions at
> a fast enough rate to make it benefitial.
>

Can't gdc frontend process asm to gcc's asm and go from that ?
June 20, 2012
On 20 June 2012 17:08, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> On Wednesday, June 20, 2012 13:33:53 Jacob Carlborg wrote:
>> You do understand that the GCC-style inline assembly will still be available?
>
> But inline assembler with the syntax that dmd uses is supposed to be part of the language. So, if gdc doesn't support it, it's not a fully compliant D compiler. It would be like if gdc didn't do
>
> auto a = expression;
>
> but instead did
>
> expression = a auto;
>
> except that the problem is more localized, because inline assembly is rather rare (unlike variable declarations). So, this a is a _huge_ deal.
>

1) DMD is capable of parsing both D Inline and GCC Extended assembler
without throwing errors in the lexer/parser.
2) GDC defines GNU_InlineAsm, and does *not* define D_InlineAsm,
D_InlineAsm_X86, or D_InlineAsm_X86_64.

Not a huge deal if you follow standard coding practices, putting inline asm in D_InlineAsm blocks, etc.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 20, 2012
On 20-06-2012 18:08, Jonathan M Davis wrote:
> On Wednesday, June 20, 2012 13:33:53 Jacob Carlborg wrote:
>> You do understand that the GCC-style inline assembly will still be
>> available?
>
> But inline assembler with the syntax that dmd uses is supposed to be part of
> the language. So, if gdc doesn't support it, it's not a fully compliant D
> compiler. It would be like if gdc didn't do
>
> auto a = expression;
>
> but instead did
>
> expression = a auto;
>
> except that the problem is more localized, because inline assembly is rather
> rare (unlike variable declarations). So, this a is a _huge_ deal.
>
> - Jonathan M Davis

In practice, no it isn't. Do you really think all C/C++ compilers are truly standard compliant in every single aspect of the standard, for instance? And besides, how many of D's users actually write inline assembly in the first place?

In reality, I don't think removing inline assembly support from GDC is going to be as problematic as you make it sound, especially when GDC does provide its own syntax based on the very well-established GCC syntax. And I think the comparison you offer is very exaggerated.

Besides, the D spec has always been incredibly x86-centric, something I've been screaming about for a long time now (see my rants on shared). Making it less x86-centric is a *good* thing IMHO. Implementing a D compiler shouldn't require implementing an inline assembler for x86. It just doesn't make any sense, as much as it is neat to have a standard inline assembler.

Actually, why would we even have the inline assembly version identifiers if compilers weren't allowed to omit inline assembly syntax?

And let's not forget interpreters, JITs, ...

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
June 20, 2012
On 20 June 2012 17:23, deadalnix <deadalnix@gmail.com> wrote:
> Le 20/06/2012 18:18, Iain Buclaw a écrit :
>>
>> On 20 June 2012 17:00, Brad Anderson<eco@gnuk.net>  wrote:
>>>
>>> On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw<ibuclaw@ubuntu.com>  wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Had round one of the code review process, so I'm going to post the main
>>>> issues here that most affect D users / the platforms they want to run on
>>>> /
>>>> the compiler version they want to use.
>>>>
>>>>
>>>>
>>>> 1) D Inline Asm and naked function support is raising far too many alarm
>>>> bells. So would just be easier to remove it and avoid all the other
>>>> comments
>>>> on why we need middle-end and backend headers in gdc.
>>>>
>>>>
>>>> 2) Code with #if V1 and V2 raised another bell with the request to
>>>> remove
>>>> all code that relies on internal macros with proper if() conditions. If
>>>> something is always going to be turned off, remove it.
>>>>
>>>> So, we shall also be saying bye bye D1 in GDC.  We'll miss you!
>>>>
>>>>
>>>> 3) For anyone who has submitted patches for Mingw and Apple - sorry, but I'm going to have to yank out or alter certain bits.  Apple GCC is irrelevant now, and some Mingw checks look for if(target) when it should really be checking if(host) and vice versa!
>>>>
>>>>
>>>> Most discussion I would imagine be on the decision to remove D inline assembler support from gdc.  So, nay sayers, do your worst, but unfortunately there is a +1 here for removal.
>>>>
>>>>
>>>> Regards
>>>> Iain
>>>
>>>
>>>
>>> I'm very much outside of my area of understanding but would it be
>>> possible
>>> to use CTFE+mixin to generate GCC asm from DMD style asm allowing people
>>> to
>>> still use a single version of the asm for both DMD and GDC?
>>>
>>> Regards,
>>> Brad Anderson
>>
>>
>> Hmm... doable, yes, but it would require a similarly complex construct as the implementation in the compiler.  GCC Assembler is much more expressive than D Inline Assembler, and requires for you to describe everything a given asm command is doing, inputs, outputs, clobbers, and labels that we may jump to (if any).   The only thing I worry is that CTFE is not powerful enough process a long set of instructions at a fast enough rate to make it benefitial.
>>
>
> Can't gdc frontend process asm to gcc's asm and go from that ?

It's what we did, but there's a lot of information that we require about, eg: the function frame pointer, that is not available to the frontend when trying to re-create just exactly what the assembly code is requiring us to do.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';