June 20, 2012
On 6/19/2012 6:06 PM, Alex Rønne Petersen wrote:
> On 20-06-2012 03:01, Walter Bright wrote:
>> On 6/19/2012 3:47 PM, Alex Rønne Petersen wrote:
>>> On 19-06-2012 23:52, Walter Bright wrote:
>>>> GDC can certainly define its D calling convention to match GCC's. It's
>>>> an "implementation defined" thing, not a language defined one.
>>> Then let's please rename it to the DMD ABI instead of calling it the D
>>> ABI
>>> and
>>> making it look like it's part of the language on the website.
>>
>> The ABI is not part of the language. For example, the C Standard says
>> nothing whatsoever about the C ABI.
>
> Then it's very misleading that it's under the language reference area of the
> website and calls it the "D ABI" and not the "DMD ABI". This might have been
> fine back when there was only DMD, but it really needs to be made clear that
> this is not an ABI that compilers are required to follow.

You're probably right.


>>> Further, D mangling rules should be separate from calling convention.
>>
>> I disagree. The mangling rules are not part of the language
>> specification, either. But they are necessary so that a function with
>> one convention won't be connected to one with another.
>>
>
> If compilers employed their own mangling schemes, debuggers and other tools
> would never be able to properly demangle names. I think it is important that the
> mangling is at least emphasized as a highly recommended (but not required) part
> of the language to implementors.

I don't think we need to worry about that. Implementers tend to follow existing practice unless there is a very, very good reason.

June 20, 2012
On 2012-06-19 21:14, Russel Winder wrote:

> I never used V1 (though I do have the Tango book) so enforcing V2 will
> nto be a problem. Actually this means I can delete 65% of the SCons D
> tool :-)))))))))

You can use the Tango book with D2.

>> 3) For anyone who has submitted patches for Mingw and Apple -
>> sorry, but I'm going to have to yank out or alter certain bits.
>> Apple GCC is irrelevant now, and some Mingw checks look for
>> if(target) when it should really be checking if(host) and vice
>> versa!
>
> Is Apple GCC irrelevant? Apple itself has switched to Clang but GCC is
> still available via MacPorts – or am I missing something obvious, I am
> only an occasional Mac OS X user.

Probably not so much. Note that GCC is still available from Apple, still stuck at 4.2.x.

-- 
/Jacob Carlborg
June 20, 2012
Le 19/06/2012 22:58, Manu a écrit :
> Thinking more about the implications of removing the inline asm, what
> would REALLY roxors, would be a keyword to insist a variable is
> represented by a register, and by extension, to associate it with a
> specific register:
>    register int x;             // compiler assigns an unused register,
> promises it will remain resident, error if it can't maintain promise.
>    register int x : rsp;    // x aliases RSP; can now produce a function
> pre/postable in high level code.
> Repeat for the argument registers -> readable, high-level custom calling
> conventions!
>
> This would almost entirely eliminate the usefulness of an inline assembler.
> Better yet, this could use the 'new' attribute syntax, which most agree
> will support arguments:
> @register(rsp) int x;

Choosing registers is something the compiler is better at than us most of the time.

For this very reason, I think we want to go in the exact opposite direction : asm with compiler choosen register when possible.
June 20, 2012
Le 20/06/2012 04:34, Walter Bright a écrit :
> I don't think we need to worry about that. Implementers tend to follow
> existing practice unless there is a very, very good reason.
>

Did you ever heard of a company named microsoft ?
June 20, 2012
On 20 June 2012 10:42, deadalnix <deadalnix@gmail.com> wrote:

> Le 19/06/2012 22:58, Manu a écrit :
>>
>> This would almost entirely eliminate the usefulness of an inline
>> assembler.
>> Better yet, this could use the 'new' attribute syntax, which most agree
>> will support arguments:
>> @register(rsp) int x;
>>
>
> Choosing registers is something the compiler is better at than us most of the time.
>
> For this very reason, I think we want to go in the exact opposite direction : asm with compiler choosen register when possible.
>

...I think you've missed the entire point of my suggestion. But that's okay. I give up ;)


June 20, 2012
Le 20/06/2012 09:58, Manu a écrit :
> On 20 June 2012 10:42, deadalnix <deadalnix@gmail.com
> <mailto:deadalnix@gmail.com>> wrote:
>
>     Le 19/06/2012 22:58, Manu a écrit :
>
>         This would almost entirely eliminate the usefulness of an inline
>         assembler.
>         Better yet, this could use the 'new' attribute syntax, which
>         most agree
>         will support arguments:
>         @register(rsp) int x;
>
>
>     Choosing registers is something the compiler is better at than us
>     most of the time.
>
>     For this very reason, I think we want to go in the exact opposite
>     direction : asm with compiler choosen register when possible.
>
>
> ...I think you've missed the entire point of my suggestion.
> But that's okay. I give up ;)

We presented you example code where your approach isn't going to do the trick. You are free to ignore them.
June 20, 2012
> Inline assembly has been relatively useless in GCC for years. Inline asm
> interferes with the optimisers ability to do a good job, which basically
> makes use of inline assembly self-defeating.
> The only time I ever need to use inline-asm is to interface an arch feature
> that has no API. As long as there are intrinsics for all the opcodes one
> might want, then it's better to use them.

> That said, as stated above, if use of this stuff is for performance, then
> using an inline-asm block will ruin the surrounding code anyway,

Could someone explain to me, why inline asm screws up the optimizer? My naive view on the matter is, that the optimizer has full knowledge of what is going on regardless of whether intrinsics or asm is used. I could also think of an optimizer that optimizes inline asm, too. For example by reassigning registers etc.


June 20, 2012
On 20 June 2012 11:14, deadalnix <deadalnix@gmail.com> wrote:

> Le 20/06/2012 09:58, Manu a écrit :
>
>> On 20 June 2012 10:42, deadalnix <deadalnix@gmail.com <mailto:deadalnix@gmail.com>> wrote:
>>
>>    Le 19/06/2012 22:58, Manu a écrit :
>>
>>        This would almost entirely eliminate the usefulness of an inline
>>        assembler.
>>        Better yet, this could use the 'new' attribute syntax, which
>>        most agree
>>        will support arguments:
>>        @register(rsp) int x;
>>
>>
>>    Choosing registers is something the compiler is better at than us
>>    most of the time.
>>
>>    For this very reason, I think we want to go in the exact opposite
>>    direction : asm with compiler choosen register when possible.
>>
>>
>> ...I think you've missed the entire point of my suggestion. But that's okay. I give up ;)
>>
>
> We presented you example code where your approach isn't going to do the trick. You are free to ignore them.
>

No, the entire point of my suggestion IS to allow seamless mixing with
conventional code, which includes compiler register assignment.
The main problem with IA is it's interference with the optimiser, and it's
inability to make automatic register selection.

Walter claimed push/pop intrinsics wouldn't work due to alignment issues,
but I think that's a moot argument, since it's identical to writing your
code in asm anyway. If the asm works, then it'll work using an intrinsic
exactly the same.
The neat bonus is, you can interleave it with structured code, any
non-critical variables can be automatically assigned by the compiler as
usual... and if the compiler feels comfortable to reorder the code, it can
do so.


June 20, 2012
On 20 June 2012 09:32, Tobias Pankrath <tobias@pankrath.net> wrote:
>> Inline assembly has been relatively useless in GCC for years. Inline asm
>> interferes with the optimisers ability to do a good job, which basically
>> makes use of inline assembly self-defeating.
>> The only time I ever need to use inline-asm is to interface an arch
>> feature
>> that has no API. As long as there are intrinsics for all the opcodes one
>> might want, then it's better to use them.
>
>
>> That said, as stated above, if use of this stuff is for performance, then using an inline-asm block will ruin the surrounding code anyway,
>
>
> Could someone explain to me, why inline asm screws up the optimizer? My naive view on the matter is, that the optimizer has full knowledge of what is going on regardless of whether intrinsics or asm is used. I could also think of an optimizer that optimizes inline asm, too. For example by reassigning registers etc.
>
>

Actually, the compiler has little knowledge of what the assembly does at all, other than the input/output constraints, and what gets registers get clobbered.  Which is enough for the compiler to know how to avoid stepping on your toes when trying to work around it.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 20, 2012
On 20 June 2012 03:58, Walter Bright <newshound2@digitalmars.com> wrote:

>    Do a grep for "asm" across the druntime library sources. Can you
>> justify all
>>    of that with some other scheme?
>>
>>
>> I think almost all the blocks I just browsed through could be easily
>> written
>> with nothing more than the register alias feature I suggested, and
>> perhaps a
>> couple of opcode intrinsics.
>>
>
> But I see nothing gained by that.


The gain is that by not using IA, the compiler could much better optimise and inline your code. Your code is likely more readable by more people. Also, since Iain is proposing removing the inline assembler from GDC, it's clearly hard to maintain across different compilers. A higher level language defined construct may be simpler...


And as a bonus, they would also be readable.
>>
>
> I don't agree. The point of IA to me is so I can specify exactly what I want. If I wanted to do it at a higher level, I'd use normal D syntax.


In many cases, you need to write a big block of asm to do one single operation that's not expressible at the higher level... and in my experience, most of the time, that operation is addressing a register directly; most commonly, dealing with the stack pointer, or argument registers direcetly.


I can imagine cases where the
>> optimiser would have more freedom too.
>>
>
> But if I'm writing IA, I want to do it my way. Not the optimizer's way, which may or may not be able to give me what I want.
>

I think you typically want to do one very small detail your way, the rest
of the function, let the optimiser make it the best of.
The result is very much comparable to the use of intrinsics in high level
code.


Yes. C has a register keyword, and nobody uses it anymore. The troubles are
>> many, starting with people always "register"ed the wrong variables, and it really didn't work out too well when compilers started doing live range register assignments. It's ignored by modern C compilers, and hasn't been carried forward into other languages.
>>
>
You miss the point of the suggestion; as a mechanism to directly address particular registers in high level code, allowing you do eliminate many small asm blocks. C's failing is unrelated, the goal was totally different.


Really? I've never seen that. What about it was fail?
>>
>
> It's actually in DMC, believe it or not. It was a giant failure because nobody used it. It was in Borland's TurboC, too. It pretty much just throws a wrench into the gears of more sophisticated code generators.
>

I'm not surprised nobody used it in a niche compiler like DMC, especially when it's not supported by major compilers like GCC or MSC... It's not a feature of C, so most people wouldn't ever consider it, or even realise it's possible.

Of course it throws a gear in the works, it's a reasonably complex feature,
but IA blocks themselves throw an equally large (and rather similar) gear
in the works. The most naive implementation could probably do precisely
what IA does, that is, to stop reordering across the IA block.
That should be just as safe when using intrinsics or explicit register
aliasing as it is with inline asm. And that's only a start, I think the
compiler could do better with time.
The compiler doesn't have much opportunity for improvement with IA, unless
the compiler attempts to understand the IA block, which is in a totally
different language, and architecture specific. Well defined high-level
constructs help the compiler with the understanding it needs to do a
good/safe job.
It's the same logic that supports opcode intrinsics, which became almost
universally preferred to IA in appropriate situations, and are an
undeniable success.


   I really don't understand preferring all these rather convoluted
>>    enhancements to avoid something simple and straightforward like the
>> inline
>>    assembler. The use of IA in the D runtime library, for example, has
>> been
>>    quite successful.
>>
>>
>> I agree, IA is useful and has been successful, but it has drawbacks too.
>>   * IA ruins optimisation around the IA block
>>
>
> dmd's optimizer is not so sensitive to that.


How can you safely reorder across an IA block? Is there a well defined
mechanism to determine it's safe?
GCC has been failing at that forever. It takes a very conservative approach.
I guess the main problem is because GCC doesn't attempt to understand the
asm block, it just pastes it in the output.


This one seems trivial, you just need one intrinsic:
>>
>>   size_t reqsize = size * newcapacity;
>>   __jc(&Loverflow);
>>
>
> That's highly risky. The optimizer knows nothing at all about the state of the flags register, and does not take into account a dependency on the C flag when doing code motion. Nor would the compiler guarantee that the C flag is even set by however it chose to do the previous multiply (for example, the LEA instruction is often used to do multiplies, which leaves the C flag untouched. Oops!). Nothing connects the __jc intrinsic to that multiply operation.


True, but you could also perform the multiply explicitly with another
intrinsic.
This reordering problem is perhaps the most difficult issue, but not
necessarily insurmountable. And it's only really relevant where explicit
interaction with the flags are involved.
I suspect it wouldn't be too much trouble to make that intrinsic encode
some information that fuses it with the preceding operation as written in
the source.
Alternatively use a __noreorder {} scope block or something surrounding the
mul and jc..
Another possibility might be to make the intrinsic combine both operations
as a compound: if(__mul_getc(T a, T b, ref in T res)) goto blah; // <-
eliminates the need to take the address of a label
There are lots of different approaches, I'm sure an elegant solution is
possible.


 Although it depends on a '&codeLabel' mechanism to get the label address
>> (GCC
>> supports this in C, I'd love to see this in D too).
>>
>
> Note that supporting such will wind up disabling a lot of the data flow analysis, which is not set up to handle unknown edges between basic blocks.
>

No doubt, but it only affects code where that operation appears, which would be rather rare.


To summarize, I see a lot of complex new features, a significant rewrite of
> the optimizer, and a rewrite of a lot of existing code, and at the end of all that we're pretty much at the same state we are at now.
>

I agree, it's not trivial. It was just something to think about.
It's not quite the same place. The examples that have come up here are
relatively trivial, so it doesn't add so much to those. It would add an
awful lot to larger uses of asm, where it's really nice to be able to mix
the explicit pseudo-asm code with regular automatic register assignments,
and use of standard control structures (if/for/etc)