Inlining (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » Inlining (page 2)

May 04, 2003

Posted by Scott Wood
in reply to Walter

Scott Wood

Posted in reply to Walter

On Sat, 3 May 2003 14:38:10 -0700, Walter <walter@digitalmars.com> wrote:
> "Scott Wood" <scott@buserror.net> wrote in message news:slrnbaoctt.3jp.scott@ti.buserror.net...
>> It'd still be nice to have a way of explicitly saying that a function either must or must not be inlined.  For example, the dynamic linker in the GNU libc will break if certain functions are not inlined, because the relocation has not yet been done.  The schedule() function in Linux will break on sparc (and perhaps some other platforms) if it is inlined, if you switch to a task that entered the scheduler via a different containing function.
> 
> I suspect those functions are heavilly dependent on how a *particular* compiler generates code for that.

Not particularly, at least in the case of the scheduler.  The scheduler's only concern with inlining is that it the destination thread doesn't resume in the wrong inlined instance.  The inline assembly is non-portable as well, but only because inline assembly is not part of C.

> Depending on that is going outside of the language definition.

That depends on what the language definition is. :-)

> It makes successful operation of the code overly sensitive to particular compiler versions, etc. (Some linux kernel developers are open about the kernel code being heavilly dependent on how a particular revision of GCC generates code.)

Some bits have been, but it's mainly been due to Linux developers ignoring GCC's own rules for things like inline assembly constraints, or making assumptions about weird stuff like "inline" assembly outside of any function.

> Those things are what the inline assembler is for, and D has very strong support for inline assembler.

How do you use the inline assembler to tell the compiler not to inline a certain function written in D, not assembly?

> The C language itself has no support at all
> for inline assembler, and GCC's support for it is very weak and error-prone
> (for example, there's an arcane syntax you have to add to say which
> registers were read and which were written by each asm block - get that
> wrong, and your code will behave unpredictably. D, on the other hand, keeps
> track of that automatically).

Is there a way in D inline assembly to ask for a temporary register without mandating a specific one?  How about specifying clobbers that aren't explicitly in the code, such as when calling a function with an unusual calling convention, or when switching threads?

Also, one of the example code sequences is this:

    void *pc;
    asm
    {
        call L1             ;
     L1:                    ;
        pop EBX             ;
        mov pc[EBP],EBX     ;       // pc now points to code at L1
    }

Why do you need to specify EBP when accessing pc?  Shouldn't the compiler know what the best way to access pc is?  It might want to get rid of the frame pointer, or it might want to keep it around in a register for use after the asm block, etc.

GCC's inline assembly also has the sometimes desirable attribute that the compiler doesn't touch the instructions you specify, other than to schedule the block and substitute the things you asked it to. Will a D compiler be allowed to stick code in the middle of it, in order to satisfy symbolic references, or to schedule instructions? Is it allowed to optimize away mov instructions if it can get the data there on its own?  Can it move memory accesses across the asm block?

Usually, those sorts of things would be beneficial, but there should be a way to tell it not to do it.

-Scott

May 07, 2003

Posted by Walter
in reply to Scott Wood

Walter

Posted in reply to Scott Wood

"Scott Wood" <scott@buserror.net> wrote in message news:slrnbbalqd.ud.scott@ti.buserror.net...
> Some bits have been, but it's mainly been due to Linux developers ignoring GCC's own rules for things like inline assembly constraints, or making assumptions about weird stuff like "inline" assembly outside of any function.
> > Those things are what the inline assembler is for, and D has very strong support for inline assembler.
> How do you use the inline assembler to tell the compiler not to inline a certain function written in D, not assembly?

The compiler does not optimize inline assembly that you write. Therefore, if you use the inline assembler to call a function, that function won't be inlined.


> > The C language itself has no support at all
> > for inline assembler, and GCC's support for it is very weak and
error-prone
> > (for example, there's an arcane syntax you have to add to say which registers were read and which were written by each asm block - get that wrong, and your code will behave unpredictably. D, on the other hand,
keeps
> > track of that automatically).
> Is there a way in D inline assembly to ask for a temporary register without mandating a specific one?

No. The idea is "what you write is what you get" with the inline assembler.


>  How about specifying clobbers that
> aren't explicitly in the code, such as when calling a function with
> an unusual calling convention, or when switching threads?

Called functions must follow the normal register saving convention. If it is an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.


> Also, one of the example code sequences is this:
>     void *pc;
>     asm
>     {
>         call L1             ;
>      L1:                    ;
>         pop EBX             ;
>         mov pc[EBP],EBX     ;       // pc now points to code at L1
>     }
> Why do you need to specify EBP when accessing pc?  Shouldn't the
> compiler know what the best way to access pc is?  It might want to
> get rid of the frame pointer, or it might want to keep it around in a
> register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline assembler is used, because the results of the inline assembler shouldn't be affected by whether optimization is on or off. If you want, though, you can use the 'naked' pseudo-op and write the entire function in assembler, and what you write is what you get.


> GCC's inline assembly also has the sometimes desirable attribute that the compiler doesn't touch the instructions you specify, other than to schedule the block and substitute the things you asked it to. Will a D compiler be allowed to stick code in the middle of it, in order to satisfy symbolic references, or to schedule instructions? Is it allowed to optimize away mov instructions if it can get the data there on its own?  Can it move memory accesses across the asm block?

The D compiler does not schedule, move around, optimize, or alter the inline assembler instructions. The assumption is that if the programmer is going to use inline assembler, the programmer knows exactly what he wants, and will write it that way. What you write is what you get.

> Usually, those sorts of things would be beneficial, but there should be a way to tell it not to do it.

I guess I'm philosophically opposed to such things. I much prefer the straightforward approach of inline assembler that what you write is what you get. I also find it odd that gcc provides such things, yet still requires me to specify which registers were read/written for the simplest inline asm.

May 08, 2003

Posted by Scott Wood
in reply to Walter

Scott Wood

Posted in reply to Walter

On Wed, 7 May 2003 11:11:40 -0700, Walter <walter@digitalmars.com> wrote:
> The compiler does not optimize inline assembly that you write. Therefore, if you use the inline assembler to call a function, that function won't be inlined.

I suppose, though it'd be a little awkward to use the assembler just to call a function without it being inlined.

Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly.  I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization.  Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.

>>  How about specifying clobbers that
>> aren't explicitly in the code, such as when calling a function with
>> an unusual calling convention, or when switching threads?
> 
> Called functions must follow the normal register saving convention. If it is an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.

Which would defeat the purpose of using a special convention.  For example, on a mutex implementation, one might want to make the contented case call a function that saves all registers, so that the common case doesn't have to spill any registers (other than whatever's need to test the mutex).

Thread switching would also be slower on architectures with a reasonable number of registers if you have to manually save all of them just because you can't tell the compiler to save (or reconstruct) the 2 or 3 it might still care about.

BTW, will there be any way to tell the inline assembler to put some code out-of-line?  Something like:

inline int lock_mutex(Mutex m)
{
   int new = whatever_goes_in_there;

   asm {
      eax = 0;  /* This tells the compiler to get a zero into eax,
                   in whatever way it chooses.  Maybe the caller
                   (which is inlining this function) had one lying
                   around in a register, and it can now choose to use
                   eax for that variable. */
      lock; cmpxchg [m.lock], new;
      jz failed;

      outofline {
         failed: /* I hope this label isn't visible outside of this
                    instantiation of this assembly block... */
            push ecx;
            push edx;
            call handle_failed;
            pop edx;
            pop ecx;
            return; /* This tells the compiler to exit the assembly
                       block.  Alternatively, a return label could
                       be declared. */
      }

      /* Tell the compiler that these registers were not, in fact,
         clobbered.  It can't assume it automatically, though, since
         it has no idea what handle_failed might be doing to those
         values on the stack.  Or, to save space, I may have buried
         those pushes into a wrapper assembly function instead,
         where the compiler probably won't see them. */

      noclobber ecx, edx;

      /* Tell the compiler that, since this thing acts as a mutex,
         no memory accesses can be reordered across it.  It's
         probably not necessary in this case, though, as it contains
         a function call. */

      clobber memory;
   }
}

>> Also, one of the example code sequences is this:
>>     void *pc;
>>     asm
>>     {
>>         call L1             ;
>>      L1:                    ;
>>         pop EBX             ;
>>         mov pc[EBP],EBX     ;       // pc now points to code at L1
>>     }
>> Why do you need to specify EBP when accessing pc?  Shouldn't the
>> compiler know what the best way to access pc is?  It might want to
>> get rid of the frame pointer, or it might want to keep it around in a
>> register for use after the asm block, etc.
> 
> The compiler doesn't do frame pointer optimization when the inline assembler is used, because the results of the inline assembler shouldn't be affected by whether optimization is on or off.

But it wouldn't affect the results, if the compiler handles the assignment to pc rather than the programmer.  And what if I move to a compiler that *never* uses frame pointers?  The code is now broken, because I had to make an assumption about what the compiler was doing with its registers.

Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?

> If you want, though, you can use the
> 'naked' pseudo-op and write the entire function in assembler, and what you
> write is what you get.

Yes, but you can get that by using an external assembler as well. The point of inline assembly is to, well, be inline. :-)

> The D compiler does not schedule, move around, optimize, or alter the inline assembler instructions. The assumption is that if the programmer is going to use inline assembler, the programmer knows exactly what he wants, and will write it that way. What you write is what you get.

The problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make.  GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions.  Removing the ability of the compiler to make the decisions will lead to slower code.

> I guess I'm philosophically opposed to such things. I much prefer the straightforward approach of inline assembler that what you write is what you get. I also find it odd that gcc provides such things, yet still requires me to specify which registers were read/written for the simplest inline asm.

It's not really that odd, seeing as it needs those features to make up for its inability to parse the assembly code itself.  However, those features end up granting the programmer more power than what they replace.

-Scott

May 08, 2003

Posted by Walter
in reply to Scott Wood

Walter

Posted in reply to Scott Wood

"Scott Wood" <scott@buserror.net> wrote in message news:slrnbbjfj0.1a2.scott@ti.buserror.net...
> On Wed, 7 May 2003 11:11:40 -0700, Walter <walter@digitalmars.com> wrote:
> I suppose, though it'd be a little awkward to use the assembler just
> to call a function without it being inlined.

I'd agree with that.

> Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly.  I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization.  Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.

I think that comes with the territory of using a high level language. If a particular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune it to the max, write it in inline assembler.

> >>  How about specifying clobbers that
> >> aren't explicitly in the code, such as when calling a function with
> >> an unusual calling convention, or when switching threads?
> > Called functions must follow the normal register saving convention. If
it is
> > an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.
> Which would defeat the purpose of using a special convention.  For example, on a mutex implementation, one might want to make the contented case call a function that saves all registers, so that the common case doesn't have to spill any registers (other than whatever's need to test the mutex).
>
> Thread switching would also be slower on architectures with a reasonable number of registers if you have to manually save all of them just because you can't tell the compiler to save (or reconstruct) the 2 or 3 it might still care about.
>
> BTW, will there be any way to tell the inline assembler to put some code out-of-line?  Something like:
>
> inline int lock_mutex(Mutex m)
> {
>    int new = whatever_goes_in_there;
>
>    asm {
>       eax = 0;  /* This tells the compiler to get a zero into eax,
>                    in whatever way it chooses.  Maybe the caller
>                    (which is inlining this function) had one lying
>                    around in a register, and it can now choose to use
>                    eax for that variable. */

The Digital Mars C++ compiler can do this, but after having that capability for 15 years it just never proved out to be very useful.

>       lock; cmpxchg [m.lock], new;
>       jz failed;
>
>       outofline {
>          failed: /* I hope this label isn't visible outside of this
>                     instantiation of this assembly block... */

Yes, it is visible outside. All labels are in one scope per function, including the inline asm labels.

>             push ecx;
>             push edx;
>             call handle_failed;
>             pop edx;
>             pop ecx;
>             return; /* This tells the compiler to exit the assembly
>                        block.  Alternatively, a return label could
>                        be declared. */

Exit the assembly block? I don't know what you mean by that.

>       }
>
>       /* Tell the compiler that these registers were not, in fact,
>          clobbered.  It can't assume it automatically, though, since
>          it has no idea what handle_failed might be doing to those
>          values on the stack.  Or, to save space, I may have buried
>          those pushes into a wrapper assembly function instead,
>          where the compiler probably won't see them. */
>
>       noclobber ecx, edx;

That might be a reasonable addition.

>       /* Tell the compiler that, since this thing acts as a mutex,
>          no memory accesses can be reordered across it.  It's
>          probably not necessary in this case, though, as it contains
>          a function call. */
>
>       clobber memory;

Unnecessary, as the inline assembler assumes memory is clobbered.

>    }
> }
>
> >> Also, one of the example code sequences is this:
> >>     void *pc;
> >>     asm
> >>     {
> >>         call L1             ;
> >>      L1:                    ;
> >>         pop EBX             ;
> >>         mov pc[EBP],EBX     ;       // pc now points to code at L1
> >>     }
> >> Why do you need to specify EBP when accessing pc?  Shouldn't the
> >> compiler know what the best way to access pc is?  It might want to
> >> get rid of the frame pointer, or it might want to keep it around in a
> >> register for use after the asm block, etc.
> >
> > The compiler doesn't do frame pointer optimization when the inline
assembler
> > is used, because the results of the inline assembler shouldn't be
affected
> > by whether optimization is on or off.
>
> But it wouldn't affect the results, if the compiler handles the assignment to pc rather than the programmer.  And what if I move to a compiler that *never* uses frame pointers?  The code is now broken, because I had to make an assumption about what the compiler was doing with its registers.

When using inline asm, you'll always run the risk of nonportability between compilers - after all, things like register conventions, calling conventions, etc., are not defined by the language. Only the syntax of the inline assembler is.


> Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?

Because the inline assembler assembles the code long before any register assignments are done.


> > If you want, though, you can use the
> > 'naked' pseudo-op and write the entire function in assembler, and what
you
> > write is what you get.
> Yes, but you can get that by using an external assembler as well. The point of inline assembly is to, well, be inline. :-)

I'm currently porting D to linux. Believe me, the inline assembler is a great boon to that. Just try converting MASM files to gas files! To me, using gas is like trying to write code looking in a mirror.


> > The D compiler does not schedule, move around, optimize, or alter the
inline
> > assembler instructions. The assumption is that if the programmer is
going to
> > use inline assembler, the programmer knows exactly what he wants, and
will
> > write it that way. What you write is what you get.
> The problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make.  GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions.  Removing the ability of the compiler to make the decisions will lead to slower code.

You are correct in the abstract. In my experience, I believe the difference to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.


> > I guess I'm philosophically opposed to such things. I much prefer the straightforward approach of inline assembler that what you write is what
you
> > get. I also find it odd that gcc provides such things, yet still
requires me
> > to specify which registers were read/written for the simplest inline
asm.
> It's not really that odd, seeing as it needs those features to make up for its inability to parse the assembly code itself.  However, those features end up granting the programmer more power than what they replace.

I understand what you're driving at. It is heavilly integrated in with how gcc parses, optimizes, and generates code. I don't think that's a good thing to put in a language spec, as it may unnecessarilly constrain how the compiler is built.

May 08, 2003

Posted by C
in reply to Walter

C

Posted in reply to Walter

Walter wrote:
> "Scott Wood" <scott@buserror.net> wrote in message
> news:slrnbbjfj0.1a2.scott@ti.buserror.net...

[-snip-]

>>            return; /* This tells the compiler to exit the assembly
>>                       block.  Alternatively, a return label could
>>                       be declared. */
> 
> 
> Exit the assembly block? I don't know what you mean by that.

If that means what I think is intended, should 'break' be more
approprate?

>>      }
>>
>>      /* Tell the compiler that these registers were not, in fact,
>>         clobbered.  It can't assume it automatically, though, since
>>         it has no idea what handle_failed might be doing to those
>>         values on the stack.  Or, to save space, I may have buried
>>         those pushes into a wrapper assembly function instead,
>>         where the compiler probably won't see them. */
>>
>>      noclobber ecx, edx;
> 
> 
> That might be a reasonable addition.

Agreed, though I would change the keyword, maybe 'retain' would be good,
or the list could be added to the assembler declaration ..

assembler: 'asm' '(' '!' noClobberList ')' '{' assemblerStatements '}'
	| 'asm' '{' assemblerStatements '}'
	;

noClobberList : regiterName ',' noClobberList
	| registerName
	;

such as ...

asm (! ecx, edx ) {
	xor eax, eax
	push ecx
	call myFunc;
}

This is efficient, but its meaning is not immediately clear.

C 2003/5/8

May 09, 2003

Posted by Scott Wood
in reply to Walter

Scott Wood

Posted in reply to Walter

On Thu, 8 May 2003 10:50:21 -0700, Walter <walter@digitalmars.com> wrote:
> "Scott Wood" <scott@buserror.net> wrote in message news:slrnbbjfj0.1a2.scott@ti.buserror.net...
>> Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly.  I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization.  Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.
> 
> I think that comes with the territory of using a high level language. If a particular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune it to the max, write it in inline assembler.

Except that in this case, using inline assembler would have made it worse.  The code was expecting to have the switch(constant) optimized away to just the relevant case.  Writing the containing functions in assembly would not have been realistic, as the to-be-inlined functions were used all over the source tree (they were used to move data to/from userspace).

>>       eax = 0;  /* This tells the compiler to get a zero into eax,
>>                    in whatever way it chooses.  Maybe the caller
>>                    (which is inlining this function) had one lying
>>                    around in a register, and it can now choose to use
>>                    eax for that variable. */
> 
> The Digital Mars C++ compiler can do this, but after having that capability for 15 years it just never proved out to be very useful.

It's a pretty small gain in this case, but what if it were a non-constant, that is almost guaranteed to be in some register before the asm statement?

>>       lock; cmpxchg [m.lock], new;
>>       jz failed;
>>
>>       outofline {
>>          failed: /* I hope this label isn't visible outside of this
>>                     instantiation of this assembly block... */
> 
> Yes, it is visible outside. All labels are in one scope per function, including the inline asm labels.

I was more worried about it being visible throughout the file (or caller of the inline function), like it would have been in GCC, since there's no support for find-the-first-one-in-a-given-direction labels.

>>             push ecx;
>>             push edx;
>>             call handle_failed;
>>             pop edx;
>>             pop ecx;
>>             return; /* This tells the compiler to exit the assembly
>>                        block.  Alternatively, a return label could
>>                        be declared. */
> 
> Exit the assembly block? I don't know what you mean by that.

Just a shortcut for declaring a new label at the end and branching there, which is a rather common construct (especially when using out-of-line sections).  I agree with "C" that break would be a better keyword, though.

>>       /* Tell the compiler that, since this thing acts as a mutex,
>>          no memory accesses can be reordered across it.  It's
>>          probably not necessary in this case, though, as it contains
>>          a function call. */
>>
>>       clobber memory;
> 
> Unnecessary, as the inline assembler assumes memory is clobbered.

It'd be nice if the language didn't force the compiler to do this in all cases, though.  For instance, it's not necessary when just reading timestamps, or making use of some fancy computational instruction for which the compiler doesn't have an intrinsic, or as a touch-up in a critical function that the compiler doesn't optimize well enough.  At the very least, "noclobber memory" should exist, but a compiler should also be allowed to look for itself.  If the compiler doesn't support this, it could always fall back on assuming "clobber memory" for everything.

> When using inline asm, you'll always run the risk of nonportability between compilers - after all, things like register conventions, calling conventions, etc., are not defined by the language. Only the syntax of the inline assembler is.

But would it not be better to reduce the potential sources of nonportability, by letting the programmer tell the compiler to handle certain details?  If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.

>> Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?
> 
> Because the inline assembler assembles the code long before any register assignments are done.

That's a compiler implementation detail.  Other compilers might not have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler).

If the compiler has to choose registers for the asm block in advance, it could just add the store instruction itself at the time it handles the inline assembly (in which case you get exactly the same code as you do now), or it could remember which register the asm block used and use that in the subsequent non-asm code.

>> The problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make.  GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions.  Removing the ability of the compiler to make the decisions will lead to slower code.
> 
> You are correct in the abstract. In my experience, I believe the difference to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.

It's a little harder when it's 30,000 lines out of a few million, and most of that needs to stay portable, so any assembler has to be buried in separate inline functions.  In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs.  The compiler is free to not implement them if it doesn't feel they're important.

-Scott

May 09, 2003

Posted by Walter
in reply to Scott Wood

Walter

Posted in reply to Scott Wood

"Scott Wood" <scott@buserror.net> wrote in message news:slrnbbluah.1cq.scott@ti.buserror.net...
> On Thu, 8 May 2003 10:50:21 -0700, Walter <walter@digitalmars.com> wrote:
> > "Scott Wood" <scott@buserror.net> wrote in message news:slrnbbjfj0.1a2.scott@ti.buserror.net...
> >> Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly.  I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization.  Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.
> > I think that comes with the territory of using a high level language. If
a
> > particular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune it
to
> > the max, write it in inline assembler.
> Except that in this case, using inline assembler would have made it worse.  The code was expecting to have the switch(constant) optimized away to just the relevant case.  Writing the containing functions in assembly would not have been realistic, as the to-be-inlined functions were used all over the source tree (they were used to move data to/from userspace).

I see the inline/not inline as a quality of implementation issue. The language design should specify semantics, and the semantics should not change if something is inlined or not. I want to allow the compiler writer to be as free as possible to innovate how D is implemented. Trying to specify exactly what optimizations are performed in the language spec can forestall that. Note that DMD has a compiler switch to turn inlining on or off.

> >>       eax = 0;  /* This tells the compiler to get a zero into eax,
> >>                    in whatever way it chooses.  Maybe the caller
> >>                    (which is inlining this function) had one lying
> >>                    around in a register, and it can now choose to use
> >>                    eax for that variable. */
> > The Digital Mars C++ compiler can do this, but after having that
capability
> > for 15 years it just never proved out to be very useful.
> It's a pretty small gain in this case, but what if it were a non-constant, that is almost guaranteed to be in some register before the asm statement?

It's not worth it. I have a lot of practice writing fast applications (DMC is the fastest compiler, and has been for 15 years).

> >>       /* Tell the compiler that, since this thing acts as a mutex,
> >>          no memory accesses can be reordered across it.  It's
> >>          probably not necessary in this case, though, as it contains
> >>          a function call. */
> >>       clobber memory;
> > Unnecessary, as the inline assembler assumes memory is clobbered.
> It'd be nice if the language didn't force the compiler to do this in all cases, though.  For instance, it's not necessary when just reading timestamps, or making use of some fancy computational instruction for which the compiler doesn't have an intrinsic, or as a touch-up in a critical function that the compiler doesn't optimize well enough.  At the very least, "noclobber memory" should exist, but a compiler should also be allowed to look for itself.  If the compiler doesn't support this, it could always fall back on assuming "clobber memory" for everything.

I misspoke. It doesn't do it in cases where none of the asm instructions could possibly modify memory.

> > When using inline asm, you'll always run the risk of nonportability
between
> > compilers - after all, things like register conventions, calling conventions, etc., are not defined by the language. Only the syntax of
the
> > inline assembler is.
> But would it not be better to reduce the potential sources of nonportability, by letting the programmer tell the compiler to handle certain details?  If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.

One thing I do in inline asm sometimes is muck with stack and the frame registers. The variable name gives me an offset as if I hadn't - I then adjust it as necessary.

> >> Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?
> > Because the inline assembler assembles the code long before any register assignments are done.
> That's a compiler implementation detail.  Other compilers might not have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler).

They may not have that restriction, yes, but I don't want to force the compiler to be built that way. I want to keep the bar low for building a basic spec compliant D compiler, while making it possible to build very advanced spec compliant ones.

> >> The problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make.  GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions.  Removing the ability of the compiler to make the decisions will lead to slower code.
> > You are correct in the abstract. In my experience, I believe the
difference
> > to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.
> It's a little harder when it's 30,000 lines out of a few million, and most of that needs to stay portable, so any assembler has to be buried in separate inline functions.  In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs.  The compiler is free to not implement them if it doesn't feel they're important.

If the compiler is free not to implement it, then it can't be part of the language spec. D doesn't preclude any vendors from adding extensions, though. Extensions are important as they're how new innovations get tried out. The good ones will wind up getting folded into D. I'm not sure what you mean by portable, as GCC's way of doing inline assembler is not portable to any other compiler. As far as I've been able to figure out (with google), most of it isn't even documented. I figured out how to use it by reading the kernel listings.

I'm currently in the process of building a linux version of D. It's pretty sweet to be able to take the inline asm code from win32 and recompile it under linux and it works just the same with no modification. That's a hopeless task if you're using separate asm files, or if you're using the inline assembler from a C compiler. I've even got obj2asm to work on elf files, so now you can disassemble .o files and see it in intel syntax!

P.S. How I write a whole function in hand-tuned asm is write it in C, compile it, disassemble it with obj2asm, cut & paste the code back into the C source in an asm block, and then tune.

May 10, 2003

Posted by Scott Wood
in reply to Walter

Scott Wood

Posted in reply to Walter

On Fri, 9 May 2003 01:23:17 -0700, Walter <walter@digitalmars.com> wrote:
> I see the inline/not inline as a quality of implementation issue.

For the default case, sure.  I'll wait until compilers have a full, working AI built in before I trust even the best compiler to *always* get it right, though.

> The language design should specify semantics, and the semantics should not change if something is inlined or not. I want to allow the compiler writer to be as free as possible to innovate how D is implemented. Trying to specify exactly what optimizations are performed in the language spec can forestall that.

I'm not suggesting that the language mandate certain optimizations; just that there be a standard way of communicating one's intentions to the compiler.  If the compiler doesn't support inlining at all, then fine, don't inline; however, if it does support it, it should pay attention to the programmer's request.

>> It's a pretty small gain in this case, but what if it were a non-constant, that is almost guaranteed to be in some register before the asm statement?
> 
> It's not worth it.

If there's no cost to it (as is the case with compilers which already implement such things, including GCC), then any optimization is worth it.  It doesn't make the language any harder to write a compiler for, as a compiler can choose to always interpret an assignment as a mov statement.

> I have a lot of practice writing fast applications (DMC
> is the fastest compiler, and has been for 15 years).

But how much do you need to use assembly in a compiler?  Take something like a kernel instead, which often needs to use assembly for various things, including the aforementioned copying of data between user and kernel.  This is done a lot, and saving a few cycles on every such occurance *does* show up in the benchmarks, especially since so many of them are just copying one or two words (making the overhead very visible).  Loading the value from userspace, then storing it on the stack, then loading it again immediately after the asm block is over will be noticeable.  If you're on anything but a non-regparm x86, add the cost of storing the user address to the stack (since it was passed in a register) and then loading it again.

The compiler will generally do these sorts of things for its own generated code; it doesn't strike me as a freak occurance for a compiler to allow the user access to the same thing when using inline assembly.

>> But would it not be better to reduce the potential sources of nonportability, by letting the programmer tell the compiler to handle certain details?  If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.
> 
> One thing I do in inline asm sometimes is muck with stack and the frame registers. The variable name gives me an offset as if I hadn't - I then adjust it as necessary.

If you can specify that the value must be in a register in the beginning and/or end of the block, you don't need to worry about the validity of the address in the middle of the block.

>> >> Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?
>> > Because the inline assembler assembles the code long before any register assignments are done.
>> That's a compiler implementation detail.  Other compilers might not have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler).
> 
> They may not have that restriction, yes, but I don't want to force the compiler to be built that way.

If the compiler isn't built that way, just act as if the user put a mov instruction there.  If the syntax allows the user to ask the compiler to choose the register, it can pick one arbitrarily if it's not capable of picking a good one.

>> It's a little harder when it's 30,000 lines out of a few million, and most of that needs to stay portable, so any assembler has to be buried in separate inline functions.  In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs.  The compiler is free to not implement them if it doesn't feel they're important.
> 
> If the compiler is free not to implement it, then it can't be part of the language spec.

The semantics behind what the programmer requests must be implemented; it's the optimization that the semantics allow that does not need to be there in simpler compilers.

> D doesn't preclude any vendors from adding extensions, though. Extensions are important as they're how new innovations get tried out. The good ones will wind up getting folded into D.

Sure.  However, this often leads to different compilers implementing the same feature in incompatible ways, requiring programs that want to use the feature to use lots of conditional compilation to remain semi-portable.

If the new feature would require significant effort to implement correctly (not necessarily efficiently), then I agree that it should stay out of the language unless it is demonstrated to be sufficiently useful (though it might sometimes be beneficial to formalize it into an optional yet standardized extension, so that if it is implemented, it's implemented in the same way).  However, some of these things could be implemented (poorly, but correctly and no worse than if the feature weren't used) with a sed script if one were so inclined.

> I'm not sure what you mean by portable, as GCC's way of doing inline assembler is not portable to any other compiler.

Intel's compiler claims to support GCC inline assembly on x86 (their IA64 compiler apparently doesn't support inline assembly at all). However, in general, the lack of portability of inline assembly between compilers for the same architecture is a bit annoying.

I was hoping that, with D's placing it into the language itself, it would cease to be an issue.  However, once extensions to the basic syntax are relied on, you're right back to the current state of incompatibility.

> As far as I've been able to figure out (with google), most of it isn't even documented. I figured out how to use it by reading the kernel listings.

It's documented in the GCC info pages.  Look for the "Extended Asm" node, as well as the section on constraints.

> I'm currently in the process of building a linux version of D. It's pretty sweet to be able to take the inline asm code from win32 and recompile it under linux and it works just the same with no modification. That's a hopeless task if you're using separate asm files,

Not really.  There are Intel-syntax assemblers for Linux (even gas can be told to use it now), and gas is available for Windows should one want to go the other way.

> or if you're using the inline assembler from a C compiler.

Unless you're using the same C compiler on both platforms.

> I've even got obj2asm to work on elf files, so now you can disassemble .o files and see it in intel syntax!

GNU objdump can do that as well, by passing "-m i386:intel".

> P.S. How I write a whole function in hand-tuned asm is write it in C, compile it, disassemble it with obj2asm, cut & paste the code back into the C source in an asm block, and then tune.

And do it over again every time the C code changes, or when a header it depends on changes (if you notice!).  Each time, doing it for every supported architecture.  It's still a useful technique for certain situations, but it's not a replacement for flexible inline assembly.

-Scott

May 10, 2003

Posted by Ilya Minkov
in reply to Walter

Ilya Minkov

Posted in reply to Walter

Walter wrote:

> I'm currently porting D to linux. Believe me, the inline assembler is a
> great boon to that. Just try converting MASM files to gas files! To me,
> using gas is like trying to write code looking in a mirror.

Why are you using GAS? You can use NASM (or maybe FASM) instead! Both use a (cleaned-up?) Intel-Syntax.

There have also been a number of converters NASM <-> GAS <-> MASM. And besides, the new GAS has been told to be able to use Intel-Syntax.

BTW, i didn't find a reliable way to use NASM with DigitalMars compilers for Windows. It has Borland format, but it somehow didn't work. I'll try to reproduce this problem someday later.

-i.

May 10, 2003

Posted by Nic Tiger
in reply to Ilya Minkov

Nic Tiger

Posted in reply to Ilya Minkov

I did find reliable way to use NASM with Digital Mars for Win32 and DOSX targets.

The problem is that common statement
    section .data
or
    section .code
in COFF and other formats is expanded to something line 'dword aligned
32-bit segment of code(or text)'

When the same statement is used for OBJ format, it is not treated as
pervious.
To make them identical, you should write
    section .code align=4 use32

As for DOSX target, the previous is not sufficient. You should write
    section _DATA class=DATA align=4 use32
or
    section _CODE class=CODE align=4 use32
And moreover, you should place somewhere directive
    group DGROUP _DATA
to tell linker to group data segment in this module with others.

The last described technique (I mean for DOSX target) is fully compatible
with Win32 target code.
I used this in order to compile XVID codec sources both for Win32 and DOSX
with DMC and it works.

BTW, with optimizations turned on C version of codec (when asm is not used) runs almost twice faster than not optimized one. I think DMC optimizer is cool!

Nic Tiger.

"Ilya Minkov" <midiclub@8ung.at> wrote in message news:b9j6pl$32m$1@digitaldaemon.com...
> Walter wrote:
>
> > I'm currently porting D to linux. Believe me, the inline assembler is a great boon to that. Just try converting MASM files to gas files! To me, using gas is like trying to write code looking in a mirror.
>
> Why are you using GAS? You can use NASM (or maybe FASM) instead! Both
> use a (cleaned-up?) Intel-Syntax.
>
> There have also been a number of converters NASM <-> GAS <-> MASM. And besides, the new GAS has been told to be able to use Intel-Syntax.
>
> BTW, i didn't find a reliable way to use NASM with DigitalMars compilers for Windows. It has Borland format, but it somehow didn't work. I'll try to reproduce this problem someday later.
>
> -i.
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation