Jump to page: 1 2
Thread overview
-O2/-O3 Optimization bug?
Jan 22, 2014
Mike
Jan 22, 2014
Mike
Jan 22, 2014
Iain Buclaw
Jan 23, 2014
Mike
Jan 22, 2014
Johannes Pfau
Jan 22, 2014
Iain Buclaw
Jan 23, 2014
Johannes Pfau
Jan 22, 2014
Mike
Jan 23, 2014
Mike
Jan 23, 2014
Johannes Pfau
Jan 24, 2014
Mike
Jan 24, 2014
Johannes Pfau
January 22, 2014
Hello again,

I'm continuing my work on an ARM Cortex-M port of the D Runtime.  I now have a repository (https://github.com/JinShil/D_Runtime_ARM_Cortex-M_study) and a wiki (https://github.com/JinShil/D_Runtime_ARM_Cortex-M_study/wiki/1.0-Introduction) for anyone interested.  I'm doing my best to document the entire process.

I tried playing with GDC/GCC optimizations recently, and noticed that it breaks the following simple code from my "Hello World" experiment (http://wiki.dlang.org/Extremely_minimal_semihosted_%22Hello_World%22)

void OnReset()
{
  while(true)
  {
    // Create semihosting message
    uint[3] message =
      [
	2, 			      //stderr
	cast(uint)"hello\r\n".ptr,    //ptr to string
	7                             //size of string
      ];

    //Send semihosting command
    SendCommand(0x05, &message);
  }
}

Compiling with...
  arm-none-eabi-gdc -O1 start.d -o start.o
... works fine, but compiling with...
  arm-none-eabi-gdc -O2 start.d -o start.o
... or ...
  arm-none-eabi-gdc -O3 start.d -o start.o
... does not.

I traced this down to the -finline-small-functions and -fipa-cp-clone options, so if I compile with...
  arm-none-eabi-gdc -O2 -fno-inline-small-functions start.d -o start.o
... or ...
  arm-none-eabi-gdc -O3 -fno-inline-small-functions -fno-ipa-cp-clone start.d -o start.o
... it works fine.

Comparing the assembly generated with...
  arm-none-eabi-gdc -O1 start.d -o start.o
... and ...
  arm-none-eabi-gdc -O2 start.d -o start.o
... I can see that the "hello\r\n" string constant vanishes from the assembly file with the -O2 option.

"So what's the question, Mike?" I hear you say:
1.  Is this just one of the consequences of using -O2/-O3, and I should just suck it up and deal with it?
2.  Is this potentially a bug in the GCC backend?
3.  Is this potentially a bug in GDC or the DMD frontend?

Thanks for the help,
Mike
January 22, 2014
On Wednesday, 22 January 2014 at 00:28:34 UTC, Mike wrote:
>
> "So what's the question, Mike?" I hear you say:
> 1.  Is this just one of the consequences of using -O2/-O3, and I should just suck it up and deal with it?
> 2.  Is this potentially a bug in the GCC backend?
> 3.  Is this potentially a bug in GDC or the DMD frontend?
>

I always forget to add the most important piece of information:
I'm using the GDC 4.8 branch back-ported at the beginning of the year compiled for arm-none-eabi.

January 22, 2014
On 22 January 2014 00:28, Mike <none@none.com> wrote:
> Hello again,
>
> I'm continuing my work on an ARM Cortex-M port of the D Runtime.  I now have
> a repository (https://github.com/JinShil/D_Runtime_ARM_Cortex-M_study) and a
> wiki
> (https://github.com/JinShil/D_Runtime_ARM_Cortex-M_study/wiki/1.0-Introduction)
> for anyone interested.  I'm doing my best to document the entire process.
>
> I tried playing with GDC/GCC optimizations recently, and noticed that it breaks the following simple code from my "Hello World" experiment (http://wiki.dlang.org/Extremely_minimal_semihosted_%22Hello_World%22)
>
> void OnReset()
> {
>   while(true)
>   {
>     // Create semihosting message
>     uint[3] message =
>       [
>         2,                            //stderr
>         cast(uint)"hello\r\n".ptr,    //ptr to string
>         7                             //size of string
>       ];
>
>     //Send semihosting command
>     SendCommand(0x05, &message);
>   }
> }
>
> Compiling with...
>   arm-none-eabi-gdc -O1 start.d -o start.o
> ... works fine, but compiling with...
>   arm-none-eabi-gdc -O2 start.d -o start.o
> ... or ...
>   arm-none-eabi-gdc -O3 start.d -o start.o
> ... does not.
>
> I traced this down to the -finline-small-functions and -fipa-cp-clone
> options, so if I compile with...
>   arm-none-eabi-gdc -O2 -fno-inline-small-functions start.d -o start.o
> ... or ...
>   arm-none-eabi-gdc -O3 -fno-inline-small-functions -fno-ipa-cp-clone
> start.d -o start.o
> ... it works fine.
>
> Comparing the assembly generated with...
>   arm-none-eabi-gdc -O1 start.d -o start.o
> ... and ...
>   arm-none-eabi-gdc -O2 start.d -o start.o
> ... I can see that the "hello\r\n" string constant vanishes from the
> assembly file with the -O2 option.
>
> "So what's the question, Mike?" I hear you say:
> 1.  Is this just one of the consequences of using -O2/-O3, and I should just
> suck it up and deal with it?
> 2.  Is this potentially a bug in the GCC backend?
> 3.  Is this potentially a bug in GDC or the DMD frontend?
>

Personally, I would never use -O3 for low level start.o kernel stuff. As you are coding on a small board, wouldn't you instead use -Os ?
January 22, 2014
Am Wed, 22 Jan 2014 00:28:32 +0000
schrieb "Mike" <none@none.com>:

> 
> "So what's the question, Mike?" I hear you say:
> 1.  Is this just one of the consequences of using -O2/-O3, and I
> should just suck it up and deal with it?
> 2.  Is this potentially a bug in the GCC backend?
> 3.  Is this potentially a bug in GDC or the DMD frontend?
> 
> Thanks for the help,
> Mike

I can only guess, but this looks like another 'volatile' problem. You'd have to post the ASM of the optimized version somewhere, and probably the output of -fdump-tree-optimized for the optimized version.

But anyway, I guess it inlines 'SendCommand' and then thinks you're not using the message and probably completely optimizes the call away. Then it sees you're never using message and removes the rest of your code. If SendCommand was written in D you'd have to mark the target of the copy volatile (or shared).

But I'm not sure how this applies to the inline asm though. In C you have asm volatile, but I never used that. This answer seems to state that you have to use asm volatile: http://stackoverflow.com/a/5057270/471401



So the questions for Iain:
 * should we mark all inline ASM blocks as volatile?
 * shared can't replace volatile in this case as `shared asm{...}`
   isn't valid
 * Should we add some GDC specific way to mark extended ASM blocks as
   volatile? As DMD doesn't optimize ASM blocks at all there's probably
   no need for a standard solution?
January 22, 2014
On 22 January 2014 15:03, Johannes Pfau <nospam@example.com> wrote:
> Am Wed, 22 Jan 2014 00:28:32 +0000
> schrieb "Mike" <none@none.com>:
>
>>
>> "So what's the question, Mike?" I hear you say:
>> 1.  Is this just one of the consequences of using -O2/-O3, and I
>> should just suck it up and deal with it?
>> 2.  Is this potentially a bug in the GCC backend?
>> 3.  Is this potentially a bug in GDC or the DMD frontend?
>>
>> Thanks for the help,
>> Mike
>
> I can only guess, but this looks like another 'volatile' problem. You'd have to post the ASM of the optimized version somewhere, and probably the output of -fdump-tree-optimized for the optimized version.
>
> But anyway, I guess it inlines 'SendCommand' and then thinks you're not using the message and probably completely optimizes the call away. Then it sees you're never using message and removes the rest of your code. If SendCommand was written in D you'd have to mark the target of the copy volatile (or shared).
>
> But I'm not sure how this applies to the inline asm though. In C you have asm volatile, but I never used that. This answer seems to state that you have to use asm volatile: http://stackoverflow.com/a/5057270/471401
>
>
>
> So the questions for Iain:
>  * should we mark all inline ASM blocks as volatile?
>  * shared can't replace volatile in this case as `shared asm{...}`
>    isn't valid
>  * Should we add some GDC specific way to mark extended ASM blocks as
>    volatile? As DMD doesn't optimize ASM blocks at all there's probably
>    no need for a standard solution?

We already do (ExtAsmStatement::toIR -> ASM_VOLATILE_P (exp) = 1;)

Regards
Iain
January 22, 2014
On Wednesday, 22 January 2014 at 15:03:49 UTC, Johannes Pfau wrote:
> Am Wed, 22 Jan 2014 00:28:32 +0000
> schrieb "Mike" <none@none.com>:
>
>> 
>> "So what's the question, Mike?" I hear you say:
>> 1.  Is this just one of the consequences of using -O2/-O3, and I should just suck it up and deal with it?
>> 2.  Is this potentially a bug in the GCC backend?
>> 3.  Is this potentially a bug in GDC or the DMD frontend?
>> 
>> Thanks for the help,
>> Mike
>
> I can only guess, but this looks like another 'volatile' problem. You'd
> have to post the ASM of the optimized version somewhere, and probably
> the output of -fdump-tree-optimized for the optimized version.
>
> But anyway, I guess it inlines 'SendCommand' and then thinks you're not
> using the message and probably completely optimizes the call away. Then
> it sees you're never using message and removes the rest of your code.
> If SendCommand was written in D you'd have to mark the target of the
> copy volatile (or shared).
>
> But I'm not sure how this applies to the inline asm though. In C you
> have asm volatile, but I never used that. This answer seems to state
> that you have to use asm volatile:
> http://stackoverflow.com/a/5057270/471401
>
>
>
> So the questions for Iain:
>  * should we mark all inline ASM blocks as volatile?
>  * shared can't replace volatile in this case as `shared asm{...}`
>    isn't valid
>  * Should we add some GDC specific way to mark extended ASM blocks as
>    volatile? As DMD doesn't optimize ASM blocks at all there's probably
>    no need for a standard solution?

Thanks for the response, Johannes.  Defining message as "shared uint[3] message" and defining SendMessage as "void SendCommand(int command, shared void* message)" did the trick.
January 23, 2014
On Wednesday, 22 January 2014 at 13:08:53 UTC, Iain Buclaw wrote:
> Personally, I would never use -O3 for low level start.o kernel stuff.

In my simple D program, however, -O2 also doesn't work.

> As you are coding on a small board, wouldn't you instead use -Os ?

I sometimes use -Os and sometimes use -O2/-O3.

If I'm controlling something low-speed like a refrigerator or other kitchen appliance, I use -Os so I can use the cheapest chip available.

However, for my current project, I'm making and HMI/Industrial controller.  The HMI will have a software rendered graphics engine with vector graphics, alpha blending, TrueType fonts, etc... I've already built this in C++, and the -O2/-O3 was very significant in my performance benchmarks.  I didn't notice any difference between -O2 and -O3, though.  It uses about 700KB of Flash memory, and most of that is the TrueType font data, so I'm quite satisfied with my C++ results.

Interestingly, since I started using GCC 4.8 in my C++ project, -O3 breaks my memset function, but -O2 does not, so I'm sticking with -O2 at the moment.  With GCC 4.7, -O3 worked fine.

If you can't see any error in my D code and the compiler and optimizer are working properly, shouldn't my program work at these optimization levels without resorting to special qualifiers like shared/volatile?

NOTE: I'll post assembly and the optimization tree when I get home from work today.
January 23, 2014
On Wednesday, 22 January 2014 at 15:03:49 UTC, Johannes Pfau wrote:
> I can only guess, but this looks like another 'volatile' problem. You'd
> have to post the ASM of the optimized version somewhere, and probably
> the output of -fdump-tree-optimized for the optimized version.

Here's the output with -fdump-tree-optimized
*******
;; Function start.OnReset (OnReset, funcdef_no=1, decl_uid=3544, cgraph_uid=1) (executed once)

start.OnReset ()
{
  uint message[3];

  <bb 2>:

  <bb 3>:
  __asm__ __volatile__("mov r0, %[cmd];
       mov r1, %[msg];
       bkpt #0xAB" :  : "cmd" "r" 5, "msg" "r" &message : "r0", "r1", "r1");

  <bb 4>:
  goto <bb 3>;

}

;; Function start.SendCommand (_D5start11SendCommandFiPvZv, funcdef_no=0, decl_uid=3545, cgraph_uid=0)

start.SendCommand (int command, void * message)
{
  <bb 2>:
  __asm__ __volatile__("mov r0, %[cmd];
       mov r1, %[msg];
       bkpt #0xAB" :  : "cmd" "r" command_1(D), "msg" "r" message_2(D) : "r0", "r1", "r1");
  return;

}
*******

Here's the output of the unoptimized version
*******
;; Function start.SendCommand (_D5start11SendCommandFiPvZv, funcdef_no=0, decl_uid=3545, cgraph_uid=0)

start.SendCommand (int command, void * message)
{
  <bb 2>:
  __asm__ __volatile__("mov r0, %[cmd];
       mov r1, %[msg];
       bkpt #0xAB" :  : "cmd" "r" command_1(D), "msg" "r" message_2(D) : "r0", "r1", "r1");
  return;

}

;; Function start.OnReset (OnReset, funcdef_no=1, decl_uid=3544, cgraph_uid=1)

start.OnReset ()
{
  uint message[3];
  <unnamed type> D.3562;
  <unnamed type> _1;

  <bb 2>:
  message = *.LC1;

  <bb 3>:
  _1 = 0;
  if (_1 != 0)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:
  start.SendCommand (5, &message);
  goto <bb 3>;

  <bb 5>:
  message ={v} {CLOBBER};
  return;

}
********


The __asm__ __volatile__ seems to indicate Iain is right.  Notice the message = *.LC1 in the unoptimized version, but not the optimized version.  This is the first time I've seen this kind of output, so can you decipher what's going on?


And here's the optimized assembly.  I'm not sure how to do this, so I used -fverbose-asm -Wa,-adhln
**********
http://pastebin.com/NY2PNWzS
**********


And here's the unoptimized assembly for comparison
**********
http://pastebin.com/hbThtCsP
**********


Thanks for taking the time.
Mike


January 23, 2014
Am Thu, 23 Jan 2014 11:30:41 +0000
schrieb "Mike" <none@none.com>:

> 
> The __asm__ __volatile__ seems to indicate Iain is right.  Notice the message = *.LC1 in the unoptimized version, but not the optimized version.  This is the first time I've seen this kind of output, so can you decipher what's going on?

We can see a few things in that output:
 * The function really got inlined
 * The ASM is still there and marked as volatile
 * In the optimized version, the ubyte[3] message
   variable is still there, but it's not initialized
   (message =*.LC1 is pseudo code for 'initialize message on the
   stack with the data stored at .LC1')

> And here's the optimized assembly.  I'm not sure how to do this,
> so I used -fverbose-asm -Wa,-adhln
> **********
> http://pastebin.com/NY2PNWzS
> **********

Nice, I always used -S but this output is better of course ;-)


I think what could be happening here is that GCC doesn't know what memory you're accessing via the message pointer in SendCommand.

See http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
and search for "If your assembler instructions access memory in an
unpredictable fashion"

Maybe typing "message" as uint* or uint[3]* instead of void* is already good enough. Otherwise try using a memory input as described on that page.
January 23, 2014
Am Wed, 22 Jan 2014 16:49:54 +0000
schrieb Iain Buclaw <ibuclaw@gdcproject.org>:

> 
> We already do (ExtAsmStatement::toIR -> ASM_VOLATILE_P (exp) = 1;)
> 
> Regards
> Iain

I guess I should have looked that up before posting wild speculations ;-)
« First   ‹ Prev
1 2