Jump to page: 1 2
Thread overview
ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns)
Jul 21, 2017
Mike
Jul 22, 2017
Mike
Jul 22, 2017
Timo Sintonen
Jul 22, 2017
Johannes Pfau
Jul 22, 2017
Iain Buclaw
Jul 22, 2017
Mike
Jul 22, 2017
Johannes Pfau
Jul 22, 2017
Johannes Pfau
Jul 22, 2017
Iain Buclaw
Jul 22, 2017
Mike
Jul 22, 2017
Iain Buclaw
July 21, 2017
My stm32 demo has now been updated and working with GDC/GCC 7.1.0.  Thanks for all your improvements.

However, I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).

I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through.  No only because of address locations, but also different registers and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')

Is there anything I can do to provide more actionable information to help identify the underlying cause?

Mike
July 22, 2017
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

> I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
>
> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through.  No only because of address locations, but also different registers and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')

Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further.  Comparing the disassembly of the function I modified still showed quite a significant difference.

Working Binary
-------------
ldr	r2, [pc, #188]	; (8000c50 <hardwareInit+0x104>)
ldr	r1, [pc, #188]	; (8000c54 <hardwareInit+0x108>)
ldr	r3, [r2, #0]
and.w	r3, r3, #780	; 0x30c
orr.w	r3, r3, #37888	; 0x9400
movs	r0, #0
str	r3, [r2, #0]
strb	r0, [r1, #0]
;-------------------------------------------------------
nop                     ; My stategically placed nop
;-------------------------------------------------------
ldr	r3, [pc, #172]	; (8000c58 <hardwareInit+0x10c>)
ldr	r0, [pc, #176]	; (8000c5c <hardwareInit+0x110>)
ldr	r4, [pc, #176]	; (8000c60 <hardwareInit+0x114>)
ldr	r2, [pc, #180]	; (8000c64 <hardwareInit+0x118>)
movs	r1, #1
strb	r1, [r3, #0]
ldr	r3, [r0, #0]
orr.w	r3, r3, #49152	; 0xc000
str	r3, [r0, #0]
strb	r1, [r4, #0]

Not Working Binary
------------------
ldr	r0, [pc, #184]	; (8000c4c <hardwareInit+0x100>)
ldr	r1, [pc, #184]	; (8000c50 <hardwareInit+0x104>)
ldr	r2, [pc, #188]	; (8000c54 <hardwareInit+0x108>)
ldr	r3, [r1, #0]
ldr	r4, [pc, #188]	; (8000c58 <hardwareInit+0x10c>)
movs	r5, #0
strb	r5, [r0, #0]
movs	r0, #1
strb	r0, [r2, #0]
ldr	r2, [r4, #0]
ldr	r5, [pc, #180]	; (8000c5c <hardwareInit+0x110>)
orr.w	r2, r2, #49152	; 0xc000
and.w	r3, r3, #780	; 0x30c
str	r2, [r4, #0]
orr.w	r3, r3, #37888	; 0x9400
ldr	r2, [pc, #168]	; (8000c60 <hardwareInit+0x114>)
strb	r0, [r5, #0]
str	r3, [r1, #0]

By "Not Working" I mean this code gets stuck in the while loop

PWR.CR.ODEN.value = true;
while(!PWR.CSR.ODRDY.value) { }

This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware.  The documentation states:

To set or reset the ODEN bit, the HSI or HSE must be selected as system clock.

I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions.  I still need to investigate that further, but hopefully that provides a little more insight.

Mike

July 22, 2017
On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
> On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
>
>> I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
>>
>> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through.  No only because of address locations, but also different registers and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
>
> Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further.  Comparing the disassembly of the function I modified still showed quite a significant difference.
>
> Working Binary
> -------------
> ldr	r2, [pc, #188]	; (8000c50 <hardwareInit+0x104>)
> ldr	r1, [pc, #188]	; (8000c54 <hardwareInit+0x108>)
> ldr	r3, [r2, #0]
> and.w	r3, r3, #780	; 0x30c
> orr.w	r3, r3, #37888	; 0x9400
> movs	r0, #0
> str	r3, [r2, #0]
> strb	r0, [r1, #0]
> ;-------------------------------------------------------
> nop                     ; My stategically placed nop
> ;-------------------------------------------------------
> ldr	r3, [pc, #172]	; (8000c58 <hardwareInit+0x10c>)
> ldr	r0, [pc, #176]	; (8000c5c <hardwareInit+0x110>)
> ldr	r4, [pc, #176]	; (8000c60 <hardwareInit+0x114>)
> ldr	r2, [pc, #180]	; (8000c64 <hardwareInit+0x118>)
> movs	r1, #1
> strb	r1, [r3, #0]
> ldr	r3, [r0, #0]
> orr.w	r3, r3, #49152	; 0xc000
> str	r3, [r0, #0]
> strb	r1, [r4, #0]
>
> Not Working Binary
> ------------------
> ldr	r0, [pc, #184]	; (8000c4c <hardwareInit+0x100>)
> ldr	r1, [pc, #184]	; (8000c50 <hardwareInit+0x104>)
> ldr	r2, [pc, #188]	; (8000c54 <hardwareInit+0x108>)
> ldr	r3, [r1, #0]
> ldr	r4, [pc, #188]	; (8000c58 <hardwareInit+0x10c>)
> movs	r5, #0
> strb	r5, [r0, #0]
> movs	r0, #1
> strb	r0, [r2, #0]
> ldr	r2, [r4, #0]
> ldr	r5, [pc, #180]	; (8000c5c <hardwareInit+0x110>)
> orr.w	r2, r2, #49152	; 0xc000
> and.w	r3, r3, #780	; 0x30c
> str	r2, [r4, #0]
> orr.w	r3, r3, #37888	; 0x9400
> ldr	r2, [pc, #168]	; (8000c60 <hardwareInit+0x114>)
> strb	r0, [r5, #0]
> str	r3, [r1, #0]
>
> By "Not Working" I mean this code gets stuck in the while loop
>
> PWR.CR.ODEN.value = true;
> while(!PWR.CSR.ODRDY.value) { }
>
> This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware.  The documentation states:
>
> To set or reset the ODEN bit, the HSI or HSE must be selected as system clock.
>
> I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions.  I still need to investigate that further, but hopefully that provides a little more insight.
>
> Mike

A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. After this the test inside while may also be optimized outside. Then the whole while thing which now has nothing to test and nothing inside the body can be optimized out.

I got this working because 'shared' meant 'volatile' in gdc but this is not true any more.
I did not yet look how you define your data type and how you access the data but it seems the compiler thinks it is an ordinary variable.

I made custom Volatile data type that uses those new compiler intrinsics and my sample program seems to work with gdc 7.
The only thing that does not work is exceptions. The exception code in runtime has changed a lot so I do not know whether it should work or not.

July 22, 2017
Am Sat, 22 Jul 2017 07:07:33 +0000
schrieb Timo Sintonen <t.sintonen@luukku.com>:

> On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
> > On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
> > 
> >> I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
> >>
> >> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through.  No only because of address locations, but also different registers and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
> >
> > Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further.  Comparing the disassembly of the function I modified still showed quite a significant difference.
> >
> > Working Binary
> > -------------
> > ldr	r2, [pc, #188]	; (8000c50 <hardwareInit+0x104>)
> > ldr	r1, [pc, #188]	; (8000c54 <hardwareInit+0x108>)
> > ldr	r3, [r2, #0]
> > and.w	r3, r3, #780	; 0x30c
> > orr.w	r3, r3, #37888	; 0x9400
> > movs	r0, #0
> > str	r3, [r2, #0]
> > strb	r0, [r1, #0]
> > ;-------------------------------------------------------
> > nop                     ; My stategically placed nop
> > ;-------------------------------------------------------
> > ldr	r3, [pc, #172]	; (8000c58 <hardwareInit+0x10c>)
> > ldr	r0, [pc, #176]	; (8000c5c <hardwareInit+0x110>)
> > ldr	r4, [pc, #176]	; (8000c60 <hardwareInit+0x114>)
> > ldr	r2, [pc, #180]	; (8000c64 <hardwareInit+0x118>)
> > movs	r1, #1
> > strb	r1, [r3, #0]
> > ldr	r3, [r0, #0]
> > orr.w	r3, r3, #49152	; 0xc000
> > str	r3, [r0, #0]
> > strb	r1, [r4, #0]
> >
> > Not Working Binary
> > ------------------
> > ldr	r0, [pc, #184]	; (8000c4c <hardwareInit+0x100>)
> > ldr	r1, [pc, #184]	; (8000c50 <hardwareInit+0x104>)
> > ldr	r2, [pc, #188]	; (8000c54 <hardwareInit+0x108>)
> > ldr	r3, [r1, #0]
> > ldr	r4, [pc, #188]	; (8000c58 <hardwareInit+0x10c>)
> > movs	r5, #0
> > strb	r5, [r0, #0]
> > movs	r0, #1
> > strb	r0, [r2, #0]
> > ldr	r2, [r4, #0]
> > ldr	r5, [pc, #180]	; (8000c5c <hardwareInit+0x110>)
> > orr.w	r2, r2, #49152	; 0xc000
> > and.w	r3, r3, #780	; 0x30c
> > str	r2, [r4, #0]
> > orr.w	r3, r3, #37888	; 0x9400
> > ldr	r2, [pc, #168]	; (8000c60 <hardwareInit+0x114>)
> > strb	r0, [r5, #0]
> > str	r3, [r1, #0]
> >
> > By "Not Working" I mean this code gets stuck in the while loop
> >
> > PWR.CR.ODEN.value = true;
> > while(!PWR.CSR.ODRDY.value) { }
> >
> > This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware.  The documentation states:
> >
> > To set or reset the ODEN bit, the HSI or HSE must be selected as system clock.
> >
> > I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions.  I still need to investigate that further, but hopefully that provides a little more insight.
> >
> > Mike
> 
> A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]

There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated.

-- Johannes

July 22, 2017
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

> However, I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).

I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store.  Read the comments in the following code for understanding.  FYI A bit-banded address is a 32-bit address to a single bit.

Here's the D code
-----------------
// This is a single atomic store to bit-banded address 0x42470048
RCC.CR.HSEBYP.value = false;

// This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808
with(RCC.CFGR)
{
    setValue
    !(
          MCO2,    0
        , MCO2PRE, 0
        , MCO1PRE, 0
        , I2SSRC,  0
        , MCO1,    0
        , RTCPRE,  0
        , HPRE,    0b000
        , PPRE2,   0b100
        , PPRE1,   0b101
        , SW,      0
    )();
}

And here's the dis-assembly
--------------------------_
8000b92:    ldr      r2, [pc, #188]  ; (8000c50 <hardwareInit+0x104>)  0x40023808 - RCC.CFGR
8000b94:    ldr      r1, [pc, #188]  ; (8000c54 <hardwareInit+0x108>)  0x40023808 - RCC.CR.HSEBYP

; Read-modify of RCC.CFGR
8000b96:    ldr      r3, [r2, #0]
8000b98:    and.w    r3, r3, #780    ; 0x30c
8000b9c:    orr.w    r3, r3, #37888  ; 0x9400

8000ba0:    movs     r0, #0          ; #0 is `false` value for RCC.CR.HSEBYP
8000ba2:    str      r3, [r2, #0]    ; This is the store to RCC.CFGR
8000ba4:    strb     r0, [r1, #0]    ; This is the store to RCC.CR.HSEBYP
...
8000c50:    .word    0x40023808
8000c54:    .word    0x42470048


You can see that at 8000ba2 and 8000ba4 RCC.CFGR is written first.  But in the D code RCC.CR.HSEBYP should be written first.

Mike

July 22, 2017
Am Fri, 21 Jul 2017 23:44:53 +0000
schrieb Mike <none@none.com>:

> My stm32 demo has now been updated and working with GDC/GCC 7.1.0.  Thanks for all your improvements.
> 
> However, I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
> 
> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through.  No only because of address locations, but also different registers and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')

This can unfortunately happen if scheduling allows further optimizations. Then the generated code might look nothing like the original code. It's also possible that this is only caused by a combination of optimization passes, I guess it doesn't happen using -fschedule-insns without other -O flags?

> 
> Is there anything I can do to provide more actionable information to help identify the underlying cause?

As I don't have an ARM bare metal compiler ready to test: The output of -fdump-tree-all and -fdump-rtl-all would be useful. The tree output is usually quite readable, rtl not so much...

There might be some more useful switches on https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html

-- Johannes

July 22, 2017
Am Sat, 22 Jul 2017 08:11:28 +0000
schrieb Mike <none@none.com>:

> On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
> 
> > However, I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
> 
> I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store.  Read the comments in the following code for understanding.  FYI A bit-banded address is a 32-bit address to a single bit.
> 
> Here's the D code
> -----------------
> // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false;
> 
> // This is a single read-modify-write to non-bit-banded 32-bit
> address 0x40023808
> with(RCC.CFGR)
> {
>      setValue
>      !(
>            MCO2,    0
>          , MCO2PRE, 0
>          , MCO1PRE, 0
>          , I2SSRC,  0
>          , MCO1,    0
>          , RTCPRE,  0
>          , HPRE,    0b000
>          , PPRE2,   0b100
>          , PPRE1,   0b101
>          , SW,      0
>      )();
> }
> 

I guess this doesn't happen for a reduced example directly using volatileLoad/volatileStore? If you could provide such a reduced example that'd be very useful.

-- Johannes

July 22, 2017
On 22 July 2017 at 01:44, Mike via D.gnu <d.gnu@puremagic.com> wrote:
> My stm32 demo has now been updated and working with GDC/GCC 7.1.0.  Thanks for all your improvements.
>
> However, I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
>
> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through.  No only because of address locations, but also different registers and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
>
> Is there anything I can do to provide more actionable information to help identify the underlying cause?
>
> Mike

Hi Mike,

Is the stm discovery repository up to date on Github?

https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type=

Those should probably be volatileLoad, as they look to be used by setValue().

Iain.
July 22, 2017
On 22 July 2017 at 10:09, Johannes Pfau via D.gnu <d.gnu@puremagic.com> wrote:
> Am Sat, 22 Jul 2017 07:07:33 +0000
> schrieb Timo Sintonen <t.sintonen@luukku.com>:
>
>> On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
>> > On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
>> >
>> >> I'm getting broken binaries with -O2 and -O3.  I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
>> >>
>> >> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through.  No only because of address locations, but also different registers and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
>> >
>> > Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further.  Comparing the disassembly of the function I modified still showed quite a significant difference.
>> >
>> > Working Binary
>> > -------------
>> > ldr r2, [pc, #188]  ; (8000c50 <hardwareInit+0x104>)
>> > ldr r1, [pc, #188]  ; (8000c54 <hardwareInit+0x108>)
>> > ldr r3, [r2, #0]
>> > and.w       r3, r3, #780    ; 0x30c
>> > orr.w       r3, r3, #37888  ; 0x9400
>> > movs        r0, #0
>> > str r3, [r2, #0]
>> > strb        r0, [r1, #0]
>> > ;-------------------------------------------------------
>> > nop                     ; My stategically placed nop
>> > ;-------------------------------------------------------
>> > ldr r3, [pc, #172]  ; (8000c58 <hardwareInit+0x10c>)
>> > ldr r0, [pc, #176]  ; (8000c5c <hardwareInit+0x110>)
>> > ldr r4, [pc, #176]  ; (8000c60 <hardwareInit+0x114>)
>> > ldr r2, [pc, #180]  ; (8000c64 <hardwareInit+0x118>)
>> > movs        r1, #1
>> > strb        r1, [r3, #0]
>> > ldr r3, [r0, #0]
>> > orr.w       r3, r3, #49152  ; 0xc000
>> > str r3, [r0, #0]
>> > strb        r1, [r4, #0]
>> >
>> > Not Working Binary
>> > ------------------
>> > ldr r0, [pc, #184]  ; (8000c4c <hardwareInit+0x100>)
>> > ldr r1, [pc, #184]  ; (8000c50 <hardwareInit+0x104>)
>> > ldr r2, [pc, #188]  ; (8000c54 <hardwareInit+0x108>)
>> > ldr r3, [r1, #0]
>> > ldr r4, [pc, #188]  ; (8000c58 <hardwareInit+0x10c>)
>> > movs        r5, #0
>> > strb        r5, [r0, #0]
>> > movs        r0, #1
>> > strb        r0, [r2, #0]
>> > ldr r2, [r4, #0]
>> > ldr r5, [pc, #180]  ; (8000c5c <hardwareInit+0x110>)
>> > orr.w       r2, r2, #49152  ; 0xc000
>> > and.w       r3, r3, #780    ; 0x30c
>> > str r2, [r4, #0]
>> > orr.w       r3, r3, #37888  ; 0x9400
>> > ldr r2, [pc, #168]  ; (8000c60 <hardwareInit+0x114>)
>> > strb        r0, [r5, #0]
>> > str r3, [r1, #0]
>> >
>> > By "Not Working" I mean this code gets stuck in the while loop
>> >
>> > PWR.CR.ODEN.value = true;
>> > while(!PWR.CSR.ODRDY.value) { }
>> >
>> > This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware.  The documentation states:
>> >
>> > To set or reset the ODEN bit, the HSI or HSE must be selected as system clock.
>> >
>> > I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions.  I still need to investigate that further, but hopefully that provides a little more insight.
>> >
>> > Mike
>>
>> A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]
>
> There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated.
>

While I'm confident that the current implementation of volatileLoad/volatileStore should prevent such reordering, inserting a memory barrier before generating our volatileLoad/Store's can also be done to really hammer it in to the gcc optimizer.

Iain.
July 22, 2017
On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:

> https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type=
>
> Those should probably be volatileLoad, as they look to be used by setValue().

I am such an idiot. Problem solved.  Thank you, and I'm terribly sorry for the noise.

Mike

« First   ‹ Prev
1 2