Thread overview | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
July 21, 2017 ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
My stm32 demo has now been updated and working with GDC/GCC 7.1.0. Thanks for all your improvements. However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]') Is there anything I can do to provide more actionable information to help identify the underlying cause? Mike |
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote: > I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). > > I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]') Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>) ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r2, #0] and.w r3, r3, #780 ; 0x30c orr.w r3, r3, #37888 ; 0x9400 movs r0, #0 str r3, [r2, #0] strb r0, [r1, #0] ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>) ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>) ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>) ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>) movs r1, #1 strb r1, [r3, #0] ldr r3, [r0, #0] orr.w r3, r3, #49152 ; 0xc000 str r3, [r0, #0] strb r1, [r4, #0] Not Working Binary ------------------ ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>) ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>) ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r1, #0] ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>) movs r5, #0 strb r5, [r0, #0] movs r0, #1 strb r0, [r2, #0] ldr r2, [r4, #0] ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>) orr.w r2, r2, #49152 ; 0xc000 and.w r3, r3, #780 ; 0x30c str r2, [r4, #0] orr.w r3, r3, #37888 ; 0x9400 ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>) strb r0, [r5, #0] str r3, [r1, #0] By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike |
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
> On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
>
>> I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
>>
>> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
>
> Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference.
>
> Working Binary
> -------------
> ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>)
> ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>)
> ldr r3, [r2, #0]
> and.w r3, r3, #780 ; 0x30c
> orr.w r3, r3, #37888 ; 0x9400
> movs r0, #0
> str r3, [r2, #0]
> strb r0, [r1, #0]
> ;-------------------------------------------------------
> nop ; My stategically placed nop
> ;-------------------------------------------------------
> ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>)
> ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>)
> ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>)
> ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>)
> movs r1, #1
> strb r1, [r3, #0]
> ldr r3, [r0, #0]
> orr.w r3, r3, #49152 ; 0xc000
> str r3, [r0, #0]
> strb r1, [r4, #0]
>
> Not Working Binary
> ------------------
> ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>)
> ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>)
> ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>)
> ldr r3, [r1, #0]
> ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>)
> movs r5, #0
> strb r5, [r0, #0]
> movs r0, #1
> strb r0, [r2, #0]
> ldr r2, [r4, #0]
> ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>)
> orr.w r2, r2, #49152 ; 0xc000
> and.w r3, r3, #780 ; 0x30c
> str r2, [r4, #0]
> orr.w r3, r3, #37888 ; 0x9400
> ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>)
> strb r0, [r5, #0]
> str r3, [r1, #0]
>
> By "Not Working" I mean this code gets stuck in the while loop
>
> PWR.CR.ODEN.value = true;
> while(!PWR.CSR.ODRDY.value) { }
>
> This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states:
>
> To set or reset the ODEN bit, the HSI or HSE must be selected as system clock.
>
> I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight.
>
> Mike
A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. After this the test inside while may also be optimized outside. Then the whole while thing which now has nothing to test and nothing inside the body can be optimized out.
I got this working because 'shared' meant 'volatile' in gdc but this is not true any more.
I did not yet look how you define your data type and how you access the data but it seems the compiler thinks it is an ordinary variable.
I made custom Volatile data type that uses those new compiler intrinsics and my sample program seems to work with gdc 7.
The only thing that does not work is exceptions. The exception code in runtime has changed a lot so I do not know whether it should work or not.
|
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timo Sintonen | Am Sat, 22 Jul 2017 07:07:33 +0000
schrieb Timo Sintonen <t.sintonen@luukku.com>:
> On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
> > On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
> >
> >> I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
> >>
> >> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
> >
> > Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference.
> >
> > Working Binary
> > -------------
> > ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>)
> > ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>)
> > ldr r3, [r2, #0]
> > and.w r3, r3, #780 ; 0x30c
> > orr.w r3, r3, #37888 ; 0x9400
> > movs r0, #0
> > str r3, [r2, #0]
> > strb r0, [r1, #0]
> > ;-------------------------------------------------------
> > nop ; My stategically placed nop
> > ;-------------------------------------------------------
> > ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>)
> > ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>)
> > ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>)
> > ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>)
> > movs r1, #1
> > strb r1, [r3, #0]
> > ldr r3, [r0, #0]
> > orr.w r3, r3, #49152 ; 0xc000
> > str r3, [r0, #0]
> > strb r1, [r4, #0]
> >
> > Not Working Binary
> > ------------------
> > ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>)
> > ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>)
> > ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>)
> > ldr r3, [r1, #0]
> > ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>)
> > movs r5, #0
> > strb r5, [r0, #0]
> > movs r0, #1
> > strb r0, [r2, #0]
> > ldr r2, [r4, #0]
> > ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>)
> > orr.w r2, r2, #49152 ; 0xc000
> > and.w r3, r3, #780 ; 0x30c
> > str r2, [r4, #0]
> > orr.w r3, r3, #37888 ; 0x9400
> > ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>)
> > strb r0, [r5, #0]
> > str r3, [r1, #0]
> >
> > By "Not Working" I mean this code gets stuck in the while loop
> >
> > PWR.CR.ODEN.value = true;
> > while(!PWR.CSR.ODRDY.value) { }
> >
> > This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states:
> >
> > To set or reset the ODEN bit, the HSI or HSE must be selected as system clock.
> >
> > I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight.
> >
> > Mike
>
> A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]
There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated.
-- Johannes
|
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
> However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit.
Here's the D code
-----------------
// This is a single atomic store to bit-banded address 0x42470048
RCC.CR.HSEBYP.value = false;
// This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808
with(RCC.CFGR)
{
setValue
!(
MCO2, 0
, MCO2PRE, 0
, MCO1PRE, 0
, I2SSRC, 0
, MCO1, 0
, RTCPRE, 0
, HPRE, 0b000
, PPRE2, 0b100
, PPRE1, 0b101
, SW, 0
)();
}
And here's the dis-assembly
--------------------------_
8000b92: ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>) 0x40023808 - RCC.CFGR
8000b94: ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>) 0x40023808 - RCC.CR.HSEBYP
; Read-modify of RCC.CFGR
8000b96: ldr r3, [r2, #0]
8000b98: and.w r3, r3, #780 ; 0x30c
8000b9c: orr.w r3, r3, #37888 ; 0x9400
8000ba0: movs r0, #0 ; #0 is `false` value for RCC.CR.HSEBYP
8000ba2: str r3, [r2, #0] ; This is the store to RCC.CFGR
8000ba4: strb r0, [r1, #0] ; This is the store to RCC.CR.HSEBYP
...
8000c50: .word 0x40023808
8000c54: .word 0x42470048
You can see that at 8000ba2 and 8000ba4 RCC.CFGR is written first. But in the D code RCC.CR.HSEBYP should be written first.
Mike
|
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | Am Fri, 21 Jul 2017 23:44:53 +0000 schrieb Mike <none@none.com>: > My stm32 demo has now been updated and working with GDC/GCC 7.1.0. Thanks for all your improvements. > > However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). > > I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]') This can unfortunately happen if scheduling allows further optimizations. Then the generated code might look nothing like the original code. It's also possible that this is only caused by a combination of optimization passes, I guess it doesn't happen using -fschedule-insns without other -O flags? > > Is there anything I can do to provide more actionable information to help identify the underlying cause? As I don't have an ARM bare metal compiler ready to test: The output of -fdump-tree-all and -fdump-rtl-all would be useful. The tree output is usually quite readable, rtl not so much... There might be some more useful switches on https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html -- Johannes |
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | Am Sat, 22 Jul 2017 08:11:28 +0000
schrieb Mike <none@none.com>:
> On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
>
> > However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
>
> I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit.
>
> Here's the D code
> -----------------
> // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false;
>
> // This is a single read-modify-write to non-bit-banded 32-bit
> address 0x40023808
> with(RCC.CFGR)
> {
> setValue
> !(
> MCO2, 0
> , MCO2PRE, 0
> , MCO1PRE, 0
> , I2SSRC, 0
> , MCO1, 0
> , RTCPRE, 0
> , HPRE, 0b000
> , PPRE2, 0b100
> , PPRE1, 0b101
> , SW, 0
> )();
> }
>
I guess this doesn't happen for a reduced example directly using volatileLoad/volatileStore? If you could provide such a reduced example that'd be very useful.
-- Johannes
|
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike | On 22 July 2017 at 01:44, Mike via D.gnu <d.gnu@puremagic.com> wrote: > My stm32 demo has now been updated and working with GDC/GCC 7.1.0. Thanks for all your improvements. > > However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). > > I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]') > > Is there anything I can do to provide more actionable information to help identify the underlying cause? > > Mike Hi Mike, Is the stm discovery repository up to date on Github? https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type= Those should probably be volatileLoad, as they look to be used by setValue(). Iain. |
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johannes Pfau | On 22 July 2017 at 10:09, Johannes Pfau via D.gnu <d.gnu@puremagic.com> wrote:
> Am Sat, 22 Jul 2017 07:07:33 +0000
> schrieb Timo Sintonen <t.sintonen@luukku.com>:
>
>> On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
>> > On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
>> >
>> >> I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).
>> >>
>> >> I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
>> >
>> > Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference.
>> >
>> > Working Binary
>> > -------------
>> > ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>)
>> > ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>)
>> > ldr r3, [r2, #0]
>> > and.w r3, r3, #780 ; 0x30c
>> > orr.w r3, r3, #37888 ; 0x9400
>> > movs r0, #0
>> > str r3, [r2, #0]
>> > strb r0, [r1, #0]
>> > ;-------------------------------------------------------
>> > nop ; My stategically placed nop
>> > ;-------------------------------------------------------
>> > ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>)
>> > ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>)
>> > ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>)
>> > ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>)
>> > movs r1, #1
>> > strb r1, [r3, #0]
>> > ldr r3, [r0, #0]
>> > orr.w r3, r3, #49152 ; 0xc000
>> > str r3, [r0, #0]
>> > strb r1, [r4, #0]
>> >
>> > Not Working Binary
>> > ------------------
>> > ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>)
>> > ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>)
>> > ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>)
>> > ldr r3, [r1, #0]
>> > ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>)
>> > movs r5, #0
>> > strb r5, [r0, #0]
>> > movs r0, #1
>> > strb r0, [r2, #0]
>> > ldr r2, [r4, #0]
>> > ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>)
>> > orr.w r2, r2, #49152 ; 0xc000
>> > and.w r3, r3, #780 ; 0x30c
>> > str r2, [r4, #0]
>> > orr.w r3, r3, #37888 ; 0x9400
>> > ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>)
>> > strb r0, [r5, #0]
>> > str r3, [r1, #0]
>> >
>> > By "Not Working" I mean this code gets stuck in the while loop
>> >
>> > PWR.CR.ODEN.value = true;
>> > while(!PWR.CSR.ODRDY.value) { }
>> >
>> > This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states:
>> >
>> > To set or reset the ODEN bit, the HSI or HSE must be selected as system clock.
>> >
>> > I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight.
>> >
>> > Mike
>>
>> A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]
>
> There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated.
>
While I'm confident that the current implementation of volatileLoad/volatileStore should prevent such reordering, inserting a memory barrier before generating our volatileLoad/Store's can also be done to really hammer it in to the gcc optimizer.
Iain.
|
July 22, 2017 Re: ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:
> https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type=
>
> Those should probably be volatileLoad, as they look to be used by setValue().
I am such an idiot. Problem solved. Thank you, and I'm terribly sorry for the noise.
Mike
|
Copyright © 1999-2021 by the D Language Foundation