Bad codegen for ARM (and maybe others) when optimizing

Feb 04, 2015

Dan Olson

Feb 05, 2015

Feb 05, 2015

Feb 06, 2015

Feb 07, 2015

Feb 08, 2015

Feb 10, 2015

Feb 10, 2015

Feb 10, 2015

Feb 07, 2015

Feb 08, 2015

February 04, 2015

Bad codegen for ARM (and maybe others) when optimizing

Posted by Dan Olson

Permalink

Dan Olson

Permalink

Just a warning to ARM and PPC users.  And potentially other CPUs with similar register files.  I've discovered a codegen problem with -O1 and higher.  What happens is that load instructions are being incorrectly reordered before stores.  It is a problem in the LLVM backend.  Versions LLVM-3.5, 3.4, 3.3 have the problem based on my testing, and maybe earlier.

The result is a program with incorrect results or crashes.

After much digging I discovered it happens in the "top-down list latency scheduler" pass and a search of llvm bugs brought up for PPC:

http://llvm.org/bugs/show_bug.cgi?id=20280

This looks to be the same problem.  All the fixes should be in LLVM-3.6 which I have not tried yet (does LDC work with 3.6?).

Here is a small program which exhibits the problem on ARM (iOS in my case).

---- badopt.d ----
struct A {int value;}

A fun(A a) {return a;}

void foo(A a)
{
    A z = fun(a);

    import core.stdc.stdio: printf;
    printf("%d %d\n", a.value, z.value);
    // -O1 prints: 12345678 1
    // -O0 prints: 12345678 12345678
}

void main()
{
    A a = A(12345678);
    foo(a);
}
----

In this case the, the -O1 assembly for foo() as it calls fun() is:

$ ldc2 -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -float-abi=softfp -O1 -output-s -c badopt.d

// foo gets arg a.value in r0
__D6badopt3fooFS6badopt1AZv:
	sub	sp, #4
	push	{r7, lr}
	mov	r7, sp
	sub	sp, #4
	ldr	r1, [r7, #8]  <--- out of order
	str	r0, [r7, #8]  <----'
	mov	r0, sp
	bl	__D6badopt3funFS6badopt1AZS6badopt1A

Hi Dan! On Wednesday, 4 February 2015 at 16:15:47 UTC, Dan Olson wrote: > This looks to be the same problem. All the fixes should be in LLVM-3.6 > which I have not tried yet (does LDC work with 3.6?). Thanks for the hint! LDC works with LLVM 3.6 out of the box. Regards, Kai

"Kai Nacke" <kai@redstar.de> writes: > Hi Dan! > > On Wednesday, 4 February 2015 at 16:15:47 UTC, Dan Olson wrote: >> This looks to be the same problem. All the fixes should be in >> LLVM-3.6 >> which I have not tried yet (does LDC work with 3.6?). > > Thanks for the hint! LDC works with LLVM 3.6 out of the box. > > Regards, > Kai Thanks. I updated to LLVM 3.6 and it does not fix the ARM problem. I looked at the changes for http://llvm.org/bugs/show_bug.cgi?id=20280 and they are specific to PPC target :-( I have been looking at IR. The bug is connected to using the byval attribute. I checked out clang's IR and it does not use byval for structs. It passes structs as an array. I am wondering if LDC could use a abi-arm.cpp to generate something like clang. For example, with these declarations: struct A {int value, x,y, z;} A fun(A a); clang IR for ARM: %struct.A = type { i32, i32, i32, i32 } declare void @fun(%struct.A* sret, [4 x i32]) #1 where as LDC is generating: %badopt.A = type { i32, i32, i32, i32 } declare fastcc void @_D6badopt3funFS6badopt1AZS6badopt1A(%badopt.A* noalias sret, %badopt.A* byval) The other alternative is to try to duplicate the PPC LLCM fix in my forked LLVM repo and submit a llvm bug report. -- Dan

Dan Olson <zans.is.for.cans@yahoo.com> writes: > > I have been looking at IR. The bug is connected to using the byval attribute. I checked out clang's IR and it does not use byval for structs. It passes structs as an array. I am wondering if LDC could use a abi-arm.cpp to generate something like clang. > I think an ARM specific abi class is the right solution. On a whim, I tried one small change in abi.cpp bool passByVal(Type* t) { // was // return t->toBasetype()->ty == Tstruct; return false; } This gets rid of LLVM byval attribute and fixes this codegen problem. The generated instructions are similar to clang. The parameter is not an array like clang, it is the IR struct type as a value, but it works. "%badopt.A" instead of "%badopt.A* byval" Another thing I noticed about clang is that structs that fit in 32-bits are returned directly in r0 instead of an sret parameter. There may be more, I need to study clang. -- Dan

Hi Dan! On Thursday, 5 February 2015 at 16:28:37 UTC, Dan Olson wrote: > The other alternative is to try to duplicate the PPC LLCM fix in my > forked LLVM repo and submit a llvm bug report. IMHO you should submit the bug report as LLVM does not work as specified. Regards, Kai

Hi Dan! On Friday, 6 February 2015 at 15:03:11 UTC, Dan Olson wrote: > I think an ARM specific abi class is the right solution. On a whim, I tried one small change in abi.cpp > > bool passByVal(Type* t) > { > // was > // return t->toBasetype()->ty == Tstruct; > return false; > } > > This gets rid of LLVM byval attribute and fixes this codegen problem. The generated instructions are similar to clang. The parameter is not an array like clang, it is the IR struct type as a value, but it works. > > "%badopt.A" instead of "%badopt.A* byval" > > Another thing I noticed about clang is that structs that fit in 32-bits are returned directly in r0 instead of an sret parameter. There may be more, I need to study clang. We can add an ARM specific abi class. Could you just provide your change as a pull request? (I tried similar things in the past with the PPC backend. But at some point it was easier to fix LLVM. :-) Regards, Kai

"Kai Nacke" <kai@redstar.de> writes: > Hi Dan! > > We can add an ARM specific abi class. Could you just provide your > change as a pull request? > (I tried similar things in the past with the PPC backend. But at some > point it was easier to fix LLVM. :-) > > Regards, > Kai I am working on it. After reading the D ABI http://dlang.org/abi.html which says: Function Calling Conventions The extern (C) and extern (D) calling convention matches the C calling convention used by the supported C compiler on the host system. Should I follow that? I noticed the default abi.cpp TargetABI::callingConv() returns fastcc for D calling convention and instead of ccc. I am wondering if I should cripple LDC for iOS to follow this rule. It seems to me, that when D is calling D, function calls should be as efficient as possible. -- Dan

"Kai Nacke" <kai@redstar.de> writes: > Hi Dan! > > On Thursday, 5 February 2015 at 16:28:37 UTC, Dan Olson wrote: >> The other alternative is to try to duplicate the PPC LLCM fix in my forked LLVM repo and submit a llvm bug report. > > IMHO you should submit the bug report as LLVM does not work as specified. > > Regards, > Kai True, I'll do that.

On Sunday, 8 February 2015 at 21:59:02 UTC, Dan Olson wrote: > I am working on it. After reading the D ABI http://dlang.org/abi.html > which says: > > Function Calling Conventions > The extern (C) and extern (D) calling convention matches the C calling > convention used by the supported C compiler on the host system. > > Should I follow that? I noticed the default abi.cpp > TargetABI::callingConv() returns fastcc for D calling convention and > instead of ccc. I am wondering if I should cripple LDC for iOS to follow > this rule. It seems to me, that when D is calling D, function calls > should be as efficient as possible. Hi Dan! I prefer if you follow the D ABI. My impression is that the difference between fastcc and the standard calling convention is not really great, at least on non-x86 systems. Regards, Kai

On Tuesday, 10 February 2015 at 06:07:54 UTC, Kai Nacke wrote: > I prefer if you follow the D ABI. My impression is that the difference between fastcc and the standard calling convention is not really great, at least on non-x86 systems. To further put this into context: LDC supported non-x86_32 platforms before the D spec ABI had that "just as C" clause. Walter's position once explicitly was that on non-x86 platforms, other compilers are "free to innovate", but that seems to have changed. And in my opinion, rightfully so. We need C ABI compatibility on all the platforms anyway, and there is little reason to also maintain a second ABI implementation. We can still relax the ABI on the IR transformation level for internal function when we are doing LTO anyway (LLVM does this by default). David

Forums