Thread overview
Android/ARM: fixing exception-handling
Jun 16, 2015
Joakim
Jun 17, 2015
David Nadlinger
Jun 17, 2015
Dan Olson
Jun 17, 2015
Joakim
Jul 08, 2015
Joakim
Jul 09, 2015
Dan Olson
Jul 25, 2015
Joakim
Jul 28, 2015
Dan Olson
Jun 17, 2015
Joakim
June 16, 2015
I've gotten pretty far along with ldc for Android/ARM, with the big remaining issue appearing to be the unfinished support for exception-handling.  Many exceptions seem to work just fine, while others cause segfaults.  I've just started looking at ldc.eh with one such failing exception from the unit tests for core.thread and it seems to error out when trying to find the landing pad and action offset, I think in the get_uleb128 helper.

David, what remains to be done for ARM support, if you know anything more specific than simply finding and fixing the remaining stuff that doesn't work?
June 17, 2015
On Tuesday, 16 June 2015 at 23:07:45 UTC, Joakim wrote:
> David, what remains to be done for ARM support, if you know anything more specific than simply finding and fixing the remaining stuff that doesn't work?

Unfortunately, I don't know of anything more specific than the couple EH-related of test case failures on Linux/EABI.

It has been quite some while since I last worked on LDC/ARM to be honest; most of my ARM work is getting embedded stuff done with C++14 these days. Maybe Dan knows of some other codegen/math-related issues still to be solved?

 - David
June 17, 2015
"David Nadlinger" <code@klickverbot.at> writes:

> On Tuesday, 16 June 2015 at 23:07:45 UTC, Joakim wrote:
>> David, what remains to be done for ARM support, if you know anything more specific than simply finding and fixing the remaining stuff that doesn't work?
>
> Unfortunately, I don't know of anything more specific than the couple EH-related of test case failures on Linux/EABI.
>
> It has been quite some while since I last worked on LDC/ARM to be honest; most of my ARM work is getting embedded stuff done with C++14 these days. Maybe Dan knows of some other codegen/math-related issues still to be solved?

There might be some clues in the iOS branch for ldc/eh.d even though it is dealing with SjLj style exceptions and landing pads are interpreted differently.  It has few version variations but uses much of the same code.  It seems to work ok during the unittests.  I haven't encountered any weirdness since I spent some late nights with a debugger a year ago.

https://github.com/smolt/druntime/blob/ios/src/ldc/eh.d

Diff with tag v0.15.1 to see where I changed stuff.  All my published ios branches are currently based on 0.15.1 and using LLVM 3.5.1.

Joakim, what branch of LDC are you basing your Android stuff on?

I can publish to github ios merges with 0.15.2-beta and 0.16.0 (branch merge-2.067), but I don't think there is any additional help there with regard to EH, even though ldc/eh.d did change for druntime ldc branch.

As far as codegen problems - there is nothing related to EH that I can think of.  The optimizer occassionally gets some alignment wrong with neon instructions in LLVM 3.5.1, but that does not show up as a EH problem.  Currently neon is disabled when building optimized libs.

If you haven't created a gen/abi-arm.{h,cpp}, you will need to as the default has a few problems on ARM, but still not related to EH.  If you are on LLVM 3.5.1, try the one on the ios branch named abi-ios.{h,cpp}. There are additional abi-ios changes for 0.15.2 because D variadic functions handling changed.
-- 
Dan
June 17, 2015
On Wednesday, 17 June 2015 at 06:50:52 UTC, Dan Olson wrote:
> There might be some clues in the iOS branch for ldc/eh.d even though it is dealing with SjLj style exceptions and landing pads are interpreted differently.  It has few version variations but uses much of the same code.  It seems to work ok during the unittests.  I haven't encountered any weirdness since I spent some late nights with a debugger a year ago.
>
> https://github.com/smolt/druntime/blob/ios/src/ldc/eh.d
>
> Diff with tag v0.15.1 to see where I changed stuff.  All my published ios branches are currently based on 0.15.1 and using LLVM 3.5.1.
>
> Joakim, what branch of LDC are you basing your Android stuff on?

I'm currently using the merge-2.067 branch linked against a lightly patched llvm 3.6, the one that's used in the Android NDK, and compiled by clang 3.6.1.

> I can publish to github ios merges with 0.15.2-beta and 0.16.0 (branch merge-2.067), but I don't think there is any additional help there with regard to EH, even though ldc/eh.d did change for druntime ldc branch.

I hadn't bothered looking at how your iOS branch dealt with exceptions, since you had said a while back that it uses setjmp/longjmp exceptions, but I'll take a look now and see if there's anything helpful.

> As far as codegen problems - there is nothing related to EH that I can think of.  The optimizer occassionally gets some alignment wrong with neon instructions in LLVM 3.5.1, but that does not show up as a EH problem.  Currently neon is disabled when building optimized libs.
>
> If you haven't created a gen/abi-arm.{h,cpp}, you will need to as the default has a few problems on ARM, but still not related to EH.  If you are on LLVM 3.5.1, try the one on the ios branch named abi-ios.{h,cpp}. There are additional abi-ios changes for 0.15.2 because D variadic functions handling changed.

I'll take a look.  Right now, the only change I made to gen/abi.cpp is to use the C calling convention everywhere.
June 17, 2015
On Wednesday, 17 June 2015 at 01:03:19 UTC, David Nadlinger wrote:
> On Tuesday, 16 June 2015 at 23:07:45 UTC, Joakim wrote:
>> David, what remains to be done for ARM support, if you know anything more specific than simply finding and fixing the remaining stuff that doesn't work?
>
> Unfortunately, I don't know of anything more specific than the couple EH-related of test case failures on Linux/EABI.

OK, I'll look into those.  It does seem that there are a lot more unit tests that throw exceptions in 2.067 though, so a lot more than a couple fail.

I've also found one or two tests unrelated to exceptions that may have ARM codegen issues.  I'll look into those further and file the appropriate issues, if necessary.
July 08, 2015
On Wednesday, 17 June 2015 at 07:32:35 UTC, Joakim wrote:
> I hadn't bothered looking at how your iOS branch dealt with exceptions, since you had said a while back that it uses setjmp/longjmp exceptions, but I'll take a look now and see if there's anything helpful.

Took a look, don't think it's relevant to DWARF exceptions.

> I'll take a look.  Right now, the only change I made to gen/abi.cpp is to use the C calling convention everywhere.

It appears that the only change you made is to turn off passing structs by value?

https://github.com/smolt/ldc/blob/ios/gen/abi-ios.cpp#L53

The fast C calling convention works for you?  It always caused problems for me on ARM, including causing a segfault in llvm when compiling, the last time I tried it.

I spent some time looking into the ARM EH issues and it appears that disabling inlining fixes a lot of it:

--- a/gen/optimizer.cpp
+++ b/gen/optimizer.cpp
@@ -163,8 +163,8 @@ static unsigned sizeLevel() {

 // Determines whether or not to run the normal, full inlining pass.
 bool willInline() {
-    return enableInlining == cl::BOU_TRUE ||
-        (enableInlining == cl::BOU_UNSET && optLevel() > 1);
+    return enableInlining == cl::BOU_TRUE;// ||
+        //(enableInlining == cl::BOU_UNSET && optLevel() > 1);
 }

 bool isOptimizationEnabled() {

I also get proper backtraces in gdb much more often after turning off inlining, not to mention actual error output on the command-line as opposed to segfaults.  I'm guessing something is screwed up in the generation or handling of DWARF exception data by function inlining.  Almost all of druntime now passes tests on Android/ARM, with the exception of some codegen issues in core.time.

For a comparison, running the phobos tests with logging turned on in the ldc/eh.d code showed that only about 67 exceptions were thrown with -O2/-O3 -release and inlining turned on.  With inlining turned off, it jumps up to 658 exceptions, an order of magnitude more, because many more tests are run once EH starts working.  A couple exceptions might still be uncaught and need to be fixed, but it appears that EH is not the bottleneck anymore, it's codegen and other ARM issues.

David, Kai, or whoever else runs tests on linux/Android/ARM, can you turn inlining off and verify the same results on your ARM hardware?
July 09, 2015
"Joakim" <dlang@joakim.fea.st> writes:

> On Wednesday, 17 June 2015 at 07:32:35 UTC, Joakim wrote:
> It appears that the only change you made is to turn off passing
> structs by value?
>
> https://github.com/smolt/ldc/blob/ios/gen/abi-ios.cpp#L53

Hi Joakm,

Yes, that little change had a big impact.

http://forum.dlang.org/post/m2r3u5ac0c.fsf@comcast.net

Structs are still passed by value, just in a different way.  The LLVM "byval" attribute non-inuitively passes a pointer to a struct instead of passing its contents in registers and stack.

http://llvm.org/docs/LangRef.html#parameter-attributes.

> The fast C calling convention works for you?  It always caused problems for me on ARM, including causing a segfault in llvm when compiling, the last time I tried it.

fastcc has worked quite well and an attempt to change to C calling convention (ccc) led to funny codegen for some aggregate function return values (e.g complex reals) when optimization was enabled.  But that problem seemed to go away with LLVM 3.6.

In the end I have abandoned fastcc for ccc with my 0.15.2 and 2.067 merge branches because LDC adopted a different variadic approach and fastcc doesn't support it.
-- 
Dan
July 25, 2015
On Wednesday, 8 July 2015 at 16:14:43 UTC, Joakim wrote:
> I spent some time looking into the ARM EH issues and it appears that disabling inlining fixes a lot of it:
>
> --- a/gen/optimizer.cpp
> +++ b/gen/optimizer.cpp
> @@ -163,8 +163,8 @@ static unsigned sizeLevel() {
>
>  // Determines whether or not to run the normal, full inlining pass.
>  bool willInline() {
> -    return enableInlining == cl::BOU_TRUE ||
> -        (enableInlining == cl::BOU_UNSET && optLevel() > 1);
> +    return enableInlining == cl::BOU_TRUE;// ||
> +        //(enableInlining == cl::BOU_UNSET && optLevel() > 1);
>  }
>
>  bool isOptimizationEnabled() {
>
> I also get proper backtraces in gdb much more often after turning off inlining, not to mention actual error output on the command-line as opposed to segfaults.  I'm guessing something is screwed up in the generation or handling of DWARF exception data by function inlining.  Almost all of druntime now passes tests on Android/ARM, with the exception of some codegen issues in core.time.
>
> For a comparison, running the phobos tests with logging turned on in the ldc/eh.d code showed that only about 67 exceptions were thrown with -O2/-O3 -release and inlining turned on.  With inlining turned off, it jumps up to 658 exceptions, an order of magnitude more, because many more tests are run once EH starts working.  A couple exceptions might still be uncaught and need to be fixed, but it appears that EH is not the bottleneck anymore, it's codegen and other ARM issues.
>
> David, Kai, or whoever else runs tests on linux/Android/ARM, can you turn inlining off and verify the same results on your ARM hardware?

I spent some more time looking into this and it appears that an ARM optimization pass in llvm is the real issue, not inlining.  It turns out that enabling the EH_personality debug output in ldc.eh and turning off inlining happened to generate ARM code that worked earlier, but I can get it to work without those two hacks by turning off one call to an ARM optimization pass in llvm instead.  Specifically, if I disable this second call to createARMLoadStoreOptimizationPass() and then compile only ldc/eh.d with the resulting ldc2, ARM EH will work, because the second "while" loop in eh_personality_common doesn't segfault anymore:

https://github.com/llvm-mirror/llvm/blob/release_36/lib/Target/ARM/ARMTargetMachine.cpp#L312

Otherwise, it will often, though not always, fail at a ldmib instruction, similar to the other codegen issue I brought up in another thread, which Dan provided a workaround for.  With this second pass turned off, that ldmib instruction isn't there and EH starts working.  I haven't looked further into exactly what that ARM optimization pass is screwing up, but this is probably an llvm codegen issue.
July 28, 2015
"Joakim" <dlang@joakim.fea.st> writes:
> I spent some more time looking into this and it appears that an ARM optimization pass in llvm is the real issue, not inlining.  It turns out that enabling the EH_personality debug output in ldc.eh and turning off inlining happened to generate ARM code that worked earlier, but I can get it to work without those two hacks by turning off one call to an ARM optimization pass in llvm instead.

Good puzzle solving.

There might be clues in the clang source code on how to set everything up to make that optimization pass work.  Clang does a lot of interesting stuff, like coercing args and changing alignment that I don't think is done in LDC.