December 30, 2015
Jacob Carlborg <doob@me.com> writes:

> On 2015-12-30 08:02, Dan Olson wrote:
>
>> I know some of it from hacking dyld for iOS, but not all.  How does this fit in with "Plan B.2"?
>
> If you need to figure out how TLS works, I can give you some help, that's all I'm saying :)

Oh, good.  Always like help.  I'm going to start with Plan B.1 though because LLVM does nice optimizations for TLS.
December 30, 2015
Dan Olson <gorox@comcast.net> writes:

> A little progress report. More to come later when I get something pushed to github.
>
> I bought a returned Apple Watch yesterday at discount for $223.99 US and tried to see how much of D would work on it using my iOS fork of LDC. There were a few bumps, like dealing with embedded bitcode (a watchOS requirement). After 4-hours of baby steps, little D programs with incremental druntime support, I was able to download a huge watch app extension with all druntime and phobos unittests and run most of them alphabetically. Everything zipped along fine, only a std.math error, then mysteriously a exit after running std.parallelism test a long time. It was late for me so decided that was enough progress.
>
> This means all of druntime worked and probably most of phobos.

Played with this a little more and learned a bit about watchOS memory. A little test that allocated memory in 5 MB chucks was terminated at 30 MB data RAM.  The combined unittests in phobos suck up much more than that, std.uri itself uses over 50 MB.  By tailoring memory usage and running phobos unittests in smaller block, they all work.  The std.math failure was my own coding error missing a version block for WatchOS.

In end, good news: druntime and phobos fully work on watchOS with LLVM optimizations disabled.  With optimzations on, there are alignment problems.  For example, compact unwind data generated by LLVM isn't aligned but some of our eh unwinding code casts these to uint.  Not so good when the optimizer selects instructions requiring special alignment.  I'll track these down gradually.
-- 
Dan

December 30, 2015
On Wednesday, 30 December 2015 at 21:56:46 UTC, Dan Olson wrote:
> Dan Olson <gorox@comcast.net> writes:
>
>> A little progress report. More to come later when I get something pushed to github.
>>
>> I bought a returned Apple Watch yesterday at discount for $223.99 US and tried to see how much of D would work on it using my iOS fork of LDC. There were a few bumps, like dealing with embedded bitcode (a watchOS requirement). After 4-hours of baby steps, little D programs with incremental druntime support, I was able to download a huge watch app extension with all druntime and phobos unittests and run most of them alphabetically. Everything zipped along fine, only a std.math error, then mysteriously a exit after running std.parallelism test a long time. It was late for me so decided that was enough progress.
>>
>> This means all of druntime worked and probably most of phobos.
>
> Played with this a little more and learned a bit about watchOS memory. A little test that allocated memory in 5 MB chucks was terminated at 30 MB data RAM.  The combined unittests in phobos suck up much more than that, std.uri itself uses over 50 MB.  By tailoring memory usage and running phobos unittests in smaller block, they all work.  The std.math failure was my own coding error missing a version block for WatchOS.
>
> In end, good news: druntime and phobos fully work on watchOS with LLVM optimizations disabled.  With optimzations on, there are alignment problems.  For example, compact unwind data generated by LLVM isn't aligned but some of our eh unwinding code casts these to uint.  Not so good when the optimizer selects instructions requiring special alignment.  I'll track these down gradually.

That sounds like this issue I ran into with ARM EH:

https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075

I was able to work around it by disabling the mentioned llvm optimization pass:

https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42

https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133
December 31, 2015
On Wednesday, 30 December 2015 at 23:11:06 UTC, Joakim wrote:
> That sounds like this issue I ran into with ARM EH:
>
> https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075
>
> I was able to work around it by disabling the mentioned llvm optimization pass:
>
> https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42
>
> https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133

Yup, that's exactly it!  The approach I took was to leave optimization on, removed the casts, and byte load the data into the uint vars.  If the dwarf data is not guaranteed to be aligned to the data type, then I think this is the approach to take.
December 31, 2015
On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:
> On Wednesday, 30 December 2015 at 23:11:06 UTC, Joakim wrote:
>> That sounds like this issue I ran into with ARM EH:
>>
>> https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075
>>
>> I was able to work around it by disabling the mentioned llvm optimization pass:
>>
>> https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42
>>
>> https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133
>
> Yup, that's exactly it!  The approach I took was to leave optimization on, removed the casts, and byte load the data into the uint vars.  If the dwarf data is not guaranteed to be aligned to the data type, then I think this is the approach to take.

Sounds good, submit a PR and let's get it in.
December 31, 2015
On Wednesday, 30 December 2015 at 20:55:44 UTC, Dan Olson wrote:

> I'm going to start with Plan B.1 though because LLVM does nice optimizations for TLS.

What is Plan B.1?

--
/Jacob Carlborg
December 31, 2015
On Thursday, 31 December 2015 at 10:10:20 UTC, Jacob Carlborg wrote:
> On Wednesday, 30 December 2015 at 20:55:44 UTC, Dan Olson wrote:
>
>> I'm going to start with Plan B.1 though because LLVM does nice optimizations for TLS.
>
> What is Plan B.1?
>
> --
> /Jacob Carlborg

Getting it into llvm:

http://forum.dlang.org/post/m237um75x7.fsf@comcast.net
January 04, 2016
Joakim <dlang@joakim.fea.st> writes:

> On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:
>> On Wednesday, 30 December 2015 at 23:11:06 UTC, Joakim wrote:
>>> That sounds like this issue I ran into with ARM EH:
>>>
>>> https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075
>>>
>>> I was able to work around it by disabling the mentioned llvm optimization pass:
>>>
>>> https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42
>>>
>>> https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133
>>
>> Yup, that's exactly it!  The approach I took was to leave optimization on, removed the casts, and byte load the data into the uint vars.  If the dwarf data is not guaranteed to be aligned to the data type, then I think this is the approach to take.
>
> Sounds good, submit a PR and let's get it in.

Was planning to get that PR going then got side tracked by a more difficult ARM exeption unwinding bug.  It happens in std.random unittest at LDC -O2 or higher.  Does this sound familiar Joakim?

The bug is a bad stack pointer which blows up when the last unittest returns.  This unittest has all the right conditions to generate stack adjustments around some of the function calls that throw exceptions. The exception landing pad does not fixup the stack adjustment, thus a stack leak on each caught exception.  The unittest function epilog restores the stack by adding a fixed offset to match the prolog, so the stack pointer stays wrong when the saved registers and return address are popped.

Really looks like LLVM is not doing the right thing with landing pads. In the meantime I patched LLVM to generate epilog that always uses frame pointer to restore the stack pointer.  WatchOS requires a frame pointer, so this isn't too bad.  Now all unittests pass at -O3 for watchOS.

I am guessing iOS is not effected since it uses SjLj to restore the stack after an exception is thrown.  I'll have to pursue this later.  My mind is freed up for the original PR.
-- 
Dan
January 04, 2016
On Monday, 4 January 2016 at 09:26:39 UTC, Dan Olson wrote:
> Joakim <dlang@joakim.fea.st> writes:
>
>> On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:
>>> [...]
>>
>> Sounds good, submit a PR and let's get it in.
>
> Was planning to get that PR going then got side tracked by a more difficult ARM exeption unwinding bug.  It happens in std.random unittest at LDC -O2 or higher.  Does this sound familiar Joakim?

Yep, except tests were failing in three unittest blocks with -O1 too, but I never looked into exactly why:

https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3139

> The bug is a bad stack pointer which blows up when the last unittest returns.  This unittest has all the right conditions to generate stack adjustments around some of the function calls that throw exceptions. The exception landing pad does not fixup the stack adjustment, thus a stack leak on each caught exception.  The unittest function epilog restores the stack by adding a fixed offset to match the prolog, so the stack pointer stays wrong when the saved registers and return address are popped.
>
> Really looks like LLVM is not doing the right thing with landing pads. In the meantime I patched LLVM to generate epilog that always uses frame pointer to restore the stack pointer.  WatchOS requires a frame pointer, so this isn't too bad.  Now all unittests pass at -O3 for watchOS.

Could be the same issue for me, not sure.  If you put your fix online, I can try it and see.

> I am guessing iOS is not effected since it uses SjLj to restore the stack after an exception is thrown.  I'll have to pursue this later.  My mind is freed up for the original PR.

That one is much simpler, let's get it in.
January 04, 2016
Joakim <dlang@joakim.fea.st> writes:

> On Monday, 4 January 2016 at 09:26:39 UTC, Dan Olson wrote:
>> Joakim <dlang@joakim.fea.st> writes:
>
>> The bug is a bad stack pointer which blows up when the last unittest returns.  This unittest has all the right conditions to generate stack adjustments around some of the function calls that throw exceptions. The exception landing pad does not fixup the stack adjustment, thus a stack leak on each caught exception.  The unittest function epilog restores the stack by adding a fixed offset to match the prolog, so the stack pointer stays wrong when the saved registers and return address are popped.
>>
>> Really looks like LLVM is not doing the right thing with landing pads. In the meantime I patched LLVM to generate epilog that always uses frame pointer to restore the stack pointer.  WatchOS requires a frame pointer, so this isn't too bad.  Now all unittests pass at -O3 for watchOS.
>
> Could be the same issue for me, not sure.  If you put your fix online, I can try it and see.

It is this commit based on a 2 week old LLVM 3.8 trunk:

https://github.com/smolt/llvm/commit/91a4420615c6ec83b227b63d36054f12ccffb00f

A small change but took me a long time in debugger on an Apple Watch to figure out.  Something the x86 simulator can't show.  It is tailored to watchOS which uses thumb2 instructions.  watchOS always has a frame, hasFP() is always true. You will want to add Android to the hasFP() or disable frame pointer elimination some other way.  I noticed that -disable-fp-elim for LDC with LLVM 3.7 and above is broken so can't use that.

The pattern to look for if you have a suspect is this:

A function that throws an exception is codegened with stack adjustment surrounding the call:

	sub	sp, #16
	str	r1, [sp]
	mov	r1, r2
	movs	r0, #66
	movw	r2, #2424
	blx	__D3std9exception25__T7bailOutHTC9ExceptionZ7bailOutFNaNfAyakxAaZv
	add	sp, #16    <--- This adjustment is missed on exception

Epilog without hack (llvm 3.8 git 0838b1f Add iOS TLS support for WatchOS)
	itttt	ne
	addne.w	sp, sp, #9984 <-- stack adjust matches prolog, but stack
	                                              is off by 16 bytes if above throws
	addne	sp, #48
	popne.w	{r8, r10, r11}
	popne	{r4, r5, r6, r7, pc}

Epilog with hack (commit 91a4420)
	itttt	ne
	subne.w	r4, r7, #24   <-- stack set from frame pointer (r7)
	movne	sp, r4
	popne.w	{r8, r10, r11}
	popne	{r4, r5, r6, r7, pc}

-- 
Dan