Jump to page: 1 2
Thread overview
Openwrt Linux Uclibc ARM GC issue
Dec 15, 2017
Radu
Dec 15, 2017
David Nadlinger
Dec 17, 2017
Radu
Dec 17, 2017
Joakim
Dec 18, 2017
Suliman
Jan 10, 2018
Radu
Jan 10, 2018
David Nadlinger
Jan 10, 2018
Radu
Jan 10, 2018
Joakim
Jan 14, 2018
Radu
Jan 15, 2018
Joakim
Jan 15, 2018
David Nadlinger
Dec 16, 2017
Joakim
Dec 17, 2017
Radu
December 15, 2017
Trying to run some D code on Openwrt with Uclibc and got stuck by broken GC.

Using LDC 1.6
====================================
LDC - the LLVM D compiler (1.6.0):
  based on DMD v2.076.1 and LLVM 5.0.0
  built with LDC - the LLVM D compiler (1.6.0)
  Default target: x86_64-unknown-linux-gnu
  Host CPU: broadwell
  http://dlang.org - http://wiki.dlang.org/LDC

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_be - AArch64 (big endian)
    arm        - ARM
    arm64      - ARM64 (little endian)
    armeb      - ARM (big endian)
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    ppc32      - PowerPC 32
    ppc64      - PowerPC 64
    ppc64le    - PowerPC 64 LE
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
====================================

Run time libs where compiled with:

====================================
ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF
====================================


The minimal program is:

++++++++++++++++++++
import core.memory;

void main()
{
  GC.collect();
}
++++++++++++++++++++

Compiled with `ldc2 -mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -gcc=arm-openwrt-linux-gcc`

When run, I get this error spuriously:

====================================
core.exception.AssertError@rt/sections_elf_shared.d(116): Assertion failure
Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9
Aborted (core dumped)
====================================


GDB on the coredump:
====================================
(gdb) bt
#0  _dl_setup_progname (argv0=<optimized out>) at ldso/ldso/ldso.c:418
#1  0xb6f55e34 in map_writeable (libaddr=<optimized out>, flags=<optimized out>, piclib=-1225360472, ppnt=0x21, infile=<optimized out>) at ldso/ldso/dl-elf.c:442
#2  _dl_load_elf_shared_library (rflags=<optimized out>, rpnt=0xbeea5d9c, libname=0x0) at ldso/ldso/dl-elf.c:703
#3  0x0001c718 in _d_dso_registry ()
#4  0x00016b14 in ldc.register_dso ()
#5  0x00016b4c in ldc.dso_ctor.4test ()
#6  0xb6f54548 in __GI__dl_tls_setup () at ldso/ldso/dl-tls.c:451
#7  0xb6ea79e4 in _pthread_cleanup_pop_restore (buffer=<optimized out>, execute=<optimized out>) at libpthread/nptl/forward.c:152
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
====================================

Any idea what might be wrong?

December 15, 2017
On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
> When run, I get this error spuriously:
>
> ====================================
> core.exception.AssertError@rt/sections_elf_shared.d(116): Assertion failure
> Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9
> Aborted (core dumped)
> ====================================

The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.)

Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC.

Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds.

 — David
December 16, 2017
On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:
> Trying to run some D code on Openwrt with Uclibc and got stuck by broken GC.
>
> Using LDC 1.6
> ====================================
> LDC - the LLVM D compiler (1.6.0):
>   based on DMD v2.076.1 and LLVM 5.0.0
>   built with LDC - the LLVM D compiler (1.6.0)
>   Default target: x86_64-unknown-linux-gnu
>   Host CPU: broadwell
>   http://dlang.org - http://wiki.dlang.org/LDC
>
>   Registered Targets:
>     aarch64    - AArch64 (little endian)
>     aarch64_be - AArch64 (big endian)
>     arm        - ARM
>     arm64      - ARM64 (little endian)
>     armeb      - ARM (big endian)
>     nvptx      - NVIDIA PTX 32-bit
>     nvptx64    - NVIDIA PTX 64-bit
>     ppc32      - PowerPC 32
>     ppc64      - PowerPC 64
>     ppc64le    - PowerPC 64 LE
>     thumb      - Thumb
>     thumbeb    - Thumb (big endian)
>     x86        - 32-bit X86: Pentium-Pro and above
>     x86-64     - 64-bit X86: EM64T and AMD64
> ====================================
>
> Run time libs where compiled with:
>
> ====================================
> ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF
> ====================================

First thing I'd do is build and run the test runners, then make sure no tests are failing, particularly in druntime.  Another thing I notice is that you don't separate many of those C and D flags with semi-colons: not sure how that worked for you, as I get errors if I try something similar.  Also, you need to specify the C cross-compiler with CC=arm-openwrt-linux-gcc before running ldc-build-runtime: maybe you did that but forgot to mention it.

It is fairly easy to cross-compile the test runners too if you pass the --testrunners flag, see the instructions for the RPi and Android for examples:

https://wiki.dlang.org/Building_LDC_runtime_libraries
https://wiki.dlang.org/Build_D_for_Android

You may need to make some modifications to druntime or Phobos to get everything to compile, and you may have to specify some linker flags too, to get the test runners to link.  Let us know how it works out.

While you could reuse most of the glibc declarations for now, you may eventually need to patch druntime for Uclibc, as was done before for Bionic and the NetBSD libc for example:

https://github.com/dlang/druntime/pull/734
https://github.com/dlang/druntime/pull/1494
December 17, 2017
On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger wrote:
> On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
>> When run, I get this error spuriously:
>>
>> ====================================
>> core.exception.AssertError@rt/sections_elf_shared.d(116): Assertion failure
>> Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9
>> Aborted (core dumped)
>> ====================================
>
> The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.)
>
> Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC.
>
> Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds.
>
>  — David

My various attempts on getting it to run behaved very erratic.
So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side.

My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33

With the compiler switches changed I could run my test program and try the druntime test runner (albeit with some changes on math and stdio to get it linking):

./druntime-test-runner
0.000s PASS release32 core.atomic
0.000s PASS release32 core.bitop
0.000s PASS release32 core.checkedint
0.005s PASS release32 core.demangle
0.000s PASS release32 core.exception
0.002s PASS release32 core.internal.arrayop
0.000s PASS release32 core.internal.convert
0.000s PASS release32 core.internal.hash
0.000s PASS release32 core.internal.string
0.000s PASS release32 core.math
0.000s PASS release32 core.memory
0.002s PASS release32 core.sync.barrier
0.015s PASS release32 core.sync.condition
0.000s PASS release32 core.sync.config
0.016s PASS release32 core.sync.mutex
0.016s PASS release32 core.sync.rwmutex
0.002s PASS release32 core.sync.semaphore
Segmentation fault (core dumped)

The seg fault is from core.thread:1351

unittest
{
    auto t1 = new Thread({
        foreach (_; 0 .. 20)
            Thread.getAll;
    }).start;
    auto t2 = new Thread({
        foreach (_; 0 .. 20)
            GC.collect; // this seg faults
    }).start;
    t1.join();
    t2.join();
}

Calling GC.collect from the main thread doesn't seg fault.

Core dump is not very helpful - stack is garbage, but running with gdbserver a minimal program with the unit test I can see this:

Thread 1 "test" received signal SIGUSR1, User defined signal 1.
pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at libpthread/nptl/pthread_getattr_np.c:47
47        iattr->schedpolicy = thread->schedpolicy;
(gdb) step

Thread 1 "test" received signal SIGUSR2, User defined signal 2.
0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, maxevents=2, timeout=-1224756080) at libc/sysdeps/linux/common/epoll.c:58
58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct epoll_event *events, int maxevents, int timeout),
(gdb) step

Thread 1 "test" received signal SIGSEGV, Segmentation fault.
0xfffffffc in ?? ()
(gdb)


December 17, 2017
On Saturday, 16 December 2017 at 14:14:40 UTC, Joakim wrote:
> On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:
>> Trying to run some D code on Openwrt with Uclibc and got stuck by broken GC.
>>
>> Using LDC 1.6
>> ====================================
>> LDC - the LLVM D compiler (1.6.0):
>>   based on DMD v2.076.1 and LLVM 5.0.0
>>   built with LDC - the LLVM D compiler (1.6.0)
>>   Default target: x86_64-unknown-linux-gnu
>>   Host CPU: broadwell
>>   http://dlang.org - http://wiki.dlang.org/LDC
>>
>>   Registered Targets:
>>     aarch64    - AArch64 (little endian)
>>     aarch64_be - AArch64 (big endian)
>>     arm        - ARM
>>     arm64      - ARM64 (little endian)
>>     armeb      - ARM (big endian)
>>     nvptx      - NVIDIA PTX 32-bit
>>     nvptx64    - NVIDIA PTX 64-bit
>>     ppc32      - PowerPC 32
>>     ppc64      - PowerPC 64
>>     ppc64le    - PowerPC 64 LE
>>     thumb      - Thumb
>>     thumbeb    - Thumb (big endian)
>>     x86        - 32-bit X86: Pentium-Pro and above
>>     x86-64     - 64-bit X86: EM64T and AMD64
>> ====================================
>>
>> Run time libs where compiled with:
>>
>> ====================================
>> ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF
>> ====================================
>
> First thing I'd do is build and run the test runners, then make sure no tests are failing, particularly in druntime.  Another thing I notice is that you don't separate many of those C and D flags with semi-colons: not sure how that worked for you, as I get errors if I try something similar.  Also, you need to specify the C cross-compiler with CC=arm-openwrt-linux-gcc before running ldc-build-runtime: maybe you did that but forgot to mention it.
>
> It is fairly easy to cross-compile the test runners too if you pass the --testrunners flag, see the instructions for the RPi and Android for examples:
>
> https://wiki.dlang.org/Building_LDC_runtime_libraries
> https://wiki.dlang.org/Build_D_for_Android
>
> You may need to make some modifications to druntime or Phobos to get everything to compile, and you may have to specify some linker flags too, to get the test runners to link.  Let us know how it works out.
>
> While you could reuse most of the glibc declarations for now, you may eventually need to patch druntime for Uclibc, as was done before for Bionic and the NetBSD libc for example:
>
> https://github.com/dlang/druntime/pull/734
> https://github.com/dlang/druntime/pull/1494

Test runners where out of the question as no program started. See my reply to David.
Yeah I setup the CC correctly, but curiously specifying a more fitting platform triple and -march on GCC produced non working binaries, I had to revert to the defaults.

Yes - latest LDC versions make cross compiling a breeze so kudos to you guys for making this happening. I'm using Linux subsystem for Window btw. so for me this is even more fun as I can work on both environments natively :)

The modifications need it surface deep are very few - some math and memory streams functions are missing.

The road block looks to be somewhere in the GC and TLS, or the interaction of them (at least this is my feeling ATM)
December 17, 2017
On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:
> On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger wrote:
>> On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
>>> When run, I get this error spuriously:
>>>
>>> ====================================
>>> core.exception.AssertError@rt/sections_elf_shared.d(116): Assertion failure
>>> Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9
>>> Aborted (core dumped)
>>> ====================================
>>
>> The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.)
>>
>> Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC.
>>
>> Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds.
>>
>>  — David
>
> My various attempts on getting it to run behaved very erratic.
> So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side.
>
> My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33

I believe that triple defaults to ARMv5, are you sure your Openwrt kernel is built for ARMv7?  Try running uname -m on the device to check.  For example, most low- to mid-level smartphones these days ship with ARMv8 chips but the kernel is only built for 32-bit ARMv7, so they can only run 32-bit apps.

> With the compiler switches changed I could run my test program and try the druntime test runner (albeit with some changes on math and stdio to get it linking):
>
> ./druntime-test-runner
> 0.000s PASS release32 core.atomic
> 0.000s PASS release32 core.bitop
> 0.000s PASS release32 core.checkedint
> 0.005s PASS release32 core.demangle
> 0.000s PASS release32 core.exception
> 0.002s PASS release32 core.internal.arrayop
> 0.000s PASS release32 core.internal.convert
> 0.000s PASS release32 core.internal.hash
> 0.000s PASS release32 core.internal.string
> 0.000s PASS release32 core.math
> 0.000s PASS release32 core.memory
> 0.002s PASS release32 core.sync.barrier
> 0.015s PASS release32 core.sync.condition
> 0.000s PASS release32 core.sync.config
> 0.016s PASS release32 core.sync.mutex
> 0.016s PASS release32 core.sync.rwmutex
> 0.002s PASS release32 core.sync.semaphore
> Segmentation fault (core dumped)
>
> The seg fault is from core.thread:1351
>
> unittest
> {
>     auto t1 = new Thread({
>         foreach (_; 0 .. 20)
>             Thread.getAll;
>     }).start;
>     auto t2 = new Thread({
>         foreach (_; 0 .. 20)
>             GC.collect; // this seg faults
>     }).start;
>     t1.join();
>     t2.join();
> }
>
> Calling GC.collect from the main thread doesn't seg fault.

Try running core.thread alone and see if it makes a difference, ./druntime-test-runner core.thread, as I've sometimes seen tested modules interfere with each other.  I see that there are a few places where Glibc is assumed in core.thread, make sure those are right on Uclibc too:

https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301
https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410

You can also try skipping those tests that segfault for now and make sure everything else works, by adding something like version(skip) before that failing unittest block, so you know the extent of the test problems.

> Core dump is not very helpful - stack is garbage, but running with gdbserver a minimal program with the unit test I can see this:
>
> Thread 1 "test" received signal SIGUSR1, User defined signal 1.
> pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at libpthread/nptl/pthread_getattr_np.c:47
> 47        iattr->schedpolicy = thread->schedpolicy;
> (gdb) step
>
> Thread 1 "test" received signal SIGUSR2, User defined signal 2.
> 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, maxevents=2, timeout=-1224756080) at libc/sysdeps/linux/common/epoll.c:58
> 58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct epoll_event *events, int maxevents, int timeout),
> (gdb) step
>
> Thread 1 "test" received signal SIGSEGV, Segmentation fault.
> 0xfffffffc in ?? ()
> (gdb)

The SIGUSR1/SIGUSR2 signals mean the GC ran fine.  You'd need to delve more into the code and the implementation details mentioned above to track this down.

On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:
> Yes - latest LDC versions make cross compiling a breeze so kudos to you guys for making this happening. I'm using Linux subsystem for Window btw. so for me this is even more fun as I can work on both environments natively :)

Yeah, you could just use the Windows ldc too, assuming you have a cross-compiler from that OS, as shown on the wiki for Windows with the Android NDK.

> The modifications need it surface deep are very few - some math and memory streams functions are missing.

I don't know how much it differs from Glibc, but we'd always be interested in a port, assuming you have the time to submit a pull like this recent one for Musl:

https://github.com/dlang/druntime/pull/1997

> The road block looks to be somewhere in the GC and TLS, or the interaction of them (at least this is my feeling ATM)

Not being able to do an explicit collect there isn't that big a deal: I'd skip that test for now and run everything else, then come back to that one once you have an idea of the bigger picture.
December 18, 2017
offtop: there is another interesing lib: https://uclibc-ng.org/
January 10, 2018
On Sunday, 17 December 2017 at 19:05:04 UTC, Joakim wrote:
> On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:
>> On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger wrote:
>>> On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
>>>> When run, I get this error spuriously:
>>>>
>>>> ====================================
>>>> core.exception.AssertError@rt/sections_elf_shared.d(116): Assertion failure
>>>> Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9
>>>> Aborted (core dumped)
>>>> ====================================
>>>
>>> The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.)
>>>
>>> Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC.
>>>
>>> Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds.
>>>
>>>  — David
>>
>> My various attempts on getting it to run behaved very erratic.
>> So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side.
>>
>> My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33
>
> I believe that triple defaults to ARMv5, are you sure your Openwrt kernel is built for ARMv7?  Try running uname -m on the device to check.  For example, most low- to mid-level smartphones these days ship with ARMv8 chips but the kernel is only built for 32-bit ARMv7, so they can only run 32-bit apps.
>
>> With the compiler switches changed I could run my test program and try the druntime test runner (albeit with some changes on math and stdio to get it linking):
>>
>> ./druntime-test-runner
>> 0.000s PASS release32 core.atomic
>> 0.000s PASS release32 core.bitop
>> 0.000s PASS release32 core.checkedint
>> 0.005s PASS release32 core.demangle
>> 0.000s PASS release32 core.exception
>> 0.002s PASS release32 core.internal.arrayop
>> 0.000s PASS release32 core.internal.convert
>> 0.000s PASS release32 core.internal.hash
>> 0.000s PASS release32 core.internal.string
>> 0.000s PASS release32 core.math
>> 0.000s PASS release32 core.memory
>> 0.002s PASS release32 core.sync.barrier
>> 0.015s PASS release32 core.sync.condition
>> 0.000s PASS release32 core.sync.config
>> 0.016s PASS release32 core.sync.mutex
>> 0.016s PASS release32 core.sync.rwmutex
>> 0.002s PASS release32 core.sync.semaphore
>> Segmentation fault (core dumped)
>>
>> The seg fault is from core.thread:1351
>>
>> unittest
>> {
>>     auto t1 = new Thread({
>>         foreach (_; 0 .. 20)
>>             Thread.getAll;
>>     }).start;
>>     auto t2 = new Thread({
>>         foreach (_; 0 .. 20)
>>             GC.collect; // this seg faults
>>     }).start;
>>     t1.join();
>>     t2.join();
>> }
>>
>> Calling GC.collect from the main thread doesn't seg fault.
>
> Try running core.thread alone and see if it makes a difference, ./druntime-test-runner core.thread, as I've sometimes seen tested modules interfere with each other.  I see that there are a few places where Glibc is assumed in core.thread, make sure those are right on Uclibc too:
>
> https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301
> https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410
>
> You can also try skipping those tests that segfault for now and make sure everything else works, by adding something like version(skip) before that failing unittest block, so you know the extent of the test problems.
>
>> Core dump is not very helpful - stack is garbage, but running with gdbserver a minimal program with the unit test I can see this:
>>
>> Thread 1 "test" received signal SIGUSR1, User defined signal 1.
>> pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at libpthread/nptl/pthread_getattr_np.c:47
>> 47        iattr->schedpolicy = thread->schedpolicy;
>> (gdb) step
>>
>> Thread 1 "test" received signal SIGUSR2, User defined signal 2.
>> 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, maxevents=2, timeout=-1224756080) at libc/sysdeps/linux/common/epoll.c:58
>> 58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct epoll_event *events, int maxevents, int timeout),
>> (gdb) step
>>
>> Thread 1 "test" received signal SIGSEGV, Segmentation fault.
>> 0xfffffffc in ?? ()
>> (gdb)
>
> The SIGUSR1/SIGUSR2 signals mean the GC ran fine.  You'd need to delve more into the code and the implementation details mentioned above to track this down.
>
> On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:
>> Yes - latest LDC versions make cross compiling a breeze so kudos to you guys for making this happening. I'm using Linux subsystem for Window btw. so for me this is even more fun as I can work on both environments natively :)
>
> Yeah, you could just use the Windows ldc too, assuming you have a cross-compiler from that OS, as shown on the wiki for Windows with the Android NDK.
>
>> The modifications need it surface deep are very few - some math and memory streams functions are missing.
>
> I don't know how much it differs from Glibc, but we'd always be interested in a port, assuming you have the time to submit a pull like this recent one for Musl:
>
> https://github.com/dlang/druntime/pull/1997
>
>> The road block looks to be somewhere in the GC and TLS, or the interaction of them (at least this is my feeling ATM)
>
> Not being able to do an explicit collect there isn't that big a deal: I'd skip that test for now and run everything else, then come back to that one once you have an idea of the bigger picture.

Got some time to work on this - just to clarify I'm developing against uClibc-ng 1.0.9, noticed others suggesting this and wanted to make it clear.

Re. the architecture - it is an armv7a as 'uname -a' says:
'Linux fs 3.4.39 #249 SMP PREEMPT Wed Oct 4 12:07:05 MYT 2017 armv7l GNU/Linux'

I could not produce any working binary by specifying the armv7a architecture to ldc, so I used the generic arm architecture for gnueabihf, as previously stated.

I managed to get the druntime tester running (minus some math functions and memstream) except for one specific blocking issue - Thread.suspend does not work, it produces a segfault.
To test this I commented out all suspendAll/resumeAll unittests from core.thread and stubbed out GC.collect().

This issue is not linked to the GC, as the segfault happens even when disabling the GC.collect function and enable the suspendAll/resumeAll unittests, the GC just happens to use the suspend mechanics and exposes the core issue.

From what I can see in gdb 'thread_resumeHandler' is to blame, it looks like 'sem_post( &suspendCount )' will immediately trigger the resumeSignal and the call for 'sigsuspend( &sigres )' is never made.

Like:

464                     status = sem_post( &suspendCount );
(gdb) n

Thread 2 "druntime-test-r" received signal SIGUSR2, User defined signal 2.
0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464
464                     status = sem_post( &suspendCount );
(gdb) info threads
  Id   Target Id         Frame
  1    Thread 16005.16005 "druntime-test-r" 0x001ba7a0 in _D4core6thread5Fiber5stateMxFNaNbNdNiNfZEQBnQBlQBh5State (this=0xb6d34980) at thread.d:4533
* 2    Thread 16005.16273 "druntime-test-r" 0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464
(gdb) bt
#0  0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464
#1  0x001b483c in core.thread.callWithStackShell(scope void(void*) nothrow delegate) (fn=...) at thread.d:2600
#2  0x001b45f8 in thread_suspendHandler (sig=10) at thread.d:487
#3  0xfffffffe in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) n

Thread 2 "druntime-test-r" received signal SIGSEGV, Segmentation fault.
0xfffffffc in ?? ()
(gdb) bt
#0  0xfffffffc in ?? ()
#1  0xfffffffe in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)



January 10, 2018
On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:
> From what I can see in gdb 'thread_resumeHandler' is to blame, it looks like 'sem_post( &suspendCount )' will immediately trigger the resumeSignal and the call for 'sigsuspend( &sigres )' is never made.

You mean thread_suspendHandler? Perhaps single-stepping through the code and having a look where the stack is corrupted would yield some insight? Is there possibly some ABI incompatibility caused by callWithStackShell?

sem_post shouldn't cause anything to happen on the calling thread itself; and it is explicitly documented to be re-entrant w.r.t. signals.

 —David
January 10, 2018
On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger wrote:
> On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:
>> From what I can see in gdb 'thread_resumeHandler' is to blame, it looks like 'sem_post( &suspendCount )' will immediately trigger the resumeSignal and the call for 'sigsuspend( &sigres )' is never made.
>
> You mean thread_suspendHandler? Perhaps single-stepping through the code and having a look where the stack is corrupted would yield some insight? Is there possibly some ABI incompatibility caused by callWithStackShell?
>
> sem_post shouldn't cause anything to happen on the calling thread itself; and it is explicitly documented to be re-entrant w.r.t. signals.
>
>  —David

David, indeed sem_post works correctly, I guess gdb interpreted the sequence in the wrong order.

Moving the break point to the thread_resumeHandler I can see that the handler gets called, but I think you are right about the ABI, observe:

Thread 2 "druntime-test-r" received signal SIGUSR2, User defined signal 2.
0xb6e88648 in ?? () from target:/lib/libc.so.1
(gdb) bt
#0  0xb6e88648 in ?? () from target:/lib/libc.so.1
#1  0xb6e50dd0 in sigsuspend () from target:/lib/libc.so.1
#2  0x001b46e8 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:467
#3  0x001b483c in core.thread.callWithStackShell(scope void(void*) nothrow delegate) (fn=...) at thread.d:2600
#4  0x001b45f8 in thread_suspendHandler (sig=10) at thread.d:487
#5  0xfffffffe in ?? ()
(gdb) c
Thread 2 "druntime-test-r" hit Breakpoint 1, thread_resumeHandler (sig=12) at thread.d:494
warning: Source file is more recent than executable.
494                 assert( sig == resumeSignalNumber );
(gdb) i f
Stack level 0, frame at 0xb572f4d8:
 pc = 0x1b487c in thread_resumeHandler (thread.d:494); saved pc = 0xfffffffe
 called by frame at 0xb572f4d8
 source language d.
 Arglist at 0xb572f4c8, args: sig=12
 Locals at 0xb572f4c8, Previous frame's sp is 0xb572f4d8
 Saved registers:
  r11 at 0xb572f4d0, lr at 0xb572f4d4
.......
(gdb) disas
(gdb) disas
Dump of assembler code for function thread_resumeHandler:
   0x001b4864 <+0>:     push    {r11, lr}
   0x001b4868 <+4>:     mov     r11, sp
   0x001b486c <+8>:     sub     sp, sp, #8
   0x001b4870 <+12>:    ldr     r1, [pc, #52]   ; 0x1b48ac <thread_resumeHandler+72>
   0x001b4874 <+16>:    ldr     r1, [pc, r1]
   0x001b4878 <+20>:    str     r0, [sp, #4]
   0x001b487c <+24>:    ldr     r0, [sp, #4]
   0x001b4880 <+28>:    ldr     r1, [r1]
   0x001b4884 <+32>:    cmp     r0, r1
   0x001b4888 <+36>:    bne     0x1b4894 <thread_resumeHandler+48>
   0x001b488c <+40>:    mov     sp, r11
=> 0x001b4890 <+44>:    pop     {r11, pc}
   0x001b4894 <+48>:    ldr     r0, [pc, #20]   ; 0x1b48b0 <thread_resumeHandler+76>
   0x001b4898 <+52>:    add     r1, pc, r0
   0x001b489c <+56>:    mov     r0, #13
   0x001b48a0 <+60>:    mov     r2, #238        ; 0xee
   0x001b48a4 <+64>:    orr     r2, r2, #256    ; 0x100
   0x001b48a8 <+68>:    bl      0xf00c8 <_d_assert>
   0x001b48ac <+72>:    mulseq  r4, r8, r5
   0x001b48b0 <+76>:                    ; <UNDEFINED> instruction: 0x00117bd1

(gdb) ni
0x001b4890 in thread_resumeHandler (sig=-2) at thread.d:499
499             }
Warning:
Cannot insert breakpoint 0.
Cannot access memory at address 0xfffffffe

It looks that PC is invalid causing the segmentation fault.

« First   ‹ Prev
1 2