Jump to page: 1 2
Thread overview
Re: Illegal Instruction
Nov 07, 2017
David Nadlinger
Nov 07, 2017
Russel Winder
Nov 07, 2017
David Nadlinger
Nov 07, 2017
Russel Winder
Nov 07, 2017
kinke
Nov 07, 2017
David Nadlinger
Dec 12, 2017
Bottled Gin
Dec 23, 2017
David Nadlinger
Mar 24, 2018
Russel Winder
Mar 24, 2018
David Nadlinger
Mar 25, 2018
Russel Winder
Mar 25, 2018
David Nadlinger
Mar 26, 2018
Russel Winder
November 07, 2017
On 7 Nov 2017, at 16:03, Russel Winder via digitalmars-d-ldc wrote:
> Using the LDC2 compiler up to date on Debian Sid:
>
> LDC - the LLVM D compiler (1.4.0):
>   based on DMD v2.074.1 and LLVM 5.0.0
>   built with LDC - the LLVM D compiler (0.17.5)
>   Default target: x86_64-pc-linux-gnu
>
> in debug mode I get a program that runs (albeit the thread messaging
> fails to work), whereas if I use release mode I get an Illegal
> Instruction. I am guessing this is an LDC2 problem?

What is the failing instruction (use gdb)? How can we reproduce the issue?

SIGILL can be due to hitting an assert(0) or similarly unreachable code (LLVM emits ud2 as a trap instruction), or due to genuine instructions not supported on the target (e.g. AVX2, etc.).

 — David
November 07, 2017
David,

> What is the failing instruction (use gdb)? How can we reproduce the issue?

To date gdb has failed me, but this is most certainly because I am effectively a newbie at using gdb. What I can show is:

(gdb) r
Starting program: /home/users/russel/BuildArea/Me-TV_D/me-tv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe7c15700 (LWP 22993)]
[New Thread 0x7fffe7414700 (LWP 22994)]
[New Thread 0x7fffe50f4700 (LWP 22995)]
[New Thread 0x7fffe48f3700 (LWP 22996)]
[New Thread 0x7fffd7fff700 (LWP 22997)]
Control window daemon going into receive.

Thread 4 "me-tv" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fffe50f4700 (LWP 22995)]
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  0x00007ffff54b315f in pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff595fc90 in core.sync.condition.Condition.wait() () at /usr/lib/x86_64-linux-gnu/libdruntime-ldc.so.74
#2  0x00005555555ee9bf in std.concurrency.MessageBox.get!(void(frontend_manager.FrontendAppeared) @safe function, void(frontend_manager.AdapterDisappeared) @safe function, void(frontend_manager.FrontendDisappeared) @safe function, void(std.concurrency.OwnerTerminated) @safe delegate).get(scope void(frontend_manager.FrontendAppeared) @safe function, scope void(frontend_manager.AdapterDisappeared) @safe function, scope void(frontend_manager.FrontendDisappeared) @safe function, scope void(std.concurrency.OwnerTerminated) @safe delegate) ()
#3  0x00005555555ed529 in control_window.runControlWindowDaemon() ()
#4  0x00007ffff596199a in thread_entryPoint () at /usr/lib/x86_64-linux-gnu/libdruntime-ldc.so.74
#5  0x00007ffff54ad494 in start_thread (arg=0x7fffe50f4700) at pthread_create.c:333
#6  0x00007ffff4eedabf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
(gdb)

The problem of reproducibility has been challenging me for a day as what's the point of a bug report without a bug to be shown. I guess this is why I arrived here first. I have no clue what to drag out of my code as a small exemplar of the problem, and I am sure no-one wants to work with my code.

> SIGILL can be due to hitting an assert(0) or similarly unreachable
> code
> (LLVM emits ud2 as a trap instruction), or due to genuine
> instructions
> not supported on the target (e.g. AVX2, etc.).

In case of doubt about it being an illegal instruction, which it may not be but, when run out of gdb:

Control window daemon going into receive.
/dev/dvb/adapter0/frontend0 being added.
Illegal instruction

I think what I am really asking here is to be given guidance providing the data that will be helpful to people.

-- 
Russel.
==========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


November 07, 2017
Hi Russel,

On 7 Nov 2017, at 16:54, Russel Winder wrote:
> To date gdb has failed me, but this is most certainly because I am
> effectively a newbie at using gdb.

Running "disas(semble)" once you hit the illegal instruction should print the code surrounding the instruction pointer in disassembled form. Seeing which instruction fails and where it is would be an inroads towards reducing/tracking down the issue.

The output you show has gdb stopping on SIGUSR1, which is (was?) used by the GC internally to synchronise between threads. You might need to run "handle SIGUSR1 noprint nostop" to avoid having to manually continue each time until you actually hit the illegal instruction.

GDB should automatically switch to the faulting thread, but if it doesn't, "info threads" to display a list of all threads and "thread <n>" to switch between them might be helpful.

Best,
David
November 07, 2017
David,

> Running "disas(semble)" once you hit the illegal instruction should
> print the code surrounding the instruction pointer in disassembled
> form.
> Seeing which instruction fails and where it is would be an inroads
> towards reducing/tracking down the issue.

I think I may now have a clue for you.

> The output you show has gdb stopping on SIGUSR1, which is (was?) used
> by
> the GC internally to synchronise between threads. You might need to
> run
> "handle SIGUSR1 noprint nostop" to avoid having to manually continue
> each time until you actually hit the illegal instruction.

Eminently successful, :-)

> GDB should automatically switch to the faulting thread, but if it doesn't, "info threads" to display a list of all threads and "thread <n>" to switch between them might be helpful.

(gdb) info threads
  Id   Target Id         Frame
  1    Thread 0x7ffff7fb2700 (LWP 27525) "me-tv" 0x00007ffff4ee466d in poll () at ../sysdeps/unix/syscall-template.S:84
  2    Thread 0x7fffe7c15700 (LWP 27529) "gmain" 0x00007ffff4ee466d in poll () at ../sysdeps/unix/syscall-template.S:84
  3    Thread 0x7fffe7414700 (LWP 27530) "gdbus" 0x00007ffff4ee466d in poll () at ../sysdeps/unix/syscall-template.S:84
* 4    Thread 0x7fffe50f4700 (LWP 27531) "me-tv" 0x0000555555600c9a in std.variant.VariantN!(32uL).VariantN.handler!(frontend_manager.FrontendAppeared).handler(std.variant.VariantN!(32uL).VariantN.OpID, ubyte[32]*, void*) ()
  5    Thread 0x7fffe48f3700 (LWP 27532) "me-tv" __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  6    Thread 0x7fffd7fff700 (LWP 27533) "me-tv" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:80

So it seems to have switched as expected since:

Thread 4 "me-tv" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fffe50f4700 (LWP 27531)]
0x0000555555600c9a in std.variant.VariantN!(32uL).VariantN.handler!(frontend_manager.FrontendAppeared).handler(std.variant.VariantN!(32uL).VariantN.OpID, ubyte[32]*, void*) ()

I wont put the full disassembly here just now, in case the answer is here in this fragment.:


(gdb) disas
Dump of assembler code for function _D3std7variant18__T8VariantNVmi32Z8VariantN50__T7handlerTS16frontend_manager16FrontendAppearedZ7handlerFE3std7variant18__T8VariantNVmi32Z8VariantN4OpIDPG32hPvZl:
   0x0000555555600800 <+0>:	push   %rbp
   0x0000555555600801 <+1>:	push   %r15
   0x0000555555600803 <+3>:	push   %r14
   0x0000555555600805 <+5>:	push   %r13
   0x0000555555600807 <+7>:	push   %r12
   0x0000555555600809 <+9>:	push   %rbx
   0x000055555560080a <+10>:	sub    $0xb8,%rsp
   0x0000555555600811 <+17>:	mov    %edx,%ebp
   0x0000555555600813 <+19>:	mov    %rsi,%r14
   0x0000555555600816 <+22>:	mov    %rdi,%r12

…

   0x0000555555600c6c <+1132>:	movups %xmm3,0x40(%rax)
   0x0000555555600c70 <+1136>:	movups %xmm2,0x30(%rax)
   0x0000555555600c74 <+1140>:	movups %xmm1,0x20(%rax)
   0x0000555555600c78 <+1144>:	movups %xmm0,0x10(%rax)
   0x0000555555600c7c <+1148>:	mov    0x23c145(%rip),%rsi        # 0x55555583cdc8
   0x0000555555600c83 <+1155>:	lea    0x23ce06(%rip),%rdx        # 0x55555583da90 <_D46TypeInfo_S16frontend_manager16FrontendAppeared6__initZ>
   0x0000555555600c8a <+1162>:	mov    %rax,%rdi
   0x0000555555600c8d <+1165>:	callq  0x5555555ec440 <_D3std7variant16VariantException6__ctorMFC8TypeInfoC8TypeInfoZC3std7variant16VariantException@plt>
   0x0000555555600c92 <+1170>:	mov    %rax,%rdi
   0x0000555555600c95 <+1173>:	callq  0x5555555ec180 <_d_throw_exception@plt>
=> 0x0000555555600c9a <+1178>:	ud2
   0x0000555555600c9c <+1180>:	mov    %rax,%rbx
   0x0000555555600c9f <+1183>:	xor    %edi,%edi
   0x0000555555600ca1 <+1185>:	mov    $0xd,%edx
   0x0000555555600ca6 <+1190>:	lea    0x38(%rsp),%rsi
   0x0000555555600cab <+1195>:	callq  *0x30(%rsp)
   0x0000555555600caf <+1199>:	mov    %rbx,%rdi
   0x0000555555600cb2 <+1202>:	callq  0x5555555ec600 <_d_eh_resume_unwind@plt>
End of assembler dump.
(gdb)



-- 
Russel.
==========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


November 07, 2017
On Tuesday, 7 November 2017 at 18:06:44 UTC, Russel Winder wrote:
>    0x0000555555600c95 <+1173>:	callq  0x5555555ec180 <_d_throw_exception@plt>
> => 0x0000555555600c9a <+1178>:	ud2

I'm definitely no EH expert, but this looks as if Russel's hitting an unreachable right after calling _d_throw_exception(), where we certainly aren't supposed to resume from afterwards. So something bad must be happening during unwinding.

November 07, 2017
On 7 Nov 2017, at 18:06, Russel Winder wrote:
>    0x0000555555600c8d <+1165>:	callq  0x5555555ec440 <_D3std7variant16VariantException6__ctorMFC8TypeInfoC8TypeInfoZC3std7variant16VariantException@plt>
>    0x0000555555600c92 <+1170>:	mov    %rax,%rdi
>    0x0000555555600c95 <+1173>:	callq  0x5555555ec180 <_d_throw_exception@plt>
> => 0x0000555555600c9a <+1178>:	ud2
>    0x0000555555600c9c <+1180>:	mov    %rax,%rbx

Interesting… For all the world, this looks like _d_throw_exception in fact returned, which it never should (we put an unreachable instruction there, which probably gets translated into ud2).

You could try using GCC's reverse debugging support to step backwards through the code to figure out whether that's really the case, and if so, where unwinding goes wrong.

 — David

December 12, 2017
A simple testcase:

// test.d
void foo() {}


// main.d
import test;
void main() {
  foo();
}


$ ldmd2 -fPIC -g -shared test.d
$ ldmd2 main.d -g -L-ltest -L-L/tmp -L-R/tmp
$ ./main
Illegal instruction (core dumped)

$ gdb ./main
(gdb) bt
#0  0x00007ffff7b73158 in _d_dso_registry () from /tmp/libtest.so
#1  0x0000000000400822 in ldc.register_dso ()
#2  0x0000000000400842 in ldc.dso_ctor.4main ()
#3  0x00000000004008ed in __libc_csu_init ()
#4  0x00007ffff776a7bf in __libc_start_main (main=0x400870 <main>, argc=1,
    argv=0x7fffffffdf58, init=0x4008a0 <__libc_csu_init>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7fffffffdf48) at ../csu/libc-start.c:247
#5  0x00000000004006c9 in _start ()
(gdb) quit
A debugging session is active.

        Inferior 1 [process 8038] will be killed.

December 23, 2017
On Tuesday, 12 December 2017 at 13:55:45 UTC, Bottled Gin wrote:
> $ ldmd2 -fPIC -g -shared test.d
> $ ldmd2 main.d -g -L-ltest -L-L/tmp -L-R/tmp
> $ ./main
> Illegal instruction (core dumped)

Just for the record (I had responded on Gitter already), the problem here is trying to mix shared libraries and the static druntime build; https://github.com/ldc-developers/ldc/pull/2454 converts the assert(0) trap into a nice error message.

 — David
March 24, 2018
On Sat, 2017-12-23 at 20:29 +0000, David Nadlinger via digitalmars-d-ldc wrote:
> On Tuesday, 12 December 2017 at 13:55:45 UTC, Bottled Gin wrote:
> > $ ldmd2 -fPIC -g -shared test.d
> > $ ldmd2 main.d -g -L-ltest -L-L/tmp -L-R/tmp
> > $ ./main
> > Illegal instruction (core dumped)
> 
> Just for the record (I had responded on Gitter already), the problem here is trying to mix shared libraries and the static druntime build; https://github.com/ldc-developers/ldc/pull/2454 converts the assert(0) trap into a nice error message.
> 
>   — David

Apologies for the delay in getting back to this, long story but involves working with Rust for a while. <shock-horror/>

I just did a brand new rebuild on Debian Sid and this is still seeming to happen. Does anyone have a workaround that doesn't involve using dmd instead of ldc2?

-- 
Russel.
==========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


March 24, 2018
On Saturday, 24 March 2018 at 16:07:47 UTC, Russel Winder wrote:
> I just did a brand new rebuild on Debian Sid and this is still seeming to happen. Does anyone have a workaround that doesn't involve using dmd instead of ldc2?

Ubuntu 17.10 works as expected:

---
$ . ~/dlang/ldc-1.8.0/activate
$ cat > test.d
void foo() {}
^D
$ cat > main.d
import test;
void main() {
  foo();
}
^D
$ ldmd2 -fPIC -g -shared test.d
$ ldmd2 main.d -g -L-ltest -L-L. -L-R.
$ ./main
Aborting from rt/sections_elf_shared.d(477) Only one D shared object allowed for static runtime. Link with shared runtime via LDC switch '-link-defaultlib-shared'.
Aborted (core dumped)
$ ldmd2 main.d -g -L-ltest -L-L. -L-R. -link-defaultlib-shared
$ ./main
---

 — David
« First   ‹ Prev
1 2