Thread overview
Re: Illegal Instruction
Nov 07
kinke
November 07
On 7 Nov 2017, at 16:03, Russel Winder via digitalmars-d-ldc wrote:
> Using the LDC2 compiler up to date on Debian Sid:
>
> LDC - the LLVM D compiler (1.4.0):
>   based on DMD v2.074.1 and LLVM 5.0.0
>   built with LDC - the LLVM D compiler (0.17.5)
>   Default target: x86_64-pc-linux-gnu
>
> in debug mode I get a program that runs (albeit the thread messaging
> fails to work), whereas if I use release mode I get an Illegal
> Instruction. I am guessing this is an LDC2 problem?

What is the failing instruction (use gdb)? How can we reproduce the issue?

SIGILL can be due to hitting an assert(0) or similarly unreachable code (LLVM emits ud2 as a trap instruction), or due to genuine instructions not supported on the target (e.g. AVX2, etc.).

 — David
November 07
David,

> What is the failing instruction (use gdb)? How can we reproduce the issue?

To date gdb has failed me, but this is most certainly because I am effectively a newbie at using gdb. What I can show is:

(gdb) r
Starting program: /home/users/russel/BuildArea/Me-TV_D/me-tv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe7c15700 (LWP 22993)]
[New Thread 0x7fffe7414700 (LWP 22994)]
[New Thread 0x7fffe50f4700 (LWP 22995)]
[New Thread 0x7fffe48f3700 (LWP 22996)]
[New Thread 0x7fffd7fff700 (LWP 22997)]
Control window daemon going into receive.

Thread 4 "me-tv" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fffe50f4700 (LWP 22995)]
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  0x00007ffff54b315f in pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff595fc90 in core.sync.condition.Condition.wait() () at /usr/lib/x86_64-linux-gnu/libdruntime-ldc.so.74
#2  0x00005555555ee9bf in std.concurrency.MessageBox.get!(void(frontend_manager.FrontendAppeared) @safe function, void(frontend_manager.AdapterDisappeared) @safe function, void(frontend_manager.FrontendDisappeared) @safe function, void(std.concurrency.OwnerTerminated) @safe delegate).get(scope void(frontend_manager.FrontendAppeared) @safe function, scope void(frontend_manager.AdapterDisappeared) @safe function, scope void(frontend_manager.FrontendDisappeared) @safe function, scope void(std.concurrency.OwnerTerminated) @safe delegate) ()
#3  0x00005555555ed529 in control_window.runControlWindowDaemon() ()
#4  0x00007ffff596199a in thread_entryPoint () at /usr/lib/x86_64-linux-gnu/libdruntime-ldc.so.74
#5  0x00007ffff54ad494 in start_thread (arg=0x7fffe50f4700) at pthread_create.c:333
#6  0x00007ffff4eedabf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
(gdb)

The problem of reproducibility has been challenging me for a day as what's the point of a bug report without a bug to be shown. I guess this is why I arrived here first. I have no clue what to drag out of my code as a small exemplar of the problem, and I am sure no-one wants to work with my code.

> SIGILL can be due to hitting an assert(0) or similarly unreachable
> code
> (LLVM emits ud2 as a trap instruction), or due to genuine
> instructions
> not supported on the target (e.g. AVX2, etc.).

In case of doubt about it being an illegal instruction, which it may not be but, when run out of gdb:

Control window daemon going into receive.
/dev/dvb/adapter0/frontend0 being added.
Illegal instruction

I think what I am really asking here is to be given guidance providing the data that will be helpful to people.

-- 
Russel.
==========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


November 07
Hi Russel,

On 7 Nov 2017, at 16:54, Russel Winder wrote:
> To date gdb has failed me, but this is most certainly because I am
> effectively a newbie at using gdb.

Running "disas(semble)" once you hit the illegal instruction should print the code surrounding the instruction pointer in disassembled form. Seeing which instruction fails and where it is would be an inroads towards reducing/tracking down the issue.

The output you show has gdb stopping on SIGUSR1, which is (was?) used by the GC internally to synchronise between threads. You might need to run "handle SIGUSR1 noprint nostop" to avoid having to manually continue each time until you actually hit the illegal instruction.

GDB should automatically switch to the faulting thread, but if it doesn't, "info threads" to display a list of all threads and "thread <n>" to switch between them might be helpful.

Best,
David
November 07
David,

> Running "disas(semble)" once you hit the illegal instruction should
> print the code surrounding the instruction pointer in disassembled
> form.
> Seeing which instruction fails and where it is would be an inroads
> towards reducing/tracking down the issue.

I think I may now have a clue for you.

> The output you show has gdb stopping on SIGUSR1, which is (was?) used
> by
> the GC internally to synchronise between threads. You might need to
> run
> "handle SIGUSR1 noprint nostop" to avoid having to manually continue
> each time until you actually hit the illegal instruction.

Eminently successful, :-)

> GDB should automatically switch to the faulting thread, but if it doesn't, "info threads" to display a list of all threads and "thread <n>" to switch between them might be helpful.

(gdb) info threads
  Id   Target Id         Frame
  1    Thread 0x7ffff7fb2700 (LWP 27525) "me-tv" 0x00007ffff4ee466d in poll () at ../sysdeps/unix/syscall-template.S:84
  2    Thread 0x7fffe7c15700 (LWP 27529) "gmain" 0x00007ffff4ee466d in poll () at ../sysdeps/unix/syscall-template.S:84
  3    Thread 0x7fffe7414700 (LWP 27530) "gdbus" 0x00007ffff4ee466d in poll () at ../sysdeps/unix/syscall-template.S:84
* 4    Thread 0x7fffe50f4700 (LWP 27531) "me-tv" 0x0000555555600c9a in std.variant.VariantN!(32uL).VariantN.handler!(frontend_manager.FrontendAppeared).handler(std.variant.VariantN!(32uL).VariantN.OpID, ubyte[32]*, void*) ()
  5    Thread 0x7fffe48f3700 (LWP 27532) "me-tv" __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  6    Thread 0x7fffd7fff700 (LWP 27533) "me-tv" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:80

So it seems to have switched as expected since:

Thread 4 "me-tv" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fffe50f4700 (LWP 27531)]
0x0000555555600c9a in std.variant.VariantN!(32uL).VariantN.handler!(frontend_manager.FrontendAppeared).handler(std.variant.VariantN!(32uL).VariantN.OpID, ubyte[32]*, void*) ()

I wont put the full disassembly here just now, in case the answer is here in this fragment.:


(gdb) disas
Dump of assembler code for function _D3std7variant18__T8VariantNVmi32Z8VariantN50__T7handlerTS16frontend_manager16FrontendAppearedZ7handlerFE3std7variant18__T8VariantNVmi32Z8VariantN4OpIDPG32hPvZl:
   0x0000555555600800 <+0>:	push   %rbp
   0x0000555555600801 <+1>:	push   %r15
   0x0000555555600803 <+3>:	push   %r14
   0x0000555555600805 <+5>:	push   %r13
   0x0000555555600807 <+7>:	push   %r12
   0x0000555555600809 <+9>:	push   %rbx
   0x000055555560080a <+10>:	sub    $0xb8,%rsp
   0x0000555555600811 <+17>:	mov    %edx,%ebp
   0x0000555555600813 <+19>:	mov    %rsi,%r14
   0x0000555555600816 <+22>:	mov    %rdi,%r12

…

   0x0000555555600c6c <+1132>:	movups %xmm3,0x40(%rax)
   0x0000555555600c70 <+1136>:	movups %xmm2,0x30(%rax)
   0x0000555555600c74 <+1140>:	movups %xmm1,0x20(%rax)
   0x0000555555600c78 <+1144>:	movups %xmm0,0x10(%rax)
   0x0000555555600c7c <+1148>:	mov    0x23c145(%rip),%rsi        # 0x55555583cdc8
   0x0000555555600c83 <+1155>:	lea    0x23ce06(%rip),%rdx        # 0x55555583da90 <_D46TypeInfo_S16frontend_manager16FrontendAppeared6__initZ>
   0x0000555555600c8a <+1162>:	mov    %rax,%rdi
   0x0000555555600c8d <+1165>:	callq  0x5555555ec440 <_D3std7variant16VariantException6__ctorMFC8TypeInfoC8TypeInfoZC3std7variant16VariantException@plt>
   0x0000555555600c92 <+1170>:	mov    %rax,%rdi
   0x0000555555600c95 <+1173>:	callq  0x5555555ec180 <_d_throw_exception@plt>
=> 0x0000555555600c9a <+1178>:	ud2
   0x0000555555600c9c <+1180>:	mov    %rax,%rbx
   0x0000555555600c9f <+1183>:	xor    %edi,%edi
   0x0000555555600ca1 <+1185>:	mov    $0xd,%edx
   0x0000555555600ca6 <+1190>:	lea    0x38(%rsp),%rsi
   0x0000555555600cab <+1195>:	callq  *0x30(%rsp)
   0x0000555555600caf <+1199>:	mov    %rbx,%rdi
   0x0000555555600cb2 <+1202>:	callq  0x5555555ec600 <_d_eh_resume_unwind@plt>
End of assembler dump.
(gdb)



-- 
Russel.
==========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


November 07
On Tuesday, 7 November 2017 at 18:06:44 UTC, Russel Winder wrote:
>    0x0000555555600c95 <+1173>:	callq  0x5555555ec180 <_d_throw_exception@plt>
> => 0x0000555555600c9a <+1178>:	ud2

I'm definitely no EH expert, but this looks as if Russel's hitting an unreachable right after calling _d_throw_exception(), where we certainly aren't supposed to resume from afterwards. So something bad must be happening during unwinding.

November 07
On 7 Nov 2017, at 18:06, Russel Winder wrote:
>    0x0000555555600c8d <+1165>:	callq  0x5555555ec440 <_D3std7variant16VariantException6__ctorMFC8TypeInfoC8TypeInfoZC3std7variant16VariantException@plt>
>    0x0000555555600c92 <+1170>:	mov    %rax,%rdi
>    0x0000555555600c95 <+1173>:	callq  0x5555555ec180 <_d_throw_exception@plt>
> => 0x0000555555600c9a <+1178>:	ud2
>    0x0000555555600c9c <+1180>:	mov    %rax,%rbx

Interesting… For all the world, this looks like _d_throw_exception in fact returned, which it never should (we put an unreachable instruction there, which probably gets translated into ud2).

You could try using GCC's reverse debugging support to step backwards through the code to figure out whether that's really the case, and if so, where unwinding goes wrong.

 — David