Thread overview
Issues with Socket.accept() and SIGUSR1
Dec 08, 2017
LeqxLeqx
Dec 08, 2017
Adam D. Ruppe
Dec 08, 2017
Nemanja Boric
Dec 09, 2017
LeqxLeqx
Dec 08, 2017
Nemanja Boric
December 08, 2017
Hello,

I've been trying to create a small server-client program, and I've run into a rather strange problem. There's a thread separate from the main which accepts incoming connections. The established connections are then passed over to the main thread for the actual logic of the interaction, and the accept thread loops back to listen for further connections.

Normally the accept will throw a timeout and then the loop will continue to listen, but sometimes (and I can't find a decent pattern) the Socket.accept() method will raise a SIGUSR1 rather than throwing an exception of any kind.

Can anyone help with this? I have no idea why this happen. Below is the output from GDB when I've been testing it.


  Thread 2 "sverse" received signal SIGUSR1, User defined signal 1.
  [Switching to Thread 0x7ffff6a16700 (LWP 1562)]
  0x00007ffff72a5840 in __libc_accept (fd=3, addr=addr@entry=..., len=len@entry=0x0) at ../sysdeps/unix/sysv/linux/accept.c:26
  26	../sysdeps/unix/sysv/linux/accept.c: No such file or directory.
  (gdb) backtrace
  #0  0x00007ffff72a5840 in __libc_accept (fd=3, addr=addr@entry=..., len=len@entry=0x0) at ../sysdeps/unix/sysv/linux/accept.c:26
  #1  0x00005555555be5d5 in std.socket.Socket.accept() (this=0x7ffff7edc0e0) at ../../../../src/libphobos/src/std/socket.d:2817
  #2  0x00005555555801c1 in sverse.server.server.Server.acceptCallback() (this=0x7ffff7eda100) at ./sverse/server/server.d:264
  #3  0x00005555555fa842 in core.thread.Thread.run() (this=0x7ffff7eda200) at ../../../../src/libphobos/libdruntime/core/thread.d:1403
  #4  thread_entryPoint (arg=0x7ffff7eda200) at ../../../../src/libphobos/libdruntime/core/thread.d:392
  #5  0x00007ffff6c227fc in start_thread (arg=0x7ffff6a16700) at pthread_create.c:465
  #6  0x00007ffff72a4b0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  (gdb) info locals
  resultvar = 18446744073709551612
  sc_cancel_oldtype = 0
  (gdb) up
  #1  0x00005555555be5d5 in std.socket.Socket.accept() (this=0x7ffff7edc0e0) at ../../../../src/libphobos/src/std/socket.d:2817
  2817	../../../../src/libphobos/src/std/socket.d: No such file or directory.
  (gdb) info locals
  newSocket = <optimized out>
  newsock = <optimized out>


Using GDC if that helps.

Any and all assistance will be greatly appreciated

Thanks,

December 08, 2017
On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
> Normally the accept will throw a timeout and then the loop will continue to listen, but sometimes (and I can't find a decent pattern) the Socket.accept() method will raise a SIGUSR1 rather than throwing an exception of any kind.

That probably means the *other* thread started a garbage collection cycle. The D GC uses that signal to pause threads while it scans memory, so they don't change out from under it mid-scan.

All you need to do is try the accept again if that happens. It isn't really an exception, it is just an EINTR - signal call interrupted - and you are supposed to just try again when that happens (unless the interruption meant the program is now instructed to terminate e.g. SIGINT)
December 08, 2017
On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
> Hello,
>
> I've been trying to create a small server-client program, and I've run into a rather strange problem. There's a thread separate from the main which accepts incoming connections. The established connections are then passed over to the main thread for the actual logic of the interaction, and the accept thread loops back to listen for further connections.
>
> [...]

Looking the trace, your thread 2 had not raised, but received a signal. I have a feeling GC collection starts from another thread, and GC sends SIGUSR1 to all (registered to the runtime) threads to pause them, while the collection is running. As Adam said, repeat the call (check how Phobos sockets are handling this and what you should do), and you probably want to instruct the debugger to ignore SIGUSR1/2.
December 08, 2017
On Friday, 8 December 2017 at 23:11:47 UTC, Adam D. Ruppe wrote:
> On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
>> Normally the accept will throw a timeout and then the loop will continue to listen, but sometimes (and I can't find a decent pattern) the Socket.accept() method will raise a SIGUSR1 rather than throwing an exception of any kind.
>
> That probably means the *other* thread started a garbage collection cycle. The D GC uses that signal to pause threads while it scans memory, so they don't change out from under it mid-scan.
>
> All you need to do is try the accept again if that happens. It isn't really an exception, it is just an EINTR - signal call interrupted - and you are supposed to just try again when that happens (unless the interruption meant the program is now instructed to terminate e.g. SIGINT)

Sorry, I've completely missed your first paragraph! It's after midnight here, good night!
December 09, 2017
On Friday, 8 December 2017 at 23:11:47 UTC, Adam D. Ruppe wrote:
> On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
>> Normally the accept will throw a timeout and then the loop will continue to listen, but sometimes (and I can't find a decent pattern) the Socket.accept() method will raise a SIGUSR1 rather than throwing an exception of any kind.
>
> That probably means the *other* thread started a garbage collection cycle. The D GC uses that signal to pause threads while it scans memory, so they don't change out from under it mid-scan.
>
> All you need to do is try the accept again if that happens. It isn't really an exception, it is just an EINTR - signal call interrupted - and you are supposed to just try again when that happens (unless the interruption meant the program is now instructed to terminate e.g. SIGINT)

Than you both for answering my stupid question.

Nonetheless, it seems that there still is a very strange thing going on. I'm getting a segfault (which was the error I got before I opened GDB and ran into the SIGUSR1 thing), in the middle of a object.opEquals call. It seems to be triggered right after the GC's SIGUSR1.

Perhaps this is just another stupid question, but is it possible that the D GC is collecting a resource which my program is still attempting to use? I'm not using pointers directly at all in this program.



  Thread 2 "sverse" received signal SIGUSR1, User defined signal 1.

  Thread 3 "sverse" received signal SIGUSR1, User defined signal 1.

  Thread 2 "sverse" received signal SIGUSR2, User defined signal 2.

  Thread 3 "sverse" received signal SIGUSR2, User defined signal 2.

  Thread 1 "sverse" received signal SIGSEGV, Segmentation fault.
  0x00007ffff7edc200 in ?? ()
  (gdb) backtracfe
  Undefined command: "backtracfe".  Try "help".
  (gdb) backtrace
  #0  0x00007ffff7edc200 in ?? ()
  #1  0x0000555555606242 in object.opEquals(Object, Object) (lhs=0x7ffff7edc120, rhs=0x7ffff7edc080)
    at ../../../../src/libphobos/libdruntime/object.d:152
  #2  0x0000555555581fb9 in sverse.server.serverpanel.ServerPanel.canMove(sverse.core.entity.Entity) (this=0x7ffff7ede000,
    movedEntity=0x7ffff7edf1c0) at ./sverse/server/serverpanel.d:129
  #3  0x0000555555581e13 in sverse.server.serverpanel.ServerPanel.attemptToApplyMove(sverse.core.entity.Entity, dmath.vector.Vector!(int).Vector) (this=0x7ffff7ede000, entity=0x7ffff7edf1c0, originalPosition=0x7ffff7fd6ec0) at ./sverse/server/serverpanel.d:107
  #4  0x0000555555581afb in sverse.server.serverpanel.ServerPanel.update() (this=0x7ffff7ede000) at ./sverse/server/serverpanel.d:72
  #5  0x000055555557f971 in sverse.server.server.Server.updateAllPanels() (this=0x7ffff7eda100) at ./sverse/server/server.d:190
  #6  0x000055555557f15e in sverse.server.server.Server.tick() (this=0x7ffff7eda100) at ./sverse/server/server.d:113
  #7  0x000055555557c926 in sverse.sverse.runServer(immutable(char)[], ushort) (addressString=..., port=6001) at ./sverse/sverse.d:99
  #8  0x000055555557c398 in D main (args=...) at ./sverse/sverse.d:26