Jump to page: 1 2 3
Thread overview
threading issues with D -> C -> Python
Dec 03, 2014
Michael
Dec 03, 2014
ketmar
Dec 03, 2014
Michael
Dec 03, 2014
ketmar
Dec 03, 2014
Michael
Dec 03, 2014
ketmar
Dec 03, 2014
ketmar
Dec 03, 2014
ketmar
Dec 03, 2014
ketmar
Dec 03, 2014
Ellery Newcomer
Dec 03, 2014
Michael
Dec 03, 2014
ketmar
Dec 04, 2014
Michael
Dec 04, 2014
Ellery Newcomer
Dec 04, 2014
Michael
Dec 04, 2014
Ellery Newcomer
Dec 04, 2014
Michael
Dec 04, 2014
H. S. Teoh
Dec 05, 2014
Ellery Newcomer
Dec 06, 2014
Ellery Newcomer
Dec 07, 2014
Michael
Dec 08, 2014
Ellery Newcomer
Dec 09, 2014
Michael
Dec 18, 2014
Ellery Newcomer
Dec 03, 2014
Russel Winder
Dec 03, 2014
Michael
December 03, 2014
Hi. I'm new here and this is my first post. I'm not sure this is the right subforum for it, but wasn't sure where else to put it either.

I've written a library to talk to some external hardware using a socket. It uses the std.concurrency threads to send messages between the main D-object for the hardware and the D-object for the sockets. I then wanted to be able to call these functions from Python. PyD appeared to be out of date, so I've been using a D -> C interface, and a C -> Python interface. The python code will often run from different python threads, so I then added yet another message-passing layer between the D->C interface and the D->hardware interface.

My problem is that this code routinely causes segmentation faults. I've spent a long time going through trying to figure out exactly what the causes are. I think there have been some related to D-exceptions not being handled gracefully by the C/Python code. Some more by stdout writing from multiple threads (which surprised me).

I'm fairly sure I have tackled both of these issues, but it still seems like Python threads and D threads don't mix well. When running the same functions from D, I am able to get no errors, but when run from Python/C it causes segfaults reliably.

Sorry for the large exposition. I am currently at the point of suspecting bugs in Phobos, but I am unskilled enough to tell for sure, and would appreciate any help.

The latest core dump gives a backtrace of almost entirely phobos commands:

#0  0x00007fe789ad3b97 in gc.gc.Gcx.fullcollect() () from /lib/libphobos2.so.0.66
#1  0x00007fe789ad3294 in gc.gc.Gcx.bigAlloc() () from /lib/libphobos2.so.0.66
#2  0x00007fe789ad0df1 in gc.gc.GC.mallocNoSync() () from /lib/libphobos2.so.0.66
#3  0x00007fe789ad0c15 in gc.gc.GC.malloc() () from /lib/libphobos2.so.0.66
#4  0x00007fe789ad6470 in gc_malloc () from /lib/libphobos2.so.0.66
#5  0x00007fe789ae6d36 in _d_newitemT () from /lib/libphobos2.so.0.66
#6  0x00007fe789e57112 in std.array.__T8AppenderTAaZ.Appender.__T3putTAxaZ.put() () from /usr/lib/libv5camera.so
#7  0x00007fe789e570b5 in std.array.__T8AppenderTAaZ.Appender.__T3putTAxaZ.put() () from /usr/lib/libv5camera.so
#8  0x00007fe789e562dc in std.array.__T8AppenderTAaZ.Appender.__T3putTAaZ.put() () from /usr/lib/libv5camera.so
#9  0x00007fe789e561ea in std.array.__T8AppenderTAaZ.Appender.__T3putTxwZ.put() () from /usr/lib/libv5camera.so
#10 0x00007fe789e5617d in std.format.__T10formatCharTS3std5array16__T8AppenderTAaZ8AppenderZ.formatChar() () from /usr/lib/libv5camera.so
#11 0x00007fe789e56132 in std.format.__T10formatCharTS3std5array16__T8AppenderTAaZ8AppenderZ.formatChar() () from /usr/lib/libv5camera.so
#12 0x00007fe789e61f09 in std.concurrency.MessageBox.__T3getTS4core4time8DurationTDFNfAyaiZvZ.get() () from /usr/lib/libv5camera.so
#13 0x00007fe789e5b4ac in std.concurrency.MessageBox.__T3getTS4core4time8DurationTDFNaNbNiNfAyaiZvZ.get() () from /usr/lib/libv5camera.so
#14 0x00007fe789e57e8d in std.typecons.__T5TupleTAyaTiTG65536kZ.Tuple.__T6__ctorTS3std8typecons24__T5TupleTAyaTiTG65536kZ5TupleZ.__ctor() ()
   from /usr/lib/libv5camera.so
#15 0x00007fe789e581f1 in std.variant.__T8VariantNVmi32Z.VariantN.__T7handlerTS3std8typecons24__T5TupleTAyaTiTG65536kZ5TupleZ.handler() ()
   from /usr/lib/libv5camera.so
#16 0x00007fe789e57d0f in std.typecons.__T5TupleTAyaTiTG65536kZ.Tuple.__T8opEqualsTS3std8typecons24__T5TupleTAyaTiTG65536kZ5TupleZ.opEquals() ()
   from /usr/lib/libv5camera.so
#17 0x00007fe789e57ba8 in std.typecons.__T5TupleTAyaTiTG65536kZ.injectNamedFields() () from /usr/lib/libv5camera.so
#18 0x00007fe789e62087 in std.concurrency.MessageBox.__T3getTS4core4time8DurationTDFNfAyaiZvZ.get() () from /usr/lib/libv5camera.so
#19 0x00007fe789e621a3 in std.concurrency.MessageBox.__T3getTS4core4time8DurationTDFNfAyaiZvZ.get() () from /usr/lib/libv5camera.so
#20 0x00007fe789e5b7f6 in std.concurrency.MessageBox.__T3getTS4core4time8DurationTDFNaNbNiNfAyaiZvZ.get() () from /usr/lib/libv5camera.so
#21 0x00007fe789ac7d51 in core.thread.Thread.run() () from /lib/libphobos2.so.0.66
#22 0x00007fe789ac6f95 in thread_entryPoint () from /lib/libphobos2.so.0.66
#23 0x00007fe79cee5182 in start_thread (arg=0x7fe77aca5700) at pthread_create.c:312
#24 0x00007fe79cc11fbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Cheers,
Michael.

December 03, 2014
On Wed, 03 Dec 2014 01:07:42 +0000
Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> I'm fairly sure I have tackled both of these issues, but it still seems like Python threads and D threads don't mix well. When running the same functions from D, I am able to get no errors, but when run from Python/C it causes segfaults reliably.
you are right, D threads and other language/library threads aren't mix well. at least you have to use `thread_attachThis()` and `thread_detachThis()` from core.threads module to make sure that GC is aware of "alien" threads. and i assume that calling this functions from python will not be very easy.

but it's better to not mix 'em at all if it is possible.


December 03, 2014
Thanks for this. Its definitely a step in the right direction. Would you mind explaining a bit more about the problem here, if you can? I don't fully understand why the garbage collector needs to know about the threads, and if so for how long does it need to know? If I put in "thread_attachThis();scope(exit)thread_detachThis();" it doesn't appear to fix my problems, so I'm definitely curious as to what is going on under the hood.

Cheers,
Michael.

On Wednesday, 3 December 2014 at 01:17:43 UTC, ketmar via Digitalmars-d-learn wrote:
> On Wed, 03 Dec 2014 01:07:42 +0000
> Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
> wrote:
>
>> I'm fairly sure I have tackled both of these issues, but it still seems like Python threads and D threads don't mix well. When running the same functions from D, I am able to get no errors, but when run from Python/C it causes segfaults reliably.
> you are right, D threads and other language/library threads aren't mix
> well. at least you have to use `thread_attachThis()` and
> `thread_detachThis()` from core.threads module to make sure that GC is
> aware of "alien" threads. and i assume that calling this functions
> from python will not be very easy.
>
> but it's better to not mix 'em at all if it is possible.

December 03, 2014
On Wed, 03 Dec 2014 02:21:45 +0000
Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> Thanks for this. Its definitely a step in the right direction. Would you mind explaining a bit more about the problem here, if you can? I don't fully understand why the garbage collector needs to know about the threads, and if so for how long does it need to know? If I put in "thread_attachThis();scope(exit)thread_detachThis();" it doesn't appear to fix my problems, so I'm definitely curious as to what is going on under the hood.
you have to call `thread_attachThis();` in "alien" thread, not in D
thread. i.e. if you created thread from python code, you have to call
`thread_attachThis();` in that python thread (i don't know how you'll
do that, but you must ;-). and you must call `thread_detachThis();`
from the same python thread before exiting from it.

garbage collector must know about all running threads so it can scan their stacks, variables and so on. as there is no portable way to set application-wide hooks on thread creation and termination, you must inform GC about that events manually.

the other thing you can do is to not use any D allocated data in "alien" threads. i.e. don't pass anything that was allocated by D code to python thread and vice versa. if you want to pass some data to "alien" thread, `malloc()` the necessary space, copy data to it and pass malloc'ed pointer. don't forget to free that data in "alien" thread. but i think that this is not what you really want, as it means alot of allocations and copying, and complicates the whole thing alot.


"alien" is the thread that was created outside of D code.


December 03, 2014
On Wednesday, 3 December 2014 at 02:41:11 UTC, ketmar via Digitalmars-d-learn wrote:
> On Wed, 03 Dec 2014 02:21:45 +0000
> Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
> wrote:
>
>> Thanks for this. Its definitely a step in the right direction. Would you mind explaining a bit more about the problem here, if you can? I don't fully understand why the garbage collector needs to know about the threads, and if so for how long does it need to know? If I put in "thread_attachThis();scope(exit)thread_detachThis();" it doesn't appear to fix my problems, so I'm definitely curious as to what is going on under the hood.
> you have to call `thread_attachThis();` in "alien" thread, not in D
> thread. i.e. if you created thread from python code, you have to call
> `thread_attachThis();` in that python thread (i don't know how you'll
> do that, but you must ;-). and you must call `thread_detachThis();`
> from the same python thread before exiting from it.
>
> garbage collector must know about all running threads so it can scan
> their stacks, variables and so on. as there is no portable way to set
> application-wide hooks on thread creation and termination, you must
> inform GC about that events manually.
>
> the other thing you can do is to not use any D allocated data in
> "alien" threads. i.e. don't pass anything that was allocated by D code
> to python thread and vice versa. if you want to pass some data to
> "alien" thread, `malloc()` the necessary space, copy data to it and
> pass malloc'ed pointer. don't forget to free that data in "alien"
> thread. but i think that this is not what you really want, as it means
> alot of allocations and copying, and complicates the whole thing alot.
>
>
> "alien" is the thread that was created outside of D code.

Okay. Well I am already not passing any D-allocated data. I'm specifically creating variables/arrays on the C-stack, and then passing the pointer of that to D and overwriting the data of the C-stack pointer for any return values. I was worried about that specific problem and I thought this would be a solution. I am then able to tell python to use the C-stack variable without having to worry about D trying to run any garbage collection on it.

Going the other way, I probably am passing some python strings etc.. into D, but I would assume they are valid for the lifetime of the function call, and that D would have no reason to try and perform any garbage collection on them.
December 03, 2014
On Wed, 03 Dec 2014 02:52:27 +0000
Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> Okay. Well I am already not passing any D-allocated data. I'm specifically creating variables/arrays on the C-stack, and then passing the pointer of that to D and overwriting the data of the C-stack pointer for any return values. I was worried about that specific problem and I thought this would be a solution. I am then able to tell python to use the C-stack variable without having to worry about D trying to run any garbage collection on it.
if D code has any pointer to that data stored anywhere, GC will walk it and hit another thread's stack. and now it doesn't know where it is, and it can't pause that uknown thread so it will not mutate the area GC is scanning now. this *may* work, but it will segfault sooner or later.

> Going the other way, I probably am passing some python strings etc.. into D, but I would assume they are valid for the lifetime of the function call, and that D would have no reason to try and perform any garbage collection on them.
D has conservative GC, so it will try to walk with the unknown data just in case that data contains some pointers. and GC can hit "false positives" there (something that *looks* like a pointer to some area) and other things.

so to make the long story short: you should either register and deregister *all* your threads in GC (for D threads it's automatic process; for other threads you must do it manually), or don't use GC at all.

besides, if you are using your D library from C code, you must call `rt_init()` once before calling any D code. this function will initialize D runtime. and you have to call `rt_term()` before exiting your program to deinitialize D runtime.


December 03, 2014
On Wed, 03 Dec 2014 02:52:27 +0000
Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

by "using from C code" i mean that your main program is not written in D, it has no D `main()` and so on. i.e. you wrote, for example, some .a library in D and now you want to use that library in C code.


December 03, 2014
On Wed, 03 Dec 2014 01:07:42 +0000
Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

all in all, you'd better not mixing D code with "alien" mulththreaded code and not using .a/.so libraries written in D in another language until you are familiar with D runtime and GC. those mixes are very fragile.


December 03, 2014
On Wed, 03 Dec 2014 01:07:42 +0000
Michael via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

btw, Adam Ruppe's "D Cookbook" has a chapter which describes how to call D library from C code. don't remember if it describes threading, though.


December 03, 2014
On 12/02/2014 05:07 PM, Michael wrote:
> Hi. I'm new here and this is my first post. I'm not sure this is the
> right subforum for it, but wasn't sure where else to put it either.
>
> I've written a library to talk to some external hardware using a socket.
> It uses the std.concurrency threads to send messages between the main
> D-object for the hardware and the D-object for the sockets. I then
> wanted to be able to call these functions from Python. PyD appeared to
> be out of date, so I've been using a D -> C interface, and a C -> Python
> interface. The python code will often run from different python threads,
> so I then added yet another message-passing layer between the D->C
> interface and the D->hardware interface.
>

are you looking at this pyd: https://bitbucket.org/ariovistus/pyd

« First   ‹ Prev
1 2 3