Thread overview
toStringz lifetime
Oct 25, 2020
Ali Çehreli
Oct 25, 2020
rikki cattermole
Oct 25, 2020
Ali Çehreli
Nov 09, 2020
Ali Çehreli
Nov 09, 2020
rikki cattermole
Nov 09, 2020
Ali Çehreli
Oct 25, 2020
Johan Engelen
October 25, 2020
toStringz documentation is clear on why, when, and how to extend the lifetime of a D string:

  https://dlang.org/phobos/std_string.html#.toStringz

Assume foo is a D library function that passes a "string" result to e.g. C:

extern(C)
void foo(ref const(char) * name) {
  name = format!"file%s.txt"(42).toStringz;  // Allocates from GC memory
}

This may be fine for "immediate use" on the C side because at first glance no garbage collection can take place between our returning the result and their using it:

// C caller:
  const char * name = NULL;
  foo(&name);                 // Calls us
  printf("%s", name);         // Uses 'name' immediately

Is it really safe? Imagine a multi-threaded environment where another D function is executed that triggers a GC collection right before the printf.

Does the GC see that local variable 'name' that is on the C side? What I don't know is whether the GC is aware only of the stack frames of D functions or the entire thread, which would include the C caller's 'name'.

Ali
October 25, 2020
On 25/10/2020 11:03 PM, Ali Çehreli wrote:
> Does the GC see that local variable 'name' that is on the C side? What I don't know is whether the GC is aware only of the stack frames of D functions or the entire thread, which would include the C caller's 'name'.

The thread stack frame that is registered with the D GC will know about the D side and may know about the C side.

It depends on what the C side is doing.

If the C side went ahead and made a new stack frame via a fiber... it won't know about it. But even if it did, the D stack frame is still alive and pinning that bit of memory.

Ultimately, if the C side puts that pointer some place like a global or send it to another thread, there are no guarantees that things will play out well.
October 25, 2020
On 10/25/20 3:19 AM, rikki cattermole wrote:
> On 25/10/2020 11:03 PM, Ali Çehreli wrote:
>> Does the GC see that local variable 'name' that is on the C side? What I don't know is whether the GC is aware only of the stack frames of D functions or the entire thread, which would include the C caller's 'name'.
> 
> The thread stack frame that is registered with the D GC will know about the D side and may know about the C side.
> 
> It depends on what the C side is doing.
> 
> If the C side went ahead and made a new stack frame via a fiber... it won't know about it. But even if it did, the D stack frame is still alive and pinning that bit of memory.
> 
> Ultimately, if the C side puts that pointer some place like a global or send it to another thread, there are no guarantees that things will play out well.

Thanks. That's reassuring. :) So, as long as the D function documents that the C side should make a copy if they want to extend the string's lifetime it's their responsibility. And from your description I understand that they have time to make that copy.

Ali

October 25, 2020
On Sunday, 25 October 2020 at 10:03:44 UTC, Ali Çehreli wrote:
>
> Is it really safe? Imagine a multi-threaded environment where another D function is executed that triggers a GC collection right before the printf.
>
> Does the GC see that local variable 'name' that is on the C side? What I don't know is whether the GC is aware only of the stack frames of D functions or the entire thread, which would include the C caller's 'name'.

Small note: besides the stack, it is crucial that the GC is aware of the CPU register values.

-Johan

November 08, 2020
On 10/25/20 3:19 AM, rikki cattermole wrote:
> On 25/10/2020 11:03 PM, Ali Çehreli wrote:
>> Does the GC see that local variable 'name' that is on the C side? What I don't know is whether the GC is aware only of the stack frames of D functions or the entire thread, which would include the C caller's 'name'.
> 
> The thread stack frame that is registered with the D GC will know about the D side and may know about the C side.
> 
> It depends on what the C side is doing.
> 
> If the C side went ahead and made a new stack frame via a fiber... it won't know about it. But even if it did, the D stack frame is still alive and pinning that bit of memory.
> 
> Ultimately, if the C side puts that pointer some place like a global or send it to another thread, there are no guarantees that things will play out well.

Sorry to bring this up again but I want to understand this fully before I say something wrong during my DConf presentation. :)

The D code is a library. The actual program is e.g. written in C. When the D library is loaded into the program, the following function is executed and the D GC is initialized:

pragma (crt_constructor)
extern(C) int initialize() {
  return rt_init();
}

Does the D GC know the complete function call stack of the C program all the way up from 'main'? Is there the concept of "bottom of the stack" or does the D GC can only know the value of the stack pointer at the time rt_init() was called. If the latter, then I think a toStringz string may not be alive in a C function.

Imagine the C program dlopens our library from inside a function called from main. Then the program calls one of our library functions from another function in main:

// C program
int main() {
  initializeDlibrary();  // This does dlopen()
  useDlibrary();    // This receives a string returned from
                    // toStringZ and uses that string.
}

So, the question is, does D GC only know initializeDlibrary's stack frame up because it was initialized there?

I know threads complicate matters and they need to be attached to the GC with core.thread.osthread.thread_attachThis but I am not there yet. :) I want to understand the basic single thread stack pointer issue first.

Thank you,
Ali

November 09, 2020
On 09/11/2020 2:58 PM, Ali Çehreli wrote:
> Does the D GC know the complete function call stack of the C program all the way up from 'main'? Is there the concept of "bottom of the stack" or does the D GC can only know the value of the stack pointer at the time rt_init() was called. If the latter, then I think a toStringz string may not be alive in a C function.

https://github.com/dlang/druntime/blob/master/src/core/thread/context.d#L16
https://github.com/dlang/druntime/blob/master/src/core/thread/threadbase.d#L469
https://github.com/dlang/druntime/blob/master/src/core/thread/osthread.d#L1455
https://github.com/dlang/druntime/blob/master/src/core/thread/osthread.d#L1208

I'm tired, so here is the code related to your questions.

Note: the GC will use this abstraction for dealing with stack frames (otherwise it would be duplicated).
November 08, 2020
On 11/8/20 6:58 PM, rikki cattermole wrote:

> On 09/11/2020 2:58 PM, Ali Çehreli wrote:
>> Does the D GC know the complete function call stack of the C program
>> all the way up from 'main'? Is there the concept of "bottom of the
>> stack"

> https://github.com/dlang/druntime/blob/master/src/core/thread/osthread.d#L1455 


> I'm tired, so here is the code related to your questions.

Hey, I'm tired too! :p

Thank you. By the presence of getStackTop() and getStackBottom() above, I'm convinced that the entire stack is available. So, pointer returned by toStringz will be kept alive by the C caller during their immediate use. (They obviously cannot store for later use.)

Ali