Thread overview
[Issue 12417] `toStringz` is fundamentally broken
Mar 24, 2015
John Colvin
Mar 24, 2015
Denis Shelomovskij
Mar 24, 2015
John Colvin
Mar 24, 2015
Denis Shelomovskij
Dec 17, 2022
Iain Buclaw
March 24, 2015
https://issues.dlang.org/show_bug.cgi?id=12417

John Colvin <john.loughran.colvin@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |john.loughran.colvin@gmail.
                   |                            |com

--- Comment #3 from John Colvin <john.loughran.colvin@gmail.com> ---
You mean something like this?

//D

extern(C) void foo(int* a);

void main()
{
    foo(new int);
}

//C

#include<stdlib.h>

void gc_collect();

// I solemnly swear this function does not
// hold any references to `a`
void foo(int* a)
{
    int** m = (int**)malloc(sizeof(int*));
    *m = a;
    a = 0;

    //Doesn't have to be explicit, could be from
    //another thread or triggered in a D callback
    gc_collect();

    int b = **m; //oops!

    free(m);
}


The question is, are you always guaranteed to have a reference on the caller side? If not, then I think we have a problem.

E.g. what DMD and clang spit out without optimisations (edited):

__Dmain:
0000000100001610    pushq    %rbp
0000000100001611    movq    %rsp, %rbp
0000000100001614    movq    0x199e5(%rip), %rdi     ## literal pool symbol
address: _D11TypeInfo_Pi6__initZ
000000010000161b    callq    0x1000199d0             ## symbol stub for:
__d_newitemT
0000000100001620    movq    %rax, %rdi
    # only references to the memory are in %rax and %rdi
0000000100001623    callq    _foo
0000000100001628    xorl    %eax, %eax
000000010000162a    popq    %rbp
000000010000162b    retq

_foo:
0000000100001650    pushq    %rbp
0000000100001651    movq    %rsp, %rbp
0000000100001654    subq    $0x20, %rsp
0000000100001658    movabsq    $0x8, %rax
    # %rax has been overwritten, only references is in %rdi
0000000100001662    movq    %rdi, -0x8(%rbp)
    # %rdi and %rbp-8
0000000100001666    movq    %rax, %rdi
    # %rbp-8
0000000100001669    callq    0x100019b56             ## symbol stub for:
_malloc
000000010000166e    movq    %rax, -0x10(%rbp)
0000000100001672    movq    -0x8(%rbp), %rax
    # %rax and %rbp-8
0000000100001676    movq    -0x10(%rbp), %rdi
000000010000167a    movq    %rax, (%rdi)
    # %rax, %rbp-8 and on C heap at (%rdi)
000000010000167d    movq    $0x0, -0x8(%rbp)
    # %rax and on C heap at (%rdi)
0000000100001685    movb    $0x0, %al
    # %rax lowest byte stomped, only reference on C heap at (%rdi)
0000000100001687    callq    0x10001987a             ## symbol stub for:
_gc_collect
    # GC has no references it can see, frees the memory
000000010000168c    movq    -0x10(%rbp), %rdi
0000000100001690    movq    (%rdi), %rdi
    # %rdi is now dangling pointer
0000000100001693    movl    (%rdi), %ecx
    # BOOM!
0000000100001695    movl    %ecx, -0x14(%rbp)
0000000100001698    movq    -0x10(%rbp), %rdi
000000010000169c    callq    0x100019b3e             ## symbol stub for: _free
00000001000016a1    addq    $0x20, %rsp
00000001000016a5    popq    %rbp
00000001000016a6    retq


Am I reading that right? I think it might be OK in this example because of the remaining bytes of %rax but I don't see why we can rely on that.

--
March 24, 2015
https://issues.dlang.org/show_bug.cgi?id=12417

--- Comment #4 from Denis Shelomovskij <verylonglogin.reg@gmail.com> ---
(In reply to John Colvin from comment #3)
> You mean something like this?

Yes, this is an example of what may go wrong. But in a general case we don't even know what calling convention (e.i. ABI) so the reference can be easily lost.

Also now most `toStringz`/`toUTF16z`/`toUTFz` usages in Phobos are replaced with `tempCString*` ones [1] so this doesn't affect much Phobos itself.

The problem is `toStringz` and friends aren't properly documented as dangerous (and also inefficient by the way) and `tempCString*` replacement isn't public (addition of new public features to Phobos looks too complicated for me).

[1] https://github.com/D-Programming-Language/phobos/pull/2332

--
March 24, 2015
https://issues.dlang.org/show_bug.cgi?id=12417

--- Comment #5 from John Colvin <john.loughran.colvin@gmail.com> ---
Well in that case we have a problem with all calls to C that follow this pattern:

allocate some GC memory
pass a pointer to said memory to C
never refer to the memory again

as there is nothing to guarantee that there will be any reference held on the D side during the execution of the C function and no guarantee about where the C function will hold the reference during its execution. There will be cases where the optimiser will play nice, but in general it has no obligation to preserve a reference on the D side to memory that is never accessed again.

--
March 24, 2015
https://issues.dlang.org/show_bug.cgi?id=12417

--- Comment #6 from Denis Shelomovskij <verylonglogin.reg@gmail.com> ---
(In reply to John Colvin from comment #5)
> Well in that case we have a problem with all calls to C that follow this pattern:
> 
> allocate some GC memory
> pass a pointer to said memory to C
> never refer to the memory again

As D's GC may eventually become a moving GC one mustn't do it anyway.

--
February 22, 2019
https://issues.dlang.org/show_bug.cgi?id=12417

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh@quickfur.ath.cx

--
December 17, 2022
https://issues.dlang.org/show_bug.cgi?id=12417

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

--