Thread overview
extern __gshared const(char)* symbol fails
Aug 31, 2018
James Blachly
Aug 31, 2018
Neia Neutuladh
Aug 31, 2018
James Blachly
Sep 02, 2018
Laeeth Isharc
Aug 31, 2018
James Blachly
Sep 01, 2018
Edgar Huckert
Aug 31, 2018
nkm1
August 31, 2018
Hi all,

I am linking to a C library which defines a symbol,

const char seq_nt16_str[] = "=ACMGRSVTWYHKDBN";

In the C sources, this is an array of 16 bytes (17 I guess, because it is written as a string).

In the C headers, it is listed as extern const char seq_nt16_str[];

When linking to this library from another C program, I am able to treat seq_nt16_str as any other array, and being defined as [] fundamentally it is a pointer.

When linking to this library from D, I have declared it as:

extern __gshared const(char)* seq_nt16_str;

***But this segfaults when I treat it like an array (e.g. by accessing members by index).***

Because I know the length, I can instead declare:

extern __gshared const(char)[16] seq_nt16_str;

My question is: Why can I not treat it opaquely and use it declared as char* ? Does this have anything to do with it being a global stored in the static data segment?

August 31, 2018
On Friday, 31 August 2018 at 06:20:09 UTC, James Blachly wrote:
> Hi all,
>
> I am linking to a C library which defines a symbol,
>
> const char seq_nt16_str[] = "=ACMGRSVTWYHKDBN";
>
> In the C sources, this is an array of 16 bytes (17 I guess, because it is written as a string).
>
> In the C headers, it is listed as extern const char seq_nt16_str[];
>
> When linking to this library from another C program, I am able to treat seq_nt16_str as any other array, and being defined as [] fundamentally it is a pointer.
>
> When linking to this library from D, I have declared it as:
>
> extern __gshared const(char)* seq_nt16_str;
>
> ***But this segfaults when I treat it like an array (e.g. by accessing members by index).***

I believe this should be extern extern(C)? I'm surprised that this segfaults rather than having a link error.

A bare `extern` means "this symbol is defined somewhere else".

`extern(C)` means "this symbol should have C linkage".

When I try it with just `extern`, I see a link error:

scratch.o: In function `_Dmain':
scratch.d:(.text._Dmain[_Dmain]+0x7): undefined reference to `_D7scratch5cdataPa'
collect2: error: ld returned 1 exit status
Error: linker exited with status 1


August 31, 2018
On 8/31/18 2:20 AM, James Blachly wrote:
> Hi all,
> 
> I am linking to a C library which defines a symbol,
> 
> const char seq_nt16_str[] = "=ACMGRSVTWYHKDBN";
> 
> In the C sources, this is an array of 16 bytes (17 I guess, because it is written as a string).
> 
> In the C headers, it is listed as extern const char seq_nt16_str[];
> 
> When linking to this library from another C program, I am able to treat seq_nt16_str as any other array, and being defined as [] fundamentally it is a pointer.
> 
> When linking to this library from D, I have declared it as:
> 
> extern __gshared const(char)* seq_nt16_str;
> 
> ***But this segfaults when I treat it like an array (e.g. by accessing members by index).***
> 
> Because I know the length, I can instead declare:
> 
> extern __gshared const(char)[16] seq_nt16_str;
> 
> My question is: Why can I not treat it opaquely and use it declared as char* ? Does this have anything to do with it being a global stored in the static data segment?
> 

What the C compiler is doing is storing it as data, and then storing the symbol to point at the first element in the data.

When you use const char* in D, it's expecting a *pointer* to be stored at that address, not the data itself. So using it means segfault. The static array is the correct translation, even though it leaks implementation details.

In C, it's working because C has the notion of a symbol being where an array starts. D has no concept of a C array like that, every array must have a length. So there is no equivalent you can use in D -- you have to supply the length.

Alternatively, you can treat it as a const char:

extern(C) extern const(char) seq_nt16_str;

void main()
{
   import core.stdc.stdio;
   printf("%s\n", &seq_nt16_str); // should print the string
}

You could wrap it like this:

pragma(mangle, "seq_nt16_str");
private extern(C) extern const(char) _seq_nt16_str_STORAGE;

@property const(char)* seq_nt16_str()
{
   return &_seq_nt16_str_STORAGE;
}

To make the code look similar.

-Steve
August 31, 2018
On 8/31/18 1:18 PM, Neia Neutuladh wrote:
> On Friday, 31 August 2018 at 06:20:09 UTC, James Blachly wrote:
>> Hi all,
>>
>> I am linking to a C library which defines a symbol,
>>
>> const char seq_nt16_str[] = "=ACMGRSVTWYHKDBN";
>>
>> In the C sources, this is an array of 16 bytes (17 I guess, because it is written as a string).
>>
>> In the C headers, it is listed as extern const char seq_nt16_str[];
>>
>> When linking to this library from another C program, I am able to treat seq_nt16_str as any other array, and being defined as [] fundamentally it is a pointer.
>>
>> When linking to this library from D, I have declared it as:
>>
>> extern __gshared const(char)* seq_nt16_str;
>>
>> ***But this segfaults when I treat it like an array (e.g. by accessing members by index).***
> 
> I believe this should be extern extern(C)? I'm surprised that this segfaults rather than having a link error.

Yeah, I had to add extern(C) in my tests to get it to link. I think he must have extern(C): somewhere above.

-Steve
August 31, 2018
On Friday, 31 August 2018 at 06:20:09 UTC, James Blachly wrote:
> Hi all,
>
> I am linking to a C library which defines a symbol,
>
> const char seq_nt16_str[] = "=ACMGRSVTWYHKDBN";
>
> In the C sources, this is an array of 16 bytes (17 I guess, because it is written as a string).
>
> In the C headers, it is listed as extern const char seq_nt16_str[];
>
> When linking to this library from another C program, I am able to treat seq_nt16_str as any other array, and being defined as [] fundamentally it is a pointer.

No. This is a misconception. Fundamentally, it's an array.

>
> When linking to this library from D, I have declared it as:
>
> extern __gshared const(char)* seq_nt16_str;
>
> ***But this segfaults when I treat it like an array (e.g. by accessing members by index).***
>
> Because I know the length, I can instead declare:
>
> extern __gshared const(char)[16] seq_nt16_str;
>
> My question is: Why can I not treat it opaquely and use it declared as char* ? Does this have anything to do with it being a global stored in the static data segment?

For the same reason you can't do it in C.

--- main.c ---
#include <stdio.h>

extern const char* array; /* then try array[] */

int main(void)
{
    printf("%.5s\n", array);
    return 0;
}

--- lib.c ---
const char array[] = "hello world";


# gcc -o main main.c lib.c
# ./main
Segmentation fault

You need to declare your extern array as array in D and also in C, so that the compiler would know what that is (an array, not a pointer). In many situations C compiler would silently convert an array into a pointer (when it already knows its dealing with array), but it won't convert a pointer into an array.
August 31, 2018
On Friday, 31 August 2018 at 17:18:58 UTC, Neia Neutuladh wrote:
> On Friday, 31 August 2018 at 06:20:09 UTC, James Blachly wrote:
>> Hi all,
>>
>> ...
>>
>> When linking to this library from D, I have declared it as:
>>
>> extern __gshared const(char)* seq_nt16_str;
>>
>> ***But this segfaults when I treat it like an array (e.g. by accessing members by index).***
>
> I believe this should be extern extern(C)? I'm surprised that this segfaults rather than having a link error.
>
> A bare `extern` means "this symbol is defined somewhere else".
>
> `extern(C)` means "this symbol should have C linkage".
>


I am so sorry -- I should have been more clear that this is in the context of a large header-to-D translation .d file, so the whole thing is wrapped in extern(C) via an extern(C): at the top of the file.

August 31, 2018
On Friday, 31 August 2018 at 17:50:17 UTC, Steven Schveighoffer wrote:
> What the C compiler is doing is storing it as data, and then storing the symbol to point at the first element in the data.
>
> When you use const char* in D, it's expecting a *pointer* to be stored at that address, not the data itself. So using it means segfault. The static array is the correct translation, even though it leaks implementation details.
>
> In C, it's working because C has the notion of a symbol being where an array starts. D has no concept of a C array like that, every array must have a length. So there is no equivalent you can use in D -- you have to supply the length.


NKML also wrote:
> You need to declare your extern array as array in D and also in C, so that the compiler would know what that is (an array, not a pointer). In many situations C compiler would silently convert an array into a pointer (when it already knows its dealing with array), but it won't convert a pointer into an array.

Thank you Steve and NKML for your very clear and concise answers. This makes perfect sense.

I would like not to write as a static array in D because I cannot guarantee future version of the library to which I am linking would not change the length of the data. Steve's trick, below, looks like the ticket.


> Alternatively, you can treat it as a const char:
>
> extern(C) extern const(char) seq_nt16_str;
>
> void main()
> {
>    import core.stdc.stdio;
>    printf("%s\n", &seq_nt16_str); // should print the string
> }
>
> You could wrap it like this:
>
> pragma(mangle, "seq_nt16_str");
> private extern(C) extern const(char) _seq_nt16_str_STORAGE;
>
> @property const(char)* seq_nt16_str()
> {
>    return &_seq_nt16_str_STORAGE;
> }
>
> To make the code look similar.
>
> -Steve

That is a great trick, and I will use it.



September 01, 2018
On Friday, 31 August 2018 at 17:50:17 UTC, Steven Schveighoffer wrote:
...
> When you use const char* in D, it's expecting a *pointer* to be stored at that address, not the data itself. So using it means segfault. The static array is the correct translation, even though it leaks implementation details.
>
> In C, it's working because C has the notion of a symbol being where an array starts. D has no concept of a C array like that, every array must have a length. So there is no equivalent you can use in D -- you have to supply the length.
>

I think this is only correct for dynamic arrays. For static arrays I have the impression that it works exactly as in C, i.e. the address of the array is the address of the first array element. See this simple code:

import std.stdio;
import std.array;

void main()
{
  // static array
  ulong [4] ulArr1 = [0,1,2,3];
  ulong *p1 = ulArr1.ptr;
  ulong *p2 = &(ulArr1[0]);
  ulong [4] *p3 = &ulArr1;
  writeln("same pointers: ", cast(void *)p1 == cast(void *)p2);
  writeln("same pointers: ", cast(void *)p3 == cast(void *)p2);
  writeln("");
  // dynamic array
  ulong [] ulArr2 = [0,1,2,3];
  p1 = ulArr2.ptr;
  p2 = &(ulArr2[0]);
  ulong [] *p5 = &ulArr2;
  writeln("same pointers: ", cast(void *)p1 == cast(void *)p2);
  writeln("same pointers: ", cast(void *)p5 == cast(void *)p2);
}   // end main()

This produces (with dmd):

same pointers: true
same pointers: true

same pointers: true
same pointers: false

September 02, 2018
On Friday, 31 August 2018 at 18:49:26 UTC, James Blachly wrote:
> On Friday, 31 August 2018 at 17:18:58 UTC, Neia Neutuladh wrote:
>> On Friday, 31 August 2018 at 06:20:09 UTC, James Blachly wrote:
>>> Hi all,
>>>
>>> ...
>>>
>>> When linking to this library from D, I have declared it as:
>>>
>>> extern __gshared const(char)* seq_nt16_str;
>>>
>>> ***But this segfaults when I treat it like an array (e.g. by accessing members by index).***
>>
>> I believe this should be extern extern(C)? I'm surprised that this segfaults rather than having a link error.
>>
>> A bare `extern` means "this symbol is defined somewhere else".
>>
>> `extern(C)` means "this symbol should have C linkage".
>>
>
>
> I am so sorry -- I should have been more clear that this is in the context of a large header-to-D translation .d file, so the whole thing is wrapped in extern(C) via an extern(C): at the top of the file.

In case you weren't aware of it, take a look at atilaneves DPP on GitHub or code.dlang.org.  auto translates C headers at build time and mostly it just works.  If it doesn't, file an issue and in time it will be fixed.