Thread overview
Passing D strings to C
December 21, 2005
For a library wrapper I'm working on, there is
a need to pass some strings from D over to C...


This C library is rather flexible, and allows both
of a single "char *" parameter (zero-terminated),
but also a pair of parameters: "char *", "size_t"
(for this particular application, the strings will
be in UTF-8/ASCII format and won't contain any NULs)

Will it work to just pass the D strings over as they
are then, or do I need to copy/zero-terminate them
going out and then count/strlen them when coming in ?
Sounds like that means some extra looping and copying ?
(something I was hoping to avoid here, now that I can)


But ran into something of a snag with the (missing) ABI for
arrays in D, so I'm wondering what the best approach is here.
(i.e. DMD and GDC have two *different* approaches to passing
arrays, the main difference is that GDC is going 64 bit...)

So, the main question here is should I use a) :
=============================================
C:
extern void foo(unsigned char *str);
extern unsigned char *bar(void);

D:
extern(C) void foo(char* str);
extern(C) char* bar();

char[] s;
foo(std.string.toStringz(s));
s = std.string.toString(bar());


Or, should I go with the more "efficient" b) :
============================================
C:
typedef struct dstr {
	size_t length;
	const unsigned char *ptr;
} dstr;

#ifndef __GDC__
typedef unsigned long long dstrret; // DMD
#else
typedef dstr dstrret; // GDC
#endif

extern void foo(dstr str);
extern dstrret bar(void);

D:
extern(C) void foo(char[] str);
extern(C) char[] bar();

char[] s;
foo(s);
s = bar();


That is: the ugly and slow D, or the neat and fast D ? :)

(but thinking it could be a problem if DMD ever goes 64?
 So maybe it is "safer" to use the toString(z) approach?)

--anders


PS. I'm leaning towards approach b), even if less portable
    (i.e. the C code must know which D compiler is used...)
December 21, 2005
"Anders F Björklund" <afb@algonet.se> wrote in message news:dobf2q$1jnf$1@digitaldaemon.com...
> That is: the ugly and slow D, or the neat and fast D ? :)
>
> (but thinking it could be a problem if DMD ever goes 64?
>  So maybe it is "safer" to use the toString(z) approach?)
>
> --anders
>
>
> PS. I'm leaning towards approach b), even if less portable
>     (i.e. the C code must know which D compiler is used...)


I'd say go with approach B as well, as I'd imagine the counted-string C functions will function faster as well.  But I'm not really getting the problem - the dstr struct and an unsigned long long are the exact same size, aren't they?  So should it really matter the structure that the C version uses, as long as it's 64 bits of data?  Or has the "64-bit change in GDC" (which I know nothing about) not happened yet?


December 21, 2005
Jarrett Billingsley wrote:

> I'd say go with approach B as well, as I'd imagine the counted-string C functions will function faster as well.  But I'm not really getting the problem - the dstr struct and an unsigned long long are the exact same size, aren't they?  So should it really matter the structure that the C version uses, as long as it's 64 bits of data?  Or has the "64-bit change in GDC" (which I know nothing about) not happened yet? 

They are the same size (i.e 8 bytes), but seem to be passed differently?
Or at least DMD doesn't work with a struct and GDC not with a uint128_t.

The "64-bit change" refers to David getting GDC ready to do 64-bits...
http://www.digitalmars.com/drn-bin/wwwnews?D.gnu/1567

Note:
There could be other reasons for using a struct instead of an integer.
I just assumed it had something to do with being able to move easier ?

--anders
December 21, 2005
I wrote, mistakenly:

> They are the same size (i.e 8 bytes), but seem to be passed differently?
> Or at least DMD doesn't work with a struct and GDC not with a uint128_t.

Make that "uint64_t" (i.e. "ulong" in D, and "unsigned long long" in C)

And this was for the return value, using a struct as a parameter works
OK in both compilers. (I assume they're just passed on the stack anyway)

--anders
December 22, 2005
> Or, should I go with the more "efficient" b) :
> ============================================
> C:
> typedef struct dstr {
>     size_t length;
>     const unsigned char *ptr;
> } dstr;
> 
> #ifndef __GDC__
> typedef unsigned long long dstrret; // DMD
> #else
> typedef dstr dstrret; // GDC
> #endif
> 
> extern void foo(dstr str);
> extern dstrret bar(void);
> 
> D:
> extern(C) void foo(char[] str);
> extern(C) char[] bar();
> 
> char[] s;
> foo(s);
> s = bar();

> That is: the ugly and slow D, or the neat and fast D ? :)

This was the chosen solution, and it worked out OK...


Here was the Makefile snippet that I used, to choose
between the two methods of returning arrays back to D:

OS=$(shell uname)
ARCH=$(shell arch | sed -e s/i.86/x86/ )

ifeq ("$(OS) $(ARCH)","Linux x86")
CXX = g++ -D__DMD__
DMD = dmd
else
ifeq ("$(OS) $(ARCH)","Darwin ppc")
CXX = g++ -D__GDC__
DMD = gdmd
else
CXX = g++ -D__GDC__
DMD = gdmd
endif
endif

Windows is handled in a separate Makefile altogether.
(which is using CXX = dmc and DMD = dmd, naturally...)

--anders