Jump to page: 1 2
Thread overview
D2 toStringz Return Type
Nov 07, 2008
Mike Parker
Nov 07, 2008
Sean Kelly
Nov 08, 2008
ore-sama
Nov 08, 2008
ore-sama
Nov 09, 2008
ore-sama
November 07, 2008
I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.

Consider the D2 versions of the function declarations in the std.c package. The 'const char *' declarations from C are replaced with 'in char*' from D. To what end? The C side doesn't know and doesn't care about const declarations in D. You could add any sort of modifier you want on the D side and it would serve no real purpose.

Then consider all of the existing (and future) bindings to C libraries out there. The vast majority have this sort of function prototype:

=====================
// from the C header
void someFunc(const char*);

// in D
extern(C) void someFunc(char*);
======================

This works with D1, but anyone using this library with D2 will either have to change such prototypes to include the 'in' modifier, or cast every call to toStringz.

I'm updating Derelict now to work with D2 and found this to be incredibly annoying. But before I go through every API Derelict binds to  in order to figure out which char* parameters should get the 'in' modifier, I thought I'd ask here if there is any real purpose in requiring this sort of thing?

Is there any major use case for toStringz other than passing null-terminated strings to C functions? If not, then having a const(char)* return type is really superfluous. It doesn't do anything! Can we just remove the const?
November 07, 2008
Mike Parker wrote:
> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.
> 
> Consider the D2 versions of the function declarations in the std.c package. The 'const char *' declarations from C are replaced with 'in char*' from D. To what end? The C side doesn't know and doesn't care about const declarations in D. You could add any sort of modifier you want on the D side and it would serve no real purpose.

It affects what values may be passed as parameters to these routines. String literals, for example, may only be passed as const or invariant parameters in D2.

> Then consider all of the existing (and future) bindings to C libraries out there. The vast majority have this sort of function prototype:
> 
> =====================
> // from the C header
> void someFunc(const char*);
> 
> // in D
> extern(C) void someFunc(char*);
> ======================
> 
> This works with D1, but anyone using this library with D2 will either have to change such prototypes to include the 'in' modifier, or cast every call to toStringz.

Yup.  I wish someone had said to use "in" for this from the outset.  But conversion isn't too terrible in most cases.  I converted all of Tango's C and Posix headers in an afternoon.

> I'm updating Derelict now to work with D2 and found this to be incredibly annoying. But before I go through every API Derelict binds to  in order to figure out which char* parameters should get the 'in' modifier, I thought I'd ask here if there is any real purpose in requiring this sort of thing?

Passing const variables.

> Is there any major use case for toStringz other than passing null-terminated strings to C functions? If not, then having a const(char)* return type is really superfluous. It doesn't do anything! Can we just remove the const?

See above :-)


Sean
November 07, 2008
Mike Parker wrote:
> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.

We want to leave the opportunity open to not duplicate the actual memory underneath the string object. (Right now that opportunity is not effected.)

> Consider the D2 versions of the function declarations in the std.c package. The 'const char *' declarations from C are replaced with 'in char*' from D. To what end? The C side doesn't know and doesn't care about const declarations in D. You could add any sort of modifier you want on the D side and it would serve no real purpose.

C functions should have the most descriptive D signatures. "in" means "const" and "scope". The latter is not yet fully described at the moment, and may be dropped entirely. In that case, "in" remains a shorter synonym for "const".

> Then consider all of the existing (and future) bindings to C libraries out there. The vast majority have this sort of function prototype:
> 
> =====================
> // from the C header
> void someFunc(const char*);
> 
> // in D
> extern(C) void someFunc(char*);
> ======================

I think the const information should be there in the D version as well.

> This works with D1, but anyone using this library with D2 will either have to change such prototypes to include the 'in' modifier, or cast every call to toStringz.
> 
> I'm updating Derelict now to work with D2 and found this to be incredibly annoying. But before I go through every API Derelict binds to  in order to figure out which char* parameters should get the 'in' modifier, I thought I'd ask here if there is any real purpose in requiring this sort of thing?
> 
> Is there any major use case for toStringz other than passing null-terminated strings to C functions? If not, then having a const(char)* return type is really superfluous. It doesn't do anything! Can we just remove the const?

I think it's better to keep it.

Andrei
November 07, 2008
"Andrei Alexandrescu" wrote
> Mike Parker wrote:
>> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.
>
> We want to leave the opportunity open to not duplicate the actual memory underneath the string object. (Right now that opportunity is not effected.)

My recommendation -- have 2 functions.  One which always copies (and returns char *), and one which does not.

This at least leaves a safe alternative for people who have headers that aren't properly constified, and don't want to go through the hassle of looking it up themselves.  Also good for those C functions which actually require a mutable char *, since D2 strings are mostly invariant.

-Steve


November 07, 2008
Steven Schveighoffer wrote:
> "Andrei Alexandrescu" wrote
>> Mike Parker wrote:
>>> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.
>> We want to leave the opportunity open to not duplicate the actual memory underneath the string object. (Right now that opportunity is not effected.)
> 
> My recommendation -- have 2 functions.  One which always copies (and returns char *), and one which does not.
> 
> This at least leaves a safe alternative for people who have headers that aren't properly constified, and don't want to go through the hassle of looking it up themselves.  Also good for those C functions which actually require a mutable char *, since D2 strings are mostly invariant.

You can't quite do that because dynamic conditions establish whether it's safe to avoid copying or not.

Andrei
November 07, 2008
"Andrei Alexandrescu" wrote
> Steven Schveighoffer wrote:
>> "Andrei Alexandrescu" wrote
>>> Mike Parker wrote:
>>>> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.
>>> We want to leave the opportunity open to not duplicate the actual memory underneath the string object. (Right now that opportunity is not effected.)
>>
>> My recommendation -- have 2 functions.  One which always copies (and returns char *), and one which does not.
>>
>> This at least leaves a safe alternative for people who have headers that aren't properly constified, and don't want to go through the hassle of looking it up themselves.  Also good for those C functions which actually require a mutable char *, since D2 strings are mostly invariant.
>
> You can't quite do that because dynamic conditions establish whether it's safe to avoid copying or not.

I can see how you interpreted it this way.

What I meant was one is the toStringz as it is today, which might copy and might leave it in-place.  This can be used to call C functions that take a const char *.  The other function will *always* copy, and will return a mutable char *.  This is for when you don't care to look at the function yourself (assuming the author got it correct), or the case where the C function actually does mutate the argument.

If the C function does actually require a mutable argument, you are forced to do an extra dup for no reason with today's toStringz.

-Steve


November 07, 2008
Steven Schveighoffer wrote:
> "Andrei Alexandrescu" wrote
>> Steven Schveighoffer wrote:
>>> "Andrei Alexandrescu" wrote
>>>> Mike Parker wrote:
>>>>> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.
>>>> We want to leave the opportunity open to not duplicate the actual memory underneath the string object. (Right now that opportunity is not effected.)
>>> My recommendation -- have 2 functions.  One which always copies (and returns char *), and one which does not.
>>>
>>> This at least leaves a safe alternative for people who have headers that aren't properly constified, and don't want to go through the hassle of looking it up themselves.  Also good for those C functions which actually require a mutable char *, since D2 strings are mostly invariant.
>> You can't quite do that because dynamic conditions establish whether it's safe to avoid copying or not.
> 
> I can see how you interpreted it this way.
> 
> What I meant was one is the toStringz as it is today, which might copy and might leave it in-place.  This can be used to call C functions that take a const char *.  The other function will *always* copy, and will return a mutable char *.  This is for when you don't care to look at the function yourself (assuming the author got it correct), or the case where the C function actually does mutate the argument.
> 
> If the C function does actually require a mutable argument, you are forced to do an extra dup for no reason with today's toStringz.
> 
> -Steve 

I see. So:

const(char)* toStringzMayOrMayNotCopy(in char[]);
char* toStringzWillAlwaysCopy(in char[]);

Providing writable zero-terminated strings is a sure recipe for disaster
 (see the debates around sprintf, strcpy etc.). I think the need for
such things is rare and at best avoided entirely by the standard
library. If you so wish, you can always use malloc by hand.


Andrei
November 07, 2008
"Andrei Alexandrescu" wrote
> Steven Schveighoffer wrote:
>> "Andrei Alexandrescu" wrote
>>> Steven Schveighoffer wrote:
>>>> "Andrei Alexandrescu" wrote
>>>>> Mike Parker wrote:
>>>>>> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.
>>>>> We want to leave the opportunity open to not duplicate the actual memory underneath the string object. (Right now that opportunity is not effected.)
>>>> My recommendation -- have 2 functions.  One which always copies (and returns char *), and one which does not.
>>>>
>>>> This at least leaves a safe alternative for people who have headers that aren't properly constified, and don't want to go through the hassle of looking it up themselves.  Also good for those C functions which actually require a mutable char *, since D2 strings are mostly invariant.
>>> You can't quite do that because dynamic conditions establish whether it's safe to avoid copying or not.
>>
>> I can see how you interpreted it this way.
>>
>> What I meant was one is the toStringz as it is today, which might copy and might leave it in-place.  This can be used to call C functions that take a const char *.  The other function will *always* copy, and will return a mutable char *.  This is for when you don't care to look at the function yourself (assuming the author got it correct), or the case where the C function actually does mutate the argument.
>>
>> If the C function does actually require a mutable argument, you are forced to do an extra dup for no reason with today's toStringz.
>>
>> -Steve
>
> I see. So:
>
> const(char)* toStringzMayOrMayNotCopy(in char[]);
> char* toStringzWillAlwaysCopy(in char[]);
>
> Providing writable zero-terminated strings is a sure recipe for disaster
>  (see the debates around sprintf, strcpy etc.). I think the need for
> such things is rare and at best avoided entirely by the standard
> library. If you so wish, you can always use malloc by hand.

Using zero terminated strings, even const ones, is a recipe for disaster. Yet, there it is.  And it's making me do 2 duplications.

The reality is that as soon as you cross the boundary from D to C, you have lost all the safety benefits that D provides, even if the signature is const.  The reality is, people are still going to call these functions, either with an extra dup (which buys you nothing in safety), or by editing the bindings to be const (which makes it even more unsafe).  The reality is, most of these calls are pretty innocuous.  People aren't using sprintf or strcpy, they are using C libraries that do things that D doesn't already do. Most of these are just using char * as a way to pass const strings, it isn't too much to ask for a function that complies.

But you probably won't add it.  That's ok, I don't use Phobos anyways.  I'll be sure to add an appropriate function to Tango while porting it to D2.

-Steve


November 07, 2008
Steven Schveighoffer wrote:
> "Andrei Alexandrescu" wrote
>> Steven Schveighoffer wrote:
>>> "Andrei Alexandrescu" wrote
>>>> Steven Schveighoffer wrote:
>>>>> "Andrei Alexandrescu" wrote
>>>>>> Mike Parker wrote:
>>>>>>> I'm curious as to why toStringz in D2 returns const(char)* instead of just a plain char*. Considering that the primary use case foe the function is interfacing with C code, it seems rather pointless to return a const(char)*.
>>>>>> We want to leave the opportunity open to not duplicate the actual memory underneath the string object. (Right now that opportunity is not effected.)
>>>>> My recommendation -- have 2 functions.  One which always copies (and returns char *), and one which does not.
>>>>>
>>>>> This at least leaves a safe alternative for people who have headers that aren't properly constified, and don't want to go through the hassle of looking it up themselves.  Also good for those C functions which actually require a mutable char *, since D2 strings are mostly invariant.
>>>> You can't quite do that because dynamic conditions establish whether it's safe to avoid copying or not.
>>> I can see how you interpreted it this way.
>>>
>>> What I meant was one is the toStringz as it is today, which might copy and might leave it in-place.  This can be used to call C functions that take a const char *.  The other function will *always* copy, and will return a mutable char *.  This is for when you don't care to look at the function yourself (assuming the author got it correct), or the case where the C function actually does mutate the argument.
>>>
>>> If the C function does actually require a mutable argument, you are forced to do an extra dup for no reason with today's toStringz.
>>>
>>> -Steve
>> I see. So:
>>
>> const(char)* toStringzMayOrMayNotCopy(in char[]);
>> char* toStringzWillAlwaysCopy(in char[]);
>>
>> Providing writable zero-terminated strings is a sure recipe for disaster
>>  (see the debates around sprintf, strcpy etc.). I think the need for
>> such things is rare and at best avoided entirely by the standard
>> library. If you so wish, you can always use malloc by hand.
> 
> Using zero terminated strings, even const ones, is a recipe for disaster. Yet, there it is. 

Well writable ones are even more of a disaster. Reading random characters can cause the program to fail but does not corrupt its state arbitrarily. So it's good to limit the damage. The C and C++ communities have much more beef with writable stringz's than read-only ones.

> And it's making me do 2 duplications.

Not at all.

string s = ...;
auto sz = cast(char*) malloc(s.length + 1);
sz[0 .. s.length] = s[];
sz[s.length] = 0;

If you use it often in an application, put it in a function. I'm not putting it in the standard library.

> The reality is that as soon as you cross the boundary from D to C, you have lost all the safety benefits that D provides, even if the signature is const.

I disagree. You lost automatic checking from the D side when interfacing with C, but if a C function is reliably not mutating its arguments its D signature is better tagged as const. It's a net win.

> The reality is, people are still going to call these functions, either with an extra dup (which buys you nothing in safety), or by editing the bindings to be const (which makes it even more unsafe).  The reality is, most of these calls are pretty innocuous.  People aren't using sprintf or strcpy, they are using C libraries that do things that D doesn't already do. Most of these are just using char * as a way to pass const strings, it isn't too much to ask for a function that complies.

Maybe I got lucky, but I haven't run across any C libraries that don't use const in signatures. Anyhow the point is superfluous as you, not them, gets to write the D interfacing signatures. Const conveys a world of information. True, that is not 100% enforceable in D and in C alike, as a cast could always ruin things. But it's good if the signature reflects a guarantee that is reasonable and also reasonably easy to observe.

> But you probably won't add it.  That's ok, I don't use Phobos anyways.  I'll be sure to add an appropriate function to Tango while porting it to D2.

You may want to rethink before putting dangerous functions in widely-used libraries. Returning a writable zero-terminated char* is as dangerous as it gets, and fostering bad coding style too.


Andrei
November 08, 2008
"Andrei Alexandrescu" wrote
> Steven Schveighoffer wrote:
>> But you probably won't add it.  That's ok, I don't use Phobos anyways. I'll be sure to add an appropriate function to Tango while porting it to D2.
>
> You may want to rethink before putting dangerous functions in widely-used libraries. Returning a writable zero-terminated char* is as dangerous as it gets, and fostering bad coding style too.

Nonsense.  Tango currently has such a function with D 1.x, and I've never heard of any issues with it.  I think you have overblown the danger here.

-Steve


« First   ‹ Prev
1 2