toStringz and predictability (page 3)

"parabolis" <parabolis@softhome.net> wrote in message news:csmiqa$edp$1@digitaldaemon.com... > Ben Hinkle wrote: >> There's something about toStringz that has me uncomfortable. Consider this code: > > There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations. So the toStringz function should probably look like this: > > ---------------------------------------------------------------- > char* toStringz(char[] dStr) { > char[] cStr = new char[dStr.length+1]; > foreach(int i, char dChar; dStr) { > if(!(cStr[i] = dChar)) throw new Exception("Null char"); > } > return &cStr; > ---------------------------------------------------------------- > > Now seems like a great time for plugging the unless/until feature of Perl as being nice in this context allowing: > > unless(cStr[i] = dChar) throw new Exception("Null char"); Has there been debate about unless/until? If so, count me on the list of 'wanting'. :-)

Matthew wrote: > "parabolis" <parabolis@softhome.net> wrote in message news:csmiqa$edp$1@digitaldaemon.com... > >> >>---------------------------------------------------------------- >>char* toStringz(char[] dStr) { >> char[] cStr = new char[dStr.length+1]; >> foreach(int i, char dChar; dStr) { >> if(!(cStr[i] = dChar)) throw new Exception("Null char"); >> } >> return &cStr; >>---------------------------------------------------------------- >> >>Now seems like a great time for plugging the unless/until feature of Perl as being nice in this context allowing: >> >> unless(cStr[i] = dChar) throw new Exception("Null char"); > > > Has there been debate about unless/until? If so, count me on the list of 'wanting'. :-) > Yes back around the time the digitalmars.d newsgroup started: http://www.digitalmars.com/d/archives/digitalmars/D/1714.html Walter wrote: > >"Brian Hammond" <d at brianhammond dot comBrian_member xx >pathlink.com> wrote >in message news:c8lmu2$vdm$1 xx digitaldaemon.com... >> I really like the unless because it reads so well. >> >> "do this unless this is true" > > That just seems backwards to me <g>. I like things to execute > forwards, not backwards. However Walter's response was long before "is" replaced "===" and so I think it at least deserves another consideration as Perl's unless construct would give us "unless(A is null)" instead of the akward and much maligned "if(!(A is null))".

(Actually, I refer here to several examples in this thread.) >>>char* toStringzz(char[] str) { >>> str.length = str.length+1; >>> str[length-1] = 0; >>> return str.ptr; >>>} What bothers me is, if a string gets repeatedly passed, say, between a library and the main program, and the library functions pass the string on to the OS or another library, every time using toStringz -- then what keeps the string from growing at each iteration? Finally we end up with a (possibly short) string with a lot of zeros at the end. It seems harmless at first glance, but what if later this kind of strings are concatenated (in D code) and passed on to a C-written parser? It would see a lot of "empty strings" between real data. Or am I missing something? In the same manner, should toStringz guarantee a valid C string? I.e. no internal zeros? At the _very least_ in the non-release build! ---- The name toStringz is misleading. Since the only use for it is to make strings edible for C code, it should be renamed toStringC. Normally, if a programmer _wants_ to slap a zero at the end, he'd use ~, wouldn't he. Misnomers like this introduce parallax, and in this case so subtle that we don't even notice. And that's where it _really_ counts!

January 24, 2005

Re: toStringz and predictability

Posted by Anders F Björklund
in reply to Georg Wrede

Permalink

Anders F Björklund

Posted in reply to Georg Wrede

Permalink

Georg Wrede wrote:

> It seems harmless at first glance, but what if later this kind of strings are concatenated (in D code) and passed on to a C-written parser? It would see a lot of "empty strings" between real data.
> 
> Or am I missing something?

It would probably be easier to remove the hack altogether and just copy?

>     body
>     {
> 	if (string.length == 0)
> 	    return "";
> 
> 	// Need to make a copy
> 	char[] copy = new char[string.length + 1];
> 	copy[0..string.length] = string;
> 	copy[string.length] = 0;
> 	return copy;
>     }

Isn't that just what "string.length = string.length + 1" does, anyway ?

It would be neat if it could be optimized for string literals, but not
at the expense of making the whole function instable? (like it is now)

> In the same manner, should toStringz guarantee a valid C string? I.e. no internal zeros? At the _very least_ in the non-release build!

The contract for toStringz specifies that the char[] is *without* '\0':

>     in
>     {
> 	if (string)
> 	{
> 	    // No embedded 0's
> 	    for (uint i = 0; i < string.length; i++)
> 		assert(string[i] != 0);
> 	}
>     }
>     out (result)
>     {
> 	if (result)
> 	{   assert(strlen(result) == string.length);
> 	    assert(memcmp(result, string, string.length) == 0);
> 	}
>     }

It also (implicitly) returns a "" string, for an input param of null.

> The name toStringz is misleading. Since the only use for it is to make strings edible for C code, it should be renamed toStringC. Normally, if a programmer _wants_ to slap a zero at the end, he'd use ~, wouldn't he.

It converts a char[], to a zero-terminated char*. No "C" about that ??
(I'm not sure why it doesn't just 'return (string ~ "\0");', anyone ?)
==> body { return ((string.length == 0) ? "" : string ~ "\0"); }

Besides, most of the C functions does not accept UTF-8 input anyway...
To be usable from regular C, it would need to be converted to byte* ?
(and that would most likely involve charset encoding conversion too)

--anders

Forums