Idea: Introduce zero-terminated string specifier (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Idea: Introduce zero-terminated string specifier (page 3)

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Piotr Szturmaj
in reply to Jonathan M Davis

Piotr Szturmaj

Posted in reply to Jonathan M Davis

Jonathan M Davis wrote:
> On Monday, October 01, 2012 11:18:16 Piotr Szturmaj wrote:
>> Adam D. Ruppe wrote:
>>> On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne Petersen wrote:
>>>> While the idea is reasonable, the problem then becomes that if you
>>>> accidentally pass a non-zero terminated char* to %sz, all hell breaks
>>>> loose just like with printf.
>>>
>>> That's the same risk with to!string(), yes? We aren't really losing
>>> anything by adding it.
>>>
>>> Also this reminds me of the utter uselessness of the current behavior of
>>> "%s" and a pointer - it prints the address.
>>
>> Why not specialize current "%s" for character pointer types so it will
>> print null terminated strings? It's always possible to cast to void* to
>> print an address.
>
> Honestly? One of Phobos' best features is the fact that %s works for
> _everything_. Specializing it for _anything_ would be horrible. It would also
> break a _ton_ of code. Who even uses %d, %f, etc. if they don't need to use
> format specifiers? It's just way simpler to always use %s.

OK, I think you're right.

> I'm not completely against the idea of %zs, but I confess that I have to
> wonder what someone is doing if they really need to print zero-terminated
> strings all that often in D for anything other than quick debugging (in which
> case to!string works just fine), since only stuff directly interacting with C
> code will even care. And if it's really that big a deal, and you're constantly
> interacting with C code like that, you can always use the appropriate C
> function - printf - and then it's a non-issue.

Imagine you're serializing great amount of text when some of the text come from a C library (as null-terminated char*) and you're using format() with %s specifiers. Direct handling of C strings would be just faster because it avoids double iteration.

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Johannes Pfau
in reply to Piotr Szturmaj

Johannes Pfau

Posted in reply to Piotr Szturmaj

Am Mon, 01 Oct 2012 13:22:46 +0200
schrieb Piotr Szturmaj <bncrbme@jadamspam.pl>:

> Paulo Pinto wrote:
> > On Monday, 1 October 2012 at 09:42:08 UTC, Piotr Szturmaj wrote:
> >> Jakob Ovrum wrote:
> >>> On Monday, 1 October 2012 at 09:17:52 UTC, Piotr Szturmaj wrote:
> >>>> Adam D. Ruppe wrote:
> >>>>> On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne
> >>>>> Petersen wrote:
> >>>>> Also this reminds me of the utter uselessness of the current
> >>>>> behavior of
> >>>>> "%s" and a pointer - it prints the address.
> >>>>
> >>>> Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.
> >>>
> >>> It's not safe to assume that pointers to characters are generally null terminated.
> >>
> >> Yes, but programmer should know what he's passing anyway.
> >
> > The thinking "the programmer should" only works in one man teams.
> >
> > As soon as you start having teams with disparate programming knowledge among team members, you can forget everything about "the programmer should".
> 
> I experienced such team at my previous work and I know what you mean. My original thoughts was based on telling writef that I want print a null-terminated string rather than address. to!string will surely work, but it implies double iteration, one in to!string to calculate length (seeking for 0 char) and one in writef (printing). With long strings this is suboptimal. What about something like this:
> 
> struct CString(T)
>      if (isSomeChar!T)
> {
>      T* str;
> }
> 
> @property
> auto cstring(S : T*, T)(S str)
>      if (isSomeChar!T)
> {
>      return CString!T(str);
> }
> 
> string test = "abc";
> immutable(char)* p = test.ptr;
> 
> writefln("%s", p.cstring); // prints "abc"
> 
> Here the char pointer type is "annotated" as null terminated string and writefln can use this information.

If CString implemented a toString method (probably the variant taking a sink delegate), this would already work. I'm not sure about performance though: Isn't writing out bigger buffers a lot faster than writing single chars? You could print every char individually, but wouldn't a p[0 .. strlen(p)] usually be faster?

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by deadalnix
in reply to Vladimir Panteleev

deadalnix

Posted in reply to Vladimir Panteleev

Le 01/10/2012 13:29, Vladimir Panteleev a écrit :
> On Monday, 1 October 2012 at 10:56:36 UTC, deadalnix wrote:
>> Le 30/09/2012 21:58, Vladimir Panteleev a écrit :
>>> On Sunday, 30 September 2012 at 18:31:00 UTC, deadalnix wrote:
>>>> If you know that a string is 0 terminated, you can easily create a
>>>> slice from it as follow :
>>>>
>>>> char* myZeroTerminatedString;
>>>> char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];
>>>>
>>>> It is clean and avoid to modify the stdlib in an unsafe way.
>>>
>>> That's what to!string already does.
>>
>> How does to!string know that the string is 0 terminated ?
>
> By convention (it doesn't).

It is unsafe as hell oO

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Steven Schveighoffer
in reply to Jonathan M Davis

Steven Schveighoffer

Posted in reply to Jonathan M Davis

On Mon, 01 Oct 2012 05:54:30 -0400, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> I'm not completely against the idea of %zs, but I confess that I have to
> wonder what someone is doing if they really need to print zero-terminated
> strings all that often in D for anything other than quick debugging (in which
> case to!string works just fine)

to!string necessarily allocates, I think that is not a small problem.

I think %s should treat char * as if it is zero-terminated.

Invariably, you will have two approaches to this problem:

1. writefln("%s", mycstring); => 0xptrlocation
2. hm.., I guess I'll just use to!string => vulnerable to non-zero-terminated strings!

or

2. hm.., to!string will allocate, I guess I'll just use writefln("%s", mycstring[0..strlen(mycstring)]); => vulnerable to non-zero-terminated strings!

So how is forcing the user to use one of these methods any safer?  I don't see any casts in there...

> , since only stuff directly interacting with C
> code will even care. And if it's really that big a deal, and you're constantly
> interacting with C code like that, you can always use the appropriate C
> function - printf - and then it's a non-issue.

Nobody should ever *ever* use printf, unless you are debugging druntime.

It's not a non-issue.  printf has no type checking whatsoever.  Using it means 1) non-typechecked code (i.e., accidentally pass an int instead of a string, or forget to pass an arg for a specifier, and you've crashed your code), and 2) you have locked yourself into using C's streams (something I hope to remedy in the future).

Besides, it doesn't *gain* you anything over having writef(ln) just support char *.

Bottom line -- if to!string(arg) is supported, writefln("%s", arg) should be supported, and do the same thing.

-Steve

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Piotr Szturmaj
in reply to Johannes Pfau

Piotr Szturmaj

Posted in reply to Johannes Pfau

Johannes Pfau wrote:
>> struct CString(T)
>>       if (isSomeChar!T)
>> {
>>       T* str;
>> }
>>
>> @property
>> auto cstring(S : T*, T)(S str)
>>       if (isSomeChar!T)
>> {
>>       return CString!T(str);
>> }
>>
>> string test = "abc";
>> immutable(char)* p = test.ptr;
>>
>> writefln("%s", p.cstring); // prints "abc"
>>
>> Here the char pointer type is "annotated" as null terminated string
>> and writefln can use this information.
>
> If CString implemented a toString method (probably the variant taking a
> sink delegate), this would already work.

I reworked this example to form a forward range:

http://dpaste.dzfl.pl/7ab1eeec

The major advantage over "%zs" is that it could be used anywhere, not only with writef().

For example C binding writers may change:

extern(C) char* getstr();

to

extern(C) cstring getstr();

so the string may be immediately used with writef();

> I'm not sure about performance
> though: Isn't writing out bigger buffers a lot faster than writing
> single chars? You could print every char individually, but wouldn't a
> p[0 .. strlen(p)] usually be faster?

I think it internally prints single characters anyway. At least it must test each character if it's not zero valued. strlen() does that.

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Andrej Mitrovic
in reply to Piotr Szturmaj

Andrej Mitrovic

Posted in reply to Piotr Szturmaj

On 10/1/12, Piotr Szturmaj <bncrbme@jadamspam.pl> wrote:
> For example C binding writers may change:
>
> extern(C) char* getstr();
>
> to
>
> extern(C) cstring getstr();

I don't think you can reliably do that because of semantics w.r.t.
passing parameters on the stack vs in registers based on whether a
type is a pointer or not. I've had this sort of bug when wrapping C++
where the C++ compiler was passing a parameter in one way but the D
compiler expected the parameters to be passed, simply because I tried
to be clever and fake a return type. See:
http://forum.dlang.org/thread/mailman.1547.1346632732.31962.d.gnu@puremagic.com#post-mailman.1557.1346690320.31962.d.gnu:40puremagic.com

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Andrej Mitrovic

Andrej Mitrovic

On 10/1/12, Andrej Mitrovic <andrej.mitrovich@gmail.com> wrote:
> but the D
> compiler expected the parameters to be passed

missing "in another way" there.

October 01, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Vladimir Panteleev
in reply to deadalnix

Vladimir Panteleev

Posted in reply to deadalnix

On Monday, 1 October 2012 at 12:12:52 UTC, deadalnix wrote:
> Le 01/10/2012 13:29, Vladimir Panteleev a écrit :
>> On Monday, 1 October 2012 at 10:56:36 UTC, deadalnix wrote:
>>> How does to!string know that the string is 0 terminated ?
>>
>> By convention (it doesn't).
>
> It is unsafe as hell oO

Forcing the programmer to put strlen calls everywhere in his code is not any safer.

October 02, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Walter Bright
in reply to deadalnix

Walter Bright

Posted in reply to deadalnix

On 9/30/2012 11:31 AM, deadalnix wrote:
> If you know that a string is 0 terminated, you can easily create a slice
> from it as follow :
>
> char* myZeroTerminatedString;
> char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];
>
> It is clean and avoid to modify the stdlib in an unsafe way.

Of course, using strlen() is always going to be unsafe. But having %zs is equally unsafe for the same reason.

deadalnix's example shows that adding a new format specifier %zs adds little value, but it gets much worse. Since %zs is inherently unsafe, it hides such unsafety in a commonly used library function, which will infect everything else that transitively calls writefln with unsafety.

This makes %zs an unacceptable feature.

October 02, 2012

Re: Idea: Introduce zero-terminated string specifier

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On Mon, 01 Oct 2012 21:13:47 -0400, Walter Bright <newshound1@digitalmars.com> wrote:

> On 9/30/2012 11:31 AM, deadalnix wrote:
>> If you know that a string is 0 terminated, you can easily create a slice
>> from it as follow :
>>
>> char* myZeroTerminatedString;
>> char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];
>>
>> It is clean and avoid to modify the stdlib in an unsafe way.
>
>
> Of course, using strlen() is always going to be unsafe. But having %zs is equally unsafe for the same reason.
>
> deadalnix's example shows that adding a new format specifier %zs adds little value, but it gets much worse. Since %zs is inherently unsafe, it hides such unsafety in a commonly used library function, which will infect everything else that transitively calls writefln with unsafety.
>
> This makes %zs an unacceptable feature.

What about %s just working with zero-terminated strings?

I was going to argue this point, but I just thought of a very very good counter-case for this.

string x = "abc".idup; // no zero-terminator!

writefln("%s", x.ptr);

What we don't want is for writefln to try and interpret the pointer as a C string.  Not only is it bad, but even the code seems to suggest "Hey, this should print a pointer!"

The large underlying issue here is that C considers char * to be a zero-terminated string, and D considers it to be a pointer.

This means any code which uses C calls heavily will have to awkwardly dance between both worlds.  I think there is some value in providing something that is *not* common to do the above work (convert char * to char[]).

Hm...

@system char[] zstr(char *s) { return s[0..strlen(s)]; }

provides:

writefln("%s", zstr(s));

vs.

writefln("%zs", s);

Arguably, nobody uses %zs, so even though writefln is common, the specifier is not.  However, we can't require an import to use a bizarre specifier, and you can't link un@safe code to a specifier, so the zstr concept is far superior in requiring the user to know what he is doing, and having the compiler enforce that.

Does it make sense for Phobos to provide such a shortcut in an obscure header somewhere?  Like std.cstring?  Or should we just say "roll your own if you need it"?

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation