string types: const(char)[] and cstring (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » string types: const(char)[] and cstring (page 2)

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Walter Bright
in reply to Bill Baxter

Walter Bright

Posted in reply to Bill Baxter

Bill Baxter wrote:
> 'const string' automatically applies const to both the char and the [], right?

Right.

> Is that something to be worried about?

If you want to reassign another value, yes. I suggest:

	const(char)[]

instead.

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Walter Bright
in reply to Reiner Pope

Walter Bright

Posted in reply to Reiner Pope

Reiner Pope wrote:
> Also, I can't see any difference between const(char) and invariant(char), since neither can ever be rebound. In that case, if I assume that they are identical types, how can an array of const(char) be different from an array of invariant(char)?

The difference is when they are reference types, such as arrays of const char, or arrays of invariant chars.

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Daniel Keep
in reply to Reiner Pope

Daniel Keep

Posted in reply to Reiner Pope


Reiner Pope wrote:
> Walter Bright wrote:
>> Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write:
>>
>>     const(char)[]
>>
> ....
>> String literals, on the other hand, will be invariant (which means
>> they can be stuffed into read-only memory). So,
>>     typeof("abc")
>> will be:
>>     invariant(char)[3]
> 
> The thing I don't get about this syntax is what happens when you take off the [].
> 
> 1.   invariant(char) c = 'b'; // c is 'b' now, and will never change.
> 2.   final(char) d = 'b';     // but calling it final means the same...
> 3.   const(char) e = 'b';     // ummm... what?
> 
> It seems like const(char) is a constant char -- one that can't change.
> Does that make final obsolete?
> 
> Also, I can't see any difference between const(char) and
> invariant(char), since neither can ever be rebound. In that case, if I
> assume that they are identical types, how can an array of const(char) be
> different from an array of invariant(char)?
> 
> -- Reiner

This is what I'm wondering; I thought const and invariant only applied
to reference types (which is why we have final as storage const), in
which case, const(char)[] doesn't make any sense...

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Walter Bright
in reply to Daniel Keep

Walter Bright

Posted in reply to Daniel Keep

Daniel Keep wrote:
> This is what I'm wondering; I thought const and invariant only applied
> to reference types (which is why we have final as storage const), in
> which case, const(char)[] doesn't make any sense...

If you know C++, then const(char)* is the same as:
	const char* p;		// C++
and const(char*) is the same as:
	const char * const p;	// C++


(using * because C++ doesn't have dynamic arrays)

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Anders F Björklund
in reply to Walter Bright

Anders F Björklund

Posted in reply to Walter Bright

Walter Bright wrote:

> Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars.

I think cstring is a horrible name. "string" is much better, and in use.
(else wouldn't those be wcstring and dcstring or cwstring and cdstring?)

That it is made up of constant characters, and that those aren't really
characters but instead UTF-8 code units is something that can be hidden.

alias const(char)[] string;

But "cstring" both sounds awkward, and also leads the mind to C strings.
Even if those (char*) would probably be "stringz" in the usual D lingo.

If any name conflict with previously existing "string" must be avoided,
then "str" is probably a better name... (character->char, integer->int)

As was discussed earlier.

--anders

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Anders F Björklund
in reply to Bill Baxter

Anders F Björklund

Posted in reply to Bill Baxter

Bill Baxter wrote:

> Some people already alias char[] to string.  As far as I've heard they haven't run into conflicts with the module name, or with people naming variables 'string'.

I think it would be a problem at the top of the namespace,
but it's OK if you use (for instance) "wx.common.string":

module wx.common;
alias char[] string;

Then you can do declarations like:
string string = "string";

At least that's how it has been working for the last couple
of years, and for Christopher E. Miller's dstring.d as well:

module dstring;
struct string { ... }

--anders

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Chris Miller
in reply to Anders F Björklund

Chris Miller

Posted in reply to Anders F Björklund

On Sat, 26 May 2007 04:35:34 -0400, Anders F Björklund <afb@algonet.se> wrote:

> Walter Bright wrote:
>
>> Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars.
>
> I think cstring is a horrible name. "string" is much better, and in use.
> (else wouldn't those be wcstring and dcstring or cwstring and cdstring?)
>
> That it is made up of constant characters, and that those aren't really
> characters but instead UTF-8 code units is something that can be hidden.
>
> alias const(char)[] string;
>
> But "cstring" both sounds awkward, and also leads the mind to C strings.
> Even if those (char*) would probably be "stringz" in the usual D lingo.
>
> If any name conflict with previously existing "string" must be avoided,
> then "str" is probably a better name... (character->char, integer->int)
>
> As was discussed earlier.
>
> --anders

I agree, except I don't care much for "str". I'd prefer it named string. If it's an alias in object.d and not a keyword, it shouldn't be too bad.

Actually, while we're at a change for strings, why not bring in something similar to my dstring module, where slicing and indexing never result in an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the code may not be ideal, but it's the concept I'm referring to.

While on strings, I'll mention another problem I have with D's string handling. "invalid utf8 sequence" (or, if you prefer, "4invalid utf8 sequence"). Other Unicode implementations I've used do not throw such an exception, but interpret the bad parts as replacement characters (U+FFFD). I believe I've also heard that the Unicode standard also recommends being forgiving in this aspect.

- Chris

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Marcin Kuszczak
in reply to Chris Miller

Marcin Kuszczak

Posted in reply to Chris Miller

Chris Miller wrote:

> Actually, while we're at a change for strings, why not bring in something similar to my dstring module, where slicing and indexing never result in an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the code may not be ideal, but it's the concept I'm referring to.

Yup. That's my opinion also...

For me advantages of such a string are quite obvious:
1. Easy slicing and indexing of utf8 sequences (without corrupting this
sequence - as mention above)
2. Common denominator for char[], wchar[] and dchar[]
3. For classes which doesn't need speed it simplifies API (only one version
of functions instead of 3)
4. With some additional support from language (cast operators to different
types and opImplicitCast) it can be fully interchangeable with every method
taking char[], wchar[], dchar[].

Having another 3 names for string is not very appealing for me. We would
have 9 official versions of string available in D:
char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char),
tango String!(wchar), tango String!(dchar)

To write nice, fully functional library you have to write 3 versions of every function which takes different string types (I know, templates makes it a little bit easier). Probably I will not be wrong when I say that reality is that people just write one version for char[], because it is convenient (see: SWT ported from Java). It causes that wchar and dchar are treated as second class citizens in D. Additionally when people design their program for char[], they mostly don't think about issues with slicing of char[] utf8 sequence (warning! assumption!), so default way of writing programs is *NOT SAFE*. When you write code and don't care about bare metal speed it is just tedious to do this additional work...

Having one string, which hides differences between char[], wchar[] and dchar[] would solve problem nicely. Adding constness would also be easy. And you use only one reserved keyword - string - for everything.

I would be happy to hear some other opinions from people on NG. Maybe I am wrong with above arguments, so probably someone can give counterarguments... I think it is very important issue as it seems that most developers over the world are non-native-english-speakers...

PS. See also thread on DWT NG.

-- 
Regards
Marcin Kuszczak (Aarti_pl)
-------------------------------------
Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl)
Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/
-------------------------------------

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Leandro Lucarella
in reply to Bill Baxter

Leandro Lucarella

Posted in reply to Bill Baxter

Bill Baxter, el 26 de mayo a las 14:59 me escribiste:
> Plain 'string' really does make the most sense.

What about "text"?

Please see "The 'string' types" here[1] for an explanation.

[1] http://xlr.sourceforge.net/concept/diverge.html

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
 .------------------------------------------------------------------------,
  \  GPG: 5F5A8D05 // F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05 /
   '--------------------------------------------------------------------'
En la calle me crucé con un señor muy correcto, que habitualmente anda en
Falcon; iba corriendo con dos valijas en la mano y dijo: "Voy para Miami,
tiene algún mensaje o ..." y le dije: "No, no, no..."
	-- Extra Tato (1983, Triunfo de Alfonsín)

May 26, 2007

Re: string types: const(char)[] and cstring

Posted by Derek Parnell
in reply to Walter Bright

Derek Parnell

Posted in reply to Walter Bright

On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:

> Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable.

We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.

> To express an array of const chars, one would write:
> 
> 	const(char)[]
> 
> but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so:
> 
> 	alias const(char)[] cstring;
> 
> Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars.

No it doesn't.

I have rarely seen 'string' used as a variable. In phobos it is used in boxer.d and regexp.d only. I use it as an alias for 'char[]'. I see 'str' used fairly often but not so much 'string'.

'cstring' is pronounced C-String which instantly brings to mind the 'string' implementation used by C language. Not something I imagine you wish to imply.

> String literals, on the other hand, will be invariant (which means they
> can be stuffed into read-only memory). So,
> 	typeof("abc")
> will be:
> 	invariant(char)[3]
> 
> Invariants can be implicitly cast to const.

So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right?

     void func(const(char)[] x)
     {
      x = "def"; // ok
      x.length = 0; // ok
      x[0] = 'd'; // fails
     }

And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right?

     void func(invariant(char)[] x)
     {
      x = "def"; // fails
      x.length = 0; // fails
      x[0] = 'd'; // ok
     }

So what syntax is to be used so that x.ptr and x.length cannot be changed but the characters referred to by 'x' can be changed?

     void func(char const([]) x) ???
     {
      x = "def"; // fails
      x.length = 0; // fails
      x[0] = 'd' // ok
     }

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation