string types: const(char)[] and cstring (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » string types: const(char)[] and cstring (page 4)

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Chris Nicholson-Sauls
in reply to Walter Bright

Chris Nicholson-Sauls

Posted in reply to Walter Bright

Walter Bright wrote:
> Reiner Pope wrote:
>> Will there be something in the type system which enables you to safely say, "This is the only reference to this data, so it's ok for me to make this invariant" ?
> 
> Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.
> 
>> Does 'scope' happen to have anything to do with that?
> 
> No. Scope just ensures that the reference does not 'escape' the scope it's in.
> 
>> invariant(char)[] createJunk()
>> {
>>     /* scope? */ char[] val = "aaaaa".dup;
>>     size_t index = rand() % 5;
>>     val[index] = rand();
>>
>>     return cast(invariant(char)[]) val;
>> }
>>
>> I mean, do I really need to cast it to invariant there? It's easy to see that there's only one copy of val's data in existance.
> 
> Easy for you to see, not so easy for the compiler to. And besides:
> 
>     return cast(invariant)val;
> 
> will do the trick more conveniently.

That's an interesting syntax, casting to a trait/attribute with the rest of the type inferred.  I presume cast(const) works as well.  (Maybe cast(scope)?  Then again, what's the use...)  Given cast(*) where * is invariant/const, is cast(*)T[] the same as cast(*(T)[]) or cast(*(T[]))?  That is, does the trait apply to the element type, or the array?

-- Chris Nicholson-Sauls

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Reiner Pope
in reply to Walter Bright

Reiner Pope

Posted in reply to Walter Bright

Walter Bright wrote:
> Reiner Pope wrote:
>> Will there be something in the type system which enables you to safely say, "This is the only reference to this data, so it's ok for me to make this invariant" ?
> 
> Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.
> 
>> Does 'scope' happen to have anything to do with that?
> 
> No. Scope just ensures that the reference does not 'escape' the scope it's in.

I must have misunderstood what scope specifies. I had thought that, to avoid being escaped, scope specified that your variable may not be aliased by another (non-scope) name. In that case, I thought, can't you say: "well, when I leave this function, I'm the only one holding a reference to this data, so it would be safe to call it invariant (or anything else I choose)." I thought a compiler could have a special case saying, "at the end of scope, you can safely turn any scope variables into whatever you want".

However, I was surprised to find out that the following code compiled fine, although it returns a dead object:

Foo foo()
{
    scope Foo f = new Foo();
    Foo g = f;
    return g;
}

  -- Reiner

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Walter Bright
in reply to Reiner Pope

Walter Bright

Posted in reply to Reiner Pope

Reiner Pope wrote:
> However, I was surprised to find out that the following code compiled fine, although it returns a dead object:

Sadly, it currently isn't enforced.

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Walter Bright
in reply to Chris Nicholson-Sauls

Walter Bright

Posted in reply to Chris Nicholson-Sauls

Chris Nicholson-Sauls wrote:
> That's an interesting syntax, casting to a trait/attribute with the rest of the type inferred.  I presume cast(const) works as well.  (Maybe cast(scope)?  Then again, what's the use...)  Given cast(*) where * is invariant/const, is cast(*)T[] the same as cast(*(T)[]) or cast(*(T[]))?  That is, does the trait apply to the element type, or the array?

Both.

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Derek Parnell
in reply to Walter Bright

Derek Parnell

Posted in reply to Walter Bright

On Sat, 26 May 2007 22:27:18 -0700, Walter Bright wrote:

> Derek Parnell wrote:
>> We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.
> 
> You'll still be able to concatenate and slice invariant strings. You can also cast a char[] to an invariant, when you're done building it.

While that is interesting, it has not much to do with what I was saying.

You said "strings should be immutable" and I saying that seems odd because my experience is that most strings are meant to be changed.

So now I'm thinking that we are talking about different things when we use the word "string". I'm guessing you are really referring to compile-time generated string data (e.g. literals) rather than run-time generated string data.

>> So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right?
> 
> Right.
> 
>> And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right?
> 
> Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable.

Huh??? Isn't that what I just said? Now I'm even more confused about these terms. They are just not intuitive, are they?

> Const is only immutable through the reference - another reference to the same data can change it.

Ok ... so this below won't fail ...

  void func(const char[] parm)
  {
      char [] q;
      q = parm;
      q[0] = 'a';
  }

or is the "q = parm" not really permitted.

>> So what syntax is to be used so that x.ptr and x.length cannot be changed but the characters referred to by 'x' can be changed?
> 
> final char[] x;

Given the syntax on the form "  void func(<X> char[] parm) ", is the table
below true ...

*-------------------------------------*
| <X>         + parm.ptr  |  parm[0]  |
|-------------+-----------------------+
| const       | mutable   | immutable |
| final       | immutable | mutable   |
| invariant   | immutable | immutable |
|             | mutable   | mutable   |
*-------------------------------------*

I'm sorry I'm a bit slow on this ... but what is the difference between "invariant" and "const final" ? Is it that "invariant" is sort of a global effect but "const final" is only in effect for the specific reference it occurs on.

I'm not looking forward to reading the docs on this. I hope you get a lot of people to edit the docs to make it understandable for everyone.

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Walter Bright
in reply to Derek Parnell

Walter Bright

Posted in reply to Derek Parnell

Derek Parnell wrote:
> You said "strings should be immutable" and I saying that seems odd because
> my experience is that most strings are meant to be changed. 

I'm going to argue that your experience is unusual. I do a lot of string manipulation (after all, that's what a compiler does) and the strings, once constructed, are essentially always immutable. In conversations with many others, my experience is commonplace.

But still, in D, nothing prevents you from using mutable strings.

> So now I'm thinking that we are talking about different things when we use
> the word "string". I'm guessing you are really referring to compile-time
> generated string data (e.g. literals) rather than run-time generated string
> data.

I'm referring to the arrays of characters, generated or literals.

>>> So 'const(char)[] x' means that I can change x.ptr and x.length but I
>>> cannot change anything that x.ptr points to, right?
>> Right.
>>
>>> And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
>>> I cannot change anything that x.ptr points to, right?
>> Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable. 
> 
> Huh??? Isn't that what I just said?

No. You said for const you could change x.ptr and x.length, but for invariant you could not. For both const and invariant, you can change x.ptr and x.length.

> Now I'm even more confused about these
> terms. They are just not intuitive, are they?

The problem is I have failed to explain them. Invariant data can go into read-only memory. Const data can be changed by another reference to the same data (just like in C++). In other words, const is a read-only *view* of the data, whereas invariant data is read-only for all views of it.

>> Const is only immutable through the reference - another reference to the same data can change it.
> 
> Ok ... so this below won't fail ...
> 
>   void func(const char[] parm)
>   {
>       char [] q;
>       q = parm;
error, q is not const.
>       q[0] = 'a';
>   }
> 
> or is the "q = parm" not really permitted.

Right.

> 
>>> So what syntax is to be used so that x.ptr and x.length cannot be changed
>>> but the characters referred to by 'x' can be changed?
>> final char[] x;
> 
> 
> Given the syntax on the form "  void func(<X> char[] parm) ", is the table
> below true ...
> 
> *-------------------------------------*
> | <X>         + parm.ptr  |  parm[0]  |    |-------------+-----------------------+
> | const       | mutable   | immutable |
> | final       | immutable | mutable   |
> | invariant   | immutable | immutable |
> |             | mutable   | mutable   |
> *-------------------------------------*

You've got invariant wrong, it's mutable|immutable.

> I'm sorry I'm a bit slow on this ... but what is the difference between
> "invariant" and "const final" ? Is it that "invariant" is sort of a global
> effect but "const final" is only in effect for the specific reference it
> occurs on.

First differences: final is a *storage class*. const and invariant are *type constructors*.

final only refers to the actual value that a symbol has, and it means that, once a value is assigned to a symbol, that value can never change. If the value is a pointer or reference, what it points to *can* be changed.

int x = 3;
final int* p = &x;
p = null; // error, p is final
*p = 1; // ok

const(int)* q = null;
q = &x;  // ok, q is not const, and now *q is 1
*q = 2;  // error, *q is const
*p = 5;  // ok, but now *q is 5, too!
x = 6;   // ok, but now *q is 6

invariant(int)* s = null;
s = &x;  // error, cannot implicitly convert int* to invariant(int)*
int y = 4;
s = cast(invariant(int)*)&y; // ok, trust programmer that y is immutable
*s = 3;  // error, *s is immutable
y = 5;   // undefined behavior, as y is never supposed to change,
         // and compiler assumes *s is still 4

Note that int* can be implicitly converted to const(int)*, and invariant(int)* can be implicitly converted to const(int)*.

> I'm not looking forward to reading the docs on this. I hope you get a lot
> of people to edit the docs to make it understandable for everyone.

The thing is actually rather simple, but I am having trouble finding the right words to express it. Certainly, the mishmash of C++ const has badly muddied the waters about what const means.

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Anders F Björklund
in reply to Bill Baxter

Anders F Björklund

Posted in reply to Bill Baxter

Bill Baxter wrote:

>> The same here. I don't have much experience with Java and really don't know
>> why const strings are so usefull...
>> Maybe someone could elaborate a little bit more?
> 
> Ditto here.  When I've used java I found it more annoying that strings were immutable than anything else.

When using Java (and Objective-C), I've found it very useful that strings (and others) are immutable since they are then thread-safe.

--anders

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Derek Parnell
in reply to Walter Bright

Derek Parnell

Posted in reply to Walter Bright

On Sun, 27 May 2007 01:09:40 -0700, Walter Bright wrote:

Thanks for taking the time out to help me understand the proposed D changes. I really appreciate it.

I think that I'm going to have to wait until you have an implementation to try it on; to see how it fits with my terminology and needs.

> Derek Parnell wrote:
>> You said "strings should be immutable" and I saying that seems odd because my experience is that most strings are meant to be changed.
> 
> I'm going to argue that your experience is unusual. I do a lot of string manipulation (after all, that's what a compiler does) and the strings, once constructed, are essentially always immutable. In conversations with many others, my experience is commonplace.

Ok we'll leave it that then. However the phrase "once constructed" is the key one I suspect. Its like saying, once I've finished changing things I don't want them to change anymore - no argument there. So the idea would be to work with mutable strings until they are finished being constructed and then cast them to immutable for the rest of the run time. I'm thinking here of things like changing case, macro expansion, standarizing file names, constructing message text, etc ...

> But still, in D, nothing prevents you from using mutable strings.

That's why I can see that I'll be continuing to use 'alias char[] string', unless you make 'string' the immutable beastie of course <g>

>>>> So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right?
>>> Right.
>>>
>>>> And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right?
>>> Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable.
>> 
>> Huh??? Isn't that what I just said?
> 
> No. You said for const you could change x.ptr and x.length, but for invariant you could not. For both const and invariant, you can change x.ptr and x.length.

See, this is what is weird ... I can have an invariant string which can be changed, thus making it not really invariant in the English language sense. I'm still thinking that "invariant" means "does not change ever".

But it seems that I'm wrong ...

 invariant char[] x;
 x = "abc".dup;  // The string 'x' now contains "abc";
 x = "def".dup;  // The string (which is not supposed to change
                 // i.e invariant) has been changed to "def".

Now this is counter-intuitive (read: *WEIRD*), no?

>> Now I'm even more confused about these
>> terms. They are just not intuitive, are they?
> 
> The problem is I have failed to explain them. Invariant data can go into read-only memory. Const data can be changed by another reference to the same data (just like in C++). In other words, const is a read-only *view* of the data, whereas invariant data is read-only for all views of it.

Okay, I've got that now ... but how to remember that two terms that mean the same in English actually mean different things in D <G>

I think I read that someone suggested that 'const' be a contraction of 'constrained' rather than 'constant' - that might help. And that 'invariant' is longer than 'const' so its effect is 'bigger'.

  invariant char[] x; // The data pointed to by 'x' cannot be changed
                      // by anything anytime during the execution
                      // of the program.
                      // (So how do I populate it then? Hmmmm ...)

  const char[] y;    // The data pointed to by 'y' cannot be changed
                     // by anything anytime during the execution
                     // of the program when using the 'y' variable,
                     // however using another variable that also
                     // refers to y's data, or some of it, is ok.

For example ...

  void func (const char[] a, char[] b)
  {
        a[0] = 'a'; // fails
        b[0] = 'a'; // succeeds
  }

  char[] y = "def".dup;
  func( y, y);

>> I'm sorry I'm a bit slow on this ... but what is the difference between "invariant" and "const final" ? Is it that "invariant" is sort of a global effect but "const final" is only in effect for the specific reference it occurs on.
> 
> First differences: final is a *storage class*. const and invariant are *type constructors*.

Thanks. So 'final' means that it can be changed (from its initial default
value) once and only once.

/* --- Scenario #1 --- */
  final int r;
  r = randomer(); // succeeds
  foo(); // fails

  int randomer() {
      // Get a random integer between -100 and 100.
      return cast(int)(std.random.rand() % 201) - 100;
  }
  void foo() {
    r = randomer(); // success depends on whether or not 'r'
                    // has already been set.
  }


/* --- Scenario #2 --- */
  final int r;

  foo(); // succeeds
  r = randomer(); // fails

  int randomer() {
      // Get a random integer between -100 and 100.
      return cast(int)(std.random.rand() % 201) - 100;
  }
  void foo() {
    r = randomer(); // success depends on whether or not 'r'
                    // has already been set.
  }

Is this a run-time check or a compile time one? If run-time, would it be possible to somehow 'unfinal' a variable using some implementation dependant trickery.

>> I'm not looking forward to reading the docs on this. I hope you get a lot of people to edit the docs to make it understandable for everyone.
> 
> The thing is actually rather simple, but I am having trouble finding the right words to express it.

And thus my comment re editors.

> Certainly, the mishmash of C++ const has badly muddied the waters about what const means.

I have no real knowledge of C++ or its const, and I'm still weirded out by it all <G>

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by Derek Parnell
in reply to Walter Bright

Derek Parnell

Posted in reply to Walter Bright

On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:

> Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write:
> 
> 	const(char)[]
> 
> but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so:
> 
> 	alias const(char)[] cstring;
> 

 const(char)[]  // A mutable array of immutable characters?
 const(char[])  // An immutable array of mutable characters?
 const(const(char)[]) // An immutable array of immutable characters?
 char[]         // A mutable array of mutable characters?

What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27, 2007

Re: string types: const(char)[] and cstring

Posted by renoX
in reply to Marcin Kuszczak

renoX

Posted in reply to Marcin Kuszczak

Marcin Kuszczak a écrit :
> Chris Miller wrote:
> 
>> Actually, while we're at a change for strings, why not bring in something
>> similar to my dstring module, where slicing and indexing never result in
>> an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the
>> code may not be ideal, but it's the concept I'm referring to.
> 
> Yup. That's my opinion also...
> 
> For me advantages of such a string are quite obvious:
> 1. Easy slicing and indexing of utf8 sequences (without corrupting this
> sequence - as mention above)
> 2. Common denominator for char[], wchar[] and dchar[]
> 3. For classes which doesn't need speed it simplifies API (only one version
> of functions instead of 3)
> 4. With some additional support from language (cast operators to different
> types and opImplicitCast) it can be fully interchangeable with every method
> taking char[], wchar[], dchar[].
> 
> Having another 3 names for string is not very appealing for me. We would
> have 9 official versions of string available in D:
> char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char),
> tango String!(wchar), tango String!(dchar)
> 
> To write nice, fully functional library you have to write 3 versions of
> every function which takes different string types (I know, templates makes
> it a little bit easier). Probably I will not be wrong when I say that
> reality is that people just write one version for char[], because it is
> convenient (see: SWT ported from Java). It causes that wchar and dchar are
> treated as second class citizens in D. Additionally when people design
> their program for char[], they mostly don't think about issues with slicing
> of char[] utf8 sequence (warning! assumption!), so default way of writing
> programs is *NOT SAFE*. When you write code and don't care about bare metal
> speed it is just tedious to do this additional work... 
> 
> Having one string, which hides differences between char[], wchar[] and
> dchar[] would solve problem nicely. Adding constness would also be easy.
> And you use only one reserved keyword - string - for everything.
> 
> I would be happy to hear some other opinions from people on NG. Maybe I am
> wrong with above arguments, so probably someone can give
> counterarguments... I think it is very important issue as it seems that
> most developers over the world are non-native-english-speakers...
> 
> PS. See also thread on DWT NG.

I agree with you, I don't think that the string should be a char[] alias, wether it's const or not but a class with char[],dchar[],wchar[] under the hood representation and safe slicing by default.

The difficulty is providing enough flexibility for managing correctly the internal representation: there should be a possibility to say use UTF8 even though there are multibyte characters for example (a size optimization with some CPU cost).

renoX

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation