Thread overview
Another prayer for invariant strngs
Jul 12, 2007
Robert Fraser
Jul 12, 2007
Christian Kamm
Jul 12, 2007
torhu
Jul 13, 2007
Robert Fraser
Jul 13, 2007
Christian Kamm
Jul 13, 2007
0ffh
Jul 13, 2007
Regan Heath
Jul 13, 2007
Robert Fraser
July 12, 2007
Invariant strings have been discussed before (briefly) in discussions of constness, however I wish to bring up the subject again more directly.

The "string" alias as it is now (in D 2.0) is an odd beast. The problem is that it is invariant(char)[] instead of invariant(char[]), so that while the characters themselves are invariant, the array is mutable. This has two main problems:

1. It's confusing. There have been quite a few topics both in this newsgroup and in digitalmars.D.learn about how exactly to use the 2.0 string alias and where it's immutable/where it's not.

2. Performance. While writing my own code, I can pretend "string" is invariant (or use my own invariant(char[]) alias), but when passing to, or receiving code from library functions, this is not possible. This means that in each of these situations I must take two, performance-draining precautionary measures:
i. Duplicate the string every time it's passed in or out of my code.
ii.Synchronize multithreaded access to strings/acquire locks/etc.

Invariant strings have precedent: they're used in Java, .NET, Perl, Python, Ruby and quite a few other languages. And for when multiple string operations are going down, there's always char[] and .idup to fall back on, which are far better than Java's StringBuffer, etc.

So, please, Walter... consider Andrei's proposal and make "string" an alias to invariant(char[]). It'll make a lot of happiness happen.
July 12, 2007
> The problem is
> that it is invariant(char)[] instead of invariant(char[])

I was under the impression that invariant(char)[] was the same type as
invariant(char[]) as invariant/const never apply to the declaration itself?

So
invariant(int) == int,
invariant(int*) == invariant(int)*
invariant(int**) == invariant(int*)* != invariant(int)**

Or is that incorrect?

Christian
July 12, 2007
Christian Kamm wrote:
>> The problem is
>> that it is invariant(char)[] instead of invariant(char[])
> 
> I was under the impression that invariant(char)[] was the same type as
> invariant(char[]) as invariant/const never apply to the declaration itself?
> 
> So
> invariant(int) == int,
> invariant(int*) == invariant(int)*
> invariant(int**) == invariant(int*)* != invariant(int)**
> 
> Or is that incorrect?

That's my understanding too, but I'm a bit confused by that fact that Walter's examples uses both variants.
July 13, 2007
Robert Fraser wrote:
> 2. Performance. While writing my own code, I can pretend "string" is
> invariant (or use my own invariant(char[]) alias), but when passing to,
> or receiving code from library functions, this is not possible. This
> means that in each of these situations I must take two,
> performance-draining precautionary measures: i. Duplicate the string
> every time it's passed in or out of my code. ii.Synchronize
> multithreaded access to strings/acquire locks/etc.

I don't quite see this point. The way I understand D2.0 strings (which
may be like so much wrong, but still), with invariant(char)[] you can
be sure the characters will never change, so there is totally no reason
to duplicate that string. Only the pointer to the characters and the
length information are mutable.

> Invariant strings have precedent: they're used in Java, .NET, Perl,
> Python, Ruby and quite a few other languages.

In my book, precedence in itself is no argument - except for lemmings. ;-)

Regards, Frank
July 13, 2007
(disclaimer, I have done only the testing shown at the end of this post)

Robert Fraser wrote:
> Invariant strings have been discussed before (briefly) in discussions
> of constness, however I wish to bring up the subject again more
> directly.
> 
> The "string" alias as it is now (in D 2.0) is an odd beast. The
> problem is that it is invariant(char)[] instead of invariant(char[]),
> so that while the characters themselves are invariant, the array is
> mutable. 

This makes sense if you think about it from the compilers point of view.

It has placed the characters themselves in ROM but the array reference is in RAM so it's pointer and length can change.  So, this is valid:

invariant(char)[] a = "foo";
invariant(char)[] b = "bar";
b = a;

But these are invalid:

char[] p;

a[0] = 'a'; //for any given rvalue
b[] = a[];  //and slicing variants
p = a;      //p cannot point to invariant(char)

If you want to prevent the reference from changing make it 'final', eg.

final invariant(char)[] a;

> This has two main problems:
> 
> 1. It's confusing. There have been quite a few topics both in this
> newsgroup and in digitalmars.D.learn about how exactly to use the 2.0
> string alias and where it's immutable/where it's not.

I wont argue as to whether it's confusing, but to me it seems the basic concept is:  "A 'string' reference isn't immutable (or rather 'final'), but it's data is (immutable)".

> 2. Performance. While writing my own code, I can pretend "string" is
> invariant (or use my own invariant(char[]) alias), but when passing
> to, or receiving code from library functions, this is not possible.

When you pass string to a function that function gets a /copy/ of the reference.  So, there is technically no need for the copied reference to be invariant (or rather 'final').  Changes to the reference in the function *do not* propagate back to the caller.

Unless, however, the parameter is 'ref'.  In which case changes to the reference propagate back to the caller.  In this case if your reference is final DMD will error, see test case below.

In short, if you use 'final' on your strings then even if you call a library function which takes a 'ref' the compiler will protect you.

> This means that in each of these situations I must take two,
> performance-draining precautionary measures: i. Duplicate the string
> every time it's passed in or out of my code. ii.Synchronize
> multithreaded access to strings/acquire locks/etc.

You do not need to sync access to invariant data, but you may need to sync access to an array reference (if your code, or library code might change it).  To prevent changes make your strings final.

> Invariant strings have precedent: they're used in Java, .NET, Perl,
> Python, Ruby and quite a few other languages. And for when multiple
> string operations are going down, there's always char[] and .idup to
> fall back on, which are far better than Java's StringBuffer, etc.

Does Java prevent you re-assigning an invariant string reference?  If so, are they implicitly 'final' then?

> So, please, Walter... consider Andrei's proposal and make "string" an
> alias to invariant(char[]). It'll make a lot of happiness happen.

I think a greater understanding of the current system is required before we start opting for changes.

 - Regan Heath

Test cases:

void main()
{
	invariant(char)[] p1 = "one";
	invariant(char[]) p2 = "two";
	final invariant(char[]) p3 = "three";
	char[] p4 = "four".dup;
	const(char)[] p5 = "five";
	const(char[]) p6 = "six";

	//p1[0] = 'a'; //Error: p1[0] is not mutable
	//p2[0] = 'a'; //Error: p2[0] is not mutable
	//p3[0] = 'a'; //Error: p3[0] is not mutable
	p4[0] = 'a'; //ok
	//p5[0] = 'a'; //Error: p5[0] is not mutable
	//p6[0] = 'a'; //Error: p6[0] is not mutable
	
	//p1[] = p2[]; //Error: slice p1[] is not mutable
	//p2[] = p1[]; //Error: slice p2[] is not mutable
	//p3[] = p1[]; //Error: slice p3[] is not mutable
	p4[] = p1[]; //ok
	//p5[] = p1[]; //Error: slice p5[] is not mutable
	//p6[] = p1[]; //Error: slice p6[] is not mutable
	
	p1 = p2; //ok
	p2 = p1; //ok	
	//p3 = p1; //variable invariant.p3 cannot modify final/const/invariant variable 'p3'
	//p4 = p1;  //Error: cannot implicitly convert expression (p1) of type invariant(char)[] to char[]
	p5 = p1; //ok
	p6 = p1; //ok

	foo(p3); //variable invariant.main.p3 cannot modify final/const/invariant variable 'p3'
}

/*
void foo(final invariant(char)[] param)
{
	//param = "test";  //variable invariant.foo.param cannot modify final/const/invariant variable 'param'
}
*/

void foo(ref invariant(char)[] param)
{
	param = "test";  //variable invariant.foo.param cannot modify final/const/invariant variable 'param'
}
July 13, 2007
Oh, sorry, guess I was quite wrong. So does this mean I don't need to be making defensive copies of every string?

torhu Wrote:

> Christian Kamm wrote:
> >> The problem is
> >> that it is invariant(char)[] instead of invariant(char[])
> > 
> > I was under the impression that invariant(char)[] was the same type as
> > invariant(char[]) as invariant/const never apply to the declaration itself?
> > 
> > So
> > invariant(int) == int,
> > invariant(int*) == invariant(int)*
> > invariant(int**) == invariant(int*)* != invariant(int)**
> > 
> > Or is that incorrect?
> 
> That's my understanding too, but I'm a bit confused by that fact that Walter's examples uses both variants.

July 13, 2007
Oh, didn't see your message. That's awesome, thanks! No, I didn't want the refrences to be final, just the data. Basically, I want to ensure that functions I call won't mess around with my data.

Thanks!
All the best,
Fraser

Regan Heath Wrote:

> (disclaimer, I have done only the testing shown at the end of this post)
> 
> Robert Fraser wrote:
> > Invariant strings have been discussed before (briefly) in discussions of constness, however I wish to bring up the subject again more directly.
> > 
> > The "string" alias as it is now (in D 2.0) is an odd beast. The
> > problem is that it is invariant(char)[] instead of invariant(char[]),
> > so that while the characters themselves are invariant, the array is
> > mutable.
> 
> This makes sense if you think about it from the compilers point of view.
> 
> It has placed the characters themselves in ROM but the array reference is in RAM so it's pointer and length can change.  So, this is valid:
> 
> invariant(char)[] a = "foo";
> invariant(char)[] b = "bar";
> b = a;
> 
> But these are invalid:
> 
> char[] p;
> 
> a[0] = 'a'; //for any given rvalue
> b[] = a[];  //and slicing variants
> p = a;      //p cannot point to invariant(char)
> 
> If you want to prevent the reference from changing make it 'final', eg.
> 
> final invariant(char)[] a;
> 
>  > This has two main problems:
> > 
> > 1. It's confusing. There have been quite a few topics both in this newsgroup and in digitalmars.D.learn about how exactly to use the 2.0 string alias and where it's immutable/where it's not.
> 
> I wont argue as to whether it's confusing, but to me it seems the basic concept is:  "A 'string' reference isn't immutable (or rather 'final'), but it's data is (immutable)".
> 
> > 2. Performance. While writing my own code, I can pretend "string" is invariant (or use my own invariant(char[]) alias), but when passing to, or receiving code from library functions, this is not possible.
> 
> When you pass string to a function that function gets a /copy/ of the reference.  So, there is technically no need for the copied reference to be invariant (or rather 'final').  Changes to the reference in the function *do not* propagate back to the caller.
> 
> Unless, however, the parameter is 'ref'.  In which case changes to the reference propagate back to the caller.  In this case if your reference is final DMD will error, see test case below.
> 
> In short, if you use 'final' on your strings then even if you call a library function which takes a 'ref' the compiler will protect you.
> 
> > This means that in each of these situations I must take two, performance-draining precautionary measures: i. Duplicate the string every time it's passed in or out of my code. ii.Synchronize multithreaded access to strings/acquire locks/etc.
> 
> You do not need to sync access to invariant data, but you may need to sync access to an array reference (if your code, or library code might change it).  To prevent changes make your strings final.
> 
> > Invariant strings have precedent: they're used in Java, .NET, Perl, Python, Ruby and quite a few other languages. And for when multiple string operations are going down, there's always char[] and .idup to fall back on, which are far better than Java's StringBuffer, etc.
> 
> Does Java prevent you re-assigning an invariant string reference?  If so, are they implicitly 'final' then?
> 
> > So, please, Walter... consider Andrei's proposal and make "string" an alias to invariant(char[]). It'll make a lot of happiness happen.
> 
> I think a greater understanding of the current system is required before we start opting for changes.
> 
>   - Regan Heath
> 
> Test cases:
> 
> void main()
> {
> 	invariant(char)[] p1 = "one";
> 	invariant(char[]) p2 = "two";
> 	final invariant(char[]) p3 = "three";
> 	char[] p4 = "four".dup;
> 	const(char)[] p5 = "five";
> 	const(char[]) p6 = "six";
> 
> 	//p1[0] = 'a'; //Error: p1[0] is not mutable
> 	//p2[0] = 'a'; //Error: p2[0] is not mutable
> 	//p3[0] = 'a'; //Error: p3[0] is not mutable
> 	p4[0] = 'a'; //ok
> 	//p5[0] = 'a'; //Error: p5[0] is not mutable
> 	//p6[0] = 'a'; //Error: p6[0] is not mutable
> 
> 	//p1[] = p2[]; //Error: slice p1[] is not mutable
> 	//p2[] = p1[]; //Error: slice p2[] is not mutable
> 	//p3[] = p1[]; //Error: slice p3[] is not mutable
> 	p4[] = p1[]; //ok
> 	//p5[] = p1[]; //Error: slice p5[] is not mutable
> 	//p6[] = p1[]; //Error: slice p6[] is not mutable
> 
> 	p1 = p2; //ok
> 	p2 = p1; //ok
> 	//p3 = p1; //variable invariant.p3 cannot modify final/const/invariant
> variable 'p3'
> 	//p4 = p1;  //Error: cannot implicitly convert expression (p1) of type
> invariant(char)[] to char[]
> 	p5 = p1; //ok
> 	p6 = p1; //ok
> 
> 	foo(p3); //variable invariant.main.p3 cannot modify
> final/const/invariant variable 'p3'
> }
> 
> /*
> void foo(final invariant(char)[] param)
> {
> 	//param = "test";  //variable invariant.foo.param cannot modify
> final/const/invariant variable 'param'
> }
> */
> 
> void foo(ref invariant(char)[] param)
> {
> 	param = "test";  //variable invariant.foo.param cannot modify
> final/const/invariant variable 'param'
> }

July 13, 2007
> So does this mean I don't need to be
> making defensive copies of every string?

Yep, dynamic arrays behave very much like pointers or classes:

void foo(const(char)[] str)
{
  // valid since str is not final
  // only changes local copy of array pointer and length
  str = "abc";

  // illegal! can't change the data of the array
  str[] = "abc";
}