Thread overview
peculiarities with char[] and std.string
Jun 19, 2006
Kyle K
Jun 19, 2006
xs0
Jun 19, 2006
Kyle K
Jun 19, 2006
BCS
Jun 19, 2006
Kyle K
Jun 19, 2006
Kyle K
June 19, 2006
Greetings.

I was poking around the std.string lib, and was wondering if someone could answer a few questions about it. I'm relatively new to D, so I'm sure there are pretty obvious answers.

I notice in most of the functions like toStringz() and tolower() it implements the copy-on-write convention... but since the default function parameter is in, is there not already an implicit copy of the data being made? For example,

import std.stdio;
int main()
{
char []str, str2;
str="foo";
str2= bob(str);
writefln("%s:%s", str, str2);  // should print "foo:keke"
return 0;
}
char []bob(in char[] str)
{
str = "keke";
return str;
}

Works fine with my copy of DMD. Is this behavior not to be relied on as you shouldn't ever touch memory you didnt allocate (according to the FAQ)?


Also, why is the following the case:

printf("%s", "hello\0"); // Fails with access violation
printf("%s", cast(char *)"hello\0"); // OK

Is the implicit casting from char[] to char * doing something im not aware of in terms of the length of the string, like chopping off the \0?

My last question is which is the preferred method of making a copy of a string? Suppose I want str2 to be a copy of str, then:

str2.length = str.length;
str2[] = str;
//      These two equivalent?
str2 = str.dup;

Sorry for all the questions and thanks for the help, let me know if this info is somewhere obvious.. I wasn't able to find it in the spec.

Regards
Kyle K.


June 19, 2006
Kyle K wrote:
> Greetings.
> 
> I was poking around the std.string lib, and was wondering if someone could
> answer a few questions about it. I'm relatively new to D, so I'm sure there are
> pretty obvious answers.
> 
> I notice in most of the functions like toStringz() and tolower() it implements
> the copy-on-write convention... but since the default function parameter is in,
> is there not already an implicit copy of the data being made? 

No, just a copy of the _reference_ is made, but both point to the same data.

> For example,
> 
> import std.stdio;
> int main()
> {
> char []str, str2;
> str="foo";
> str2= bob(str);
> writefln("%s:%s", str, str2);  // should print "foo:keke"
> return 0;
> }
> char []bob(in char[] str)
> {
> str = "keke"; return str;
> }
> 
> Works fine with my copy of DMD. Is this behavior not to be relied on as you
> shouldn't ever touch memory you didnt allocate (according to the FAQ)?

Well, you didn't touch the memory you didn't allocate :) If you had

char[] bob(in char[] str)
{
    str[0] = 'a';
    return str;
}

You'd get "aoo:aoo" as output (or a crash, as you can't write into constants on some platforms)


> Also, why is the following the case:
> 
> printf("%s", "hello\0"); // Fails with access violation
> printf("%s", cast(char *)"hello\0"); // OK
> 
> Is the implicit casting from char[] to char * doing something im not aware of in
> terms of the length of the string, like chopping off the \0?

"hello\0" is a D char[] array, which is composed of length + char*. printf doesn't know about D arrays, so it takes the length to be the pointer to data, which fails for obvious reasons. When you cast it to char*, you lose the length, keep the pointer, and it works. I think you should use something like

printf("%.*s", "hello"); // no zero needed/wanted in this case..

Better yet, use writef/ln instead - it knows all about D's types..

> My last question is which is the preferred method of making a copy of a string?
> Suppose I want str2 to be a copy of str, then:
> 
> str2.length = str.length;
> str2[] = str;
> //      These two equivalent?
> str2 = str.dup;

Generally, .dup is/could/should be faster, as it's obvious you want a copy, so there's no need to initialize the destination array on resizing, for example.

Hope that helped :)


xs0
June 19, 2006
In article <e76aq8$qsr$1@digitaldaemon.com>, xs0 says...
>Well, you didn't touch the memory you didn't allocate :) If you had
>
>char[] bob(in char[] str)
>{
>     str[0] = 'a';
>     return str;
>}
>
>You'd get "aoo:aoo" as output (or a crash, as you can't write into constants on some platforms)

Ah ok, that makes sense. So using 'in' with arrays and aggregate types will always still give you a reference? I assume with primitives the semantics remain pass-by-value, such that foo(in int b) will never modify the caller's data?


>
>Hope that helped :)

It did, thanks a lot! :D


June 19, 2006
In article <e76aq8$qsr$1@digitaldaemon.com>, xs0 says...
>Well, you didn't touch the memory you didn't allocate :) If you had
>
>char[] bob(in char[] str)
>{
>     str[0] = 'a';
>     return str;
>}
>
>You'd get "aoo:aoo" as output (or a crash, as you can't write into constants on some platforms)

Ah ok, that makes sense. So using 'in' with arrays and aggregate types will always still give you a reference? I assume with primitives the semantics remain pass-by-value, such that foo(in int b) will never modify the caller's data?


>
>Hope that helped :)

It did, thanks a lot! :D


June 19, 2006
Kyle K wrote:
> In article <e76aq8$qsr$1@digitaldaemon.com>, xs0 says...
> 
>>Well, you didn't touch the memory you didn't allocate :) If you had
>>
>>char[] bob(in char[] str)
>>{
>>    str[0] = 'a';
>>    return str;
>>}
>>
>>You'd get "aoo:aoo" as output (or a crash, as you can't write into constants on some platforms)
> 
> 
> Ah ok, that makes sense. So using 'in' with arrays and aggregate types will
> always still give you a reference? I assume with primitives the semantics remain
> pass-by-value, such that foo(in int b) will never modify the caller's data?
> 
> 
Actually "in" always gives you a copy of the actual "thing". Arrays are reference types so you get a copy of the reference. Same with objects, as they are also reference types. Stucts on the other hand are not reference types and as such will get passed by value


class fooC{int i;}
struct fooS{int i;}


void main()
{
	fooC c1= new fooC, c2;
	c1.i = 0;
	c2 = fn(c1);
	writef(c1.i, " ", c2.i, \n);	// prints "1 1"

	fooS s1, s2;
	s1.i = 0;
	s2 = fn(s1);
	writef(s1.i, " ", s2.i, \n);	// prints "0 1"

}

fooC fn(in fooC v)
{
	v.i=1;
	return v;
}

fooS fn(in fooS v)
{
	v.i=1;
	return v;
}
June 19, 2006
In article <e76jri$1ds7$1@digitaldaemon.com>, BCS says...

>> Ah ok, that makes sense. So using 'in' with arrays and aggregate types will always still give you a reference? I assume with primitives the semantics remain pass-by-value, such that foo(in int b) will never modify the caller's data?
>> 
>> 
>Actually "in" always gives you a copy of the actual "thing". Arrays are reference types so you get a copy of the reference. Same with objects, as they are also reference types. Stucts on the other hand are not reference types and as such will get passed by value
>
>
>class fooC{int i;}
>struct fooS{int i;}
>
>
>void main()
>{
>	fooC c1= new fooC, c2;
>	c1.i = 0;
>	c2 = fn(c1);
>	writef(c1.i, " ", c2.i, \n);	// prints "1 1"
>
>	fooS s1, s2;
>	s1.i = 0;
>	s2 = fn(s1);
>	writef(s1.i, " ", s2.i, \n);	// prints "0 1"
>
>}
>
>fooC fn(in fooC v)
>{
>	v.i=1;
>	return v;
>}
>
>fooS fn(in fooS v)
>{
>	v.i=1;
>	return v;
>}

Got it, thanks a bunch. I knew it had to be something simple... :D