Jump to page: 1 2
Thread overview
Newbie Question about strings
May 10, 2004
hellcatv
May 10, 2004
Ben Hinkle
May 10, 2004
Sean Kelly
May 10, 2004
Ben Hinkle
May 10, 2004
Daniel Horn
May 11, 2004
Ben Hinkle
May 11, 2004
hellcatv
May 11, 2004
Norbert Nemec
COW, gc and strings
May 11, 2004
Walter
May 11, 2004
Sean Kelly
May 11, 2004
Norbert Nemec
May 11, 2004
Sean Kelly
May 10, 2004
does the following result in undefined behavior (as if I had realloc'd the char
* inp in C?)
i.e. could the inp[0]='A' also affect the char[] s;


import std.string;
void mod (char [] inp) {
inp~="8";
inp[0]='A';
printf ("\n%s ",std.string.toStringz(inp));
printf ("%d ",inp.length);
}

int main () {
char [] s = "1234567";
printf ("%s\n",std.string.toStringz(s));
mod(s);
printf ("%s\n",std.string.toStringz(s));
return 0;
}


May 10, 2004
The "~=" operator will reallocate if there isn't space already. That is why the std.string uses "copy-on-write" semantics - meaning if you don't "own" an array you make a copy before changing it.

<hellcatv@hotmail.com> wrote in message news:c7of0m$c15$1@digitaldaemon.com...
> does the following result in undefined behavior (as if I had realloc'd the
char
> * inp in C?)
> i.e. could the inp[0]='A' also affect the char[] s;
>
>
> import std.string;
> void mod (char [] inp) {
> inp~="8";
> inp[0]='A';
> printf ("\n%s ",std.string.toStringz(inp));
> printf ("%d ",inp.length);
> }
>
> int main () {
> char [] s = "1234567";
> printf ("%s\n",std.string.toStringz(s));
> mod(s);
> printf ("%s\n",std.string.toStringz(s));
> return 0;
> }
>
>


May 10, 2004
Ben Hinkle wrote:

> The "~=" operator will reallocate if there isn't space already. That is why
> the std.string uses "copy-on-write" semantics - meaning if you don't "own"
> an array you make a copy before changing it.

But D is a GC language.  Would there even be a dangling reference in this case?  I assumed that this would just result in a side-effect.

Sean
May 10, 2004
"Sean Kelly" <sean@f4.ca> wrote in message news:c7osgt$10f4$1@digitaldaemon.com...
> Ben Hinkle wrote:
>
> > The "~=" operator will reallocate if there isn't space already. That is
why
> > the std.string uses "copy-on-write" semantics - meaning if you don't
"own"
> > an array you make a copy before changing it.
>
> But D is a GC language.  Would there even be a dangling reference in this case?  I assumed that this would just result in a side-effect.

umm, I'm not sure what the GC has to do with it, but yeah, the GC will collect the copy if all the references go away. COW is to prevent side-effects.


May 10, 2004
right the docs say "you" but I wasn't sure if it means I must do it or by modifying it, the lib does a copy-on-write.

so I must specifically make a copy of it in order to guarantee that my function will not result in side effects?

could I make a wrapper struct that guaranteed it would copy when passed into a function (like C++ strings)?  in C I could wrap a static array into a struct in order to get pass-by-value semantics (of course the size of this array was known) and in C++ I could make a copy constructor that would implicitly get called when I passed the string into a wrapper function.

is there anything similar in D for char[] arrays?

Ben Hinkle wrote:
> "Sean Kelly" <sean@f4.ca> wrote in message
> news:c7osgt$10f4$1@digitaldaemon.com...
> 
>>Ben Hinkle wrote:
>>
>>
>>>The "~=" operator will reallocate if there isn't space already. That is
> 
> why
> 
>>>the std.string uses "copy-on-write" semantics - meaning if you don't
> 
> "own"
> 
>>>an array you make a copy before changing it.
>>
>>But D is a GC language.  Would there even be a dangling reference in
>>this case?  I assumed that this would just result in a side-effect.
> 
> 
> umm, I'm not sure what the GC has to do with it, but yeah, the GC will
> collect the copy if all the references go away. COW is to prevent
> side-effects.
> 
> 
May 11, 2004
Daniel Horn wrote:

> right the docs say "you" but I wasn't sure if it means I must do it or by modifying it, the lib does a copy-on-write.

copy-on-write is not enforced by the compiler but it is a technique used by std.string (and probably the rest of phobos). If you look at std.string.tolower, for example, you will see how it delays making a copy until it absolutely has to. I'm not sure which part of the doc you are looking at.

> so I must specifically make a copy of it in order to guarantee that my function will not result in side effects?

If you write a statement like
 str[3] = 'a';
then you should think about using COW. If you want to guarantee your
function has no side effect then you should make a copy. If you write
 str = tolower(str);
in your function then you don't have to make a copy since tolower uses COW
already and it will make a copy if it needs to.

> could I make a wrapper struct that guaranteed it would copy when passed into a function (like C++ strings)?  in C I could wrap a static array into a struct in order to get pass-by-value semantics (of course the size of this array was known) and in C++ I could make a copy constructor that would implicitly get called when I passed the string into a wrapper function.
> 
> is there anything similar in D for char[] arrays?

I suppose you could wrap the array in a struct that overloads opIndex assignment and do something funky, but I haven't really thought about it. Seems like a lot of trouble to avoid using strings.

May 11, 2004
you have some good points
but what is a good way for a new person (or someone reading someone else's code)
to know if the function is exhibiting copy on write

if C++ had the same feature for strings then I would assume that a const string would not be modified and a non const string would be...

can I assume all phobos-related string functions that need to perform copy on write then?  it's a potential pitfall for new programmers to have the opCat function ~= sometimes copy on write yet the tolower function copies on write

perhaps this just needs to be mentioned carefully in the documentation...preferably in a consistent manner

I also noticed that
char [] blah="1234567";
char [] bleh=blah;
bleh~="";
bleh[0]='A';
blah[0] is still '1'
perhaps ~= also guarantees copy-on-write semantics? :-)
that would make phobos a quite consistent library then


In article <c7pat1$1kk4$1@digitaldaemon.com>, Ben Hinkle says...
>
>Daniel Horn wrote:
>
>> right the docs say "you" but I wasn't sure if it means I must do it or by modifying it, the lib does a copy-on-write.
>
>copy-on-write is not enforced by the compiler but it is a technique used by std.string (and probably the rest of phobos). If you look at std.string.tolower, for example, you will see how it delays making a copy until it absolutely has to. I'm not sure which part of the doc you are looking at.
>
>> so I must specifically make a copy of it in order to guarantee that my function will not result in side effects?
>
>If you write a statement like
> str[3] = 'a';
>then you should think about using COW. If you want to guarantee your function has no side effect then you should make a copy. If you write
> str = tolower(str);
>in your function then you don't have to make a copy since tolower uses COW already and it will make a copy if it needs to.
>
>> could I make a wrapper struct that guaranteed it would copy when passed into a function (like C++ strings)?  in C I could wrap a static array into a struct in order to get pass-by-value semantics (of course the size of this array was known) and in C++ I could make a copy constructor that would implicitly get called when I passed the string into a wrapper function.
>> 
>> is there anything similar in D for char[] arrays?
>
>I suppose you could wrap the array in a struct that overloads opIndex assignment and do something funky, but I haven't really thought about it. Seems like a lot of trouble to avoid using strings.
>


May 11, 2004
Ben Hinkle wrote:
> "Sean Kelly" <sean@f4.ca> wrote in message
> news:c7osgt$10f4$1@digitaldaemon.com...
>>
>>But D is a GC language.  Would there even be a dangling reference in
>>this case?  I assumed that this would just result in a side-effect.
> 
> umm, I'm not sure what the GC has to do with it, but yeah, the GC will
> collect the copy if all the references go away. COW is to prevent
> side-effects.

By GC I meant that the string is effectively passed by reference, so a reallocation would not leave the passed variable pointing to bad memory as may happen in C using pointers.  I just wanted to clarify the semantics that the result is not "undefined" but rather merely that the function has a side-effect.

Sean
May 11, 2004
Ben Hinkle wrote:
> 
> copy-on-write is not enforced by the compiler but it is a technique used by
> std.string (and probably the rest of phobos). If you look at
> std.string.tolower, for example, you will see how it delays making a copy
> until it absolutely has to. I'm not sure which part of the doc you are
> looking at.

COW is great in many cases but it can be a nightmare with multithreaded programming.  It almost makes me wish that we could specify the behavior with a template parameter.


Sean
May 11, 2004
Sean Kelly wrote:

> COW is great in many cases but it can be a nightmare with multithreaded programming.  It almost makes me wish that we could specify the behavior with a template parameter.

Why is that? If you know, that no other part of the program may have a reference to some string, then you may write to it. Otherwise, you just have to copy the string first. I see no difference whether "other part" is a local variable in the same routine or some part in another thread. Of course, if the reference itself is shared between threads, you have to lock it before writing anything, but that is the same with any variable.


« First   ‹ Prev
1 2