June 26, 2008
Steven Schveighoffer wrote:
> "Bill Baxter" wrote
>> Me Here wrote:
>>> Walter Bright wrote:
>>>
>>> Perl has invariant strings, but they are implicitly invariant
>>>> and so nobody notices it, they just work.
>>>>
>>> Sorry Walter, but thta is simply not the case: Vis:
>>>
>>> [0] Perl> $x = 'x' x 500e6;
>>> [0] Perl> print length $x;;
>>> 500000000
>>> [0] Perl> substr $x, 250e6, 1, 'y';;
>>> [0] Perl> print length $x;;
>>> 500000000
>>> [0] Perl> print substr $x, 250e6-5, 10;;
>>> xxxxxyxxxx
>>>
>>> b.
>> What are you disagreeing with?
>>
>> The fact that they're invariant?
>> Or the fact that nobody notices?
>>
>> I have no idea if they're invariant in perl or not.  But I don't think your test above is conclusive proof that they're mutable.
> 
> No it's not.  The only conclusive proof comes from observing what happens when you copy strings from one to the other:
> 
> #!/usr/bin/perl
> 
> print "before x\n";
> sleep 20;
> $x = 'x' x 100000000;
> print "initialized x\n";
> sleep 5;
> $y = $x;
> print "copied to y\n";
> sleep 5;
> substr $x, 3, 1, 'y';
> print "did substring\n";
> print substr $y, 0, 5;
> print "\n";
> sleep 5;
> 
> OK, so what does this do?  I set x to a string of 100 million x's, then assign x to y, then replace the 4th character in x with a 'y', then print the first 5 characters of y to see if they changed too (see if x and y reference the same data)
> 
> So what does this output?
> before x
> initialized x
> copied to y
> did substring
> xxxxx
> 
> But this is not yet conclusive proof, we need to watch what happens with memory usage when each step occurs (hence the sleeps).  So using 'top', I observed this:
> 
> before x => mem usage 3k
> initialized x => 191MB (!)
> copied to y => 291MB
> did substring => 291MB
> xxxxx
> 
> So, what it looks like is on assignment, the string is copied, and the editing edits the string in-place.   But I can't really explain why it takes 191MB to store x, where it only takes 100MB to store y.
> 
> So I'd say perl does not have invariant strings.  'course, I'm not a perl hacker, so I don't know if I did this correctly :)

That is a bit more conclusive.  So it appears that what Perl has is not invariant strings, but rather mutable strings that are copied on assignment so that they can be thought of more or less as value types.

So what happens with function calls?  To keep the illusion of "string is a value type" those would need to dup string arguments too.

--bb
June 26, 2008
Simen Kjaeraas wrote:
> Me Here <p9e883002@sneakemail.com> wrote:
>> I *know* the above proves it, because I can monitor the memory usage and
>> addresses.
>> I used a very large string and the mutated a character in the middle of it. If
>> the original string was mutated, the memory consumption of the process would
>> have to (breifly) double. It does not.
> 
> Could not the garbage collector theoretically be intelligent enough to see
> that there's only one reference to the string, and thus not do CoW?
> 
> -- Simen

Right.  That's what I was thinking too.  But Steven's example with memory info rules out that possibility.

--bb
June 26, 2008
"Dee Girl" wrote
> Steven Schveighoffer Wrote:
>> So, what it looks like is on assignment, the string is copied, and the
>> editing edits the string in-place.   But I can't really explain why it
>> takes
>> 191MB to store x, where it only takes 100MB to store y.
>>
>> So I'd say perl does not have invariant strings.  'course, I'm not a perl hacker, so I don't know if I did this correctly :)
>>
>> -Steve
>
> Hello! I think the first part of your message is correct. the second is maybe mis guided. Walter is correct. Perl strings do not have mutable chars. They can be think as similar to D strings. Your example with $x and $y shows that.
>
> Perl can optimize copies sometimes. But it does not matter. Semantics is all that matters. And Perl strings can not mutate individual chars ever. Thanks, Dee Girl

I am not super-knowledgable about perl, but I understand the workings of invariant strings and what they mean for memory usage.  The memory usage exhibited by perl when copying one string to another suggests an entire copy of the data, not just copying a reference.  If strings were immutable (like they are in D or Java), then memory usage should not go up by 100MB when simply assinging two variables to point to the same data.  It actually appears to me that perl has mutable strings but only allows one reference to the data at a time.  i.e. they are more like C++ std::strings (when not used with references).

Perhaps you can explain how my example shows they are immutable?  Maybe I am not getting something.

-Steve


June 26, 2008
Steven Schveighoffer Wrote:

> I am not super-knowledgable about perl, but I understand the workings of invariant strings and what they mean for memory usage.  The memory usage exhibited by perl when copying one string to another suggests an entire copy of the data, not just copying a reference.  If strings were immutable (like they are in D or Java), then memory usage should not go up by 100MB when simply assinging two variables to point to the same data.  It actually appears to me that perl has mutable strings but only allows one reference to the data at a time.  i.e. they are more like C++ std::strings (when not used with references).
> 
> Perhaps you can explain how my example shows they are immutable?  Maybe I am not getting something.

Perl has mix of reference counting with duplication and many optimization depending on code. It is mis guided to judge only by memory use. That is not relevant, just implementation detail. Tomorrow Perl is better, yesterday is bad. It is not relevant.

What is needed to look is semantics. Strings in Perl never alias, have strict value semantics. It is same as saying they have immutable characters because you can not distinguish.

You assign $y = $x. Two things could happen, a refcount is done or a full copy is done. You do not know. But you do not care! You care if changing one character in $x changes one character in $y. That never is happen. Which means semantically Perl strings are as good as strings of invariant characters. They never alias mutable data. This is the important thing. Thank you, Dee Girl
June 26, 2008
Dee Girl wrote:
> Steven Schveighoffer Wrote:
> 
> You assign $y = $x. Two things could happen, a refcount is done or a full copy is done. You do not know. But you do not care! You care if changing one character in $x changes one character in $y. That never is happen. Which means semantically Perl strings are as good as strings of invariant characters. They never alias mutable data. This is the important thing. 

I certainly care if a 100MB string is getting duplicated.  It's most definitely going to change how I write the algorithm to manipulate that string.

"Act like value type" and "are immutable" are two categories which have overlap, but they are not identical.  Walter keeps saying strings in perl are immutable, but Steven's test demonstrates that (at least for 100MB strings) they are not immutable, but they do act like value types.

This difference is relevant since Walter has often argued that invariant strings are the way to go based on the "fact" that they have been such a success in languages like Perl.  And the implication is clear there (to me any way) that by 'invariant' he means invariant in the sense that D2 strings are invariant.  If he wants to include Perl in his argument he should be saying "value type strings" rather than "invariant strings". Or he should just stick to using Java as his example.  :-)

--bb
June 26, 2008
"Dee Girl" wrote
> Steven Schveighoffer Wrote:
>
>> I am not super-knowledgable about perl, but I understand the workings of
>> invariant strings and what they mean for memory usage.  The memory usage
>> exhibited by perl when copying one string to another suggests an entire
>> copy
>> of the data, not just copying a reference.  If strings were immutable
>> (like
>> they are in D or Java), then memory usage should not go up by 100MB when
>> simply assinging two variables to point to the same data.  It actually
>> appears to me that perl has mutable strings but only allows one reference
>> to
>> the data at a time.  i.e. they are more like C++ std::strings (when not
>> used
>> with references).
>>
>> Perhaps you can explain how my example shows they are immutable?  Maybe I
>> am
>> not getting something.
>
> Perl has mix of reference counting with duplication and many optimization depending on code. It is mis guided to judge only by memory use. That is not relevant, just implementation detail. Tomorrow Perl is better, yesterday is bad. It is not relevant.
>
> What is needed to look is semantics. Strings in Perl never alias, have strict value semantics. It is same as saying they have immutable characters because you can not distinguish.

OK, I get what you are saying.  Immutability is not the important characteristic, it's value-semantics.  That still invalidates Walter's argument that immutability is essential to how perl strings 'just work'.

>
> You assign $y = $x. Two things could happen, a refcount is done or a full copy is done. You do not know. But you do not care!

I might care about whether perl decides to consume half my memory or not :) But technically I don't care since I don't use perl ;)

Thanks for the info!

-Steve


June 26, 2008
Bill Baxter Wrote:

> Dee Girl wrote:
> > Steven Schveighoffer Wrote:
> > 
> > You assign $y = $x. Two things could happen, a refcount is done or a full copy is done. You do not know. But you do not care! You care if changing one character in $x changes one character in $y. That never is happen. Which means semantically Perl strings are as good as strings of invariant characters. They never alias mutable data. This is the important thing.
> 
> I certainly care if a 100MB string is getting duplicated.  It's most definitely going to change how I write the algorithm to manipulate that string.

Perl uses many strategy. Focus on copy or no copy is missing my point and Walter point.

> "Act like value type" and "are immutable" are two categories which have overlap, but they are not identical.  Walter keeps saying strings in perl are immutable, but Steven's test demonstrates that (at least for 100MB strings) they are not immutable, but they do act like value types.
>
> This difference is relevant since Walter has often argued that invariant strings are the way to go based on the "fact" that they have been such a success in languages like Perl.  And the implication is clear there (to me any way) that by 'invariant' he means invariant in the sense that D2 strings are invariant.  If he wants to include Perl in his argument he should be saying "value type strings" rather than "invariant strings". Or he should just stick to using Java as his example.  :-)

I am sorry, again missing the point. Walter argument is very good. In Perl may get copy or just reference. In D always get a reference. You can create new copy. So even D has more control than Perl. In Perl you don't control when copy is.

Walter argument was that value-like types are better because no mutable aliasing. Mutable aliasing makes things difficult because for non local dependencies. This why scripting languages so easy to play with strings. D does that in a different way from Perl which is as powerful or even more. I think his point is perfect valid, even when terminology is imprecise this time. And I am happy he does not fall again in his C++ comparison trap where argument becomes childish ^_^. Thank you, Dee Girl

June 26, 2008
Steven Schveighoffer Wrote:

> 
> "Dee Girl" wrote
> > Steven Schveighoffer Wrote:
> >
> >> I am not super-knowledgable about perl, but I understand the workings of
> >> invariant strings and what they mean for memory usage.  The memory usage
> >> exhibited by perl when copying one string to another suggests an entire
> >> copy
> >> of the data, not just copying a reference.  If strings were immutable
> >> (like
> >> they are in D or Java), then memory usage should not go up by 100MB when
> >> simply assinging two variables to point to the same data.  It actually
> >> appears to me that perl has mutable strings but only allows one reference
> >> to
> >> the data at a time.  i.e. they are more like C++ std::strings (when not
> >> used
> >> with references).
> >>
> >> Perhaps you can explain how my example shows they are immutable?  Maybe I
> >> am
> >> not getting something.
> >
> > Perl has mix of reference counting with duplication and many optimization depending on code. It is mis guided to judge only by memory use. That is not relevant, just implementation detail. Tomorrow Perl is better, yesterday is bad. It is not relevant.
> >
> > What is needed to look is semantics. Strings in Perl never alias, have strict value semantics. It is same as saying they have immutable characters because you can not distinguish.
> 
> OK, I get what you are saying.  Immutability is not the important characteristic, it's value-semantics.  That still invalidates Walter's argument that immutability is essential to how perl strings 'just work'.

Walter had good argument with wrong words. Perl strings are good because they act like values. So are D strings. The argument is valid.

I work more with D strings now and I never found better idea for string implementation in all language I know. It is amazing how things stay together in type system so thin.

> > You assign $y = $x. Two things could happen, a refcount is done or a full copy is done. You do not know. But you do not care!
> 
> I might care about whether perl decides to consume half my memory or not :) But technically I don't care since I don't use perl ;)

Me not too since I use D ^_^. In general scripting language has less control of allocation. But D regex veeeery sloooow... I wish some body optimizes std.regex. Also the API of regex is very (do not know the word...) scrambled or disordered or inconsistent. When ever I use regex I must look the manual page. API is terrible and you never know what function you must call and they are not orthogonal and the names are weird. Andrei please fix ^_^. Thank you, Dee Girl

June 26, 2008
Dee Girl wrote:
> Bill Baxter Wrote:
> 
>> Dee Girl wrote:
>>> Steven Schveighoffer Wrote:
>>>
>>> You assign $y = $x. Two things could happen, a refcount is done or a full copy is done. You do not know. But you do not care! You care if changing one character in $x changes one character in $y. That never is happen. Which means semantically Perl strings are as good as strings of invariant characters. They never alias mutable data. This is the important thing. 
>> I certainly care if a 100MB string is getting duplicated.  It's most definitely going to change how I write the algorithm to manipulate that string.
> 
> Perl uses many strategy. Focus on copy or no copy is missing my point and Walter point.
> 
>> "Act like value type" and "are immutable" are two categories which have overlap, but they are not identical.  Walter keeps saying strings in perl are immutable, but Steven's test demonstrates that (at least for 100MB strings) they are not immutable, but they do act like value types.
>>
>> This difference is relevant since Walter has often argued that invariant strings are the way to go based on the "fact" that they have been such a success in languages like Perl.  And the implication is clear there (to me any way) that by 'invariant' he means invariant in the sense that D2 strings are invariant.  If he wants to include Perl in his argument he should be saying "value type strings" rather than "invariant strings". Or he should just stick to using Java as his example.  :-)
> 
> I am sorry, again missing the point. Walter argument is very good. In Perl may get copy or just reference. In D always get a reference. You can create new copy. So even D has more control than Perl. In Perl you don't control when copy is.
> 
> Walter argument was that value-like types are better because no mutable aliasing. Mutable aliasing makes things difficult because for non local dependencies. This why scripting languages so easy to play with strings. D does that in a different way from Perl which is as powerful or even more. I think his point is perfect valid, even when terminology is imprecise this time. And I am happy he does not fall again in his C++ comparison trap where argument becomes childish ^_^. Thank you, Dee Girl
> 

I think we're all on the same page here.
My point is that Walter is using the wrong words to make his argument. He means "value type" but he has on several occasions stated that Perl strings are good because they are "invariant".

--bb
June 26, 2008
Simen Kjaeraas wrote:

> Me Here <p9e883002@sneakemail.com> wrote:
> > I know the above proves it, because I can monitor the memory usage and
> > addresses.
> > I used a very large string and the mutated a character in the middle of
> > it. If the original string was mutated, the memory consumption of the
> > process   would have to (breifly) double. It does not.
> 
> Could not the garbage collector theoretically be intelligent enough to see that there's only one reference to the string, and thus not do CoW?
> 
> -- Simen

Perhaps you will find this a more convincing demonstration:

    [0] Perl> $s = 'the quick brown fox';;
    [0] Perl> $r = \substr $s, 10, 5;;
    [0] Perl> $$r = 'green';;
    [0] Perl> print $s;;
    the quick green fox

Now the description.

1) Assign a string to the scalar $s
2) Take a reference $r to a portion of that scalar
3) Replace that portion in place by assigning through the reference.
4) Print the modified original string.

Besides which, I don't think I know this. I know I know it.

As for the GC deciding not to do COW, Your way off base here, in Perl at least.

b.

--