View mode: basic / threaded / horizontal-split · Log in · Help
March 16, 2005
Copy-on-Write using toupper() source example?
To start off this newsgroup:

Section: Strings (and Array) Copy-on-Write 
(http://www.digitalmars.com/d/memory.html)

Trying to fully understand the toupper() example. Most of the questions below
are more rhetorical, but it would be helpful to have my "understanding"
verified.


Copy-on-write version

<code>
  char[] toupper(char[] s)
  {
      int changed = 0;
      for (int i = 0; i < s.length; i++)
      {
          char c = s[i];
          if ('a' <= c && c <= 'z')
          {
              if (!changed)
              {   
                  char[] r = new char[s.length];
                  r[] = s;
                  s = r;
                  changed = 1;
              }
              s[i] = c - (cast(char)'a' - 'A');
          }
      }
      return s;
  }
</code> 


1. If I understand this corretly: Should a lower case letter be encountered (s
would need to be changed), a new array r is allocated in memory, the contents of
s it copied into it, and then s points to the copy, thus not changing the
original "char[] s" that was handed to the function? 

If so, ok, so you do not mess up your original values.


2. Does one really need to use the "new" command?

  char[] r = new char[s.length];  // non-local array
  r[] = s;

would not the following be equivalent?

  char[s.length] r;               // local array, would be freed aber leavin
function
  r = s.dup;

This is more retorical but makes me wonder. "new" is required since you need to
allocate non-local memory?


3. I am only beginning to learn about the handling of strings, but how come

  r[] = s;

is used, would not  

  r = s;

do the same? Or is this (r[] = s;) required since r is a "new" created array? Or
is "r[] =" a notation for data copy?


4. As I understand:

  char[] r = new char[s.length];
  r[] = s;
  s = r;

- Create a new "global" dynamic array r with the length of s. 
- Copy the contents (data) of s to the new r array. 
- Point s (reference?) to r? Is that even legal? 

s would then point to another part of memory?

Would r not circumvent garbage collection?

When would the memory for r be freed? Would one need to do that manually? 

I am just imagining, this function is called thousands of times, and all those
"r" arrays clogging up memory? In ANSI C I would have defined one global temp
(scratch) variable, that is used in all such cases.


Thanx.

AEon
March 16, 2005
Re: Copy-on-Write using toupper() source example?
"AEon" <AEon_member@pathlink.com> wrote in message 
news:d1a5cj$22pp$1@digitaldaemon.com...
> 1. If I understand this corretly: Should a lower case letter be 
> encountered (s
> would need to be changed), a new array r is allocated in memory, the 
> contents of
> s it copied into it, and then s points to the copy, thus not changing the
> original "char[] s" that was handed to the function?
> If so, ok, so you do not mess up your original values.

Yep.

> 2. Does one really need to use the "new" command?
>   char[] r = new char[s.length];  // non-local array
>   r[] = s;

Yep.

> would not the following be equivalent?
>   char[s.length] r;               // local array, would be freed aber 
> leavin
> function
>   r = s.dup;

Writing char[s.length] causes an error, for some reason.  Hence, the "new" 
expression in the original function.

However, you can write

char[] r;
r=s.dup;

As .dup duplicates the entire array, length data and all.  So this would be 
equivalent to the original.

> 3. I am only beginning to learn about the handling of strings, but how 
> come
>
>   r[] = s;
>
> is used, would not
>
>   r = s;
>
> do the same? Or is this (r[] = s;) required since r is a "new" created 
> array? Or
> is "r[] =" a notation for data copy?

You got it.  r=s would make r point to the same array as s.  r[]=s copies 
s's data into r's memory.

> 4. As I understand:
>
>   char[] r = new char[s.length];
>   r[] = s;
>   s = r;
>
> - Create a new "global" dynamic array r with the length of s.
> - Copy the contents (data) of s to the new r array.
> - Point s (reference?) to r? Is that even legal?

Oh of course.

> s would then point to another part of memory?

s points to the same place r points.  Arrays are pretty much just structs in 
D - they have a length and a pointer to their data.  When you write "s=r", 
you're only copying those two things - the length and the pointer - to s.

> Would r not circumvent garbage collection?
> When would the memory for r be freed? Would one need to do that manually?

Hopefully r's memory is not freed, as that is what is returned from the 
function.

> I am just imagining, this function is called thousands of times, and all 
> those
> "r" arrays clogging up memory? In ANSI C I would have defined one global 
> temp
> (scratch) variable, that is used in all such cases.

Actually it has to do with what you do with r once it's returned from the 
function.  If you do something like this:

char[][] strings;
// .. strings is filled somehow
for(int i=0; i<strings.length; i++)
{
   strings[i]=toUpper(strings[i]);
}

Here's what happens.

strings[i] points to some place in memory.  You pass it into toUpper, where 
it is now called s.  Say it does have to be copied.  What is returned is a 
copy of s, a.k.a. strings[i].  So strings[i] is so far unaffected.  However, 
the statement in the for() loop assigns the result of the function back into 
strings[i].  So strings[i] now points to the copy of strings[i], and what 
used to be pointed to by strings[i] is released by the GC.  So the new, 
uppercase string is created and the old string is destroyed - no memory 
loss.

I've found that it's a lot easier not to try to understand the GC, but to 
take advantage of it ;)  When you're not sure about some memory management 
issue, remember - because of the GC, it really IS that easy.
March 16, 2005
Re: Copy-on-Write using toupper() source example?
Jarrett Billingsley wrote...

Thanx for the detailed info. I am glad I understood most the things properly.


>> 2. Does one really need to use the "new" command?
>>   char[] r = new char[s.length];
>>   r[] = s;

>However, you can write
>    char[] r;
>    r=s.dup;

Should have occurred to me (i.e. dropping the explicit sizing 
"s.length"). But it does surprise me that the above 2 lines are 
equivalent to the "new" version. I would have sworn that the latter is 
just a local array declaration that is lost as soon as you leave the 
function.


Would a reduction to

   char[] r=s.dup;

also work?


>Actually it has to do with what you do with r once it's returned from the 
>function.  If you do something like this:
>
>char[][] strings;
>// .. strings is filled somehow
>for(int i=0; i<strings.length; i++)
>{
>    strings[i]=toUpper(strings[i]);
>}


>Here's what happens.
>
>strings[i] points to some place in memory.  You pass it into toUpper, where 
>it is now called s.  Say it does have to be copied.  What is returned is a 
>copy of s, a.k.a. strings[i].  So strings[i] is so far unaffected.  However, 
>the statement in the for() loop assigns the result of the function back into 
>strings[i].  

>So strings[i] now points to the copy of strings[i], and what 
>used to be pointed to by strings[i] is released by the GC.  So the new, 
>uppercase string is created and the old string is destroyed - no memory 
>loss.

Hopefully :)... but I see what you mean. Coming from ANSI C, I am very 
suspicious of any memory management, since if you did not do the right 
thing back then your tools would leak memory (or do worse things). If 
I am not mistaken, in ANSI C, the example above would indeed lead to a leak, 
since you have overwritten the original pointer to string[i], thus no 
longer are able to access that original piece of allocated memory?


>I've found that it's a lot easier not to try to understand the GC, 
>but to take advantage of it ;)  When you're not sure about some memory 
>management issue, remember - because of the GC, it really IS that 
>easy. 

:)... I guess I just need to start recoding my stats in D, and then 
after seeing my code *not* crash... then I will sit back and enjoy 
GC ;)

BTW, when I first looked at C++, hoping such things would be done for 
me, I was badly disappointed that all the things I missed in C where 
not present in C++, and many things actually got worse. I just can't 
believe how great D is... sigh... *happiness* :)

AEon
March 17, 2005
Re: Copy-on-Write using toupper() source example?
AEon wrote:
> Should have occurred to me (i.e. dropping the explicit sizing 
> "s.length"). But it does surprise me that the above 2 lines are 
> equivalent to the "new" version. I would have sworn that the latter is 
> just a local array declaration that is lost as soon as you leave the 
> function.

First rule about swearing: don't. ;>

type[length] array1; //fixed-length stack array
type[] array2: //dynamic array.

In the case of dynaic array, a structure of a pointer to the beginning 
of the array slice and a length of this slice is placed right where you 
declare it, i.e. here in a local variable space, on a stack.

Then you allocate a dynamic array and assign it to this local variable, 
this variable can be copied around, returned, etc.

This is in contrast to a static array. When passing and under other 
circumstances, such stuff becomes converted into a pointer to the memory 
region (+ optionally length), which is soon no longer valid because the 
memory region lies on the stack frame which will already be destroyed 
after you return from the function. Exactly like in C.

> Would a reduction to
> 
>     char[] r=s.dup;
> 
> also work?

If i'm not missing anything out, yes.

However, it was probably decided against .dup because it makes one extra 
step: namely it populates the new array with a copy of an old one. I 
wouldn't usually care about that though, this is among things an 
optimizer would handle someday.

>>Actually it has to do with what you do with r once it's returned from the 
>>function.  If you do something like this:
>>
>>char[][] strings;
>>// .. strings is filled somehow
>>for(int i=0; i<strings.length; i++)
>>{
>>   strings[i]=toUpper(strings[i]);
>>}
> 
> 
> 
>>Here's what happens.
>>
>>strings[i] points to some place in memory.  You pass it into toUpper, where 
>>it is now called s.  Say it does have to be copied.  What is returned is a 
>>copy of s, a.k.a. strings[i].  So strings[i] is so far unaffected.  However, 
>>the statement in the for() loop assigns the result of the function back into 
>>strings[i].  
> 
> 
>>So strings[i] now points to the copy of strings[i], and what 
>>used to be pointed to by strings[i] is released by the GC.  So the new, 
>>uppercase string is created and the old string is destroyed - no memory 
>>loss.
> 
> 
> Hopefully :)... but I see what you mean. Coming from ANSI C, I am very 
> suspicious of any memory management, since if you did not do the right 
> thing back then your tools would leak memory (or do worse things). If 
> I am not mistaken, in ANSI C, the example above would indeed lead to a leak, 
> since you have overwritten the original pointer to string[i], thus no 
> longer are able to access that original piece of allocated memory?

Generally, not, but usually and practically, yes. Function toUpper gets 
a string slice stucture as argument by value, i.e. a copy of in its 
local stack. It reassigns this internal copy to a new value. Then it 
returns it.

The calling function would then be responsible to keep an old pointer 
(actually slice structure in D), compare it with the new one, and if the 
old one differs and it doesn't need it any longer, free it. However, 
that's quite a cumbersome way of doing business, so usually and 
practically implementing a function this way is likely to make an error.

> :)... I guess I just need to start recoding my stats in D, and then 
> after seeing my code *not* crash... then I will sit back and enjoy 
> GC ;)

To sit back and enjoy means to nearly abandon delete. Only use explicit 
memory management on well-encapsulated resources.

> BTW, when I first looked at C++, hoping such things would be done for 
> me, I was badly disappointed that all the things I missed in C where 
> not present in C++, and many things actually got worse. I just can't 
> believe how great D is... sigh... *happiness* :)

C++ doesn't have language means for nearly anything, it is only sensible 
in conjunction with libraries. For example, std::string has value 
semantics - thus you hardly ever need to manage pointers to it. 
Firthermore, it is usually implemented as a reference-counted pointer 
with copy-on-write semantics: not particularly performance-gaining, but 
memory-efficient. For passing pointers up and down the stack with proper 
clean-up one uses auto_ptr<>, and for a general case of memory 
management shared_ptr<>, which is almost analogous to D's GC - though 
slower. All but shared_ptr are in the standard library, shared_prt is in 
boost, and will be probably integrated in the standard library later. 
Very nice are also containers, including an easily growing array 
replacement, vector<>. Or well, at least /me and many other people find 
prgramming in C++ with the right libraries mildly enjoyable, but not in C.

-eye
March 25, 2005
Re: Copy-on-Write using toupper() source example?
AEon wrote:
> :)... I guess I just need to start recoding my stats in D, and then 
> after seeing my code *not* crash... then I will sit back and enjoy 
> GC ;)

I guess that sums up something every GC-phobic ought to read the first 
day! (Even I.)

> BTW, when I first looked at C++, hoping such things would be done for 
> me, I was badly disappointed that all the things I missed in C where 
> not present in C++, and many things actually got worse. I just can't 
> believe how great D is... sigh... *happiness* :)

Man, amen to that!

Walter should really start collecting these statements! And put them on 
the D startup page!!!!

Hmm. Or maybe we should do it for him?  :-)
March 25, 2005
Re: Copy-on-Write using toupper() source example?
Georg Wrede wrote:
> AEon wrote:
> 
>> :)... I guess I just need to start recoding my stats in D, and then 
>> after seeing my code *not* crash... then I will sit back and enjoy GC ;)
> 
> I guess that sums up something every GC-phobic ought to read the first 
> day! (Even I.)

Well I would not call myself a GC-phobic. It's more like I never "found" 
any language that would have been fast enough for me to really 
appreciate GC.  This has probably to do with the fact (IIRC) that most 
of the time GC is used by interpreter languages like Basic, many Script 
languages (ARexx, Perl, PHP) and all those are just plain slow, compared 
to something like C.

>> BTW, when I first looked at C++, hoping such things would be done for 
>> me, I was badly disappointed that all the things I missed in C where 
>> not present in C++, and many things actually got worse. I just can't 
>> believe how great D is... sigh... *happiness* :)
> 
> Man, amen to that!
> 
> Walter should really start collecting these statements! And put them on 
> the D startup page!!!!
> 
> Hmm. Or maybe we should do it for him?  :-)

:)...

Presently I have the feeling the std.string lib is actually more 
powerful and more useful then the libs PHP uses for strings. Every time 
I was looking for a command to do something slightly out of the 
ordinary, a std.string command did exacty what I had hoped, this IIRC 
was not the case for PHP. If I am not mistaken, thsi would mean D is 
beating PHP on it's own turf :)

AEon
Top | Discussion index | About this forum | D home