Thread overview
array initialization problem
Jan 16, 2009
Qian Xu
Jan 17, 2009
Denis Koroskin
Jan 18, 2009
Qian Xu
Jan 19, 2009
Qian Xu
Jan 19, 2009
Denis Koroskin
Jan 19, 2009
Rainer Deyke
Jan 19, 2009
Christopher Wright
January 16, 2009
Hi All,

I have accidentally written a buggy class.

Briefly described as follows:
 1. The class contains a list of string
 2. The list of string is assigned to a constant in constructor
 3. Try to change the value of the list
 4. Create another class by repeating step 1-3 again
 5. Add both of them to a LinkSeq object
 6. Print their values again
Now you will find their lists have the same values now.

Can someone explain, why the values are different before they are inserted into a list?
And why this.str has no problem?


The console output and the source are included below.

##################### console output begin ############################

list: [111,222,]
 str: hello
-----------------------------
list: [333,444,]
 str: world
-----------------------------
--- after insert ---
-----------------------------
list: [333,444,]
 str: hello
-----------------------------
list: [333,444,]
 str: world
-----------------------------

##################### console output end ############################

######################### code begin ############################

module test;

import tango.io.Console;
import tango.util.collection.LinkSeq;

const char[][] CLIST = [null, null];
const char[] CSTR = "hello";

class Entity
{
  char[][] list;
  char[] str;

  this()
  {
    this.list = CLIST;
    this.str = CSTR;
  }

  void print()
  {
    Cout.opCall("list: [");
    foreach (char[] s; list)
    {
      Cout.opCall(s ~ ",");
    }
    Cout.opCall("]\n");
    Cout.opCall(" str: "~this.str);
    Cout.opCall("\n-----------------------------\n");
  }
}

void main()
{
  Entity e = new Entity();
  e.list[0] = "111";
  e.list[1] = "222";
  e.str = "hello";
  e.print();

  Entity e2 = new Entity();
  e2.list[0] = "333";
  e2.list[1] = "444";
  e2.str = "world";
  e2.print();

  Cout.opCall("--- after insert ---\n-----------------------------\n");
  LinkSeq!(Entity) l = new LinkSeq!(Entity)();
  l.append(e);
  l.append(e2);

  foreach (Entity entity; l)
  {
    entity.print();
  }
}

######################### code end ############################


-- 
Xu, Qian (stanleyxu)
 http://stanleyxu2005.blogspot.com
January 16, 2009
On Fri, Jan 16, 2009 at 4:19 PM, Qian Xu <quian.xu@stud.tu-ilmenau.de> wrote:

> Can someone explain, why the values are different before they are inserted into a list?

The values are different _before you insert values into e2_.  Try printing out the contents of e _after_ you put strings in e2, and you'll notice it now has the same values as e2.

This is because arrays in D are by reference.  e and e2 point to the same array (CLIST).  When you modify the contents of e2.list, the modifications show up in e as well.

Accessing this.str is fine because they each point to different strings.

>    Cout.opCall("list: [");

Also, lol, opCall is an operator overload of ().  You aren't supposed
to call it directly, use:

Cout("list: [");

instead.


> void main()
> {
>  Entity e = new Entity();
>  e.list[0] = "111";
>  e.list[1] = "222";
>  e.str = "hello";
>  e.print();
>
>  Entity e2 = new Entity();
>  e2.list[0] = "333";
>  e2.list[1] = "444";

See, e.list and e2.list are the same array here.
January 17, 2009
On Sat, 17 Jan 2009 00:19:46 +0300, Qian Xu <quian.xu@stud.tu-ilmenau.de> wrote:

> Hi All,
>
> I have accidentally written a buggy class.
>
> Briefly described as follows:
>   1. The class contains a list of string
>   2. The list of string is assigned to a constant in constructor
>   3. Try to change the value of the list
>   4. Create another class by repeating step 1-3 again
>   5. Add both of them to a LinkSeq object
>   6. Print their values again
> Now you will find their lists have the same values now.
>
> Can someone explain, why the values are different before they are inserted into a list?
> And why this.str has no problem?
>
>
> The console output and the source are included below.
>
> ##################### console output begin ############################
>
> list: [111,222,]
>   str: hello
> -----------------------------
> list: [333,444,]
>   str: world
> -----------------------------
> --- after insert ---
> -----------------------------
> list: [333,444,]
>   str: hello
> -----------------------------
> list: [333,444,]
>   str: world
> -----------------------------
>
> ##################### console output end ############################
>
> ######################### code begin ############################
>
> module test;
>
> import tango.io.Console;
> import tango.util.collection.LinkSeq;
>
> const char[][] CLIST = [null, null];
> const char[] CSTR = "hello";
>
> class Entity
> {
>    char[][] list;
>    char[] str;
>
>    this()
>    {
>      this.list = CLIST;
>      this.str = CSTR;
>    }
>
>    void print()
>    {
>      Cout.opCall("list: [");
>      foreach (char[] s; list)
>      {
>        Cout.opCall(s ~ ",");
>      }
>      Cout.opCall("]\n");
>      Cout.opCall(" str: "~this.str);
>      Cout.opCall("\n-----------------------------\n");
>    }
> }
>
> void main()
> {
>    Entity e = new Entity();
>    e.list[0] = "111";
>    e.list[1] = "222";
>    e.str = "hello";
>    e.print();
>
>    Entity e2 = new Entity();
>    e2.list[0] = "333";
>    e2.list[1] = "444";
>    e2.str = "world";
>    e2.print();
>
>    Cout.opCall("--- after insert ---\n-----------------------------\n");
>    LinkSeq!(Entity) l = new LinkSeq!(Entity)();
>    l.append(e);
>    l.append(e2);
>
>    foreach (Entity entity; l)
>    {
>      entity.print();
>    }
> }
>
> ######################### code end ############################
>
>

You have two instances of class Entity. Both point to the same variables - CLIST and CSTR.
Thus, modifying CSTR and CLIST variables' content would have an effect on e.str, e.list, e2.str and e.list, because they are sharing the data (as opposite to owning it).

For example, let's modify CSTR and see what happens:
CSTR[0] = 'J'; // now it is "Jello"

printing e.str and e2.str gives us the following output:
Jello
Jello

i.e. both strings have been changed, too! Once again, this happens because they don't own the data but share it with CSTR. It happens because arrays are not copied upon assignment, i.e. the following line:

this.str = CSTR;

makes sure that there is only 1 instance of "Hello" in memory, not three distinct copies (CSTR, e.str and e2.str). Therefore modifying either CSTR, e.str or e2.str would have an effect on all 3 variables. Here is a picture for you:


                  CSTR:
                     length = 5
                     ptr = ----------------  Hello
                                             /  |
                                            /   |
                                           /    |
                  e.str:                  /     |
                     length = 5          /      |
                     ptr = -------------*       |
                                                |
                                                |
                                                |
                  e2.str:                       |
                     length = 5                 |
                     ptr = ---------------------*


If you want to be able to modify without affecting others, make a copy!

this.str = CSTR.dup;

This way memory will contain three copies of "Hello" - CSTR, e.str and e2.str .

I hope this is clear, let's move on.

Just like modifying e.str contents, modifying e.list contents will have an effect on all variables - CLIST, e.list and e2.list . That's what happens step by step:

0 - Program startup
State: CLIST : [null, null];
      e     : <doesn't exist>;
      e2    : <doesn't exist>

1 - Entity e = new Entity();
State: CLIST : [null, null];
      e     : list = [null, null]; str = "hello";
      e2    : <doesn't exist>
.
2 - e.list[0] = "111";
State: CLIST : ["111", null];   // note that CLIST has been changed, too!
      e     : list = ["111", null]; str = "hello";
      e2    : <doesn't exist>

3 - e.list[1] = "222";
State: CLIST : ["111", "222"];   // note that CLIST has been changed, too!
      e     : list = ["111", "222"]; str = "hello";
      e2    : <doesn't exist>

4 - Entity e2 = new Entity();
State: CLIST : ["111", "222"];
      e     : list = ["111", "222"]; str = "hello";
      e2    : list = ["111", "222"]; str = "hello"; // !!!

5 - e2.list[0] = "333";
State: CLIST : ["333", "222"];
      e     : list = ["333", "222"]; str = "hello";
      e2    : list = ["333", "222"]; str = "hello";

6 - e2.list[1] = "444";
State: CLIST : ["333", "444"];
      e     : list = ["333", "444"]; str = "hello";
      e2    : list = ["333", "444"]; str = "hello";

7 - e2.str = "world";
State: CLIST : ["333", "444"];
      e     : list = ["333", "444"]; str = "hello";
      e2    : list = ["333", "444"]; str = "world";


Hope it helps.
January 18, 2009
Denis Koroskin wrote:
> 7 - e2.str = "world";
> State: CLIST : ["333", "444"];
>       e     : list = ["333", "444"]; str = "hello";
>       e2    : list = ["333", "444"]; str = "world";
> 
> 
> Hope it helps.

Thanks for your nice answer. You made my day ;-)



-- 
Xu, Qian (stanleyxu)
 http://stanleyxu2005.blogspot.com
January 19, 2009
Denis Koroskin wrote:

> ...
> 
> For example, let's modify CSTR and see what happens: CSTR[0] = 'J'; // now it is "Jello"
> 
> printing e.str and e2.str gives us the following output:
> Jello
> Jello
> 
> ...

Hi again,

but there is one thing, I do not understand.
CSTR is a constant. But with "CSTR[0] = 'J'", you can modify a const anyway,
cannot you?

BTW: Do you know, why D do not use copy-on-write semantic instead of referencing? IMO, copy-on-write is much performanter.

--Qian
January 19, 2009
On Mon, 19 Jan 2009 12:21:59 +0300, Qian Xu <quian.xu@stud.tu-ilmenau.de> wrote:

> Denis Koroskin wrote:
>
>> ...
>>
>> For example, let's modify CSTR and see what happens:
>> CSTR[0] = 'J'; // now it is "Jello"
>>
>> printing e.str and e2.str gives us the following output:
>> Jello
>> Jello
>>
>> ...
>
> Hi again,
>
> but there is one thing, I do not understand.
> CSTR is a constant. But with "CSTR[0] = 'J'", you can modify a const anyway,
> cannot you?
>

D1 has no const support.

> BTW: Do you know, why D do not use copy-on-write semantic instead of
> referencing? IMO, copy-on-write is much performanter.
>
> --Qian

It's not about performance (explicit memory management is faster, too), but semantics.
Arrays in D are reference types. Besides, it's best to avoid hidden allocations.

January 19, 2009
Qian Xu wrote:
> Denis Koroskin wrote:
> 
>> ...
>>
>> For example, let's modify CSTR and see what happens:
>> CSTR[0] = 'J'; // now it is "Jello"
>>
>> printing e.str and e2.str gives us the following output:
>> Jello
>> Jello
>>
>> ...
> 
> Hi again,
> 
> but there is one thing, I do not understand.
> CSTR is a constant. But with "CSTR[0] = 'J'", you can modify a const anyway,
> cannot you?

CSTR is a string constant. It's in a data segment of the binary that DMD creates. However, on Windows, string constants are in a read-write area of memory, so you can change them; but for efficiency, there is only one copy of each string constant in the binary.

On Linux, that code would produce a segmentation fault -- there, string constants are in a read-only text segment. (I believe I heard that the MinGW compiler on Windows makes string constants read-only, so this may be compiler specific.)

> BTW: Do you know, why D do not use copy-on-write semantic instead of
> referencing? IMO, copy-on-write is much performanter.

It makes the compiler a fair bit more complicated. It requires syntax to create a copy-on-write array versus a by-reference array, or to refer to a COW array by reference (so if you modify it, aliases to the same array get modified). And copy-on-write does not give you better performance.

Most of all, nobody's made a compelling case to Walter about this. It's easy enough to .dup an array if you're about to modify it, though bugs from accidentally modifying an array in place are rather hard to track. On the other hand, if you have a reasonable const system, these bugs turn into compile errors.

> --Qian
January 19, 2009
Denis Koroskin wrote:
> Arrays in D are reference types. Besides, it's best to avoid hidden allocations.

Arrays in D are reference types except when they're not.

int[] a = [5];
int[] b = a;
a[0] = 4;
assert(b[0] == 4);
a.length = 2;
assert(b.length == 1);
a[0] = 3;
// Is b[0] 3 or 4?


-- 
Rainer Deyke - rainerd@eldwood.com
January 21, 2009
Rainer Deyke wrote:
> Denis Koroskin wrote:
>> Arrays in D are reference types. Besides, it's best to avoid hidden
>> allocations.
> 
> Arrays in D are reference types except when they're not.
> 
> int[] a = [5];
> int[] b = a;
> a[0] = 4;
> assert(b[0] == 4);
> a.length = 2;
> assert(b.length == 1);
> a[0] = 3;
> // Is b[0] 3 or 4?
> 
> 

To be really pedantic about it, D's arrays aren't really reference types at all, but bear the *illusion* of reference semantics because of what they really are (a struct with a length field and a pointer field).  In the above example, the value of b[0] depends on whether a was resized in place or not.  Which is why slicing, albeit a fantastically useful feature, has to be handled with care.

-- Chris Nicholson-Sauls <ibisbasenji @ Google Mail>