Thread overview
[Issue 2418] New: Same-value string (char[]) literals get overwritten (unlike array literals)
Oct 14, 2008
d-bugmail
Oct 14, 2008
d-bugmail
Oct 14, 2008
d-bugmail
Oct 15, 2008
d-bugmail
Oct 15, 2008
d-bugmail
Oct 15, 2008
d-bugmail
Oct 15, 2008
d-bugmail
October 14, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2418

           Summary: Same-value string (char[]) literals get overwritten
                    (unlike array literals)
           Product: D
           Version: 1.035
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: business3@twistedpairgaming.com


When a string is initialized with a string literal, and part of the string is then changed, all other string literals with the same value reflect the same change.

For the following code:

------------------
module test;
import tango.io.Stdout;

class FooChar
{
        char[] str;

        this(char[] _str)
        {
                this.str = _str;
        }
}

class FooInt
{
        int[] ints;

        this(int[] _ints)
        {
                this.ints = _ints;
        }
}

void main()
{
        FooChar a;
        FooChar b;
        FooChar c;
        FooChar d;

        a = new FooChar("AAA");
        Stdout.formatln("a.str: {}", a.str);
        a.str[0] = '!';
        Stdout.formatln("a.str: {}", a.str);
        a = new FooChar("AAA");
        Stdout.formatln("a.str: {}", a.str);
        Stdout.formatln("");

        b = new FooChar("AAA");
        Stdout.formatln("b.str: {}", b.str);
        b.str[0] = '!';
        Stdout.formatln("b.str: {}", b.str);
        b = new FooChar("AAA");
        Stdout.formatln("b.str: {}", b.str);
        Stdout.formatln("");

        c = new FooChar("AA1");
        Stdout.formatln("c.str: {}", c.str);
        c.str[0] = '!';
        Stdout.formatln("c.str: {}", c.str);
        c = new FooChar("AA1");
        Stdout.formatln("c.str: {}", c.str);
        Stdout.formatln("");

        const char[] dInit = "AA2";
        d = new FooChar(dInit);
        Stdout.formatln("d.str: {}", d.str);
        d.str[0] = '!';
        Stdout.formatln("d.str: {}", d.str);
        d = new FooChar(dInit);
        Stdout.formatln("d.str: {}", d.str);
        Stdout.formatln("");

        FooInt i;

        i = new FooInt([1, 2]);
        Stdout.formatln("i.ints: {}", i.ints);
        i.ints[0] = 77;
        Stdout.formatln("i.ints: {}", i.ints);
        i = new FooInt([1, 2]);
        Stdout.formatln("i.ints: {}", i.ints);
}
------------------

Expected output:
a.str: AAA
a.str: !AA
a.str: AAA

b.str: AAA
b.str: !AA
b.str: AAA

c.str: AA1
c.str: !A1
c.str: AA1

d.str: AA2
d.str: !A2
d.str: AA2

i.ints: [1, 2]
i.ints: [77, 2]
i.ints: [1, 2]

Actual output:
a.str: AAA
a.str: !AA
a.str: !AA

b.str: !AA
b.str: !AA
b.str: !AA

c.str: AA1
c.str: !A1
c.str: !A1

d.str: AA2
d.str: !A2
d.str: !A2

i.ints: [1, 2]
i.ints: [77, 2]
i.ints: [1, 2]

This might be a side-effect of the fix for #817: http://d.puremagic.com/issues/show_bug.cgi?id=817


-- 

October 14, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2418





------- Comment #1 from business3@twistedpairgaming.com  2008-10-14 17:59 -------
Sorry, I guess those classes are not needed:

---------------
module test;
import tango.io.Stdout;

void main()
{
        char[] a;
        char[] b;
        char[] c;
        char[] d;

        a = "AAA";
        Stdout.formatln("a: {}", a);
        a[0] = '!';
        Stdout.formatln("a: {}", a);
        a = "AAA";
        Stdout.formatln("a: {}", a);
        Stdout.formatln("");

        b = "AAA";
        Stdout.formatln("b: {}", b);
        b[0] = '!';
        Stdout.formatln("b: {}", b);
        b = "AAA";
        Stdout.formatln("b: {}", b);
        Stdout.formatln("");

        c = "AA1";
        Stdout.formatln("c: {}", c);
        c[0] = '!';
        Stdout.formatln("c: {}", c);
        c = "AA1";
        Stdout.formatln("c: {}", c);
        Stdout.formatln("");

        const char[] dInit = "AA2";
        d = dInit;
        Stdout.formatln("d: {}", d);
        d[0] = '!';
        Stdout.formatln("d: {}", d);
        d = dInit;
        Stdout.formatln("d: {}", d);
        Stdout.formatln("");

        int[] i;

        i = [1, 2];
        Stdout.formatln("i: {}", i);
        i[0] = 77;
        Stdout.formatln("i: {}", i);
        i = [1, 2];
        Stdout.formatln("i: {}", i);
}
---------------

Expected output:
a: AAA
a: !AA
a: AAA

b: AAA
b: !AA
b: AAA

c: AA1
c: !A1
c: AA1

d: AA2
d: !A2
d: AA2

i: [1, 2]
i: [77, 2]
i: [1, 2]

Actual output:
a: AAA
a: !AA
a: !AA

b: !AA
b: !AA
b: !AA

c: AA1
c: !A1
c: !A1

d: AA2
d: !A2
d: !A2

i: [1, 2]
i: [77, 2]
i: [1, 2]


-- 

October 14, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2418


2korden@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Comment #2 from 2korden@gmail.com  2008-10-14 18:01 -------
The bug is at your site. You don't make string copies and work on a single string.

char[] a = "AAA";
char[] b = a; // both point to the same location
a[0] = '!';
assert(b[0] == '!'); // should be true since they both point to the same
location

Same example in C (this may help understand better):
char* str = "AAA";
char* ptr = str;
str[0] = '!';
assert(ptr[0] == '!');

The actual bug is that the following line should not compile: char[] a = "AAA";

Problem is, D1 doesn't have invariant type modifier and the spec is frozen, so this will never be fixed. D2, however, doesn't allow you to do this.

Solution would be to make copies explicitly:
a = new FooChar("AAA".dup);


-- 

October 15, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2418


business3@twistedpairgaming.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |




------- Comment #3 from business3@twistedpairgaming.com  2008-10-14 20:34 -------
(In reply to comment #2)
> The bug is at your site. You don't make string copies and work on a single string.
> 
> char[] a = "AAA";
> char[] b = a; // both point to the same location
> a[0] = '!';
> assert(b[0] == '!'); // should be true since they both point to the same
> location
> 
> Same example in C (this may help understand better):
> char* str = "AAA";
> char* ptr = str;
> str[0] = '!';
> assert(ptr[0] == '!');
> 

I don't think that's applicable (or at least, it shouldn't be). Here's a
shortened example:

char[] a = "AAA";
a[0] = '!';
char[] b = "AAA";

It's absurd that 'b' should become "!AA". It's clear that the compiler is "optimizing" both "AAA" string literals into the same reference:

char[] _tmp1 = "AAA";
char[] a = _tmp1;
a[0] = '!';
char[] b = _tmp1;

But I'm saying that transformation should not be occurring. The two "AAA" literals should not share the same reference. In D2, array literals are immutable, so yes, in D2 it works out fine. But in D1, array literals are *not* immutable, so the compiler shouldn't assume that array literals with a equal *value* can safely share the same *reference*.

> The actual bug is that the following line should not compile: char[] a = "AAA";

Again, this is for D1, so the idea of strings being immutable doesn't apply.

Besides, at the very least there's still an inconsistency:

char[] a = "AAA";
a[0] = '!';
char[] b = "AAA";  // b is "!AA"

char[] c = ['A', 'A', 'A'];
c[0] = '!';
char[] d = ['A', 'A', 'A'];  // d is "AAA"

The above shows that two occurrences of "AAA" share the same reference, but two occurrences of ['A', 'A', 'A'] have separate references. The former leads to odd side-effects, the latter does not.


-- 

October 15, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2418


2korden@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |INVALID




------- Comment #4 from 2korden@gmail.com  2008-10-15 00:02 -------
> ------- Comment #3 from business3@twistedpairgaming.com  2008-10-14  
>
> I don't think that's applicable (or at least, it shouldn't be). Here's a
> shortened example:
>
> char[] a = "AAA";
> a[0] = '!';
> char[] b = "AAA";
>
> It's absurd that 'b' should become "!AA".
Well, that's what I expect, at least :)

> It's clear that the compiler is
> "optimizing" both "AAA" string literals into the same reference:
>
> char[] _tmp1 = "AAA";
> char[] a = _tmp1;
> a[0] = '!';
> char[] b = _tmp1;

Of course it does! This string literal is put into executable. "AAA" is just a pointer to it.

Please, understand that char[] is nothing more than a pointer to a string and an associated length. When you change the string via pointer, everyone who has that pointer gets the change. Compare to object:

class A
{
    int i = 15;
}

A aaa = new A(); // aaa is an analog of "AAA"
A a = aaa;
writefln(a); // prints 15
a.i = 42;

A b = aaa;
writefln(b); // why the hell it prints 42????

This is because you don't make a copy of aaa.

Here is another example:

while (true) {
    char[] a = "AAA";
    a[0] = '!';   // in the next iteration, should a be "AAA" again?
}

If answer is yes, then there should be a memory allocation in line 2 under the hood. You can't get new string every time without a memory allocation. This, however, definitely should not occur. If you want to make an allocation - do it yourself explicitely.

Code with the behaviour you expect should be as follows:
while (true) {
    char[] a = "AAA".dup;
    writefln(a);
    a[0] = '!';
    writefln(a);
}

This is by design, it works as it should. It can't be 'fixed' because it is not broken.

> Besides, at the very least there's still an inconsistency:
>
> char[] a = "AAA";
> a[0] = '!';
> char[] b = "AAA";  // b is "!AA"
>
> char[] c = ['A', 'A', 'A'];
> c[0] = '!';
> char[] d = ['A', 'A', 'A'];  // d is "AAA"

I agree, this might be a bug, d really ought to be "!AAA"!
Do you want to create a new bugreport so that the latter case would be 'fixed'?
:)


-- 

October 15, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2418





------- Comment #5 from kamm-removethis@incasoftware.de  2008-10-15 02:36 -------
The D spec says "String literals are immutable (read only)." even in D1. So
code like this

char[] a = "ABC";
a[0] = '!';

is illegal (and will segfault on linux!).

Array literals are allocated on the heap (the D spec explicitly says so) and will thus behave fundamentally different from string literals.


-- 

October 15, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2418





------- Comment #6 from business3@twistedpairgaming.com  2008-10-15 05:13 -------
(In reply to comment #5)
> The D spec says "String literals are immutable (read only)." even in D1. So
> code like this
> 
> char[] a = "ABC";
> a[0] = '!';
> 
> is illegal (and will segfault on linux!).
> 
> Array literals are allocated on the heap (the D spec explicitly says so) and will thus behave fundamentally different from string literals.
> 

Alright, I see now. It does seem odd though that array literals would carry a type that lacks any sort of const/read-only/immutable/etc qualifier, but is still immutable anyway. Just another of D2's improvements I guess.

Out of curiosity, any idea if the lack of a run-time error on windows is due to windows itself or just something DMD/OPTLINK does differently between the platforms?


--