Jump to page: 1 2
Thread overview
[Issue 2093] New: string concatenation modifies original
May 10, 2008
d-bugmail
May 10, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
Denis Koroskin
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Nov 22, 2008
d-bugmail
Feb 19, 2009
d-bugmail
May 10, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093

           Summary: string concatenation modifies original
           Product: D
           Version: 2.014
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: bartosz@relisoft.com


I will attach source code for this example. It's an XML parser. It should
produce the following output:
c:\D\Work>xml
root
    child
     color=red

         Text=foo bar baz
Instead it produces this:
c:\D\Work>xml
root
    rootd
     rootd=red

         Text=rootdar baz
The problem is that strings are modified after being copied, when the original
is concatenated upon. The problem goes away if I idup strings:
  _name = name.idup;
  _value = value.idup;
or when I replace
  a ~= b;
with
  a = a ~ b;


-- 

May 10, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093





------- Comment #1 from bartosz@relisoft.com  2008-05-10 14:31 -------
Created an attachment (id=256)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=256&action=view)
Test case


-- 

May 10, 2008
<d-bugmail@puremagic.com> wrote in message news:bug-2093-3@http.d.puremagic.com/issues/...

> or when I replace
>  a ~= b;
> with
>  a = a ~ b;

~ always creates a copy, but ~= will attempt to expand the array in-place.

Now, if this is D2, and ~= is expanding an invariant(char)[] in-place, then _that_ is definitely an issue.


November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093


smjg@iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg@iname.com




------- Comment #2 from smjg@iname.com  2008-11-21 19:09 -------
Welcome to the world of bug reporting.

The way to report a bug isn't to attach a 695-line program that contains some functionality somewhere that exhibits the problem.

The correct manner is to post a small example that illustrates the problem, typically either by writing a test program from scratch or by simplifying little by little the program in which you found it.

If done well, the result will be small enough to post straight into the bug report rather than attaching it.  DMD's code coverage analysis is a useful tool for identifying unused parts of a program in order to cut them out, among other things.


-- 

November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093


smjg@iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code




------- Comment #3 from smjg@iname.com  2008-11-21 20:40 -------
I think I've finally managed to figure out what was going on.

----------
import std.stdio;

void main() {
    string s1, s2;

    s1 ~= "hello";
    s2 = s1;

    writefln(s1);
    writefln(s2);

    s1.length = 0;
    s1 ~= "Hi";

    writefln(s1);
    writefln(s2);
}
----------
hello
hello
Hi
Hillo
----------

This is the kind of testcase we like here.  Walter is more likely to fix a bug if you make life easier for him by supplying something on which the cause can easily be seen.


-- 

November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093





------- Comment #4 from 2korden@gmail.com  2008-11-22 00:39 -------
This is a known bug and is a major array design flow. Arrays has no determined owner (the only one who can grow without a reallocation if capacity permits):

import std.stdio;

void main()
{
    char[] s1, s2;
    s1.length = 100; // reserve the capacity
    s1.length = 0;

    s2 = s1; // both are pointing to an empty string with the capacity of 100

    s1 ~= "Hello"; // array is not reallocated, it is grown in-place
    writefln(s1);
    writefln(s2); // prints empty string. s2 still points to the same string
(which is now "Hello") and carries length of 0

    s2 ~= "Hi"; // overwrites s1
    writefln(s2); // "Hi"
    writefln(s1); // "Hillo"
}

s1 is the array owner and s2 is a slice (even though it really points to the entire array), i.e. it should reallocate and take the ownership of the reallocated array on append, but it doesn't happen.

Currently an 'owner' is anyone who has a pointer to array's beginning:

char[] s = "hello".dup;
char[] s1 = s[0..4];
s1 ~= "!";
assert(s != s1); // fails, both are "hell!", s is overwritten

s = "_hello".dup;
char[] s2 = s[1..5];
s2 ~= "!";
assert(s != s1); // succeeds, s1 is not changed


-- 

November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093





------- Comment #5 from ddparnell@bigpond.com  2008-11-22 03:58 -------

I thought 'string' types were immutable and thus ...

  s1.length = 0;

should fail as it updates the string (trucates it to zero characters).


-- 

November 22, 2008
22.11.08 в 12:58  в своём письме писал(а):

> http://d.puremagic.com/issues/show_bug.cgi?id=2093
>
>
>
>
>
> ------- Comment #5 from ddparnell@bigpond.com  2008-11-22 03:58 -------
>
> I thought 'string' types were immutable and thus ...
>
>   s1.length = 0;
>
> should fail as it updates the string (trucates it to zero characters).
>
>


No, string is a mutable array of immutable chars:
string == const(char)[]
November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093





------- Comment #6 from 2korden@gmail.com  2008-11-22 06:45 -------
No, string is aliased to invariant(char)[], i.e. an array of invariant
characters. You can change its length (usually, decreasing) but not contents.


-- 

November 22, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093





------- Comment #7 from smjg@iname.com  2008-11-22 08:43 -------
(In reply to comment #4)
> Currently an 'owner' is anyone who has a pointer to array's beginning:
> 
> char[] s = "hello".dup;
> char[] s1 = s[0..4];
> s1 ~= "!";
> assert(s != s1); // fails, both are "hell!", s is overwritten

A simple char[] is fully mutable, so that doesn't violate any established rule, but whether it's desirable is another matter.

With const(char)[] or invariant(char)[], obviously this isn't going to work, so
~= should always reallocate (unless the optimiser can be sure that no other
reference to the data can possibly exist).

Alternatively, the GC could maintain a note of the actual length of every heap-allocated array.  Ownership would be determined by matching in both start pointer and length.  When the length is increased, whether by .length or ~=, either update this actual length (if it's the owner that we're extending, IWC all other references to the same data lose ownership) or reallocate the array.


-- 

« First   ‹ Prev
1 2