Thread overview
String Slice/Concatenate bug
Aug 18, 2005
Russ Lewis
Aug 19, 2005
Ben Hinkle
Aug 19, 2005
Russ Lewis
Aug 19, 2005
Ben Hinkle
Aug 19, 2005
Russ Lewis
Aug 19, 2005
Ben Hinkle
Aug 19, 2005
Derek Parnell
August 18, 2005
DMD 0.129, Linux (Fedora Core 3)

Notice that in the output, the 2nd line starts with a 4 instead of a 5.  As it says in the comment, it will work if you .dup the slice.

> [russ@russ dmd_bugs]$ cat slice.d
> import std.stdio;
> 
> void main() {
>   char[] foo = "1234567890".dup;
>   char[] bar;
>   bar = foo[0..1];  // works if you append .dup here
>   foo = foo[1..$];
>   writefln(foo," ",bar);
>   bar ~= ","~foo[0..3];
>   foo  = foo[3..$];
>   writefln(foo," ",bar);
> }
> 
> [russ@russ dmd_bugs]$ dmd slice.d
> gcc slice.o -o slice -lphobos -lpthread -lm
> [russ@russ dmd_bugs]$ ./slice
> 234567890 1
> 467890 1,234
> [russ@russ dmd_bugs]$
August 19, 2005
"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de376q$lmr$1@digitaldaemon.com...
> DMD 0.129, Linux (Fedora Core 3)
>
> Notice that in the output, the 2nd line starts with a 4 instead of a 5. As it says in the comment, it will work if you .dup the slice.
>
>> [russ@russ dmd_bugs]$ cat slice.d
>> import std.stdio;
>>
>> void main() {
>>   char[] foo = "1234567890".dup;
>>   char[] bar;
>>   bar = foo[0..1];  // works if you append .dup here
>>   foo = foo[1..$];
>>   writefln(foo," ",bar);
>>   bar ~= ","~foo[0..3];
>>   foo  = foo[3..$];
>>   writefln(foo," ",bar);
>> }
>>
>> [russ@russ dmd_bugs]$ dmd slice.d
>> gcc slice.o -o slice -lphobos -lpthread -lm
>> [russ@russ dmd_bugs]$ ./slice
>> 234567890 1
>> 467890 1,234
>> [russ@russ dmd_bugs]$

I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.


August 19, 2005
Ben Hinkle wrote:
> "Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de376q$lmr$1@digitaldaemon.com...
> 
>>DMD 0.129, Linux (Fedora Core 3)
>>
>>Notice that in the output, the 2nd line starts with a 4 instead of a 5. As it says in the comment, it will work if you .dup the slice.
>>
>>
>>>[russ@russ dmd_bugs]$ cat slice.d
>>>import std.stdio;
>>>
>>>void main() {
>>>  char[] foo = "1234567890".dup;
>>>  char[] bar;
>>>  bar = foo[0..1];  // works if you append .dup here
>>>  foo = foo[1..$];
>>>  writefln(foo," ",bar);
>>>  bar ~= ","~foo[0..3];
>>>  foo  = foo[3..$];
>>>  writefln(foo," ",bar);
>>>}
>>>
>>>[russ@russ dmd_bugs]$ dmd slice.d
>>>gcc slice.o -o slice -lphobos -lpthread -lm
>>>[russ@russ dmd_bugs]$ ./slice
>>>234567890 1
>>>467890 1,234
>>>[russ@russ dmd_bugs]$
> 
> 
> I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting. 

I don't remember if this is documented or not, either.  My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.
August 19, 2005
"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de38os$nba$1@digitaldaemon.com...
> Ben Hinkle wrote:
>> "Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de376q$lmr$1@digitaldaemon.com...
>>
>>>DMD 0.129, Linux (Fedora Core 3)
>>>
>>>Notice that in the output, the 2nd line starts with a 4 instead of a 5. As it says in the comment, it will work if you .dup the slice.
>>>
>>>
>>>>[russ@russ dmd_bugs]$ cat slice.d
>>>>import std.stdio;
>>>>
>>>>void main() {
>>>>  char[] foo = "1234567890".dup;
>>>>  char[] bar;
>>>>  bar = foo[0..1];  // works if you append .dup here
>>>>  foo = foo[1..$];
>>>>  writefln(foo," ",bar);
>>>>  bar ~= ","~foo[0..3];
>>>>  foo  = foo[3..$];
>>>>  writefln(foo," ",bar);
>>>>}
>>>>
>>>>[russ@russ dmd_bugs]$ dmd slice.d
>>>>gcc slice.o -o slice -lphobos -lpthread -lm
>>>>[russ@russ dmd_bugs]$ ./slice
>>>>234567890 1
>>>>467890 1,234
>>>>[russ@russ dmd_bugs]$
>>
>>
>> I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
>
> I don't remember if this is documented or not, either.  My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.

I think the problem is it can't tell that foo is using the memory following
bar. All bar knows is that it's a pointer to the start of an allocation
block that can hold the requested addition.
Thinking about it some more, it seems like users of ~= must know if the
memory following the array is "in use". If it could be (or is) then the user
must dup explicitly. That means the +1 that Walter had to add to all memory
allocations can go away because slicing off the end of an array shouldn't be
extended using ~=. That would mean your bug actually has a silver lining
since I've never liked that +1. Put the burden on the user to know if they
can extend safely - just like COW the rule is "don't ~= in memory you don't
own".


August 19, 2005
Ben Hinkle wrote:
> "Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de38os$nba$1@digitaldaemon.com...
> 
>>Ben Hinkle wrote:
>>
>>>"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de376q$lmr$1@digitaldaemon.com...
>>>
>>>
>>>>DMD 0.129, Linux (Fedora Core 3)
>>>>
>>>>Notice that in the output, the 2nd line starts with a 4 instead of a 5. As it says in the comment, it will work if you .dup the slice.
>>>>
>>>>
>>>>
>>>>>[russ@russ dmd_bugs]$ cat slice.d
>>>>>import std.stdio;
>>>>>
>>>>>void main() {
>>>>> char[] foo = "1234567890".dup;
>>>>> char[] bar;
>>>>> bar = foo[0..1];  // works if you append .dup here
>>>>> foo = foo[1..$];
>>>>> writefln(foo," ",bar);
>>>>> bar ~= ","~foo[0..3];
>>>>> foo  = foo[3..$];
>>>>> writefln(foo," ",bar);
>>>>>}
>>>>>
>>>>>[russ@russ dmd_bugs]$ dmd slice.d
>>>>>gcc slice.o -o slice -lphobos -lpthread -lm
>>>>>[russ@russ dmd_bugs]$ ./slice
>>>>>234567890 1
>>>>>467890 1,234
>>>>>[russ@russ dmd_bugs]$
>>>
>>>
>>>I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
>>
>>I don't remember if this is documented or not, either.  My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.
> 
> I think the problem is it can't tell that foo is using the memory following bar. All bar knows is that it's a pointer to the start of an allocation block that can hold the requested addition.
> Thinking about it some more, it seems like users of ~= must know if the memory following the array is "in use". If it could be (or is) then the user must dup explicitly. That means the +1 that Walter had to add to all memory allocations can go away because slicing off the end of an array shouldn't be extended using ~=. That would mean your bug actually has a silver lining since I've never liked that +1. Put the burden on the user to know if they can extend safely - just like COW the rule is "don't ~= in memory you don't own". 

I hear you, but it seems to me that if you are in the middle of an allocation, and you're only using part of it, then you should *assume* that the rest of the string is being used by somebody else.  It's not always true, but it often will be.

Just my opinion, though.  I'd love to hear what the official word is.
August 19, 2005
On Thu, 18 Aug 2005 16:56:09 -0700, Russ Lewis wrote:

> DMD 0.129, Linux (Fedora Core 3)
> 
> Notice that in the output, the 2nd line starts with a 4 instead of a 5.
>   As it says in the comment, it will work if you .dup the slice.
> 
>> [russ@russ dmd_bugs]$ cat slice.d
>> import std.stdio;
>> 
>> void main() {
>>   char[] foo = "1234567890".dup;
>>   char[] bar;
>>   bar = foo[0..1];  // works if you append .dup here
>>   foo = foo[1..$];
>>   writefln(foo," ",bar);
>>   bar ~= ","~foo[0..3];
>>   foo  = foo[3..$];
>>   writefln(foo," ",bar);
>> }
>> 
>> [russ@russ dmd_bugs]$ dmd slice.d
>> gcc slice.o -o slice -lphobos -lpthread -lm
>> [russ@russ dmd_bugs]$ ./slice
>> 234567890 1
>> 467890 1,234
>> [russ@russ dmd_bugs]$

This is a surprise. I was under the impression that *all* concatenations caused an automatic dup operation.

It will also work if you have ...

   bar = bar~","~foo[0..3];

Damn ... now I have to go back and check my existing code to make sure I didn't use this construct anywhere.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
19/08/2005 1:48:42 PM
August 19, 2005
"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de3ino$vld$1@digitaldaemon.com...
> Ben Hinkle wrote:
>> "Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de38os$nba$1@digitaldaemon.com...
>>
>>>Ben Hinkle wrote:
>>>
>>>>"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:de376q$lmr$1@digitaldaemon.com...
>>>>
>>>>
>>>>>DMD 0.129, Linux (Fedora Core 3)
>>>>>
>>>>>Notice that in the output, the 2nd line starts with a 4 instead of a 5. As it says in the comment, it will work if you .dup the slice.
>>>>>
>>>>>>[russ@russ dmd_bugs]$ cat slice.d
>>>>>>import std.stdio;
>>>>>>
>>>>>>void main() {
>>>>>> char[] foo = "1234567890".dup;
>>>>>> char[] bar;
>>>>>> bar = foo[0..1];  // works if you append .dup here
>>>>>> foo = foo[1..$];
>>>>>> writefln(foo," ",bar);
>>>>>> bar ~= ","~foo[0..3];
>>>>>> foo  = foo[3..$];
>>>>>> writefln(foo," ",bar);
>>>>>>}
>>>>>>
>>>>>>[russ@russ dmd_bugs]$ dmd slice.d
>>>>>>gcc slice.o -o slice -lphobos -lpthread -lm
>>>>>>[russ@russ dmd_bugs]$ ./slice
>>>>>>234567890 1
>>>>>>467890 1,234
>>>>>>[russ@russ dmd_bugs]$
>>>>
>>>>I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
>>>
>>>I don't remember if this is documented or not, either.  My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.
>>
>> I think the problem is it can't tell that foo is using the memory
>> following bar. All bar knows is that it's a pointer to the start of an
>> allocation block that can hold the requested addition.
>> Thinking about it some more, it seems like users of ~= must know if the
>> memory following the array is "in use". If it could be (or is) then the
>> user must dup explicitly. That means the +1 that Walter had to add to all
>> memory allocations can go away because slicing off the end of an array
>> shouldn't be extended using ~=. That would mean your bug actually has a
>> silver lining since I've never liked that +1. Put the burden on the user
>> to know if they can extend safely - just like COW the rule is "don't ~=
>> in memory you don't own".
>
> I hear you, but it seems to me that if you are in the middle of an allocation, and you're only using part of it, then you should *assume* that the rest of the string is being used by somebody else.  It's not always true, but it often will be.
>
> Just my opinion, though.  I'd love to hear what the official word is.

I don't follow who "you" is. Are you saying the compiler should do something
different for your example than what it is doing now? Or by "you" do you
mean the programmer?
Just to be clear, what I'm saying is that
1) ~= do what it does today but document the duping behavior
2) functions that use ~= on inputs should document it so callers can take
action to avoid extending into live memory
3) the +1 that is added to gc allocations should be removed since it is
the user's responsibility to manage safe extensions.

The alternative of having ~= dup every time will mean building arrays in a
loop using ~= will waste lots of memory (and time spent duping).
The doc http://www.digitalmars.com/d/arrays.html#resize does say that you
should avoid setting length or cat'ing with slices.