Thread overview
[Issue 2934] New: "".dup does not return empty string
May 04, 2009
d-bugmail
May 04, 2009
d-bugmail
May 04, 2009
d-bugmail
May 04, 2009
Derek Parnell
May 05, 2009
Derek Parnell
May 05, 2009
d-bugmail
May 05, 2009
d-bugmail
May 04, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2934

           Summary: "".dup does not return empty string
           Product: D
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: qian.xu@funkwerk-itk.com


The following code will throw an exception:
  char[] s;
  assert( s.dup  is null); // OK
  assert("".dup !is null); // FAILED

"".dup is expectly also an empty string.

Confirmed with dmd v1, gdc


-- 

May 04, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2934





------- Comment #1 from qian.xu@funkwerk-itk.com  2009-05-04 09:25 -------
Sorry. I should have post the following code:

  char[] s;
  assert(s     is null);
  assert(s.dup is null);

  assert(""     !is null); // OK
  assert("".dup !is null); // FAILED

The last two lines behave not consistent. Either both are failed, or both are passed.


-- 

May 04, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2934


schveiguy@yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Comment #2 from schveiguy@yahoo.com  2009-05-04 12:45 -------
From posts in the newsgroup, I've determined that this bug is invalid:

1. Duplicating an empty array should always return a null array.  Otherwise, you'd have to allocate space to store 0 data bytes in order for the result to be non-null.

2. String literals have a null character implicitly appended to them by the compiler.  This is done to ease calling c functions.  So a string literal's pointer cannot be null, since it has to point to a static zero byte.

The spec identifies specifically item 2 here: http://www.digitalmars.com/d/1.0/arrays.html#strings

see the section describing "C's printf and Strings"

I could not find a reference for item 1, but I remember reading something about it.  Regardless of it is identified specifically in the spec or not, it is not a bug, as the alternative would be to allocate blocks for 0-sized arrays.


-- 

May 04, 2009
On Mon, 4 May 2009 17:44:56 +0000 (UTC), d-bugmail@puremagic.com wrote:

> http://d.puremagic.com/issues/show_bug.cgi?id=2934
> 
> schveiguy@yahoo.com changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |RESOLVED
>          Resolution|                            |INVALID
> 
> ------- Comment #2 from schveiguy@yahoo.com  2009-05-04 12:45 -------
> From posts in the newsgroup, I've determined that this bug is invalid:
> 
> 1. Duplicating an empty array should always return a null array.  Otherwise, you'd have to allocate space to store 0 data bytes in order for the result to be non-null.
> 
> 2. String literals have a null character implicitly appended to them by the compiler.  This is done to ease calling c functions.  So a string literal's pointer cannot be null, since it has to point to a static zero byte.
> 
> The spec identifies specifically item 2 here: http://www.digitalmars.com/d/1.0/arrays.html#strings
> 
> see the section describing "C's printf and Strings"
> 
> I could not find a reference for item 1, but I remember reading something about it.  Regardless of it is identified specifically in the spec or not, it is not a bug, as the alternative would be to allocate blocks for 0-sized arrays.

Huh??? Duplicating something should give one a duplicate.

I do not think that this is an invalid bug.

Ok, so duplicating an empty array causes memory to be allocated - so what! I asked for a duplicate so give me a duplicate, please.

To me, the "no surprise" path is simple. Duplicating an empty array should return an empty array. Duplicating a null array should return a null array.

Is that not intuitive?

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
May 04, 2009
On Mon, 04 May 2009 16:56:49 -0400, Derek Parnell <derek@psych.ward> wrote:

> On Mon, 4 May 2009 17:44:56 +0000 (UTC), d-bugmail@puremagic.com wrote:
>
>> http://d.puremagic.com/issues/show_bug.cgi?id=2934
>>
>> schveiguy@yahoo.com changed:
>>
>>            What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>              Status|NEW                         |RESOLVED
>>          Resolution|                            |INVALID
>>
>> ------- Comment #2 from schveiguy@yahoo.com  2009-05-04 12:45 -------
>> From posts in the newsgroup, I've determined that this bug is invalid:
>>
>> 1. Duplicating an empty array should always return a null array.  Otherwise,
>> you'd have to allocate space to store 0 data bytes in order for the result to
>> be non-null.
>>
>> 2. String literals have a null character implicitly appended to them by the
>> compiler.  This is done to ease calling c functions.  So a string literal's
>> pointer cannot be null, since it has to point to a static zero byte.
>>
>> The spec identifies specifically item 2 here:
>> http://www.digitalmars.com/d/1.0/arrays.html#strings
>>
>> see the section describing "C's printf and Strings"
>>
>> I could not find a reference for item 1, but I remember reading something about
>> it.  Regardless of it is identified specifically in the spec or not, it is not
>> a bug, as the alternative would be to allocate blocks for 0-sized arrays.
>
> Huh??? Duplicating something should give one a duplicate.
>
> I do not think that this is an invalid bug.
>
> Ok, so duplicating an empty array causes memory to be allocated - so what!
> I asked for a duplicate so give me a duplicate, please.
>
> To me, the "no surprise" path is simple. Duplicating an empty array should
> return an empty array. Duplicating a null array should return a null array.
>
> Is that not intuitive?
>

what's not intuitive is comparing an array (which is a struct) to null.

char[] arr1 = "";
char[] arr2 = null;

assert(arr1 == arr2); // OK
assert(arr1 == null); // FAIL

I'd say that comparing an array to null should always succeed if the array is empty, but I guess some people may use the fact that the pointer is not null in an empty array.  I definitely don't want the runtime to allocate blocks of data when requested to allocate 0 bytes.

In any case, this bug is not valid, because the compiler acts as specified by the spec.

I never compare arrays to null if I can remember, I always check the length instead, which is consistent for both null and empty arrays.

-Steve
May 05, 2009
On Mon, 04 May 2009 17:16:45 -0400, Steven Schveighoffer wrote:


> what's not intuitive is comparing an array (which is a struct) to null.

Hmmm ... interesting. I regard the array not as a struct but as a concept
implemented in D as a struct.

> char[] arr1 = "";
> char[] arr2 = null;
> 
> assert(arr1 == arr2); // OK
> assert(arr1 == null); // FAIL
> 
> I'd say that comparing an array to null should always succeed if the array is empty, but I guess some people may use the fact that the pointer is not null in an empty array.

Yes, some people rely on the distinction.

However, I think that this ought to be the case ...

 char[] arr1 = "";
 char[] arr2 = null;

 assert(arr1 == arr2); // FAIL
 assert(arr1 == null); // FAIL

 assert(arr2 == ""); // FAIL
 assert(arr2 == arr1); // FAIL

 assert(null == ""); // FAIL

Simply because an empty array is one with an allocation and a null array is one without an allocation therefore they are not the same thing. So the '==' equality test should tell the coder that there are two different beasties at play here.

I know that there is an "efficiency" aspect to this.

A "proper" test IMO is that an array is null if arr.ptr == null and arr.length = 0, but I suspect that will be evil to the speed aficionados.


>  I definitely don't want the runtime to allocate
> blocks of data when requested to allocate 0 bytes.

Then don't allocate zero bytes.

> In any case, this bug is not valid, because the compiler acts as specified by the spec.

I'm having trouble locating the specification for this.

> I never compare arrays to null if I can remember, I always check the length instead, which is consistent for both null and empty arrays.

I do the same as you.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
May 05, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2934


qian.xu@funkwerk-itk.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |




------- Comment #3 from qian.xu@funkwerk-itk.com  2009-05-05 02:46 -------
> 2. String literals have a null character implicitly appended to them by the compiler.  This is done to ease calling c functions.  So a string literal's pointer cannot be null, since it has to point to a static zero byte.

I am fully agree with you. But before using ".dup" a string variable has triple-state (null, empty or not empty). After adding a ".dup" to an empty string, it might be reduced to two. This will break existing code, if defensive copies of strings are made.

An example is as follows:

  class test {
    private char[] val;
    char[] getVal() {
      return val.dup; // make a defensive copy to avoid unexpected change from
outside
    }
    void setVal(char[] val) {
      this.val = val.dup;
    }
  }

  myTestObj.setVal("");
  char[] s = myTestObj.getVal;
  if (s is null) {
    // do task 1
  }
  else if (s == "") {
    // do task 2
  }
  else {
    // do task 3
  }

In this case, task 2 is expected to be performed. However task 1 will be performed.


> Regardless of it is identified specifically in the spec or not, it is not a bug, as the alternative would be to allocate blocks for 0-sized arrays.

Did you mean, that this is a feature request? I would like to regard the inconsistency of the dup-effect as a defect.


-- 

May 05, 2009
On Mon, 04 May 2009 20:02:01 -0400, Derek Parnell <derek@psych.ward> wrote:

> On Mon, 04 May 2009 17:16:45 -0400, Steven Schveighoffer wrote:
>
>
>> what's not intuitive is comparing an array (which is a struct) to null.
>
> Hmmm ... interesting. I regard the array not as a struct but as a concept
> implemented in D as a struct.

Yes, but null is a pointer.  Can I make just any struct with a pointer, and expect to be able to compare it to null (and have it direct that comparision to the pointer)?

The distinction that an array is a struct and not a pointer or reference is one of the frequent causes of newbie frustration, because they just don't get it at first.  I know of no other language that implements arrays like this (where the length is local, but the data is shared).

It's also one of the gems of D if you learn to use it correctly.

>
>> char[] arr1 = "";
>> char[] arr2 = null;
>>
>> assert(arr1 == arr2); // OK
>> assert(arr1 == null); // FAIL
>>
>> I'd say that comparing an array to null should always succeed if the array
>> is empty, but I guess some people may use the fact that the pointer is not
>> null in an empty array.
>
> Yes, some people rely on the distinction.
>
> However, I think that this ought to be the case ...
>
>  char[] arr1 = "";
>  char[] arr2 = null;
> assert(arr1 == arr2); // FAIL
>  assert(arr1 == null); // FAIL
>
>  assert(arr2 == ""); // FAIL
>  assert(arr2 == arr1); // FAIL
>
>  assert(null == ""); // FAIL
>
> Simply because an empty array is one with an allocation and a null array is
> one without an allocation therefore they are not the same thing. So the
> '==' equality test should tell the coder that there are two different
> beasties at play here.

I would be also fine with this, as it would discourage comparing to null.  I'd also be fine with comparing an array to null being a syntax error.  You can always do arr.ptr == null.

> I know that there is an "efficiency" aspect to this.
>
> A "proper" test IMO is that an array is null if arr.ptr == null and
> arr.length = 0, but I suspect that will be evil to the speed aficionados.

Such an array is an anomaly, and shouldn't ever occur, unless someone forces it by setting the ptr specifically.  I don't think it's worth the extra code to cover this very rare possibility.

>>  I definitely don't want the runtime to allocate
>> blocks of data when requested to allocate 0 bytes.
>
> Then don't allocate zero bytes.

Sometimes, you don't know whether it's going to be zero bytes or not until runtime.  I don't want to have to check for zero-length arrays everywhere I dup, when the GC does it for me.

>
>> In any case, this bug is not valid, because the compiler acts as specified
>> by the spec.
>
> I'm having trouble locating the specification for this.

As far as the "" being not null, the spec does talk about it (although indirectly) as I cited in the original bug resolution.  As far as returning a null array when allocating zero bytes, there is nothing I could find in the spec, but this means it's up to the implementer.  So the implementation does not violate the spec, and it can be considered desired behavior, not an accident.

I'd be interested to know what Walter had in mind.

-Steve
May 05, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2934





------- Comment #4 from schveiguy@yahoo.com  2009-05-05 08:48 -------
From that point of view, your request makes a lot more sense.

But there are two counter arguments:

1. Comparing an array to null has limited utility, I don't think it should be in widespread use, as most of the time you only care if the array is empty or not.  There may be special cases, but in those cases, you can use arr.ptr == null.  It would have been much better if arr == null never compiled.

2. Duping an empty array has limited defensive utility.  You can just as easily return the array itself.  If it weren't for the horrendous append behavior, it would be a no brainer:

T[] edup(T)(T[] arr)
{
   return arr.length == 0 ? arr : arr.dup;
}

usage:

return arr.edup();

Allocating data for duping an empty array is not an acceptable pessimization. However, I thought of another possible solution:  A dup of an empty, non-null array can return a pointer into the read only data segment.  This would allow a non-allocation on duping an empty array, would not return a pointer to null, and would not accidentally overwrite the original array if appending is done.

So a fix can be done.


--