Jump to page: 1 2
Thread overview
toUTFxx returns null references
Feb 10, 2005
Derek Parnell
Feb 10, 2005
Derek
Feb 10, 2005
Derek
Feb 10, 2005
Derek
Feb 10, 2005
Regan Heath
Feb 10, 2005
Derek Parnell
Feb 13, 2005
Regan Heath
February 10, 2005
I do not know if this is a bug or not.

The toUTF32(), toUTF16(), and toUTF8() routines return a null reference if the input parameter is an empty string. I would have thought that they should return an empty string instead. The only exception is when the parameter is the same type as the return value's type, in that case they return an empty string.

Example code...
<code>
import std.utf;
import std.stdio;

void main()
{
   char[] s = "";
   dchar[] d;

   if (s is null)
    writefln("s is null");
   else
    writefln("s length is %d", s.length);

   d = toUTF32(s);
   if (d is null)
    writefln("d is null");
   else
    writefln("d length is %d", d.length);
}
</code>

-- 
Derek
Melbourne, Australia
10/02/2005 7:28:31 PM
February 10, 2005
Derek Parnell wrote:

> I do not know if this is a bug or not.

Confusing, but I don't really think it's a bug...

(maybe the std routines need to be more similar to
eachother, either all return null or all return "",
but both types of return values are OK to use, below:)

> The toUTF32(), toUTF16(), and toUTF8() routines return a null reference if
> the input parameter is an empty string. I would have thought that they
> should return an empty string instead. The only exception is when the
> parameter is the same type as the return value's type, in that case they
> return an empty string.

I believe that in D, the empty string is "equal" to null.

http://www.digitalmars.com/d/cppstrings.html:
>  In D, an empty string is just null:
> 
> 	char[] str;
> 	if (!str)
> 		// string is empty

That works the same with either null or "", and this too:

> import std.stdio;
> 
> void main()
> {
>    char[] s = "";
>    char[] d = null;
>    
>    writefln("s is %snull", s is null ? "" : "not ");
>    writefln("s length is %d", s.length);
> 
>    writefln("d is %snull", d is null ? "" : "not ");
>    writefln("d length is %d", d.length);
> } 

s is not null
s length is 0
d is null
d length is 0

Which means that whether it is "" or null, it'll compare
and work the same to the rest of code ? Unless C is involved,
since s.ptr will point to a '\0', but d.ptr points to null.

But that will work itself out in the toStringz process...
(since D strings have to be zero-terminate for C anyway)

--anders
February 10, 2005
On Thu, 10 Feb 2005 09:59:39 +0100, Anders F Björklund wrote:

> Derek Parnell wrote:
> 
>> I do not know if this is a bug or not.
> 
> Confusing, but I don't really think it's a bug...
> 
> (maybe the std routines need to be more similar to
> eachother, either all return null or all return "",
> but both types of return values are OK to use, below:)
> 
>> The toUTF32(), toUTF16(), and toUTF8() routines return a null reference if the input parameter is an empty string. I would have thought that they should return an empty string instead. The only exception is when the parameter is the same type as the return value's type, in that case they return an empty string.
> 
> I believe that in D, the empty string is "equal" to null.
> 
> http://www.digitalmars.com/d/cppstrings.html:
>>  In D, an empty string is just null:
>> 
>> 	char[] str;
>> 	if (!str)
>> 		// string is empty
> 
> That works the same with either null or "", and this too:
> 
>> import std.stdio;
>> 
>> void main()
>> {
>>    char[] s = "";
>>    char[] d = null;
>> 
>>    writefln("s is %snull", s is null ? "" : "not ");
>>    writefln("s length is %d", s.length);
>> 
>>    writefln("d is %snull", d is null ? "" : "not ");
>>    writefln("d length is %d", d.length);
>> }
> 
> s is not null
> s length is 0
> d is null
> d length is 0
> 
> Which means that whether it is "" or null, it'll compare
> and work the same to the rest of code ? Unless C is involved,
> since s.ptr will point to a '\0', but d.ptr points to null.
> 
> But that will work itself out in the toStringz process... (since D strings have to be zero-terminate for C anyway)

If discovered this behaviour when I used an 'in' contract in a function ...

  bool foo(dchar[] X, dchar[] Y)
  in {
    assert( ! (X is null) );
    assert( ! (Y is null) );
 }
 body { . . .  }


So what you seem to be saying is that I shouldn't bother checking that a dynamic array reference is null or not. Instead I can just check the length. However, I was trying to trap the case in which the function was called with an uninitialized array. Calling it with a empty array is ok though.

A fuller example in which it tripped me up ...

<code>
import std.utf;
import std.stdio;

bool foo(dchar[] X, dchar[] Y)
  in {
    assert( ! (X is null) );
    assert( ! (Y is null) );
 }
 body {
     return true;  }

bool foo(char[] X, char[] Y)
{
   return foo( toUTF32(X), toUTF32(Y) );
}

bool foo(wchar[] X, wchar[] Y)
{
   return foo( toUTF32(X), toUTF32(Y) );
}

unittest {
   dchar[] a;
   dchar[] b;

   a = "";
   b = "123";
   debug(1) writefln("UT1");
   assert( foo(toUTF32(a), toUTF32(b) ) );

   debug(1) writefln("UT2");
   assert( foo(toUTF16(a), toUTF16(b) ) );

   debug(1) writefln("UT3");
   assert( foo(toUTF8(a),  toUTF8(b) ) );
}

</code>

Compiled with    dmd test -debug -unittest

It fails on Unit Test #2. This was totally unexpected.

-- 
Derek
Melbourne, Australia
February 10, 2005
Derek wrote:

> So what you seem to be saying is that I shouldn't bother checking that a
> dynamic array reference is null or not. Instead I can just check the
> length. However, I was trying to trap the case in which the function was
> called with an uninitialized array. Calling it with a empty array is ok
> though.

No,
I don't think you should bother to differ between null and .length == 0.

> bool foo(dchar[] X, dchar[] Y)
>   in {
>     assert( ! (X is null) );
>     assert( ! (Y is null) );
>  }
>  body { 
>      return true;  }

The "recommended" way to write that is:

	assert(X);
	assert(Y);

Since D doesn't have booleans, that is ?
(and since the long form is an eye-sore)


I'm not sure what you are trying to test, but:

int main()
{
  char[] nullstr = null;
  assert(nullstr == "");
  assert("" == nullstr);
  return 0;
}

This test does not fail, and does not segfault...
(like it would have done if nullstr was an Object:)

int main()
{
  Object nullobj = null;
  assert(nullobj == null); // <-- KABOOM
  assert(null == nullobj); // <-- KABOOM
  return 0;
}

This second program *must* be rewritten with "is".
(since using '==' with class objects calls opEquals)

Pointers are OK too:

int main()
{
  void* nullptr = null;
  assert(nullptr == null);
  assert(null == nullptr);
  return 0;
}

To be on the safe side, one can use "is" always...
(i.e. with pointers/objects, but *not* with strings
since that only compares the references, like in Java)

--anders
February 10, 2005
On Thu, 10 Feb 2005 14:10:47 +0100, Anders F Björklund wrote:

> 
> I'm not sure what you are trying to test, but:

I'm testing for this ...

 void main()
 {
    char[] nullstr;

    assert( ! (nullstr is null) );
 }

Namely, the attempted use of a string that has never had any assignment yet.

But as toUTFxx() returns that something that looks like an unassigned string, I can't test for unassigned strings.

I still think that the toUTFxx() functions should return an empty string if an empty string was passed to them.

-- 
Derek
Melbourne, Australia
February 10, 2005
Derek wrote:

>>I'm not sure what you are trying to test, but:
> 
> I'm testing for this ...
>  
>  void main()
>  {
>     char[] nullstr;
> 
>     assert( ! (nullstr is null) );
>  }
> 
> Namely, the attempted use of a string that has never had any assignment
> yet.

There is nothing wrong with using an unassigned string,
since all arrays (including char[]) default to length 0...

You can pass "nullstr" to writefln and friends, just fine.

> But as toUTFxx() returns that something that looks like an unassigned
> string, I can't test for unassigned strings. 

If you really, really, want to test for "unassigned" strings - use .ptr:

void main()
{
  char[] s = "";
  char[] d = null;

  assert(s.ptr != null);
  assert(d.ptr == null);
}

This is because the ptr of a string literal will point to a '\0' char.

> I still think that the toUTFxx() functions should return an empty string if
> an empty string was passed to them.

There is *no* difference in D, between null and the empty string.

They both have the length property set to 0, and they're equal.
(not identical, though, so using "is" between them will fail)

--anders
February 10, 2005
On Thu, 10 Feb 2005 15:21:21 +0100, Anders F Björklund wrote:

> Derek wrote:
> 
>>>I'm not sure what you are trying to test, but:
>> 
>> I'm testing for this ...
>> 
>>  void main()
>>  {
>>     char[] nullstr;
>> 
>>     assert( ! (nullstr is null) );
>>  }
>> 
>> Namely, the attempted use of a string that has never had any assignment yet.
> 
> There is nothing wrong with using an unassigned string, since all arrays (including char[]) default to length 0...
> 
> You can pass "nullstr" to writefln and friends, just fine.
> 
>> But as toUTFxx() returns that something that looks like an unassigned string, I can't test for unassigned strings.
> 
> If you really, really, want to test for "unassigned" strings - use .ptr:
> 
> void main()
> {
>    char[] s = "";
>    char[] d = null;
> 
>    assert(s.ptr != null);
>    assert(d.ptr == null);
> }
> 
> This is because the ptr of a string literal will point to a '\0' char.
> 
>> I still think that the toUTFxx() functions should return an empty string if an empty string was passed to them.
> 
> There is *no* difference in D, between null and the empty string.
> 
> They both have the length property set to 0, and they're equal. (not identical, though, so using "is" between them will fail)

Yes, I understand the technical aspect of this. However, I was attempting to help the coder trap mistakes; namely the use of unassigned strings. The assumption is that if a coder declares a string, and uses it before assigning anything to it, then it might mean that there is a logic error in the code. This is slightly different from the use of numbers, as most people expect that numbers are zero upon declaration. But still, its just a philosophy question really. Walter has decided for us that unassigned variables are an acceptable practice, where as pedantic people such as myself think that they might indicate errors in coding.

I will, no doubt, have to adjust to the given situation as it ain't gonna change ;-)

-- 
Derek
Melbourne, Australia
February 10, 2005
On Thu, 10 Feb 2005 15:21:21 +0100, Anders F Björklund <afb@algonet.se> wrote:
> There is *no* difference in D, between null and the empty string.

There is a difference, internally, but D treats them the same. Which is probably what you meant, but I'm just being thourough. :)

A null string has ptr == null, an empty string has ptr == "".

In some instances it is crucial to be able to tell these cases apart:
 1- value does not exist (null)
 2- value is blank       (empty string)

To check for case 1, we can go "if (s is null)"
To check for case 2, we can go "if (s.length == 0)"

eg. Simple example where it is important:

User enters data into a text field (A) on a web page, leaves text field (B) blank, the code is saving the values of these two fields somewhere i.e. in a database containing 3 settings A, B and C.

The presence of the emtpy field (B) on the page indicates any previous value for that setting should be overwritten with the empty value.

The absense of the field (C) indicates that any previous value of the setting should not be overwritten but kept.

Regan
February 10, 2005
On Fri, 11 Feb 2005 10:05:06 +1300, Regan Heath wrote:

> On Thu, 10 Feb 2005 15:21:21 +0100, Anders F Björklund <afb@algonet.se> wrote:
>> There is *no* difference in D, between null and the empty string.
> 
> There is a difference, internally, but D treats them the same. Which is probably what you meant, but I'm just being thourough. :)
> 
> A null string has ptr == null, an empty string has ptr == "".
> 
> In some instances it is crucial to be able to tell these cases apart:
>   1- value does not exist (null)
>   2- value is blank       (empty string)

Exactly! Well said.

-- 
Derek
Melbourne, Australia
11/02/2005 9:49:04 AM
February 11, 2005
Derek Parnell wrote:

>There is *no* difference in D, between null and the empty string.
>>
>>There is a difference, internally, but D treats them the same. Which is  probably what you meant, but I'm just being thourough. :)

More or less, yes. But that's more of an Implementation Quirk™.

The D specification explicitly says:

http://www.digitalmars.com/d/arrays.html
> Array Initialization
> 
>     * Dynamic arrays are initialized to having 0 elements.

http://www.digitalmars.com/d/cppstrings.html
> Checking For Empty Strings
>
>  In D, an empty string is just null:
> 
> 	char[] str;
> 	if (!str)
> 		// string is empty

But in practice, they do differ - in the ptr to the '\0' (for C).
(but both has a length property of 0, though, as mentioned earlier)

And when you copy the char[], this ptr settings follows as well...
This means that there is a way to trace if it has been set to "".


>>A null string has ptr == null, an empty string has ptr == "".
>>
>>In some instances it is crucial to be able to tell these cases apart:
>>  1- value does not exist (null)
>>  2- value is blank       (empty string)
> 
> Exactly! Well said.

But strings in D are not objects or pointers, they are arrays...
And arrays are initialized to have the length zero, in the spec.

Thus, that makes them similar to e.g. an integer that is initialized
with a zero ? You will have to check if they are modified in some
other way. Or just rely on the "string.ptr" value, since that will
work as long as D supports calling C functions with string literals...


But technically, there is no difference in D between "" and null.
Which is probably why the standard library mixes them freely ?

To recap:

""
    .length = 0
    .ptr = &'\0'

null
    .length = 0
    .ptr = null

> void main()
> {
>   char[] emptystr = "";
>   char[] nullstr = null;
> 
>   assert(emptystr == nullstr);
>   assert(!(emptystr is nullstr));
> 
>   assert(emptystr.length == nullstr.length);
>   assert(!(emptystr.ptr is nullstr.ptr));
> }

And the D standard library should probably be "fixed" to return
null for null and "" for "" anyway, even if it not's in the spec ?

Care to write a full unittest for it ? (at least for all of std.utf)

--anders
« First   ‹ Prev
1 2