toUTFxx returns null references (page 2)

February 13, 2005
Re: toUTFxx returns null references
Posted by Regan Heath
in reply to Anders F Björklund
Permalink
Regan Heath
Posted in reply to Anders F Björklund
Permalink
On Fri, 11 Feb 2005 17:54:45 +0100, Anders F Björklund <afb@algonet.se> wrote:
> Derek Parnell wrote:
>
>> There is *no* difference in D, between null and the empty string.
>>>
>>> There is a difference, internally, but D treats them the same. Which is  probably what you meant, but I'm just being thourough. :)
>
> More or less, yes. But that's more of an Implementation Quirk™.

Which worries me because I believe there is a real need to tell them apart.

So, I ask that this behaviour be specified, or another method to achieve the same thing be specified.

> The D specification explicitly says:
>
> http://www.digitalmars.com/d/arrays.html
>> Array Initialization
>>      * Dynamic arrays are initialized to having 0 elements.
>
> http://www.digitalmars.com/d/cppstrings.html
>> Checking For Empty Strings
>>
>>  In D, an empty string is just null:
>>  	char[] str;
>> 	if (!str)
>> 		// string is empty
>
> But in practice, they do differ - in the ptr to the '\0' (for C).
> (but both has a length property of 0, though, as mentioned earlier)

Sure, exactly what I said.

> And when you copy the char[], this ptr settings follows as well...
> This means that there is a way to trace if it has been set to "".

Yep, I want this behaviour to be specified. (or some other method to achieve what I want)

>>> A null string has ptr == null, an empty string has ptr == "".
>>>
>>> In some instances it is crucial to be able to tell these cases apart:
>>>  1- value does not exist (null)
>>>  2- value is blank       (empty string)
>>  Exactly! Well said.
>
> But strings in D are not objects or pointers, they are arrays...

And arrays appear to be value types containing a 'reference'. As in, arrays themselves cannot be null, but the reference in them can be.

> And arrays are initialized to have the length zero, in the spec.
> Thus, that makes them similar to e.g. an integer that is initialized
> with a zero ?

I agree arrays are value types, as integers are.

For a null string, the length is initialised to 0.

For a "" string the length is initialised to the length of "", which happens to be 0.

For a "abc" string the length is initialised to the length of "abc", which happens to be 3.

> You will have to check if they are modified in some
> other way. Or just rely on the "string.ptr" value, since that will
> work as long as D supports calling C functions with string literals...

In C strings are pointers, and pointers can be null or point to a piece of memory which may contain a \0, so, in C there is a way to tell the 2 cases apart.

In D arrays are value types containing a pointer/reference and a length.

I firmly believe that loosing this ability for char[] would become a weakness in D, it would force me and others to resort to other methods to achieve it.

I like the current behaviour, I just want to see it doesn't change.

> But technically, there is no difference in D between "" and null.
> Which is probably why the standard library mixes them freely ?
>
> To recap:
>
> ""
>      .length = 0
>      .ptr = &'\0'
>
> null
>      .length = 0
>      .ptr = null

Yep, like I said.

>> void main()
>> {
>>   char[] emptystr = "";
>>   char[] nullstr = null;
>>    assert(emptystr == nullstr);
>>   assert(!(emptystr is nullstr));
>>    assert(emptystr.length == nullstr.length);
>>   assert(!(emptystr.ptr is nullstr.ptr));
>> }
>
> And the D standard library should probably be "fixed" to return
> null for null and "" for "" anyway, even if it not's in the spec ?

Definately. I've been saying null and "" can mean different things depending on the context, you seem to be agreeing, why are we arguing? :)

> Care to write a full unittest for it ? (at least for all of std.utf)

First we have to decide (on a per function basis) whether returning null or "" makes sense, or if in deed both make sense (for different reasons of course) i.e.

null == failed, cannot convert, malfomed?
""   == success, result really is ""

Regan
Forums