February 13, 2005 Re: toUTFxx returns null references | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | On Fri, 11 Feb 2005 17:54:45 +0100, Anders F Björklund <afb@algonet.se> wrote: > Derek Parnell wrote: > >> There is *no* difference in D, between null and the empty string. >>> >>> There is a difference, internally, but D treats them the same. Which is probably what you meant, but I'm just being thourough. :) > > More or less, yes. But that's more of an Implementation Quirk™. Which worries me because I believe there is a real need to tell them apart. So, I ask that this behaviour be specified, or another method to achieve the same thing be specified. > The D specification explicitly says: > > http://www.digitalmars.com/d/arrays.html >> Array Initialization >> * Dynamic arrays are initialized to having 0 elements. > > http://www.digitalmars.com/d/cppstrings.html >> Checking For Empty Strings >> >> In D, an empty string is just null: >> char[] str; >> if (!str) >> // string is empty > > But in practice, they do differ - in the ptr to the '\0' (for C). > (but both has a length property of 0, though, as mentioned earlier) Sure, exactly what I said. > And when you copy the char[], this ptr settings follows as well... > This means that there is a way to trace if it has been set to "". Yep, I want this behaviour to be specified. (or some other method to achieve what I want) >>> A null string has ptr == null, an empty string has ptr == "". >>> >>> In some instances it is crucial to be able to tell these cases apart: >>> 1- value does not exist (null) >>> 2- value is blank (empty string) >> Exactly! Well said. > > But strings in D are not objects or pointers, they are arrays... And arrays appear to be value types containing a 'reference'. As in, arrays themselves cannot be null, but the reference in them can be. > And arrays are initialized to have the length zero, in the spec. > Thus, that makes them similar to e.g. an integer that is initialized > with a zero ? I agree arrays are value types, as integers are. For a null string, the length is initialised to 0. For a "" string the length is initialised to the length of "", which happens to be 0. For a "abc" string the length is initialised to the length of "abc", which happens to be 3. > You will have to check if they are modified in some > other way. Or just rely on the "string.ptr" value, since that will > work as long as D supports calling C functions with string literals... In C strings are pointers, and pointers can be null or point to a piece of memory which may contain a \0, so, in C there is a way to tell the 2 cases apart. In D arrays are value types containing a pointer/reference and a length. I firmly believe that loosing this ability for char[] would become a weakness in D, it would force me and others to resort to other methods to achieve it. I like the current behaviour, I just want to see it doesn't change. > But technically, there is no difference in D between "" and null. > Which is probably why the standard library mixes them freely ? > > To recap: > > "" > .length = 0 > .ptr = &'\0' > > null > .length = 0 > .ptr = null Yep, like I said. >> void main() >> { >> char[] emptystr = ""; >> char[] nullstr = null; >> assert(emptystr == nullstr); >> assert(!(emptystr is nullstr)); >> assert(emptystr.length == nullstr.length); >> assert(!(emptystr.ptr is nullstr.ptr)); >> } > > And the D standard library should probably be "fixed" to return > null for null and "" for "" anyway, even if it not's in the spec ? Definately. I've been saying null and "" can mean different things depending on the context, you seem to be agreeing, why are we arguing? :) > Care to write a full unittest for it ? (at least for all of std.utf) First we have to decide (on a per function basis) whether returning null or "" makes sense, or if in deed both make sense (for different reasons of course) i.e. null == failed, cannot convert, malfomed? "" == success, result really is "" Regan |
Copyright © 1999-2021 by the D Language Foundation