Empty VS null array? (page 8)

October 28, 2013
Re: Empty VS null array?
Posted by Regan Heath
in reply to Kagamin
Permalink
Regan Heath
Posted in reply to Kagamin
Permalink
I find that have repeated myself a lot in each section/reply below, I am not sure whether you'd prefer I just reply with those points once, or inline, I chose inline so as it make it clear I was not ignoring your points, and to make it clear which of my arguments apply to which point...

:)

On Fri, 25 Oct 2013 12:41:36 +0100, Kagamin <spam@here.lot> wrote:
> On Monday, 21 October 2013 at 10:33:01 UTC, Regan Heath wrote:
>> null strings are no different to null class references, they're not a special case.
>
> True. That's an implementation detail which has no meaning for business logic.

This argument applies both ways.  If D conflates null and empty, then this restricts business logic with an implementation detail.  We agree that D has no place in defining business logic, therefore it follows that the more flexible option is preferable as it is neutral in its effect on business logic.

However, this decision, like most is a cost/benefit analysis and in the case of strings the case can be made that they should be a value type, and never null.  I can get behind such a decision, as it would mean D was taking a side, finally.  If strings cannot be null then we actually benefit from the current conflation of the two, by avoiding having to do null reference checking, and the associated exception/crash.  I would prefer to go the other way and allow a consistent null/empty distinction but either option is better than the status quo where we have to check for null ("cost") but gain no benefit from this, because we cannot use the null state consistently.

> When implementation deviates from business logic, one ends up fixing the implementation details everywhere in order to implement business logic. That's why string.IsNullOrEmpty is used.

I almost never need to use string.IsNullOrEmpty.  The reason why is simple.  An empty string is just one value a string may hold, and my code does not "generally" treat it as special except in certain specific cases where I make that additional check (your blank username example, for one).  Null is the only "special" state a string reference can have, so I check for this and this alone.

>> People seem to have this odd idea that null is somehow an invalid state for a string /reference/ (c# strings are reference types), it's not.
>
> That's the very problem: null and empty are valid states and must be treated equally as "no data", but they can't for purely technical reasons.

I never treat null and empty "equally as "no data"" that is my whole point.  They are not the same thing conceptually, you should never treat them as the same thing.  null means "no data", empty is just one possible state of "data".

You might make the business logic decision of disallowing empty values, of treating an empty value as if no value was given.  The two would still be conceptually separate, but your code would be making the decision to treat them in the same way.  You encode this decision in the function which accesses the input, once, and your problems are all solved.

If you make the mistake of conflating null and empty in your input layer then you restrict your "business logic" and create the very problem you're complaining about here, stop conflating them and the problem simply vanishes.

If your input mechanism or a 3rd party library is conflating them, then you can add a business/conversion layer to convert empty to null and all your code can ignore the empty case and simply concentrate on checking for null, as it should already do - because this is unavoidable in any case.

This is KISS, collapse the 2 possible "error" states into 1 and check for that.

>> People also seem to elevate empty strings to some sort of special status, that's like saying 0 has some special status for int - it doesn't it's just one of a number of possible values.
>>
>> In fact, int having no null like state is a "problem" causing solutions like boxing to elevate the value type to a reference in order to allow a null state for int.
>
> You want to check ints for null everywhere too?

No. (Strawman).  There are some cases where people wrap int in nullable however as there are some use cases where you do want to be able to indicate "no data" using a single variable.  This is the flexibility of a reference type, and the cost is the check for null.  If you do cost/benefit analysis for int with this in mind it is clearly not a type we want as a reference type - the performance penalty alone kills this.

>> Yet, in D we've decided to inconsistently remove that functionality from string for no gain.  If string could not actually be null then we'd gain something from the limitation, instead we lose functionality and gain nothing - you still have to check your strings for null in D.
>
> Huh? Null slices work just like empty ones - that's why this topic was started in the first place. One doesn't have to check slices for nulls, only for length.

Slices are not strings, as slices cannot be null.  However "if (slice is null)" can still be true - this is just plain wrong/inconsistent.  Lets pick a side and handle it consistently, above all else.  We can argue about which side, but can we at least agree the inconsistency is a bad thing?

> If you want clear nullable semantics, you have Nullable, it works for everything, including strings and ints. You would want this feature only in rare cases, so it doesn't make sense to make it default, or it will be a nuisance.

Strings can be null, not checking for null is fatal.  You cannot easily tell if you have a string or a slice so you currently have to check for null in most/all cases already.  We're paying that "cost" already and yet not getting the full benefit from it.  It's simply a bad investment.  D should pick a side and conform to it, either we have nullable strings or we don't.  The current middle ground is just worse.

>>> both of them are just "no data", so you end up typing if(string.IsNullOrEmpty(mystr)) every time everywhere.
>>
>> I only have to code like this when I use 3rd party code which has conflated empty and null.  In my code when it's null it means not specified, and empty is just one type of value - for which I do no special handling.
>
> Equivalence between null and empty is a business logic's requirement, that's why it's done.

Whose business logic?  This is perhaps my secondary point here.  D has no grounds to define business logic for all possible applications, this is something each application must have the flexibility to define for itself.  A library ought to provide the tools to do it - converting "" to null for you - but the language should not mandate it.

>>> And, yeah, only one small feature in this big mess ever needs to differentiate between null and empty.
>>
>> Untrue, null allows many alternate and IMO more direct/obvious designs.
>
> The need for those designs is rare and trivially implementable for all value types.

Rare; untrue, I use null all the time to good effect.  Trivially implementable, debatable - if you have to do more work you're paying a price, if you get no reward for that price then you're wasting resources.  The current situation in D has you paying the price for no reward.

>>> I found this one case trivially implementable, but nulls still plague all remaining code.
>>
>> Which one case?  The readline() one below?
>
> No, it was an authentication system in third-party code for one special case.

No-one is trying to say you cannot code around it, even trivially in some cases, but the null design would likely have been simpler still.  And, this means less wasted effort, and worse still it gained you nothing.

> I also had to specify this null value in app.config - guess how, explicitly specify, not substitute missing parameter with a default.

Seems to me that if you want a config to be null, you simply omit it from the configuration file.  Then have the code return null for it's value, to indicate "no data".  If it's present, and set to "" then you would be able to differentiate these two cases, which is essential if your business logic requires that "" is a valid value for the config.  D should not place restrictions on you business logic - with an implementation detail.

> Another possibility for readline is to return a tuple
> {bool eof, string line(non-null)} - this way you have easy check for eof and don't have to check for null when you don't need it.

Yet another more complex design, for no gain.  The additional boolean buys us nothing over the string reference, it costs more in terms of memory and complexity and you still have to remember to check it, as you have to remember to check for null in the original design.

>>> you're screwed or your code becomes littered with null checks, but who accounts for all alternative scenarios from the start?
>>
>> Me, and IMO any competent programmer.  It is misguided to think you can ignore valid states, null is a valid state in C, C++, C#, and D.. You should be thinking about and handling it.
>
> Here null is a valid state for readline, not for the caller: if the caller parses a multiline data format, unexpected end of file is an invalid state.

If they pass a multi-line data format, and they have counted the number of lines prior to passing it (to verify that they can call readline() N times safely) then yes, calling readline and getting EOF would be unexpected and worthy of an exception.

But, why would you want to pay the cost of processing the lines twice (to count them and ensure no EOF)?  Why not just have readline do that for you, by returning null on EOF.  Simpler, more direct.

> And what do you gain by littering your code with those null checks? Just making runtime happy and adding noise to the code? You could use that time to improve the code or add features or even relax. It's exactly nullable strings, which gain you only a time waste.

I D, you already have to "litter your code with null checks" so you're already paying the cost, you're just not getting any benefit.

>> You don't have to check for it on every access to the variable, but you do need to check for it once where the variable is assigned, or passed (in private functions you can skip this).  From that point onward you can assume non-null, valid, job done.
>
> You just said "never assume". The assumption may fail, because the string type is still nullable, compiler doesn't save you here, this sucks. And in order to check for everything everywhere on a level near that of the compiler, you must be not just competent, but perfect.

Play on words.  If you've filtered out null, you're not "assuming" you're "ensuring" it's non-null.  The only way to get null from that point is either "by design" or via memory corruption.  D does protect you from memory corruption by avoiding the need for raw pointers etc.  And, if you're setting string variables to null "by design" then you will need to check them again, of course.

Yes, if you want to write good code you need to develop good habits WRT using null, it's unavoidable.  Unless we remove null and the power/flexibility it affords - which is a valid option.  So, can we just pick an option for D and go with it, I don't really mind which way we go - tho my preference should be obvious :)

>>> I believe there's no problem domain, which would like to differentiate between null and empty string instead of treating them as "no data".
>>
>> null means not specified, non existent, was not there.
>> empty means, present but set to empty/blank.
>>
>> Databases have this distinction for a reason.
>
> Oracle makes no distinction between null and empty string. For a reason?

Looks like it was (ultimately) a mistake:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements005.htm

<quote>Note:
Oracle Database currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.</quote>

To repeat the important part.. "Oracle recommends that you do not treat empty strings the same as nulls".

For. A. Reason.  The database has no right to define business logic - this restriction in oracle database has no doubt caused people to have to work around it, by using a specific "value" as null.

> A database is an implementation detail of a data storage, it doesn't implement business logic

Agree 100% conflating null and empty string is a business logic decision, it has no place in a database or other base level - like a language or standard library.

>> If you get input from a user a field called "foo" may be:
>>  - not specified
>>  - specified
>>
>> and if specified, may be:
>>  - empty
>>  - not empty
>
> If the user doesn't fill a text box, it's both empty and not specified - there's just no difference.

There is a clear and important difference.  Lets say the text box represents the users middle name, lets presume they have given a value for it at some stage, lets assume they would like to remove it.  They load the page, and erase the value and click submit.  Your business logic will ignore the empty value, and not update the users middle name.  My business logic will detect the text box was present (not null) and apply the given value "" to the users middle name (in the database for example).

> And it doesn't matter how you store it in the database - as null or as empty string - both are presented in the same way.

They don't have to be, that is my point.  The decision of how to display them is a business logic decision and having a clear distinction between null and empty allows you to display them differently.  Not having the distinction, ties your hands.

> Heck, we use these optional text boxes everywhere - can you tell if their content is empty or not specified?

http is one such input mechanism which conflates null and empty, there are numerous ways to code around it.  D is making the same mistake, with the same consequences, this is my central point.

> And what if the value is required? Would you accept an empty value?

This is a business logic decision, which D, and the database have no right to make.  Yes, if the user could input an empty value and yes if my business logic wanted to detect and disallow it - I would.  If not, I would not.  The point is that null gives you the power to express both, rather than restricting you and forcing an indirect solution to code around the lack.

>> If we have null, lets use it, if we want to remove null the lets remove it, but can we get out of this horrid middle ground please.
>
> *sigh* people just don't buy the KISS principle...

No kidding.  From my perspective null /is/ KISS and having to code around the lack with a more complex design is not.  :P

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
Forums