September 27, 2009
Steven Schveighoffer:

>    Build the non-null requirement into the function signature (note, the
> requirement is optional, it's still possible to use null references if you
> want).
> 
>    Pros: Easy to implement, Compile-time error, hard to "work around" by
> putting a dummy value, sometimes no performance hit, most times very
> little performance hit, allows solution 1 and 2 if you want, runtime
> errors occur AT THE POINT things went wrong not later.
>    Cons: Non-zero performance hit (you have to check for null sometimes
> before assignment!)

To implement it well (and I think it has to be implemented well) it's not so easy to implement. You have to face the problem I've discussed about about multiple object initializations inside various ifs.
Also see what downs and I have said regarding arrays of nonnullables.

Among the cons you also have to consider that there's a little more complexity in the language (two different kinds of references, and such things must also be explained in the docs and understood by novice D programmers. It's not a common feature, so they have to learn it).

Another thing to add to the cons is that every layer of compile-time constraints you add to a language they also add a little amount of rigidity that has a cost (because you have to add ? and you sometimes may need casts to break such rigidity). Dynamic languages show that constraints have a cost.

Bye,
bearophile
September 27, 2009
Walter Bright wrote:
> Jeremie Pelletier wrote:
>> This may be a good time to ask about how these variables which can be declared anywhere in the function scope are implemented.
>>
>> void bar(bool foo) {
>>     if(foo) {
>>         int a = 1;
>>         ...
>>     }
>>     else {
>>         int a = 2;
>>         ...
>>     }
>>
>> }
>>
>> is the stack frame using two ints, or is the compiler seeing only one? I never bothered to check it out and just declared 'int a = void;' at the beginning of the routine to keep the stack frames as small as possible.
> 
> They are completely independent variables. One may get assigned to a register, and not the other.

Ok, that's what I thought, so the good old C way of declaring variables at the top is not a bad thing yet :)
September 27, 2009
downs wrote:
> Jeremie Pelletier wrote:
>> Christopher Wright wrote:
>>> Jeremie Pelletier wrote:
>>>> What if using 'Object obj;' raises a warning "unitialized variable"
>>>> and makes everyone wanting non-null references happy, and 'Object obj
>>>> = null;' raises no warning and makes everyone wanting to keep the
>>>> current system (all two of us!) happy.
>>>>
>>>> I believe it's a fair compromise.
>>> It's a large improvement, but only for local variables. If your
>>> segfault has to do with a local variable, unless your function is
>>> monstrously large, it should be easy to fix, without changing the type
>>> system.
>>>
>>> The larger use case is when you have an aggregate member that cannot
>>> be null. This can be solved via contracts, but they are tedious to
>>> write and ubiquitous.
>> But how would you enforce a nonnull type over an aggregate in the first
>> place? If you can, you could also apply the same initializer semantics I
>> suggested earlier.
>>
>> Look at this for example:
>>
>> struct A {
>>     Object cannotBeNull;
>> }
>>
>> void main() {
>>     A* a = new A;
>> }
>>
>> Memory gets initialized to zero, and you have a broken non-null type.
>> You could have the compiler throw an error here, but the compiler cannot
>> possibly know about all data creation methods such as malloc, calloc or
>> any other external allocator.
>>
>> You could even do something like:
>>
>> Object* foo = calloc(Object.sizeof);
>>
>> and the compiler would let you dereference foo resulting in yet another
>> broken nonnull variable.
>>
>> Non-nulls are a cute idea when you have a type system that is much
>> stricter than D's, but there are just way too many workarounds to make
>> it crash in D.
> 
> "Here are some cases you haven't mentioned yet. This proves that the compiler can't possibly be smart enough. "
> 
> Yeeeeeah.

I allocate most structs on the gc, unless I need them only for the scope of a function (that includes RVO). All objects are on the gc already, so it's a pretty major case. The argument was to protect aggregate fields, I'm just pointing out that their usage usually is preventing an easy implementation. I'm not saying its impossible.

Besides, what I said was, if its possible to enforce these fields to be null/non-null, you can enforce them to be properly initialized in such case, making nulls/non-nulls nearly useless.

> In the above case, why not implicitly put the cannotBeNull check into the struct invariant? That's where it belongs, imho.

Exactly, what's the need for null/non-null types then?

> Regarding your example, it's calloc(size_t.sizeof). And a) we probably can't catch that case except with in/out null checks on every method, but then again, how often have you done that? I don't think it's relevant enough to be relevant to this thread. :)

Actually, sizeof currently returns the size of the reference, so its always going to be the same as size_t.sizeof.
September 27, 2009
On 27/09/2009 17:51, bearophile wrote:
> Steven Schveighoffer:
>
>> Build the non-null requirement into the function signature (note,
>> the requirement is optional, it's still possible to use null
>> references if you want).
>>
>> Pros: Easy to implement, Compile-time error, hard to "work around"
>> by putting a dummy value, sometimes no performance hit, most times
>> very little performance hit, allows solution 1 and 2 if you want,
>> runtime errors occur AT THE POINT things went wrong not later.
>> Cons: Non-zero performance hit (you have to check for null
>> sometimes before assignment!)
>
> To implement it well (and I think it has to be implemented well) it's
> not so easy to implement. You have to face the problem I've discussed
> about about multiple object initializations inside various ifs. Also
> see what downs and I have said regarding arrays of nonnullables.

I don't accept this argument about nested if statements. D has a procedural "if" statement. Of course it doesn't mesh together with non-nullable references, you're trying to fit a square peg in a round hole.
the solution is to write a more functional style code. if D ever implements true tuples that would be a perfect use case for them.

(T1 t1, T2 t2) = init();
t1.foo;
t2.bar;

>
> Among the cons you also have to consider that there's a little more
> complexity in the language (two different kinds of references, and
> such things must also be explained in the docs and understood by
> novice D programmers. It's not a common feature, so they have to
> learn it).

that's true. Not only this needs to be taught and pointed out to newbies it should also be encouraged as the D way so that it will be used by default.
>
> Another thing to add to the cons is that every layer of compile-time
> constraints you add to a language they also add a little amount of
> rigidity that has a cost (because you have to add ? and you sometimes
> may need casts to break such rigidity). Dynamic languages show that
> constraints have a cost.
>
> Bye, bearophile

September 27, 2009
language_fan wrote:
> Sun, 27 Sep 2009 00:08:50 -0400, Jeremie Pelletier thusly wrote:
> 
>> Ary Borenszweig wrote:
>>> Just out of curiosity: have you ever programmed in Java or C#?
>> Nope, never got interested in these to tell the truth. I only did C,
>> C++, D and x86 assembly in systems programming, I have quite a
>> background in PHP and JavaScript also.
> 
> So you only know imperative procedural programming + some features of hybrid OOP languages that are not even proper OOP languages.

This is what I know best, yeah. I did a lot of work in functional programming too, but not enough to add them to the above list.

What is proper OOP anyways? It's a feature offered by the language, not a critical design that must obey to some strict standard rules.  Be it class based or prototype based, supporting single or multiple inheritance, using abstract base classes or interfaces, having funny syntax for ctors and whatnot or using the class name or even 'this', its still OOP. If you wan't to call me on not knowing 15 languages like you do, I have to call you on not knowing the differences in OOP models.

>> I played with a lot of languages, but those are the ones I use on a
>> daily basis. I would like to get into Python or Ruby someday, I only
>> hear good things about these two. I know LUA has less overhead than
>> Python
> 
> Oh, the only difference between LUA and Python is the overhead?! That's a... pretty performance oriented view on languages.

Yes, I have a performance oriented view, I write a lot of real time code, and I hate unresponsive code in general. Now I didn't say it was the only difference, what I said is that it's one influencing a lot companies and people to pick LUA over Python for scripting.

>> I like extremes :)
> 
> If you like extremes, why have you not programming in Haskell or Coq? Too scary? You are often arguing against languages and concepts you have never used. The other people here who make these suggestions are more experienced with various languages.

I meant extremes as in full machine control / no control whatsoever, not in language semantics :)

I just haven't found a use for Haskell or Coq for what I do yet.
September 27, 2009
Hello downs,

> PS: You can't convert segfaults into exceptions under Linux, as far as
> I know.
> 

Last I checked, throwing from a signal handler works on linux.


September 27, 2009
On Sun, 27 Sep 2009 10:10:19 -0400, Nick Sabalausky wrote:

> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:h9n3k5$2eu9$1@digitalmars.com...
>> Jason House wrote:
>>>> Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.
>>>
>>> I reject this argument too :( To me, code isn't safe if it crashes.
>>
>> Well, we can't discuss this if we cannot agree on terms. The conventional definition of memory safe means no memory corruption.
> 
> He keeps saying "safe", and every time he does you turn it into "memory safe". If he meant "memory safe" he probably would have said something like "memory safe". He already made it perfectly clear he's talking about crashes, so continuing to put the words "memory safe" into his mouth doesn't help the discussion.

The thing is that memory safety is the only safety with code. In Walter's examples he very clearly showed that a crash is not unsafe, but operating with incorrect values is. He has pointed out that if initialization is enforced, whether with a default or by coder, there is a good chance it will be initialized to the wrong value.

Now if you really want to throw some sticks into the spokes, you would say that if the program crashes due to a null pointer, it is still likely that the programmer will just initialize/set the value to a "default" that still isn't valid just to get the program to continue to run.
September 27, 2009
Sun, 27 Sep 2009 16:47:51 +0000, Jesse Phillips thusly wrote:

> The thing is that memory safety is the only safety with code. In Walter's examples he very clearly showed that a crash is not unsafe, but operating with incorrect values is. He has pointed out that if initialization is enforced, whether with a default or by coder, there is a good chance it will be initialized to the wrong value.

Have you ever used functional languages? When you develop in Haskell or SML, how often you feel there is a good change something will be initialized to the wrong value? Can you show some statistics that show how unsafe this practice is?

When the non-nullability is made optional, you *only* use it when you really know the initialization has a sane value, ok? Otherwise you can use the good old nullable references, right?


> Now if you really want to throw some sticks into the spokes, you would say that if the program crashes due to a null pointer, it is still likely that the programmer will just initialize/set the value to a "default" that still isn't valid just to get the program to continue to run.

Why should it crash in the first place? I hate crashes. You liek them? I can prove by structural induction that you do not like them when you can avoid crashes with static checking.
September 27, 2009
Hello Walter,

> The only reasonable thing a program can do if it discovers it is in an
> unknown state is to stop immediately.
> 

This whole thread is NOT about what to do on unknown states. It is about using the compiler to statically remove the possibility of one type of unknown state ever happening.

If D were to get non-null by default, with optional nullable, then without ASM/union hacks or the like, you can only get a seg-v when you use the non-default nullable type.

Given the above (and assuming memory safety), the only possible wrong-data-error left would be where the programmer explicitly places the wrong value in a variable. In my book, that is a non-starter because 1) it can happen now 2) it can happen anywhere, not just at initialization 3) it can't be detected and 4) (assuming a well done syntax) in the cases where the compiler can't validate the code, the lazy thing to do and the correct thing to do (use a nullable type) will be the same.


September 27, 2009
On Sun, 27 Sep 2009 11:51:27 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Steven Schveighoffer:
>
>>    Build the non-null requirement into the function signature (note, the
>> requirement is optional, it's still possible to use null references if you
>> want).
>>
>>    Pros: Easy to implement, Compile-time error, hard to "work around" by
>> putting a dummy value, sometimes no performance hit, most times very
>> little performance hit, allows solution 1 and 2 if you want, runtime
>> errors occur AT THE POINT things went wrong not later.
>>    Cons: Non-zero performance hit (you have to check for null sometimes
>> before assignment!)
>
> To implement it well (and I think it has to be implemented well) it's not so easy to implement. You have to face the problem I've discussed about about multiple object initializations inside various ifs.

I think you are referring to a combination of this solution and flow analysis?  I didn't mention that solution, but it is possible.  I agree it would be more complicated, but I did say that as a con for flow analysis.

> Also see what downs and I have said regarding arrays of nonnullables.

Yes, arrays of non-nullables will be more cumbersome, I should add that as a con.  Thanks.

>
> Among the cons you also have to consider that there's a little more complexity in the language (two different kinds of references, and such things must also be explained in the docs and understood by novice D programmers. It's not a common feature, so they have to learn it).

It's not a common feature, but in practice, one doesn't usually need nullable types for most cases, it's only certain cases where it's needed.

For example, no extra docs are needed for:

auto a = new A(); // works, non-nullable is fine

And maybe even for:

A a; // error, must assign non-null  value

because that's a common feature of compilers.

It's similar in my view to shared.  Shared adds a level of complexity that needs to be understood if you want to use shared variables, but most of the time, your variables are not shared, so no extra thought is required.

> Another thing to add to the cons is that every layer of compile-time constraints you add to a language they also add a little amount of rigidity that has a cost (because you have to add ? and you sometimes may need casts to break such rigidity). Dynamic languages show that constraints have a cost.

The cost needs to be weighed against the cost of the alternatives.  I think all the solutions have a cost.  Dynamic languages have a cost too.  I've been developing in php lately, and I don't know how many times I had a bug that I slightly mis-typed a variable name, which still was valid code because the language thought I was just declaring a new variable :)  And to get the IDE to recognize types, I sometimes have to put in a line like this:

// uncomment for autocomplete
// x = new ClassType(); printf("Error, please remove line %d\n", __LINE__); throw new Exception();

Which I comment out when I'm running, but I uncomment to have the IDE recognize that x is a ClassType (for autocomplete).

I think if there was a solution that cost nothing, it would be the clear winner.

-Steve