September 27, 2009
Andrei Alexandrescu wrote:
> downs wrote:
>> Walter Bright wrote:
>>> Nick Sabalausky wrote:
>>>
>>> I agree with you that if the compiler can detect null dereferences at
>>> compile time, it should.
>>>
>>>
>>>>> Also, by "safe" I presume you mean "memory safe" which means free of
>>>>> memory corruption. Null pointer exceptions are memory safe. A null
>>>>> pointer could be caused by memory corruption, but it cannot *cause*
>>>>> memory corruption.
>>>> No, he's using the real meaning of "safe", not the
>>>> misleadingly-limited "SafeD" version of "safe" (which I'm still
>>>> convinced is going to get some poor soul into serious trouble from
>>>> mistakingly thinking their SafeD program is much safer than it really
>>>> is). Out here in reality, "safe" also means a lack of ability to
>>>> crash, or at least some level of protection against it. 
>>> Memory safety is something that can be guaranteed (presuming the
>>> compiler is correctly implemented). There is no way to guarantee that a
>>> non-trivial program cannot crash. It's the old halting problem.
>>>
>>
>> Okay, I'm gonna have to call you out on this one because it's simply incorrect.
>>
>> The halting problem deals with a valid program state - halting.
>>
>> We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program.
>>
>> Why do crashes have to be allowed? They're not an allowed instruction!
>>
>> A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.
>>
>>>> You seem to be under the impression that nothing can be made
>>>> uncrashable without introducing the possibility of corrupted state.
>>>> That's hogwash.
>>> I read that statement several times and I still don't understand what it
>>> means.
>>>
>>> BTW, hardware null pointer checking is a safety feature, just like array
>>> bounds checking is.
>>
>> PS: You can't convert segfaults into exceptions under Linux, as far as I know.
> 
> How did Jeremie do that?
> 
> Andrei

A signal handler with the undocumented kernel parameters attaches the signal context to the exception object, repairs the stack frame forged by the kernel to make us believe we called the handler ourselves, does a backtrace right away and attaches it to the exception object, and then throw it.

The error handling code will unwind down to the runtime's main() where a catch clause is waiting for any Throwables, sending them back into the unhandled exception handler, and a crash window appears with the backtrace, all finally blocks executed, and gracefully shutting down.

All I need to do is an ELF/DWARF reader to extract symbolic debug info under linux, its already working for PE/CodeView on windows.

Jeremie
September 27, 2009
Jesse Phillips:

>The thing is that memory safety is the only safety with code.<

Nope. For example in Delphi and C# you can have a runtime integer overflow errors. That's another kind of safety.
If you look at safety-critical code, the one Walter was talking about, you see people test code (and compile time) very well, looking for an enormous amount of possible errors. Doing this increases code safety. So you can have ABS brakes, TAC machine in hospitals, automatic pilots and so on.

Bye,
bearophile
September 27, 2009
Sun, 27 Sep 2009 12:35:23 -0400, Jeremie Pelletier thusly wrote:

> language_fan wrote:
>> Sun, 27 Sep 2009 00:08:50 -0400, Jeremie Pelletier thusly wrote:
>> 
>>> Ary Borenszweig wrote:
>>>> Just out of curiosity: have you ever programmed in Java or C#?
>>> Nope, never got interested in these to tell the truth. I only did C, C++, D and x86 assembly in systems programming, I have quite a background in PHP and JavaScript also.
>> 
>> So you only know imperative procedural programming + some features of hybrid OOP languages that are not even proper OOP languages.
> 
> This is what I know best, yeah. I did a lot of work in functional programming too, but not enough to add them to the above list.
> 
> What is proper OOP anyways? It's a feature offered by the language, not a critical design that must obey to some strict standard rules.  Be it class based or prototype based, supporting single or multiple inheritance, using abstract base classes or interfaces, having funny syntax for ctors and whatnot or using the class name or even 'this', its still OOP. If you wan't to call me on not knowing 15 languages like you do, I have to call you on not knowing the differences in OOP models.

I must say I have not studied languages that much, only the concepts and theory - starting from formal definitions like operational or denotational semantics, and some more informal ones. I can professionally write code in only about half a dozen languages, but learning new ones is trivial if the task requires it.

Generally the common thing for proper pure OOP languages is 'everything is an object' mentality. Because of this property there is no strict distinction between primitive non-OOP types and OOP types in pure OOP languages. In some languages e.g. number values are objects. In others there are no static members and even classes are objects, so called meta- objects. In some way you can see this purity even in UML. If we go into details, various OOP languages have major differences in their semantics.

What I meant above is that I know a lot of developers who have a similar background as you do. It is really easy to use all of those languages without actually using the OOP features in them, at least properly (for instance PHP does not even have a real OOP system, it is a cheap rip-off of mainstream languages - just look at the scoping rules). I have seen Java code where the developer never constructs new objects and only uses static methods because he fears the heap allocation is expensive. Discussing OOP and language concepts is really hard if you lack the theoretical underpinning. It is sad to say this but the best source for this knowledge are academic CS books, but nowadays even wikipedia is starting to have good articles on the subject.
September 27, 2009
language_fan wrote:
> Sun, 27 Sep 2009 12:35:23 -0400, Jeremie Pelletier thusly wrote:
> 
>> language_fan wrote:
>>> Sun, 27 Sep 2009 00:08:50 -0400, Jeremie Pelletier thusly wrote:
>>>
>>>> Ary Borenszweig wrote:
>>>>> Just out of curiosity: have you ever programmed in Java or C#?
>>>> Nope, never got interested in these to tell the truth. I only did C,
>>>> C++, D and x86 assembly in systems programming, I have quite a
>>>> background in PHP and JavaScript also.
>>> So you only know imperative procedural programming + some features of
>>> hybrid OOP languages that are not even proper OOP languages.
>> This is what I know best, yeah. I did a lot of work in functional
>> programming too, but not enough to add them to the above list.
>>
>> What is proper OOP anyways? It's a feature offered by the language, not
>> a critical design that must obey to some strict standard rules.  Be it
>> class based or prototype based, supporting single or multiple
>> inheritance, using abstract base classes or interfaces, having funny
>> syntax for ctors and whatnot or using the class name or even 'this', its
>> still OOP. If you wan't to call me on not knowing 15 languages like you
>> do, I have to call you on not knowing the differences in OOP models.
> 
> I must say I have not studied languages that much, only the concepts and theory - starting from formal definitions like operational or denotational semantics, and some more informal ones. I can professionally write code in only about half a dozen languages, but learning new ones is trivial if the task requires it.
> 
> Generally the common thing for proper pure OOP languages is 'everything is an object' mentality. Because of this property there is no strict distinction between primitive non-OOP types and OOP types in pure OOP languages. In some languages e.g. number values are objects. In others there are no static members and even classes are objects, so called meta-
> objects. In some way you can see this purity even in UML. If we go into details, various OOP languages have major differences in their semantics.
> 
> What I meant above is that I know a lot of developers who have a similar background as you do. It is really easy to use all of those languages without actually using the OOP features in them, at least properly (for instance PHP does not even have a real OOP system, it is a cheap rip-off of mainstream languages - just look at the scoping rules). I have seen Java code where the developer never constructs new objects and only uses static methods because he fears the heap allocation is expensive. Discussing OOP and language concepts is really hard if you lack the theoretical underpinning. It is sad to say this but the best source for this knowledge are academic CS books, but nowadays even wikipedia is starting to have good articles on the subject.

I agree, Wikipedia is often the first source I check to learn on different concepts, then I search for online papers and documentation, dig into source code (Google's code search is a gem), and finally books.

I'm not most programmers, and I'm sure you aren't either. I like to learn as much of the semantics and implementation details behind a language as I can, only then do I feel I know the language, I like to make the best out of everything in the languages I use, not specialize in a subset of it.

I don't believe in a perfect programming model, I believe in many different models each having their pros and cons that can live in the same language forming an all-around solution. That's why I usually stay away from 'pure' languages because they impose a single point of view of the world, that doesn't mean its a bad one, I just like to look at the world from different angles at the same time.
September 27, 2009
Michel Fortin wrote:
> On 2009-09-27 07:38:59 -0400, Christopher Wright <dhasenan@gmail.com> said:
> 
>> I dislike these forced checks.
>>
>> Let's say you're dealing with a compiler frontend. You have a semantic node that just went through some semantic pass and is guaranteed, by flow control and contracts, to have a certain property initialized that was not initialized prior to that point.
>>
>> The programmer knows the value isn't null. The compiler shouldn't force checks. At most, it should have automated checks that disappear with -release.
> 
> If the programmer knows a value isn't null, why not put the value in a nullable-reference in the first place?

It may not be nonnull for the entire lifetime of the reference.

>> Also, it introduces more nesting.
> 
> Yes and no. It introduces an "if" statement for null checking, but only for nullable references. If you know your reference can't be null it should be non-nullable, and then you don't need to check.

I much prefer explicit null checks than implicit ones I can't control.

>> Also, unless the compiler's flow analysis is great, it's a nuisance -- you can see that the error is bogus and have to insert extra checks.
> 
> First you're right, if the feature is implemented it should be well implemented. Second, if in a few place you don't want an "if" clause, you can always cast your nullable reference to a non-nullable one, explicitly bypassing the safeties. If you write a cast, you are making a consious decision of not checking for null, which is much better than the current situation where it's very easy to forget to check for null.

That's just adding useless verbosity to the language.

>> It should be fine to provide a requireNotNull template and leave it at that.
> 
> It's fine to have such a template. But it's not nearly as useful.

It definitely is, the whole point is about reference initializations, not what they can or can't initialize to.

What about non-nan floats? Or non-invalid characters? I fear nonnull references are a first step in the wrong direction. The focus should be about implementing variable initialization checks to the compiler, since this solves the issue with any variable, not just references. The flow analysis can also be reused for many other optimizations.

September 27, 2009
Rainer Deyke wrote:
> OT, but declaring the variable at the top of the function increases
> stack size.
> 
> Example with changed variable names:
> 
>   void bar(bool foo) {
>     if (foo) {
>       int a = 1;
>     } else {
>       int b = 2;
>     }
>     int c = 3;
>   }
> 
> In this example, there are clearly three different (and differently
> named) variables, but their lifetimes do not overlap.  Only one variable
> can exist at a time, therefore the compiler only needs to allocate space
> for one variable.  Now, if you move your declaration to the top:
> 
>   void bar(bool foo) {
>     int a = void;
>     if (foo) {
>       a = 1;
>     } else {
>       a = 2; // Reuse variable.
>     }
>     int c = 3;
>   }
> 
> You now only have two variables, but both of them coexist at the end of
> the function.  Unless the compiler applies a clever optimization, the
> compiler is now forced to allocate space for two variables on the stack.

Not necessarily. The optimizer uses a technique called "live range analysis" to determine if two variables have non-overlapping ranges. It uses this for register assignment, but it could just as well be used for minimizing stack usage.
September 27, 2009
On Sun, Sep 27, 2009 at 2:07 PM, Jeremie Pelletier <jeremiep@gmail.com> wrote:

>> Yes and no. It introduces an "if" statement for null checking, but only for nullable references. If you know your reference can't be null it should be non-nullable, and then you don't need to check.
>
> I much prefer explicit null checks than implicit ones I can't control.

Nonnull types do not create implicit null checks. Nonnull types DO NOT need to be checked. And nullable types WOULD force explicit null checks.

> What about non-nan floats? Or non-invalid characters? I fear nonnull references are a first step in the wrong direction. The focus should be about implementing variable initialization checks to the compiler, since this solves the issue with any variable, not just references. The flow analysis can also be reused for many other optimizations.

hash_t foo(Object o) { return o.toHash(); }
foo(null); // bamf, I just killed your function.

Forcing initialization of locals does NOT solve all the problems that nonnull references would.
September 27, 2009
On 27/09/2009 19:29, Jeremie Pelletier wrote:
> Andrei Alexandrescu wrote:
>> downs wrote:
>>> Walter Bright wrote:
>>>> Nick Sabalausky wrote:
>>>>
>>>> I agree with you that if the compiler can detect null dereferences at
>>>> compile time, it should.
>>>>
>>>>
>>>>>> Also, by "safe" I presume you mean "memory safe" which means free of
>>>>>> memory corruption. Null pointer exceptions are memory safe. A null
>>>>>> pointer could be caused by memory corruption, but it cannot *cause*
>>>>>> memory corruption.
>>>>> No, he's using the real meaning of "safe", not the
>>>>> misleadingly-limited "SafeD" version of "safe" (which I'm still
>>>>> convinced is going to get some poor soul into serious trouble from
>>>>> mistakingly thinking their SafeD program is much safer than it really
>>>>> is). Out here in reality, "safe" also means a lack of ability to
>>>>> crash, or at least some level of protection against it.
>>>> Memory safety is something that can be guaranteed (presuming the
>>>> compiler is correctly implemented). There is no way to guarantee that a
>>>> non-trivial program cannot crash. It's the old halting problem.
>>>>
>>>
>>> Okay, I'm gonna have to call you out on this one because it's simply
>>> incorrect.
>>>
>>> The halting problem deals with a valid program state - halting.
>>>
>>> We cannot check if every program halts because halting is an
>>> instruction that must be allowed at almost any point in the program.
>>>
>>> Why do crashes have to be allowed? They're not an allowed instruction!
>>>
>>> A compiler can be turing complete and still not allow crashes. There
>>> is nothing wrong with this, and it has *nothing* to do with the
>>> halting problem.
>>>
>>>>> You seem to be under the impression that nothing can be made
>>>>> uncrashable without introducing the possibility of corrupted state.
>>>>> That's hogwash.
>>>> I read that statement several times and I still don't understand
>>>> what it
>>>> means.
>>>>
>>>> BTW, hardware null pointer checking is a safety feature, just like
>>>> array
>>>> bounds checking is.
>>>
>>> PS: You can't convert segfaults into exceptions under Linux, as far
>>> as I know.
>>
>> How did Jeremie do that?
>>
>> Andrei
>
> A signal handler with the undocumented kernel parameters attaches the
> signal context to the exception object, repairs the stack frame forged
> by the kernel to make us believe we called the handler ourselves, does a
> backtrace right away and attaches it to the exception object, and then
> throw it.
>
> The error handling code will unwind down to the runtime's main() where a
> catch clause is waiting for any Throwables, sending them back into the
> unhandled exception handler, and a crash window appears with the
> backtrace, all finally blocks executed, and gracefully shutting down.
>
> All I need to do is an ELF/DWARF reader to extract symbolic debug info
> under linux, its already working for PE/CodeView on windows.
>
> Jeremie

Is this Linux specific? what about other *nix systems, like BSD and solaris?
September 27, 2009
Walter Bright wrote:
>>   void bar(bool foo) {
>>     int a = void;
>>     if (foo) {
>>       a = 1;
>>     } else {
>>       a = 2; // Reuse variable.
>>     }
>>     int c = 3;
>>   }
>>
>> You now only have two variables, but both of them coexist at the end of the function.  Unless the compiler applies a clever optimization, the compiler is now forced to allocate space for two variables on the stack.
> 
> Not necessarily. The optimizer uses a technique called "live range analysis" to determine if two variables have non-overlapping ranges. It uses this for register assignment, but it could just as well be used for minimizing stack usage.

That's the optimization I was referring to.  It works for ints, but not for RAII types.  It also doesn't (necessarily) work if you reorder the function:

   void bar(bool foo) {
     int a = void;
     int c = 3;
     if (foo) {
       a = 1;
     } else {
       a = 2; // Reuse variable.
     }
   }

Of course, a good optimizer can still reorder the declarations in this case, or even eliminate the whole function body (since it doesn't do anything).


-- 
Rainer Deyke - rainerd@eldwood.com
September 27, 2009
Jeremie Pelletier wrote:
> Walter Bright wrote:
>> They are completely independent variables. One may get assigned to a register, and not the other.
> 
> Ok, that's what I thought, so the good old C way of declaring variables at the top is not a bad thing yet :)

Strange how you can look at the evidence and arrive at exactly the wrong conclusion.  Declaring variables as close as possible to where they are used can reduce stack usage, and never increases it.

-- 
Rainer Deyke - rainerd@eldwood.com