November 05, 2009
Rainer Deyke wrote:
> Andrei Alexandrescu wrote:
>> Rainer Deyke wrote:
>>> '-safe' turns on runtime safety checks, which can be and should be
>>> mostly orthogonal to the module safety level.
>> Runtime vs. compile-time is immaterial.
> 
> The price of compile-time checks is that you are restricted to a subset
> of the language, which may or may not allow you to do what you need to do.
> 
> The price of runtime checks is runtime performance.
> 
> Safety is always good.  To me, the question is never if I want safety,
> but if I can afford it.  If I can't afford to pay the price of runtime
> checks, I may still want the compile-time checks.  If I can't afford to
> pay the price of compile-time checks, I may still want the runtime
> checks.  Thus, to me, the concepts of runtime and compile-time checks
> are orthogonal.

I hear what you're saying, but I am not enthusiastic at all about defining and advertising a half-pregnant state. Such a language is the worst of all worlds - it's frustrating to code in yet gives no guarantee to anyone. I don't see this going anywhere interesting. "Yeah, we have safety, and we also have, you know, half safety - it's like only a lap belt of sorts: inconvenient like crap and doesn't really help in an accident." I wouldn't want to code in such a language.

> A module either passes the compile-time checks or it does not.  It makes
> no sense make the compile-time checks optional for some modules.  If the
> module is written to pass the compile-time checks (i.e. uses the safe
> subset of the language), then the compile-time checks should always be
> performed for that module.

I think that's the current intention indeed.

Andrei
November 05, 2009
Andrei Alexandrescu wrote:
> SafeD is, unfortunately, not finished at the moment. I want to leave in place a stub that won't lock our options. Here's what we currently have:
> 
> module(system) calvin;
> 
> This means calvin can do unsafe things.
> 
> module(safe) susie;
> 
> This means susie commits to extra checks and therefore only a subset of D.
> 
> module hobbes;
> 
> This means hobbes abides to whatever the default safety setting is.
> 
> The default safety setting is up to the compiler. In dmd by default it is "system", and can be overridden with "-safe".
> 
> Sketch of the safe rules:
> 
> \begin{itemize*}
> \item No @cast@ from a pointer type to an integral type and vice versa
> \item No @cast@ between unrelated pointer types
> \item Bounds checks on all array accesses
> \item  No  unions  that  include  a reference  type  (array,  @class@,
>   pointer, or @struct@ including such a type)
> \item No pointer arithmetic
> \item No escape of a pointer  or reference to a local variable outside
>   its scope
> \item Cross-module function calls must only go to other @safe@ modules
> \end{itemize*}
> 
> So these are my thoughts so far. There is one problem though related to the last \item - there's no way for a module to specify "trusted", meaning: "Yeah, I do unsafe stuff inside, but safe modules can call me no problem". Many modules in std fit that mold.
> 
> How can we address that? Again, I'm looking for a simple, robust, extensible design that doesn't lock our options.
> 
> 
> Thanks,
> 
> Andrei

Not sure if this is the right topic to say this but maybe D needs monads to allow more functions to be marked as pure. Then functional could be added to the list of paradigms D supports and will also be safer.
November 05, 2009
Andrei Alexandrescu wrote:
> I hear what you're saying, but I am not enthusiastic at all about defining and advertising a half-pregnant state. Such a language is the worst of all worlds - it's frustrating to code in yet gives no guarantee to anyone. I don't see this going anywhere interesting. "Yeah, we have safety, and we also have, you know, half safety - it's like only a lap belt of sorts: inconvenient like crap and doesn't really help in an accident." I wouldn't want to code in such a language.

Basically you're saying that safety is an all or nothing deal.  Not only is this in direct contradiction to the attempts to allow both safe and unsafe modules to coexist in the same program, it is in contradiction with all existing programming languages, every single one of which offers some safety features but not absolute 100% safety.

If you have a formal definition of safety, please post it.  Without such a definition, I will use my own informal definition of safety for the rest of this post: "a safety feature is a language feature that reduces programming errors."

First, to demonstrate that all programming languages in existence offer some safety features.  With some esoteric exceptions (whitespace, hq9+), all programming languages have a syntax with some level of redundancy. This allows the language implementation to reject some inputs as syntactically incorrect.  A redundant syntax is a safety feature.

Another example relevant to D: D requires an explicit cast when converting an integer to a pointer.  This is another safety feature.

Now to demonstrate that no language offers 100% safety.  In the
abstract, no language can guarantee that a program matches the
programmer's intention.  However, let's look at a more specific form of
safety: safety from dereferencing dangling pointers.  To guarantee this,
you would need to guarantee that the compiler never generates faulty
code that causes the a dangling pointer to be dereferenced.  If the
program makes any system calls at all, you would also need to guarantee
that no bugs in the OS cause a dangling pointer to be dereferenced.
Both of these are clearly impossible.  No language can offer 100% safety.

Moreover, that safety necessarily reduces convenience is clearly false.
 This /only/ applies to compile-time checks.  Runtime checks are purely
an implementation issue.  Even C and assembly can be implemented such
that all instances of undefined behavior are trapped at runtime.

Conversely, the performance penalty of safety applies mostly to runtime checks.  If extensive testing with these checks turned on fails to reveal any bugs, it is entirely reasonable to remove these checks for the final release.


-- 
Rainer Deyke - rainerd@eldwood.com
November 05, 2009
Rainer Deyke wrote:
> Andrei Alexandrescu wrote:
>> I hear what you're saying, but I am not enthusiastic at all about
>> defining and advertising a half-pregnant state. Such a language is the
>> worst of all worlds - it's frustrating to code in yet gives no guarantee
>> to anyone. I don't see this going anywhere interesting. "Yeah, we have
>> safety, and we also have, you know, half safety - it's like only a lap
>> belt of sorts: inconvenient like crap and doesn't really help in an
>> accident." I wouldn't want to code in such a language.
> 
> Basically you're saying that safety is an all or nothing deal.  Not only
> is this in direct contradiction to the attempts to allow both safe and
> unsafe modules to coexist in the same program, it is in contradiction
> with all existing programming languages, every single one of which
> offers some safety features but not absolute 100% safety.
> 
> If you have a formal definition of safety, please post it.  Without such
> a definition, I will use my own informal definition of safety for the
> rest of this post: "a safety feature is a language feature that reduces
> programming errors."
> 
> First, to demonstrate that all programming languages in existence offer
> some safety features.  With some esoteric exceptions (whitespace, hq9+),
> all programming languages have a syntax with some level of redundancy.
> This allows the language implementation to reject some inputs as
> syntactically incorrect.  A redundant syntax is a safety feature.
> 
> Another example relevant to D: D requires an explicit cast when
> converting an integer to a pointer.  This is another safety feature.
> 
> Now to demonstrate that no language offers 100% safety.  In the
> abstract, no language can guarantee that a program matches the
> programmer's intention.  However, let's look at a more specific form of
> safety: safety from dereferencing dangling pointers.  To guarantee this,
> you would need to guarantee that the compiler never generates faulty
> code that causes the a dangling pointer to be dereferenced.  If the
> program makes any system calls at all, you would also need to guarantee
> that no bugs in the OS cause a dangling pointer to be dereferenced.
> Both of these are clearly impossible.  No language can offer 100% safety.
> 
> Moreover, that safety necessarily reduces convenience is clearly false.
>  This /only/ applies to compile-time checks.  Runtime checks are purely
> an implementation issue.  Even C and assembly can be implemented such
> that all instances of undefined behavior are trapped at runtime.
> 
> Conversely, the performance penalty of safety applies mostly to runtime
> checks.  If extensive testing with these checks turned on fails to
> reveal any bugs, it is entirely reasonable to remove these checks for
> the final release.

I'm in complete agreement with you, Reiner.
What I got from Bartosz' original post was that a large class of bugs could be eliminated fairly painlessly via some compile-time checks. It seemed to be based on pragmatic concerns. I applauded it. (I may have misread it, of course).
Now, things seem to have left pragmatism and got into ideology. Trying to eradicate _all_ possible memory corruption bugs is extremely difficult in a language like D. I'm not at all convinced that it is realistic (ends up too painful to use). It'd be far more reasonable if we had non-nullable pointers, for example.

The ideology really scares me, because 'memory safety' covers just one class of bug. What everyone wants is to drive the _total_ bug count down, and we can improve that dramatically with basic compile-time checks. But demanding 100% memory safety has a horrible cost-benefit tradeoff. It seems like a major undertaking.

And I doubt it would convince anyone, anyway. To really guarantee memory safety, you need a bug-free compiler...
November 05, 2009
Hello Don,

> Rainer Deyke wrote:
> 
>> Andrei Alexandrescu wrote:
>> 
>>> I hear what you're saying, but I am not enthusiastic at all about
>>> defining and advertising a half-pregnant state. Such a language is
>>> the worst of all worlds - it's frustrating to code in yet gives no
>>> guarantee to anyone. I don't see this going anywhere interesting.
>>> "Yeah, we have safety, and we also have, you know, half safety -
>>> it's like only a lap belt of sorts: inconvenient like crap and
>>> doesn't really help in an accident." I wouldn't want to code in such
>>> a language.
>>> 
>> Basically you're saying that safety is an all or nothing deal.  Not
>> only is this in direct contradiction to the attempts to allow both
>> safe and unsafe modules to coexist in the same program, it is in
>> contradiction with all existing programming languages, every single
>> one of which offers some safety features but not absolute 100%
>> safety.
>> 
>> If you have a formal definition of safety, please post it.  Without
>> such a definition, I will use my own informal definition of safety
>> for the rest of this post: "a safety feature is a language feature
>> that reduces programming errors."
>> 
>> First, to demonstrate that all programming languages in existence
>> offer some safety features.  With some esoteric exceptions
>> (whitespace, hq9+), all programming languages have a syntax with some
>> level of redundancy. This allows the language implementation to
>> reject some inputs as syntactically incorrect.  A redundant syntax is
>> a safety feature.
>> 
>> Another example relevant to D: D requires an explicit cast when
>> converting an integer to a pointer.  This is another safety feature.
>> 
>> Now to demonstrate that no language offers 100% safety.  In the
>> abstract, no language can guarantee that a program matches the
>> programmer's intention.  However, let's look at a more specific form
>> of
>> safety: safety from dereferencing dangling pointers.  To guarantee
>> this,
>> you would need to guarantee that the compiler never generates faulty
>> code that causes the a dangling pointer to be dereferenced.  If the
>> program makes any system calls at all, you would also need to
>> guarantee
>> that no bugs in the OS cause a dangling pointer to be dereferenced.
>> Both of these are clearly impossible.  No language can offer 100%
>> safety.
>> Moreover, that safety necessarily reduces convenience is clearly
>> false. This /only/ applies to compile-time checks.  Runtime checks
>> are purely an implementation issue.  Even C and assembly can be
>> implemented such that all instances of undefined behavior are trapped
>> at runtime.
>> 
>> Conversely, the performance penalty of safety applies mostly to
>> runtime checks.  If extensive testing with these checks turned on
>> fails to reveal any bugs, it is entirely reasonable to remove these
>> checks for the final release.
>> 
> I'm in complete agreement with you, Reiner.
> What I got from Bartosz' original post was that a large class of bugs
> could be eliminated fairly painlessly via some compile-time checks. It
> seemed to be based on pragmatic concerns. I applauded it. (I may have
> misread it, of course).
> Now, things seem to have left pragmatism and got into ideology. Trying
> to eradicate _all_ possible memory corruption bugs is extremely
> difficult in a language like D. I'm not at all convinced that it is
> realistic (ends up too painful to use). It'd be far more reasonable if
> we had non-nullable pointers, for example.
> The ideology really scares me, because 'memory safety' covers just one
> class of bug. What everyone wants is to drive the _total_ bug count
> down, and we can improve that dramatically with basic compile-time
> checks. But demanding 100% memory safety has a horrible cost-benefit
> tradeoff. It seems like a major undertaking.
> 
> And I doubt it would convince anyone, anyway. To really guarantee
> memory safety, you need a bug-free compiler...
> 

I don't know how this could have anything with ideology? Are Java and C# ideological languages ? Certainly - if you see memory safety as ideology - you cannot escape form it in  these languages. 

Currently in D exist pure functions, but you are not obliged o used them. I think the memory safety should be handled the same way, mark a function safe, if you want/need to restrict yourself to this style of coding.  And just don't use it if you don't need, or cant -same as pure and nothrow. 

Notice that if you code your function safe, it would have only one negative impact on the caller - runtime bounds checking. I Admit it is not good. There are good reasons to require speed. As the standard libraries would use safe code - I'm not sure if it would be required to distribute two versions of .lib one with bounds checked safe code and one without bound checking on safe code?

I think what concerns you is also how safety would affect use of D statements and expression, that it would be too difficult/awkward to use; I don't know exactly, but imagine it to be simpler - just like Java/C#(?)

I there should be memory safety in D, I see no other possibility as to specify it per function and provide compiler switch to turn off bounds checking for safe code if need. I see it as most flexible for "code writers" and least interfering with "code users"; there is no need for trade-off. 

Compiler switch that would magically force safety on some code - would just not  compile is no way (and specifying safety per module is too grainy - both for code users and writers).

Btw. I think non-nullable pointers are equally important, but I see no prospect of them being implemented :( 



November 05, 2009
Tim Matthews wrote:
> Andrei Alexandrescu wrote:
>> SafeD is, unfortunately, not finished at the moment. I want to leave in place a stub that won't lock our options. Here's what we currently have:
>>
>> module(system) calvin;
>>
>> This means calvin can do unsafe things.
>>
>> module(safe) susie;
>>
>> This means susie commits to extra checks and therefore only a subset of D.
>>
>> module hobbes;
>>
>> This means hobbes abides to whatever the default safety setting is.
>>
>> The default safety setting is up to the compiler. In dmd by default it is "system", and can be overridden with "-safe".
>>
>> Sketch of the safe rules:
>>
>> \begin{itemize*}
>> \item No @cast@ from a pointer type to an integral type and vice versa
>> \item No @cast@ between unrelated pointer types
>> \item Bounds checks on all array accesses
>> \item  No  unions  that  include  a reference  type  (array,  @class@,
>>   pointer, or @struct@ including such a type)
>> \item No pointer arithmetic
>> \item No escape of a pointer  or reference to a local variable outside
>>   its scope
>> \item Cross-module function calls must only go to other @safe@ modules
>> \end{itemize*}
>>
>> So these are my thoughts so far. There is one problem though related to the last \item - there's no way for a module to specify "trusted", meaning: "Yeah, I do unsafe stuff inside, but safe modules can call me no problem". Many modules in std fit that mold.
>>
>> How can we address that? Again, I'm looking for a simple, robust, extensible design that doesn't lock our options.
>>
>>
>> Thanks,
>>
>> Andrei
> 
> Not sure if this is the right topic to say this but maybe D needs monads to allow more functions to be marked as pure. Then functional could be added to the list of paradigms D supports and will also be safer.

Would be great if you found the time to write and discuss a DIP.

Andrei
November 05, 2009
Don wrote:
> Rainer Deyke wrote:
>> Andrei Alexandrescu wrote:
>>> I hear what you're saying, but I am not enthusiastic at all about
>>> defining and advertising a half-pregnant state. Such a language is the
>>> worst of all worlds - it's frustrating to code in yet gives no guarantee
>>> to anyone. I don't see this going anywhere interesting. "Yeah, we have
>>> safety, and we also have, you know, half safety - it's like only a lap
>>> belt of sorts: inconvenient like crap and doesn't really help in an
>>> accident." I wouldn't want to code in such a language.
>>
>> Basically you're saying that safety is an all or nothing deal.  Not only
>> is this in direct contradiction to the attempts to allow both safe and
>> unsafe modules to coexist in the same program, it is in contradiction
>> with all existing programming languages, every single one of which
>> offers some safety features but not absolute 100% safety.
>>
>> If you have a formal definition of safety, please post it.  Without such
>> a definition, I will use my own informal definition of safety for the
>> rest of this post: "a safety feature is a language feature that reduces
>> programming errors."
>>
>> First, to demonstrate that all programming languages in existence offer
>> some safety features.  With some esoteric exceptions (whitespace, hq9+),
>> all programming languages have a syntax with some level of redundancy.
>> This allows the language implementation to reject some inputs as
>> syntactically incorrect.  A redundant syntax is a safety feature.
>>
>> Another example relevant to D: D requires an explicit cast when
>> converting an integer to a pointer.  This is another safety feature.
>>
>> Now to demonstrate that no language offers 100% safety.  In the
>> abstract, no language can guarantee that a program matches the
>> programmer's intention.  However, let's look at a more specific form of
>> safety: safety from dereferencing dangling pointers.  To guarantee this,
>> you would need to guarantee that the compiler never generates faulty
>> code that causes the a dangling pointer to be dereferenced.  If the
>> program makes any system calls at all, you would also need to guarantee
>> that no bugs in the OS cause a dangling pointer to be dereferenced.
>> Both of these are clearly impossible.  No language can offer 100% safety.
>>
>> Moreover, that safety necessarily reduces convenience is clearly false.
>>  This /only/ applies to compile-time checks.  Runtime checks are purely
>> an implementation issue.  Even C and assembly can be implemented such
>> that all instances of undefined behavior are trapped at runtime.
>>
>> Conversely, the performance penalty of safety applies mostly to runtime
>> checks.  If extensive testing with these checks turned on fails to
>> reveal any bugs, it is entirely reasonable to remove these checks for
>> the final release.
> 
> I'm in complete agreement with you, Reiner.
> What I got from Bartosz' original post was that a large class of bugs could be eliminated fairly painlessly via some compile-time checks. It seemed to be based on pragmatic concerns. I applauded it. (I may have misread it, of course).
> Now, things seem to have left pragmatism and got into ideology. Trying to eradicate _all_ possible memory corruption bugs is extremely difficult in a language like D. I'm not at all convinced that it is realistic (ends up too painful to use). It'd be far more reasonable if we had non-nullable pointers, for example.
> 
> The ideology really scares me, because 'memory safety' covers just one class of bug. What everyone wants is to drive the _total_ bug count down, and we can improve that dramatically with basic compile-time checks. But demanding 100% memory safety has a horrible cost-benefit tradeoff. It seems like a major undertaking.
> 
> And I doubt it would convince anyone, anyway. To really guarantee memory safety, you need a bug-free compiler...

I protest against using "ideology" when characterizing safety. It instantly lowers the level of the discussion. There is no ideology being pushed here, just a clear notion with equally clear benefits. I think it is a good time we all get informed a bit more.

First off: _all_ languages except C, C++, and assembler are or at least claim to be safe. All. I mean ALL. Did I mention all? If that was some ideology that is not realistic, is extremely difficult to achieve, and ends up too painful to use, then such theories would be difficult to corroborate with "ALL". Walter and I are in agreement that safety is not difficult to achieve in D and that it would allow a great many good programs to be written.

Second, there are not many definitions of what safe means and no ifs and buts about it. This whole wishy-washy notion of wanting just a little bit of pregnancy is just not worth pursuing. The definition is given in Pierce's book "Types and Programming Languages" but I was happy yesterday to find a free online book section by Luca Cardelli:

http://www.eecs.umich.edu/~bchandra/courses/papers/Cardelli_Types.pdf

The text is very approachable and informative, and I suggest anyone interested to read through page 5 at least. I think it's a must for anyone participating in this to read the whole thing. Cardelli distinguishes between programs with "trapped errors" versus programs with "untrapped errors". Yesterday Walter and I have had a long discussion, followed by an email communication between Cardelli and myself, which confirmed that these three notions are equivalent:

a) "memory safety" (notion we used so far)
b) "no undefined behavior" (C++ definition, suggested by Walter)
c) "no untrapped errors" (suggested by Cardelli)

I suspect "memory safety" is the weakest marketing terms of the three. For example, there's this complaint above: "'memory safety' covers just one class of bug." But when you think of programs with undefined behavior vs. programs with entirely defined behavior, you realize what an important class of bugs that is. Non-nullable pointers are mightily useful, but "no undefined behavior" is quite a bit better to have.

The argument about memory safety requiring a bug-free compiler is correct. It was actually aired quite a bit in Java's first years. It can be confidently said that Java won that argument. Why? Because Java had a principled approach that slowly but surely sealed all the gaps. The fact that dmd has bugs now should be absolutely no excuse for us to give up on defining a safe subset of the language.


Andrei
November 05, 2009
Andrei Alexandrescu, el  5 de noviembre a las 08:48 me escribiste:
> First off: _all_ languages except C, C++, and assembler are or at least claim to be safe. All. I mean ALL. Did I mention all? If that was some ideology that is not realistic, is extremely difficult to achieve, and ends up too painful to use, then such theories would be difficult to corroborate with "ALL". Walter and I are in agreement that safety is not difficult to achieve in D and that it would allow a great many good programs to be written.

I think the problem is the cost. The cost for the programmer (the subset of language features it can use is reduced) and the cost for the compiler (to increase the subset of language features that can be used, the compiler has to be much smarter).

Most languages have a lot of developers, and can afford making the compiler smarter to allow safety with a low cost for the programmer (at least when writing code, that cost might be higher performance-wise).

A clear example of this, is not being able to take the address of a local.
This is too restrictive to be useful, as you pointed in you post about
having to write static methods because of this. If you can't find
a workaround for this, I guess safety in D can look a little unrealistic.

I like the idea of having a safe subset in D, but I think being
a programming language, *runtime* safety should be *always* a choice for
the user compiling the code.

As other said, you can never be 100% sure your program won't blow for unknown reasons (it could do that because a bug in the compiler/interpreter, or even because a hardware problem), you can just try to make it as difficult as possible, but 100% safety doesn't exist.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Se ha dicho tanto que las apariencias engañan
Por supuesto que engañarán a quien sea tan vulgar como para creerlo
November 05, 2009
Leandro Lucarella wrote:
> Andrei Alexandrescu, el  5 de noviembre a las 08:48 me escribiste:
>> First off: _all_ languages except C, C++, and assembler are or at
>> least claim to be safe. All. I mean ALL. Did I mention all? If that
>> was some ideology that is not realistic, is extremely difficult to
>> achieve, and ends up too painful to use, then such theories would be
>> difficult to corroborate with "ALL". Walter and I are in agreement
>> that safety is not difficult to achieve in D and that it would allow
>> a great many good programs to be written.
> 
> I think the problem is the cost. The cost for the programmer (the subset
> of language features it can use is reduced) and the cost for the compiler
> (to increase the subset of language features that can be used, the
> compiler has to be much smarter).
> 
> Most languages have a lot of developers, and can afford making the
> compiler smarter to allow safety with a low cost for the programmer (at
> least when writing code, that cost might be higher performance-wise).

D is already a rich superset of Java. So the cost of making the language safe and useful was already absorbed.

> A clear example of this, is not being able to take the address of a local.
> This is too restrictive to be useful, as you pointed in you post about
> having to write static methods because of this. If you can't find
> a workaround for this, I guess safety in D can look a little unrealistic.

Most other languages do not allow taking addresses of locals. Why are they realistic and SafeD wouldn't? Just because we know we could do it in unsafe D?

> I like the idea of having a safe subset in D, but I think being
> a programming language, *runtime* safety should be *always* a choice for
> the user compiling the code.

Well in that case we need to think again about the command-line options.

> As other said, you can never be 100% sure your program won't blow for
> unknown reasons (it could do that because a bug in the
> compiler/interpreter, or even because a hardware problem), you can just
> try to make it as difficult as possible, but 100% safety doesn't exist.

I understand that stance, but I don't find it useful.

Andrei
November 05, 2009
Leandro Lucarella wrote:
> A clear example of this, is not being able to take the address of a local.
> This is too restrictive to be useful, as you pointed in you post about
> having to write static methods because of this. If you can't find
> a workaround for this, I guess safety in D can look a little unrealistic.

Sorry, I forgot to mention one thing. My example of List in the thread "An interesting consequence of safety requirements" used struct, but it should be mentioned there's a completely safe alternative: just define List as a class and there is no safety problem at all. Java, C#, and others define lists as classes and it didn't seem to kill them. I agree that using a struct in D would be marginally more efficient, but that doesn't mean that if I want safety I'm dead in the water. In particular it's great that pointers are still usable in SafeD. I'm actually surprised that nobody sees how nicely safety fits D, particularly its handling of "ref".

Andrei