October 12, 2017
On Thursday, 12 October 2017 at 20:27:03 UTC, Jonathan M Davis wrote:
> On Thursday, October 12, 2017 20:15:41 kdevel via
>> ---
>> void main ()
>> {
>>     assert (false);
>> }
>> ---
>>
>> qualifies as "invalid, and therefore has undefined behaviour." A statement, which makes no sense to me. Either it is a "debugging aid", that implies defined behavior, or it is undefined behavior, then assert (false) cannot aid debugging.
>
> assert(false) is a bit special in that it's never removed (it becomes a HLT instruction with -release),

Confirmed. I should have written something like this instead:

---
import std.stdio;
import std.string;
import std.conv;
void main ()
{
   int i;
   i = readln.chomp.to!int;
   assert (i != 3);
   writeln ("i = <", i, ">");
}
---

Is it defined that this program throws an AssertError in debug mode if 3 is fed to stdin? If not, assert (...) could not aid debugging.
October 12, 2017
On Thursday, October 12, 2017 21:22:29 kdevel via Digitalmars-d-learn wrote:
> On Thursday, 12 October 2017 at 20:27:03 UTC, Jonathan M Davis
>
> wrote:
> > On Thursday, October 12, 2017 20:15:41 kdevel via
> >
> >> ---
> >> void main ()
> >> {
> >>
> >>     assert (false);
> >>
> >> }
> >> ---
> >>
> >> qualifies as "invalid, and therefore has undefined behaviour." A statement, which makes no sense to me. Either it is a "debugging aid", that implies defined behavior, or it is undefined behavior, then assert (false) cannot aid debugging.
> >
> > assert(false) is a bit special in that it's never removed (it
> > becomes a HLT instruction with -release),
>
> Confirmed. I should have written something like this instead:
>
> ---
> import std.stdio;
> import std.string;
> import std.conv;
> void main ()
> {
>     int i;
>     i = readln.chomp.to!int;
>     assert (i != 3);
>     writeln ("i = <", i, ">");
> }
> ---
>
> Is it defined that this program throws an AssertError in debug mode if 3 is fed to stdin? If not, assert (...) could not aid debugging.

If assertions are compiled in (which they are if you're not compiling with -release), and i is ever 3, then an AssertError will be thrown. This is guaranteed. As such, the compiler is free to assume that i is never 3 when code execution arrives at the line after the assertion, and if it can do an optimization based on that fact, it is free to do so. You've told it that i should never be 3 at that point and that it's a bug if it is, and as such, it is free to assume that i is never 3 after the assertion even if the assertion is compiled out with -release - that is the only place that undefined behavior may enter into it. If the compiler does an optimization based on the fact that i isn't 3, and it is, and -release is used, then you could get some weird behavior when the code reaches the lines after the assertion - but by definition, you already have a bug if i is 3, and your program in general is assuming that i isn't 3 at that point, so you're going to get bad behavior either way. The fact that your assertion failed means that you have a logic error in your program, and it is therefore in an invalid state and will likely not behave correctly.

However, your example is an excellent example of when _not_ to use assertions. Assertions should never be used on user input or anything outside of the program's control. When you use an assertion, you are saying that it is a bug in the progam if that assertion fails, and bad user input isn't a bug, though the fact that you're not validating user input arguably is (certainly it is if the assertion is there, since at that point, you're saying that it's a bug if i is ever 3). Assertions allow you to catch bugs in your logic during development and then don't slow your program down when compiling with -release for production. They are not for validating anything other than that the logic of your program is correct.

And if for any reason, you're paranoid enough that you want those logic checks to still be there in production, then either don't use -release (even in production), or do something like

enforce!Error(cond, "msg");

instead of

assert(cond, "msg);

and then you'll get an Error thrown when the condition fails - even with -release.


On a side note, I would point out that talking about "debug mode" with D gets annoyingly ambiguous, because that kind of implies the -debug flag, which has nothing to do with assertions and which actually can be used in conjunction with -release (all -debug does is enable debug{} blocks), which is why I try to avoid the term debug mode - though I assume that you meant when -release isn't used, since that's often what folks mean.

- Jonathan M Davis

October 13, 2017
On Friday, 13 October 2017 at 02:22:24 UTC, Jonathan M Davis wrote:
> You've told it that i should never be 3 at that point and that it's a bug if it is, and as such, it is free to assume that i is never 3 after the assertion even if the assertion is compiled out with -release - that is the only place that undefined behavior may enter into it.

Thanks for the clarification! This is a difference to C where assert has only a diagnostic purpose. Disabling assertions in C (by setting NDEBUG) does AFAICS neither introduce undefined behavior nor is the compiler entitled to optimize code away based on the assertion. This C program

--- test.c
#include <stdio.h>
#define NDEBUG 1
#include <assert.h>
int main ()
{
   int i = 3;
   assert (i != 3);
   if (i == 3)
      printf ("%d\n", i);
   return 0;
}
---

is IMHO conforming and it is defined to print 3 in a conforming environment. The 'corresponding' D program

--- assert4.d
import std.stdio;
int main ()
{
   int i = 3;
   assert (i != 3);
   if (i == 3)
      writef ("%d\n", i);
   return 0;
}
---

is 'conforming' (but buggy) under non-release-D and 'non-conforming' (because of the undefined behavior) otherwise. Is this judgement correct?

> If the compiler does an optimization based on the fact that i isn't 3, and it is, and -release is used, then you could get some weird behavior when the code reaches the lines after the assertion - but by definition, you already have a bug if i is 3, and your program in general is assuming that i isn't 3 at that point, so you're going to get bad behavior either way.

I would like to make a clear distiction between "bug" or "bad behavior" on the one hand and "undefined behavior" on the other. "Bug" and "bad behavior" address the outcome of a computation while "undefined behavior" is an (abstract, formal) property of a piece of code with respect to a certain language specification.

> The fact that your assertion failed means that you have a logic error in your program, and it is therefore in an invalid state and will likely not behave correctly.

Under non-release-D the program is perfectly valid and behaves exactly as expected. In relase-D it makes no sense to discuss if the state of program is valid or if the program behaves correctly, since it is non-conforming because of the undefined behavior.

(...)

> On a side note, I would point out that talking about "debug mode" with D gets annoyingly ambiguous, because that kind of implies the -debug flag, which has nothing to do with assertions and which actually can be used in conjunction with -release (all -debug does is enable debug{} blocks), which is why I try to avoid the term debug mode - though I assume that you meant when -release isn't used, since that's often what folks mean.

Agreed.
October 13, 2017
On Friday, October 13, 2017 11:26:54 kdevel via Digitalmars-d-learn wrote:
> On Friday, 13 October 2017 at 02:22:24 UTC, Jonathan M Davis
>
> wrote:
> > You've told it that i should never be 3 at that point and that it's a bug if it is, and as such, it is free to assume that i is never 3 after the assertion even if the assertion is compiled out with -release - that is the only place that undefined behavior may enter into it.
>
> Thanks for the clarification! This is a difference to C where assert has only a diagnostic purpose. Disabling assertions in C (by setting NDEBUG) does AFAICS neither introduce undefined behavior nor is the compiler entitled to optimize code away based on the assertion. This C program
>
> --- test.c
> #include <stdio.h>
> #define NDEBUG 1
> #include <assert.h>
> int main ()
> {
>     int i = 3;
>     assert (i != 3);
>     if (i == 3)
>        printf ("%d\n", i);
>     return 0;
> }
> ---
>
> is IMHO conforming and it is defined to print 3 in a conforming environment. The 'corresponding' D program
>
> --- assert4.d
> import std.stdio;
> int main ()
> {
>     int i = 3;
>     assert (i != 3);
>     if (i == 3)
>        writef ("%d\n", i);
>     return 0;
> }
> ---
>
> is 'conforming' (but buggy) under non-release-D and
> 'non-conforming' (because of the undefined behavior) otherwise.
> Is this judgement correct?

Essentially, though talking about conforming usually has to do with spec.

In both C/C++ and D, if you use an assertion, you're saying that if the assertion fails, then the logic in your code is faulty, and there is a bug in your program. With C/C++, it may not be codified that the compiler understands that, but the meaning is the same. If the assertion is compiled out but would have failed, then your program is in an invalid state and will do who-knows-what. By definition, you're screwed. Your program is doing something that you have said should never happen. How screwed you actually are can vary considerably, and if all it does is print out the value and never use it again, then you're not very screwed, but that also means that it was a rather odd assertion (though you're obviously doing that here as an example and not something that someone would normally do).

Because D's compiler does understand what assert means, it is allowed to optimize based on that fact. So, it _can_ generate code based on the assumption that the assertion succeeded, which can increase how screwed you are if the assertion is compiled out but would have failed, but your program is in an invalid state either way, because you've asserted that something is true when it isn't and thus indicated that if it isn't true, there is a bug in your program, and its logic is wrong. And as soon as the logic in your program is wrong, then it's not going to behave correctly. It's just a question of how badly behaved it will be.

So, we can talk about the behavior being undefined if the assertion would have failed on the basis that the compiler could generate optimized code that assumes that the assertion succeeded and thus do weirder things than it would have done if the code hadn't been optimized that way, but as far as the language is concerned, it's undefined behavior due to the simple fact that you've asserted that something is true which isn't. You yourself have stated that something must be true for your program to be valid, and it isn't true.

As long as the assertions are compiled in, then the fact that the logic in your program was invalid is caught, whereas if they're not compiled in, then it's not caught. But if the asserted condition is false, then your program is wrong either way.

- Jonathan M Davis

October 14, 2017
On Thursday, 12 October 2017 at 15:37:23 UTC, John Burton wrote:
>
> This is an example of what I mean :-
>
> undefined what it is meant to do anyway, so the compiler can "optimize" out the if condition as it only affects the case where the language doesn't define what it's supposed to do anyway, and compiles the code as if it was :-
>
> void test(int[] data)
> {
>     control_nuclear_reactor();
> }
>

Yeah the C/C++ community/haters love to talk about all the code the compiler can inject because of undefined behavior. But that is not what it means.

The compiler does not know the value of data.length so it could not make such a transformation of the code. Now had the assert been written before the if, you're telling the compiler some properties of data.length before you check it and it could make such optimizations.

The point is assert tells the compiler something it can use to reason about its job, not that it can insert additional runtime checks to see if you code is invalid an then add new jumps to execute whatever the hell it wants.

October 14, 2017
On Saturday, October 14, 2017 05:20:47 Jesse Phillips via Digitalmars-d- learn wrote:
> The point is assert tells the compiler something it can use to reason about its job, not that it can insert additional runtime checks to see if you code is invalid an then add new jumps to execute whatever the hell it wants.

+1

- Jonathan M Davis

October 14, 2017
On 14.10.2017 07:20, Jesse Phillips wrote:
> On Thursday, 12 October 2017 at 15:37:23 UTC, John Burton wrote:
>>
>> This is an example of what I mean :-
>>
>> undefined what it is meant to do anyway, so the compiler can "optimize" out the if condition as it only affects the case where the language doesn't define what it's supposed to do anyway, and compiles the code as if it was :-
>>
>> void test(int[] data)
>> {
>>     control_nuclear_reactor();
>> }
>>
> 
> Yeah the C/C++ community/haters love to talk about all the code the compiler can inject because of undefined behavior. But that is not what it means.
> ...

It can mean that, but that is not even what happened in the given example.

> The compiler does not know the value of data.length so it could not make such a transformation of the code.

The compiler can easily prove that the value of data.length does not change between the two points in the program. According to the specification, the behavior of the program is undefined in case the assertion fails, not just the behavior of the program after the assertion would have failed if it had not been removed.

> Now had the assert been written before the if, you're telling the compiler some properties of data.length before you check it and it could make such optimizations.
> 
> The point is assert tells the compiler something it can use to reason about its job, not that it can insert additional runtime checks to see if you code is invalid an then add new jumps to execute whatever the hell it wants.
> 

In the above example, a branch was removed, not added.

However, optimizers can add branches. (For example, it can check whether there is aliasing and use optimized code if it is not the case.)

Also, UB can and does sometimes mean that the program can execute arbitrary code. It's called "arbitrary code execution": https://en.wikipedia.org/wiki/Arbitrary_code_execution
October 14, 2017
On Saturday, 14 October 2017 at 09:32:32 UTC, Timon Gehr wrote:
> Also, UB can and does sometimes mean that the program can execute arbitrary code. It's called "arbitrary code execution": https://en.wikipedia.org/wiki/Arbitrary_code_execution

This confuses different levels of reasoning. In C/C++ "undefined behavior" is a property of the SOURCE code with respect to the specification. It states: The spec does not not apply, it does not define the semantic.

This issue is totally different from the question what a given program containing undefined behavior actually does after is compiles and the after the linker produces an executable. This is reasoning about generated MACHINE code.

A result of this confusion has been that some clever people tried to "detect" certain kinds of undefined behavior "after" they "happended". E.g. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475> This is the danger of undefined behavior: The MACHINE code may also work as the programmer expected. At least for some time.


October 15, 2017
On 14.10.2017 23:36, kdevel wrote:
> On Saturday, 14 October 2017 at 09:32:32 UTC, Timon Gehr wrote:
>> Also, UB can and does sometimes mean that the program can execute arbitrary code. It's called "arbitrary code execution": https://en.wikipedia.org/wiki/Arbitrary_code_execution
> 
> This confuses different levels of reasoning.

It's a correct statement about the semantics of programs produced from sources with UB by standard-compliant compilers.

> In C/C++ "undefined behavior" is a property of the SOURCE code with respect to the specification. It states: The spec does not not apply, it does not define the semantic.
> ...

I.e., the semantics of a program produced by a conforming compiler can be arbitrary.

> This issue is totally different from the question what a given program containing undefined behavior actually does after is compiles and the after the linker produces an executable. This is reasoning about generated MACHINE code.
> ...

Sure. This is very much intentional. The current subthread is about what kind of programs the compiler might produce (in practice) if the provided source code contains UB. The claim I was refuting was that the produced program cannot have branching and other behaviors not specified in the source.

> A result of this confusion has been that some clever people tried to "detect" certain kinds of undefined behavior "after" they "happended". E.g. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475> This is the danger of undefined behavior: The MACHINE code may also work as the programmer expected. At least for some time.
> 
> 

I'm not confused about this at all.
October 17, 2017
On Saturday, 14 October 2017 at 09:32:32 UTC, Timon Gehr wrote:
> The compiler can easily prove that the value of data.length does not change between the two points in the program. According to the specification, the behavior of the program is undefined in case the assertion fails, not just the behavior of the program after the assertion would have failed if it had not been removed.

You are right, in this example proving that there is no change between the condition and the assert is easy and possible. In fact there was an example of this in C I think with a function pointer which was uninitialized. Where the optimizer identified that there was only one valid function which could have been assigned and made lowered the indirect call to a direct one.

My statement was more around if the compiler/optimizer can't determine the value

    void test(int[] data, bool goboom)
    {
        if (data.length == 0) {
            writeln("Not enough data!");
        } else {
            control_nuclear_reactor(data);
        }

        assert(goboom);
    }

The optimizer can generate code to match:


    void test(int[] data, bool goboom)
    {
        if(!goboom) {
            launch_nuclear_missile();
            return;
        }

        if (data.length == 0) {
            writeln("Not enough data!");
        } else {
            control_nuclear_reactor(data);
        }
    }


> Also, UB can and does sometimes mean that the program can execute arbitrary code. It's called "arbitrary code execution": https://en.wikipedia.org/wiki/Arbitrary_code_execution

That article is about attacks not optimizers.

1 2
Next ›   Last »