Thread overview
[dmd-internals] Errors emitted by the optimizer and the as-if rule
Aug 18, 2012
David Nadlinger
Aug 18, 2012
kenji hara
Aug 18, 2012
David Nadlinger
Aug 19, 2012
Iain Buclaw
Aug 19, 2012
Marco Leise
Aug 20, 2012
Iain Buclaw
Aug 20, 2012
David Nadlinger
Aug 20, 2012
Jonathan M Davis
Aug 20, 2012
David Nadlinger
Aug 20, 2012
Jonathan M Davis
August 18, 2012
Kenji's fix for issue 8339 introduces code which accesses a
void-initialized variable without setting it first:
https://github.com/D-Programming-Language/phobos/commit/4a0546480078d51a0ad4dbf141c71bf818e034ec

I'm aware that it is contained inside an is(typeof()) check, and it seems like a clever solution to work around the underlying issue. However, DMD, with the optimizer enabled, would actually refuse to compile the code (even if a __traits(compiles, …) check returned true!), because access of uninitialized data is detected during data flow analysis somewhere in the optimizer, producing a "variable used before set"-type error.

My point, which is only tangentially related to the bug fix in question, is that the optimizer in DMD grossly violates the "as-if" rule by changing the definition of what is legal D code. For example, the spec [1] says that accessing an uninitialized value results in undefined behavior. To me, this implies that code like this is fine if it is never executed. Yet, DMD for example refuses to compile a function which contains such an access of uninitialized data (or a null pointer dereference, …), even if that function is never invoked.

In my opinion, this is clearly a bug. We can chose to either make accessing uninitialized data illegal – but the semantics of this are difficult to define in the general case due to the requirement of elaborate data flow analysis, somewhat defeating the point of having VoidInitializer in the first place –, _or_ to make it undefined behavior. Right now, we are doing a little of both, which is arguable the worst solution. It causes unexpected behavior of is(typeof()) and friends, and, much worse, leads to problems with cross-compiler compatibility: If a program includes piece of code which uses uninitialized data is never executed at runtime, it works fine when compiled GDC and LDC, in perfect agreement with the spec, yet DMD refuses to accept it.

Yes, emitting a warning if use of uninitialized data is detected is most likely a good idea, but changing the language semantics depending on whether the optimizer is enabled is not.

What do you think?

David


[1] http://dlang.org/declaration.html#VoidInitializer
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 19, 2012
2012/8/19 David Nadlinger <code@klickverbot.at>:
> Kenji's fix for issue 8339 introduces code which accesses a
> void-initialized variable without setting it first:
> https://github.com/D-Programming-Language/phobos/commit/4a0546480078d51a0ad4dbf141c71bf818e034ec
>
> I'm aware that it is contained inside an is(typeof()) check, and it seems like a clever solution to work around the underlying issue. However, DMD, with the optimizer enabled, would actually refuse to compile the code (even if a __traits(compiles, …) check returned true!), because access of uninitialized data is detected during data flow analysis somewhere in the optimizer, producing a "variable used before set"-type error.
>
> My point, which is only tangentially related to the bug fix in question, is that the optimizer in DMD grossly violates the "as-if" rule by changing the definition of what is legal D code. For example, the spec [1] says that accessing an uninitialized value results in undefined behavior. To me, this implies that code like this is fine if it is never executed. Yet, DMD for example refuses to compile a function which contains such an access of uninitialized data (or a null pointer dereference, …), even if that function is never invoked.
>
> In my opinion, this is clearly a bug. We can chose to either make accessing uninitialized data illegal – but the semantics of this are difficult to define in the general case due to the requirement of elaborate data flow analysis, somewhat defeating the point of having VoidInitializer in the first place –, _or_ to make it undefined behavior. Right now, we are doing a little of both, which is arguable the worst solution. It causes unexpected behavior of is(typeof()) and friends, and, much worse, leads to problems with cross-compiler compatibility: If a program includes piece of code which uses uninitialized data is never executed at runtime, it works fine when compiled GDC and LDC, in perfect agreement with the spec, yet DMD refuses to accept it.
>
> Yes, emitting a warning if use of uninitialized data is detected is most likely a good idea, but changing the language semantics depending on whether the optimizer is enabled is not.
>
> What do you think?

My answer is: optimizer doesn't affect the result of semantic analysis. In other words, flow analysis by the optimizer also doesn't affect to the compilation.

With the current DMD implementation, there is two *flow analysis*, the former is checked by the front-end, the latter is checked by the back-end. See the example code.

void foo() nothrow
{
    throw new Exception("");
    // "function test.foo 'foo' is nothrow yet may throw"
}
int bar(int n)
{
    if (n != 0)
        n = 1;
    else
        assert(0);
    // "function test.bar no return exp; or assert(0); at end of function"
}
void main()
{
    int n = void;
    printf("%d\n", n);
    // "Error: variable n used before set" with -O switch
}

nothrow and no return analysis, which occurs with and without -o-
switch, are checked by the front-end, and that is a part of language
specification.
The "used before set"  error disappears if you don't specify -O.
Therefore it is backend optimizer's work, and is not a part of the
spec.

Today, the flow analysis for uninitialized variable is not needed in semantic analysis, so it doesn't become matter.

Kenji Hara
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 18, 2012
On Sat, Aug 18, 2012 at 8:38 PM, kenji hara <k.hara.pg@gmail.com> wrote:
> My answer is: optimizer doesn't affect the result of semantic analysis. In other words, flow analysis by the optimizer also doesn't affect to the compilation.

I think you ares  completely missing the point missing the point. Flow analysis performed by the optimizer _does_ affect compilation, because it leads to code being treated as illegal which would be perfectly fine otherwise. How much bigger of an effect on language semantics can there be?

> The "used before set"  error disappears if you don't specify -O. Therefore it is backend optimizer's work, and is not a part of the spec.

Which is a big problem. Whether a program compiles or not should be clear from the spec, and not be dependent on the compiler writer's mood. If the optimizer detects undefined behavior, it can enable all sorts of crazy optimizations – for example, LLVM routinely optimizes down big chunks of code with just a trap if it detects undefined values –, but it can't change language semantics: if the spec says that something is undefined behavior, executing the code results in, well, undefined behavior, but the code is still perfectly legal, the compiler should _not_ be allowed to generate an error.

To illustrate my point, consider the following example. It contains code which exhibits undefined behavior, but since this code is never invoked, there should be no problem:

---
void foo() {
   int a = void;
   import std.stdio;
   writeln(a);
}
void main() {}
----

David
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 19, 2012
On 18 August 2012 20:01, David Nadlinger <code@klickverbot.at> wrote:
> On Sat, Aug 18, 2012 at 8:38 PM, kenji hara <k.hara.pg@gmail.com> wrote:
>> My answer is: optimizer doesn't affect the result of semantic analysis. In other words, flow analysis by the optimizer also doesn't affect to the compilation.
>
> I think you ares  completely missing the point missing the point. Flow analysis performed by the optimizer _does_ affect compilation, because it leads to code being treated as illegal which would be perfectly fine otherwise. How much bigger of an effect on language semantics can there be?
>
>> The "used before set"  error disappears if you don't specify -O. Therefore it is backend optimizer's work, and is not a part of the spec.
>
> Which is a big problem. Whether a program compiles or not should be clear from the spec, and not be dependent on the compiler writer's mood. If the optimizer detects undefined behavior, it can enable all sorts of crazy optimizations – for example, LLVM routinely optimizes down big chunks of code with just a trap if it detects undefined values –, but it can't change language semantics: if the spec says that something is undefined behavior, executing the code results in, well, undefined behavior, but the code is still perfectly legal, the compiler should _not_ be allowed to generate an error.
>
> To illustrate my point, consider the following example. It contains code which exhibits undefined behavior, but since this code is never invoked, there should be no problem:
>
> ---
> void foo() {
>    int a = void;
>    import std.stdio;
>    writeln(a);
> }
> void main() {}
> ----
>
> David

You can only be certain that a function is never invoked unless it is marked 'static' IMO.  My opinion is that it should warn you anyway, as it is potentially buggy code, even if unused.




-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals

August 19, 2012
Am 19.08.2012, 10:25 Uhr, schrieb Iain Buclaw <ibuclaw@ubuntu.com>:

> You can only be certain that a function is never invoked unless it is
> marked 'static' IMO.  My opinion is that it should warn you anyway, as
> it is potentially buggy code, even if unused.

What does static add to the mix?

---- b.d ----
static foo() {
    ...
}
---- a.d ----
import b;
void main() {
    foo();
}

Compiled as separate objects, b.foo can still be linked against.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals

August 20, 2012
On 19 August 2012 10:34, Marco Leise <Marco.Leise@gmx.de> wrote:
> Am 19.08.2012, 10:25 Uhr, schrieb Iain Buclaw <ibuclaw@ubuntu.com>:
>
>
>> You can only be certain that a function is never invoked unless it is marked 'static' IMO.  My opinion is that it should warn you anyway, as it is potentially buggy code, even if unused.
>
>
> What does static add to the mix?
>
> ---- b.d ----
> static foo() {
>     ...
> }
> ---- a.d ----
> import b;
> void main() {
>     foo();
> }
>
> Compiled as separate objects, b.foo can still be linked against.
>

I consider static functions local only to the current compilation unit.  It is not a global/public decl that can have it's reference pulled in from an external object / library.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals

August 20, 2012
On 19 Aug 2012, at 10:25, Iain Buclaw wrote:
> You can only be certain that a function is never invoked unless it is marked 'static' IMO.  My opinion is that it should warn you anyway, as it is potentially buggy code, even if unused.

Yes, sure, emitting a warning is fine, even more so if the code is known to be live.

My point about foo being unused is that DMD emits a hard error for a piece of code which is perfectly to spec (the example snipped for VoidInitializer even shows a similar case) and doesn't exhibit undefined behavior.

Either cases like this should be made illegal, implying that alternate compiler are allowed to perform the check in the frontend (checking for this during codegen can only be described as a hack, even if it is more or less zero-cost that way), or DMD should stop to emit an error.

Things like this actually do hurt compatibility across compilers (as nobody is going to implement the check like DMD does) - I once ran into a similar case with Thrift, where DMD complained about a null pointer dereference in dead code.

David
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals

August 20, 2012
On Monday, August 20, 2012 20:29:05 David Nadlinger wrote:
> On 19 Aug 2012, at 10:25, Iain Buclaw wrote:
> > You can only be certain that a function is never invoked unless it is marked 'static' IMO. My opinion is that it should warn you anyway, as it is potentially buggy code, even if unused.
> 
> Yes, sure, emitting a warning is fine, even more so if the code is known to be live.

Because of -w, there is essentially zero difference between a warning and an error except whether everyone sees it or not. They need to be fixed in either case. So, if what Kenji did is okay, then it should be neither a warning nor an error. If it's not okay, then a warning may be the correct approach, but the code will need to be changed regardless.

- Jonathan M Davis
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals

August 20, 2012
On Mon, Aug 20, 2012 at 8:33 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> Because of -w, there is essentially zero difference between a warning and an error except whether everyone sees it or not.

This is not true, see your favorite C compiler's "warnings-as-errors" switch. Warnings are precisely intended for cases where the code is legal, but likely to be wrong or error-prone.

David
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals

August 20, 2012
On Monday, August 20, 2012 20:50:10 David Nadlinger wrote:
> On Mon, Aug 20, 2012 at 8:33 PM, Jonathan M Davis <jmdavisProg@gmx.com>
wrote:
> > Because of -w, there is essentially zero difference between a warning and an error except whether everyone sees it or not.
> 
> This is not true, see your favorite C compiler's "warnings-as-errors" switch. Warnings are precisely intended for cases where the code is legal, but likely to be wrong or error-prone.

It doesn't really matter what C does. Because dmd has -w, which makes warnings errors, and I believe that Phobos is compiled with it, _nothing_ can be a warning if it's ever the case that it's truly reasonable to leave it as-is. If it's a warning, it needs to be fixed. It might not be as critical as an error, but since -w makes it an error, there's ultimately not much difference.

- Jonathan M Davis
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals