July 31, 2014
On Thursday, 31 July 2014 at 18:58:11 UTC, Walter Bright wrote:
> On 7/31/2014 4:28 AM, David Bregman wrote:
>> Sigh. Of course you can assume the condition after a runtime check has been
>> inserted. You just showed that
>>
>> assert(x); assume(x);
>>
>> is semantically equivalent to
>> assert(x);
>>
>> as long as the runtime check is not elided. (no -release)
>
> No. I showed that you cannot have an assert without the assume. That makes them equivalent that direction.

That is only true if assert always generates a runtime check. i.e. it is not true for C/C++ assert (and so far, D assert) in release mode.

> For the other direction, adding in a runtime check for an assume is going to be expected of an implementation.

No. It is expected that assume does /not/ have a runtime check. Assume is used to help the compiler optimize based on trusted facts, doing a runtime check could easily defeat the purpose of such micro optimizations.

> And, in fact, since the runtime check won't change the semantics if the assume is correct, they are equivalent.

Right, only "if the assume is correct". So they aren't equivalent if it isn't correct.

Q.E.D. ?

>> But you still want to assert to become assume in release mode? How
>> will you handle the safety issue?
>
> I don't know yet.

I would think the easiest way is to just not inject the assumption when inside @safe code, but I don't know anything about the compiler internals.

Even for @system code, I'm on the fence about whether asserts should affect codegen in release, it doesn't seem like a clear tradeoff to make: safety vs some dubious optimization gains. Do we really want to go down the same road as C with undefined behavior?

I would need to think about it more, but if D adopted that route, I would at least feel like I need to be much more careful with asserts, so I'm not accidentally making my code more buggy instead of less. I think it warrants discussion, anyways.
July 31, 2014
On 7/31/2014 9:02 AM, Artur Skawina via Digitalmars-d wrote:
>>> The solution is to tell the compiler that you really need that newly
>>> (over-)written data. Eg
>>>
>>>      asm {"" : : "m" (*cast(typeof(password[0])[9999999]*)password.ptr); }
>>
>> inline asm is not portable
>
> That's why a portable compiler barrier interface is needed.
> But this was just an example showing a zero-cost solution. A portable
> fallback is always possible (the bug report was about C code -- there,
> a loop that reads the data and stores a copy into a volatile location
> would work).

This is not a "barrier" operation. There you are thinking of atomic operations. This is a case of a "volatile" operation, and this supports it for D:

  https://github.com/D-Programming-Language/druntime/pull/892

Of course, someone has to actually pull it!

July 31, 2014
On 07/31/2014 11:29 PM, Sean Kelly wrote:
> On Thursday, 31 July 2014 at 21:11:17 UTC, Walter Bright wrote:
>> On 7/31/2014 1:52 PM, Sean Kelly wrote:
>>> Could you expand on what you consider input?
>>
>> All state processed by the program that comes from outside the
>> program. That would include:
>>
>> 1. user input
>> 2. the file system
>> 3. uninitialized memory
>> 4. interprocess shared memory
>> 5. anything received from system APIs, device drivers, and DLLs that
>> are not part of the program
>> 6. resource availability and exhaustion
>
> So effectively, any factor occurring at runtime.  If I create a
> library, it is acceptable to validate function parameters using
> assert() because the user of that library knows what the library
> expects and should write their code accordingly.  That's fair.

It is most fair inside the 'in' contract.
July 31, 2014
On Thursday, 31 July 2014 at 21:25:25 UTC, Walter Bright wrote:
> On 7/31/2014 1:33 PM, David Nadlinger wrote:
>> I've had the questionable pleasure of tracking down a couple of related issues
>> in LLVM and the LDC codegen, so please take my word for it: Requiring any
>> particular behavior such as halting in a case that can be assumed to be
>> unreachable is at odds with how the term "unreachable" is used in the wild – at
>> least in projects like GCC and LLVM.
>
> For example:
>
>  int foo() {
>    while (...) {
>        ...
>    }
>    assert(0);
>  }
>
> the compiler needn't issue an error at the end "no return value for foo()" because it can assume it never got there.
>
> I'll rewrite that bit in the spec as it is clearly causing confusion.

Don't rewrite it because you merely concede that it might be confusing. Rewrite it because you admit it's contradictory. If you just try to reword the spec without understanding how your use of the terminology differs from the established meaning, you'll probably come up with something that is confusing to the rest of the world just as well.

Perhaps looking at the situation in terms of basic blocks and the associated control flow graph will help:

As per your above post, assert(0) has nothing to do with making any assumptions on the compiler side. It merely servers as a terminator instruction of a BB, making it a leaf in the CFG. This seems to be the definition you intend for the spec. Maybe add something along the lines of "behaves like a function call that never returns" as an explanation to make it easier to understand.

This is not what "unreachable" means. If assert(0) was unreachable, then the compiler would be free to assume that no CFG edges *into* the BB holding the instruction are ever taken (and as a corollary, it could also decide not emit any code for it). Thus, the term certainly shouldn't appear anywhere near assert(0) in the spec, except to point out the difference.

Cheers,
David
July 31, 2014
Am 31.07.2014 23:59, schrieb Walter Bright:
> On 7/31/2014 10:40 AM, Daniel Gibson wrote:
>> It's a major PITA to debug problems that only happen in release builds.
>
> Debugging optimized code was a well known problem even back in the 70's.
> Nobody has solved it, and nobody wants unoptimized code.
>

Yeah, and because of this I'd like optimizations not to cause different behavior if at all possible to keep these kind of bugs as low as possible.

And I agree with your stance on those fine-grained optimization switches from your other post. GCC currently has 191 flags the influence optimization[1] (+ a version that negates them for most), and I don't understand what most of them do, so it would be hard for me to decide which optimizations I want and which I don't want.

However, what about an extra flag for "unsafe" optimizations?
I'd like the compiler to do inlining, replacing int multiplications with powers of two with shifts and other "safe" optimizations that don't change the semantics of my program (see the examples in the post you quoted), but I *don't* want it to e.g. remove writes to memory that isn't read afterwards or make assumptions based on assertions (that are disabled in the current compile mode).

And maybe a warning mode that tells me about "dead"/"superfluous" code that would be eliminated in an optimized build so I can check if that would break anything for me in that respect without trying to understand the asm output would be helpful.

Cheers,
Daniel


[1] according to $ gcc-4.8 --help=optimizer | grep "^  -" | wc -l
July 31, 2014
On Thursday, 31 July 2014 at 22:17:56 UTC, David Nadlinger wrote:
> servers

Gah, "serves". Also, I hope the post didn't come across as condescending, as it certainly wasn't intended that way. I just figured it would be a good idea to define the terms we are using, as we seemed to be continuously talking past each other.

David
July 31, 2014
On 08/01/14 00:08, Walter Bright via Digitalmars-d wrote:
> On 7/31/2014 9:02 AM, Artur Skawina via Digitalmars-d wrote:
>>>> The solution is to tell the compiler that you really need that newly
>>>> (over-)written data. Eg
>>>>
>>>>      asm {"" : : "m" (*cast(typeof(password[0])[9999999]*)password.ptr); }
>>>
>>> inline asm is not portable
>>
>> That's why a portable compiler barrier interface is needed.
>> But this was just an example showing a zero-cost solution. A portable
>> fallback is always possible (the bug report was about C code -- there,
>> a loop that reads the data and stores a copy into a volatile location
>> would work).
> 
> This is not a "barrier" operation. There you are thinking of atomic operations. This is a case of a "volatile" operation, and this supports it for D:
> 
>   https://github.com/D-Programming-Language/druntime/pull/892

It's a _compiler_ barrier and has nothing to do with atomic ops or
volatile. It simply tells the compiler (in this case) 'i'm going
to read the data in the memory locations pointed to by the password.ptr'.
That means that the compiler has to make sure that the data is there,
before the `asm` executes; it can not assume that the stores are dead
and can not optimize them away. The actual (emitted) asm does nothing.
It's just a way to communicate to the compiler that the data is needed.
Since in this case the point was just to overwrite /other/ security
sensitive data present at this location, nothing else is necessary. We
don't actually care about the new content, we only pretend we do, so
that the compiler isn't able to optimize across this barrier.

Exposing compiler barriers in a portable way in D would definitively be a good idea. Relatively decent implementations can be easily done, for example, the above functionality can be achieved via a pure function that takes a reference to a static array. The function would do nothing, just immediately return to the caller; it'd just need to be opaque from the optimizers POV. This version wouldn't be zero-cost, like the example above, but still very cheap (usually just a call+ret sequence), correct and enough for many not perf-sensitive use cases like the one described in that bug report.

artur

August 01, 2014
On 7/31/2014 3:21 PM, Daniel Gibson wrote:
> And I agree with your stance on those fine-grained optimization switches from
> your other post. GCC currently has 191 flags the influence optimization[1]

Just consider this from a testing standpoint. As I mentioned previously, optimizations interact with each other to produce emergent behavior. GCC has 191 factorial different optimizers. Google's calculator puts 191! at infinity, which it might as well be.


> However, what about an extra flag for "unsafe" optimizations?

There's been quite a creep of adding more and more flags. Each one of them is, in a way, a failure of design, and we are all too quick to reach for that.


> I *don't*
> want it to e.g. remove writes to memory that isn't read afterwards

That's what volatileStore() is for.


> or make assumptions based on assertions (that are disabled in the current compile mode).

This is inexorably coming. If you cannot live with it, I suggest writing your own version of assert, using the Phobos 'enforce' implementation as a model. It'll do what you want.


> And maybe a warning mode that tells me about "dead"/"superfluous" code that
> would be eliminated in an optimized build so I can check if that would break
> anything for me in that respect without trying to understand the asm output
> would be helpful.

If you compile DMD with -D, and then run it with -O --c, it will present you with a list of all the data flow optimizations performed on the code. It's very useful for debugging the optimizer. Although I think you'll find it illuminating, you won't find it very useful - for one thing, a blizzard of info is generated.

August 01, 2014
On 7/31/2014 3:07 PM, David Bregman wrote:
> On Thursday, 31 July 2014 at 18:58:11 UTC, Walter Bright wrote:
>> On 7/31/2014 4:28 AM, David Bregman wrote:
>>> Sigh. Of course you can assume the condition after a runtime check has been
>>> inserted. You just showed that
>>>
>>> assert(x); assume(x);
>>>
>>> is semantically equivalent to
>>> assert(x);
>>>
>>> as long as the runtime check is not elided. (no -release)
>>
>> No. I showed that you cannot have an assert without the assume. That makes
>> them equivalent that direction.
>
> That is only true if assert always generates a runtime check. i.e. it is not
> true for C/C++ assert (and so far, D assert) in release mode.
>
>> For the other direction, adding in a runtime check for an assume is going to
>> be expected of an implementation.
>
> No. It is expected that assume does /not/ have a runtime check. Assume is used
> to help the compiler optimize based on trusted facts, doing a runtime check
> could easily defeat the purpose of such micro optimizations.

I'm rather astonished you'd take that position. It opens a huge door wide for undefined behavior, and no obvious way of verifying that the assume() is correct.

I'm confident that if D introduced such behavior, the very first comment would be "I need it to insert a runtime check on demand."


>> And, in fact, since the runtime check won't change the semantics if the assume
>> is correct, they are equivalent.
>
> Right, only "if the assume is correct". So they aren't equivalent if it isn't
> correct.
>
> Q.E.D. ?

I'm not buying those uncheckable semantics as being workable and practical.


>>> But you still want to assert to become assume in release mode? How
>>> will you handle the safety issue?
>>
>> I don't know yet.
>
> I would think the easiest way is to just not inject the assumption when inside
> @safe code, but I don't know anything about the compiler internals.
>
> Even for @system code, I'm on the fence about whether asserts should affect
> codegen in release, it doesn't seem like a clear tradeoff to make: safety vs
> some dubious optimization gains.

So why do you want assume() with no checking whatsoever? Does anybody want that? Why are we even discussing such a misfeature?

> Do we really want to go down the same road as C
> with undefined behavior?

So you don't want assume()? Who does?

August 01, 2014
On 7/31/2014 2:01 PM, ponce wrote:
> This also puzzles me. There is the point where the two types of errors blend to
> the point of being uncomfortable.
>
> Eg: a program generates files in X format and can also read them with a X
> parser. Its X parser will only ever read output generated by itself. Should
> input errors in X parser be checked with assert or exceptions?

Exceptions. Although there are grey areas, this is not one. Filesystems are subject to all kinds of failures, exhaustions, modification by other processes, etc., which are not logic bugs in your program.


If you're brave and want to have some fun, fill up your hard disk so it is nearly full. Now run your favorite programs that read and write files. Sit back and watch the crazy results (far too many programs assume that writes succeed). Operating systems also behave erratically in this scenario, hence the 'brave' suggestion.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18