August 04, 2014
On Monday, 4 August 2014 at 02:56:35 UTC, David Bregman wrote:
> On Monday, 4 August 2014 at 02:40:49 UTC, deadalnix wrote:
>> I allow myself to chime in. I don't have much time to follow the whole thing, but I have this in my mind for quite a while.
>>
>> First thing first, the proposed behavior is what I had in mind for SDC since pretty much day 1. It already uses hint for the optimizer to tell it the branch won't be taken, but I definitively want to go further.
>
> Not everyone had that definition in mind when writing their asserts.
>
>> By definition, when an assert has been removed in release that would have failed in debug, you are in undefined behavior land already. So there is no reason not to optimize.
>
> By the new definition, yes. But is it reasonable to change the definition, and then retroactively declare previous code broken? Maybe the ends justify the means in this case but it certainly isn't obvious that they do. I don't understand why breaking code is sacrilege one time, and the next time can be done without any justifications.

The fact that the compiler can optimize based on assert is not new in D world. Maybe it wasn't advertized properly, but it always was an option.

If one want to make sure a check is done, one can use expect.
August 04, 2014
On Monday, 4 August 2014 at 03:22:51 UTC, Andrei Alexandrescu wrote:
> On 8/3/14, 6:59 PM, David Bregman wrote:
>> w.r.t the one question about performance justification: I'm not
>> necessarily asking for research papers and measurements, but based on
>> these threads I'm not aware that there is any justification at all. For
>> all I know this is all based on a wild guess that it will help
>> performance "a lot", like someone who optimizes without profiling first.
>> That certainly isn't enough to justify code breakage and massive UB
>> injection, is it? I hope we can agree on that much at least!
>
> I think at this point (without more data) a bit of trust in one's experience would be needed. I've worked on performance on and off for years, and so has Walter. We have plenty of war stories that inform our expertise in the matter, including weird stuff like "swap these two enum values and you'll get a massive performance regressions although code is correct either way".
>
> I draw from numerous concrete cases that the right/wrong optimization at the right/wrong place may as well be the difference between winning and losing. Consider the recent php engine that gets within 20% of hhvm; heck, I know where to go to make hhvm 20% slower with 50 lines of code (compare at 2M+). Conversely, gaining those 20% were months multiplied by Facebook's best engineers.
>
> Efficiency is hard to come by and easy to waste. I consider Walter's take on "assert" a modern, refreshing take on an old pattern that nicely preserves its spirit, and a good opportunity and differential advantage for D. If anything, these long threads have strengthened that belief. It has also clarified to me that:
>
> (a) We must make sure we don't transform @safe code into unsafe code; in the first approximation that may simply mean assert() has no special meaning in release mode. Also bounds checking would need to probably be not elided by assert. I consider these challenging but in good, gainful ways.
>
> (b) Deployment of optimizations must be carefully staggered and documented.
>
>
> Andrei

First of all, thank you for the reply.

I agree with nearly everything you say. I also have significant experience with code optimization. I greatly enjoyed the talk you gave on C++ optimization, partly because it validated what I've spent so much of my own efforts doing.

I think we reach different conclusions from our experience though, my feeling is that typical asserts are unlikely to contain much info that can give a speedup.

This is not to say that the compiler can't be helped by extra information, on the contrary I wholeheartedly believe it can. However I would guess this will usually require the asserts to be specifically written for that purpose, using inside knowledge about the kinds of information the optimizer is capable of using.

In the end there isn't a substitute for measurement, so if we rely on experience we're both just guessing. Is it really justified to say that we're going to break stuff on a hunch it'll help performance? Considering the downsides to reusing existing asserts, what if you're wrong about performance?

If new, specialized asserts need to be written anyways, we might as well use a new keyword and avoid all the downsides, essentially giving the best of both worlds.

Also, I'm still curious about how you are evaluating the performance tradeoff in the first place, or do you even see it as a tradeoff? Is your estimation of the downside so small that any performance increase at all is sufficient to justify semantic change, UB injection and code breakage? If so then I see why you treat it as a forgone conclusion, certainly in a large enough codebase there will be some asserts here and there that allow you to shave off some instructions.
August 04, 2014
This. 1000x this.

Atila

On Monday, 4 August 2014 at 01:17:23 UTC, John Carter wrote:
> On Sunday, 3 August 2014 at 19:47:27 UTC, David Bregman wrote:
>
>> 2. Semantic change.
>> The proposal changes the meaning of assert(), which will result in breaking existing code. Regardless of philosophizing about whether or not the code was "already broken" according to some definition of assert, the fact is that shipping programs that worked perfectly well before may no longer work after this change.
>
> Subject to the caveat suggesting having two assert's with different names and different meanings, I am in the position to comment on this one from experience.
>
> So assuming we do have a "hard assert" that is used within the standard libraries and a "soft assert" in user code (unless they explicitly choose to use the "hard assert"....)
>
> What happens?
>
> Well, I'm the dogsbody who has the job of upgrading the toolchain and handling the fallout of doing so.
>
> So I have been walking multimegaline code bases through every gcc version in the last 15 years.
>
> This is relevant because on every new version they have added stricter warnings, and more importantly, deeper optimizations.
>
> It's especially the deeper optimizations that are interesting here.
>
> They are often better data flow analysis which result in more "insightful" warnings.
>
> So given I'm taking megalines of C/C++ code from a warnings free state on gcc version N to warnings free on version N+1, I'll make some empirical observations.
>
> * They have _always_ highlighted dodgy / non-portable / non-standard compliant code.
> * They have quite often highlighted existing defects in the code.
> * They have quite often highlighted error handling code as "unreachable", because it is... and the only sane thing to do is delete it.
> * They have often highlighted the error handling code of "defensive programmers" as opposed to DbC programmers.
>
> Why? Because around 30% of the code of a defensive programmer is error handling crud that has never been executed, not even in development and hence is untested and unmaintained.
>
> The clean up effort was often fairly largish, maybe a week or two, but always resulted in better code.
>
> Customer impacting defects introduced by the new optimizations have been....
>
> a) Very very rare.
> b) Invariably from really bad code that was blatantly defective, non-standard compliant and non-portable.
>
> So what do I expect, from experience from Walter's proposed change?
>
>
> Another guy in this thread complained about the compiler suddenly relying on thousands of global axioms from the core and standard libraries.
>
> Yup.
>
> Exactly what is going to happen.
>
> As you get...
>
> * more and more optimization passes that rely on asserts,
> * in particular pre and post condition asserts within the standard libraries,
> * you are going to have flocks of user code that used to compile without warning
> * and ran without any known defect...
>
> ...suddenly spewing error messages and warnings.
>
> But that's OK.
>
> Because I bet 99.999% of those warnings will be pointing straight at bone fide defects.
>
> And yes, this will be a regular feature of life.
>
> New version of compiler, new optimization passes, new warnings... That's OK, clean 'em up, and a bunch of latent defects won't come back as customer complaints.

August 04, 2014
On Sunday, 3 August 2014 at 19:47:27 UTC, David Bregman wrote:
> 4. Performance.
> Q4a. What level of performance increases are expected of this proposal, for a representative sample of D programs?
> Q4b. Is there any threshold level of expected performance required to justify this proposal? For example, if a study determined that the average program could expect a speedup of 0.01% or less, would that still be considered a good tradeoff against the negatives?
> Q4c. Have any works or studies, empirical or otherwise, been done to estimate the expected performance benefit? Is there any evidence at all for a speedup sufficient to justify this proposal?
> Q4d. When evaluating the potential negative effects of the proposal on their codebase, D users may decide it is now too risky to compile with -release. (Even if their own code has been constructed with the new assert semantics in mind, the libraries they use might not). Thus the effect of the proposal would actually be to decrease the performance of their program instead of increase it. Has this been considered in the evaluation of tradeoffs?

I'd like to add:

Q4e: Have other alternatives been taken into consideration that could achieve the same performance gains, but in a safer way? I'm particularly thinking about whole program optimization. I suspect that WPO can prove most of what can only be assumed with asserts.
August 04, 2014
Should this semantics extend to array bounds checking, i.e. after the statement

foo[5] := 0;

can the optimizer assume that foo.length >= 6 ?
August 04, 2014
On 8/4/14, 7:27 AM, Matthias Bentrup wrote:
> Should this semantics extend to array bounds checking, i.e. after the
> statement
>
> foo[5] := 0;
>
> can the optimizer assume that foo.length >= 6 ?

Yes, definitely. -- Andrei

August 04, 2014
On Monday, 4 August 2014 at 09:38:26 UTC, Atila Neves wrote:
> This. 1000x this.
>
> Atila

Yeah, I don't think anyone disagrees with getting better warning and error messages. Static analysis rocks.

Anyways I just want to point out that this isn't what's being proposed, so it's kind of off topic. It's not an argument either for or against the proposal, just in case that wasn't clear.
August 04, 2014
On Sunday, 3 August 2014 at 23:54:46 UTC, John Carter wrote:
> I know that program proving is impossibly hard, so my asserts are a kind of short cut on it.

Yes, but the dual is that writing a correct program is impossibly hard. A correct program works as specified for all improbable input and configurations. No shipped programs are correct.

However, if you turn asserts into assume, you let the compiler use any defect in the program or the specification to prove "true==false". And after that all bets are off.

With asserts on, you can tell where the flaw is.

With asserts off and logging on you can figure out where the flaw is not.

With asserts turned to assumes, no amount of logging can help you. You just know there is a flaw.

Worse, an improbably occurring bug can now become a probable one.


> When I assert, I'm stating "In my architecture, as I designed it, this will always be true, and everything in the code downstream of here can AND DOES rely on this.

But it does not matter if it holds. The deduction engine in the compiler is not required to limit itself to the location of the "assert turned into axiom". It can move it upstream and downstream.

It is also not only a problem of mismatch between two axioms, but between any derivable theorems.

> My code explicitly relies on these simplifying assumptions, and will go hideously wrong if those assumptions are false.... So why can't the compiler rely on them too?

Because the compiler can move them around and will assume all improbable configurations and input sequences.

> Of course it can, as every single line I write after the assert is absolutely relying on the assert being true."

Yes, but the axioms can move anywhere. And any contradiction derivable from any set of axioms can lead to boolean expressions turned to random values anywhere in your program. Not only near the flawed assert turned into an axiom.

> My asserts are never "I believe this is true".
>
> They are _always_ "In this design, the following must be true, as I'm about to code absolutely relying on this fact."

Yes, but if you state it differently elsewhere, indirectly (through a series of axioms), you may have a contradiction from which you can deduce "true==false"

Please note that any potentially reachable code will be included in the axiom database. Not only the ones that will execute, also branches that will never execute in a running program.

Those can now propagate upwards since they are true.

Almost no shipped programs are correct. They are all wrong, but we take them as "working" because we don't push them to extremes very often.

Let me quote from the CompCert webpage:

http://compcert.inria.fr/motivations.html

<<More recently, Yang et al generalized their testing of C compilers and, again, found many instances of miscompilation:

We created a tool that generates random C programs, and then spent two and a half years using it to find compiler bugs. So far, we have reported more than 325 previously unknown bugs to compiler developers. Moreover, every compiler that we tested has been found to crash and also to silently generate wrong code when presented with valid inputs. (PLDI 2011)

For non-critical, "everyday" software, miscompilation is an annoyance but not a major issue: bugs introduced by the compiler are negligible compared to those already present in the source program. >>


August 04, 2014
On Sunday, 3 August 2014 at 23:05:23 UTC, Timon Gehr wrote:
> On 08/04/2014 12:51 AM, John Carter wrote:
>>> But go ahead. This will lead to a fork.
>>
>> What should fork is the two opposing intentions for assert.
>>
>> They should have two different names and different consequences.
>
> Yes. :)

If "assert" remains having assume semantics then it basically means that you will have to rewrite all libraries.

Switching the names of "assert" and "assume" is comparable to asking me to drive a car where the accelerator and break pedals have switched positions. Adjusting the compiler is less work…
August 05, 2014
On Monday, 4 August 2014 at 03:22:51 UTC, Andrei Alexandrescu wrote:
> Efficiency is hard to come by and easy to waste.

It's perfectly understandable, why one would want unsafe optimizations, and D already has the way to provide unsafe features: safe is default, unsafe is possible when requested explicitly. I'd say -Ounsafe switch would be good, it would enable various unsafe optimizations; other examples are fast math and no signed overflow assumption. They just shouldn't be forced on everyone, then you will get the upside you want and others won't get the downside they didn't want.

Efficiency is not very hard: currently `assert` is not the most prominent direction of optimizations, I'd recommend to improve inlining and deep inlining of ranges so that they will become as fast as hand-written cycles or even faster. That would be much more beneficial. Or ask bearophile, he can have more ideas for very beneficial optimizations, or ask everyone.