November 08, 2019
On Friday, 8 November 2019 at 20:28:32 UTC, Sebastiaan Koppe wrote:
> I for one have had many occasions where I had a rogue task or a leaking websocket.
>
> With structured concurrency writing that code is easier, but more importantly, reviewing is easier as well.

Right, code with multiple futures can look a bit messy as the management of different futures and the main flow often ends up being interleaved in the source code.

> You still have futures with a nursery. It is just that you can't leave the nursery block without all concurrent tasks completing, timing out or being canceled.

So one can easily build something nursery-like on top of a futures-library that provide a wait_for_all(…) function, or like OS-X dispatch queues with dispatch_group_wait(…).

I can see how that could be useful sometimes, for sure.

November 08, 2019
On 11/8/2019 8:10 AM, Doc Andrew wrote:
> I am very bullish on formal verification in languages going forward. The one big advantage that D has in this arena is DbC. I was disappointed in your NWCPP talk that you listed it as a "miss." :) The potential here is huge! I think it's entirely possible that we can do _compile-time_ proofs that D contracts can be satisfied.

assert()s are the contracts that matter, not the in and out contracts, which can be replaced with assert()s. In no way did I mean to imply that assert()s were a mistake!

I've thought for 35 years that assert()s can be the input to a program prover, i.e. it's nothing new, but I do not have the expertise to figure out such proofs might work, other than simplistic cases.

November 09, 2019
On 09/11/2019 11:41 AM, Walter Bright wrote:
> 
> I've thought for 35 years that assert()s can be the input to a program prover, i.e. it's nothing new, but I do not have the expertise to figure out such proofs might work, other than simplistic cases.

Sounds like some universities need to be contacted, perhaps a future PHD student would be interested in doing the theory.
November 09, 2019
On Friday, 8 November 2019 at 22:41:23 UTC, Walter Bright wrote:
>
> assert()s are the contracts that matter, not the in and out contracts, which can be replaced with assert()s. In no way did I mean to imply that assert()s were a mistake!
>
> I've thought for 35 years that assert()s can be the input to a program prover, i.e. it's nothing new, but I do not have the expertise to figure out such proofs might work, other than simplistic cases.

Now you've done it. I've seen this before, right before templates were in D. This is where you disappear for a couple of weeks and then come back to announce that D can formally verify user code, right? :)

The in/out syntax doesn't add a lot for typical function body code, since everything is in sequential order anyhow, but in interfaces they would be essential. They may be under-utilized, but I think they might just be one of those features that is a late bloomer. I wouldn't consider taking them out of the language just yet, even for regular functions.

Proving properties of D code outright is probably tough, but doing a mechanical conversion (of at least some subset of D, maybe betterC) to an intermediate language like ML and then feeding it into a solver is probably possible. Here's an interesting project I found that can generate Why code from LLVM-IR: https://github.com/AnnotationsForAll/WhyR. It might be interesting to throw some LDC IR at it and just see what happens!

Given that D already has contracts in place, using them to generate Why's "ensures" and "requires" clauses would be much easier than something like C where you'd need specially-formatted comments or a language extension.

Even without specific contracts, a list of things that you _know_ are run-time errors (divide-by-zero, null-pointer dereference, etc.) could be proven absent from code. The nice thing about an approach like this is that you don't have to change the language or mess with the compiler, instead the prover is just another tool, like a linter, that you can use if you need it.

Not everybody is as optimistic as I am, but I do believe that with the emphasis on safety these days, I suspect FV will become a requirement in the future.

-Doc
November 09, 2019
On Saturday, 9 November 2019 at 02:18:41 UTC, rikki cattermole wrote:
>
> Sounds like some universities need to be contacted, perhaps a future PHD student would be interested in doing the theory.

It could be done in baby steps, too - maybe start with a tool that just shows no bad behavior occurs in a simple expression. Then show the same for entire @pure functions. From there someone could work up to proving that pre/post-conditions in @pure functions are satisfied. Then maybe go on to do invariants, then work up to @safe @nogc code, and so on.

-Doc
November 09, 2019
On Saturday, 9 November 2019 at 02:18:41 UTC, rikki cattermole wrote:
> Sounds like some universities need to be contacted, perhaps a future PHD student would be interested in doing the theory.

Not sure what theory that would be. The basic idea is well known and pretty straight forward:

1. Try to prove that the assert holds by feeding the function into an external prover.
2. If it fails, retain the assert in the code or omit it.
3. If it succeeds, replace the assert with an «assume(…)» statement.
4. Compile.

You can do this today if you want to.

The challenge is to make sure that the optimizer will make good use of it.



November 09, 2019
On Saturday, 9 November 2019 at 02:55:37 UTC, Doc Andrew wrote:
> It could be done in baby steps, too - maybe start with a tool that just shows no bad behavior occurs in a simple expression.

Exactly right.

PREREQUISITE:

Add something like «assume» to the language and make the optimizer add those to its fact database. For LLVM this should be trivial.

Then add «assume» clauses to the standard library that are manually proven correct and also add an option to convert assume to assert for additional unit-testing. This allows you to get started working on the optimizer and figure out how to make good use of the extra facts.

Also, optimizing the standard library affects all compiled programs... so even small gains might be worthwhile.


STAGE 2:

Add ensure-statements to the language. Those will be converted to assume-statements if you throw a switch, but display a warning that all ensure-statements have to be proven correct by an external verifier before compilation. Thus the entire burden has been moved to an external tool.

This would encourage third parties to work on tools to translate D code into verifiable code to be consumed by standard verification tooling.

Extension: create an online certification service that allows library code to carry a certificate that the library code has been verified. So that all their ensure statements can be turned into assume without throwing a switch. Certificates can be revoked (in case you find a bug in the verification process).


STAGE 3:

Then break all dynamic checks out as asserts by transforming the code. So that all bounds checks turn up as asserts. Break composite asserts into individual asserts.

Verifying individual functions:
step 1.1: use data-flow to convert as many asserts as possible into assumes
step 1.2: use an external verifier to prove others if possible

If those fail, try to do the verification at the call site:
step 2.1: create a unique copy of the function for the call site
step 2.2: use dataflow to bring additional assumed preconditions into the called function
step 2.3: go to step 1.1

> Then show the same for entire @pure functions. From there someone could work up to proving that pre/post-conditions in @pure functions are satisfied. Then maybe go on to do invariants, then work up to @safe @nogc code, and so on.

Yup, start with the easy ones. Like pure functions provided as parameters to functions that work on arrays which often. Could even limit it to the ones used in the standard library.


November 09, 2019
On 08.11.19 23:41, Walter Bright wrote:
> On 11/8/2019 8:10 AM, Doc Andrew wrote:
>> I am very bullish on formal verification in languages going forward. The one big advantage that D has in this arena is DbC. I was disappointed in your NWCPP talk that you listed it as a "miss." :) The potential here is huge! I think it's entirely possible that we can do _compile-time_ proofs that D contracts can be satisfied.
> 
> assert()s are the contracts that matter, not the in and out contracts, which can be replaced with assert()s. In no way did I mean to imply that assert()s were a mistake!
> 
> I've thought for 35 years that assert()s can be the input to a program prover, i.e. it's nothing new, but I do not have the expertise to figure out such proofs might work, other than simplistic cases.
> 

You need in and out contracts for modular verification.
November 09, 2019
On Saturday, 9 November 2019 at 11:19:08 UTC, Ola Fosheim Grøstad wrote:
>
> PREREQUISITE:
>
> Add something like «assume» to the language and make the optimizer add those to its fact database. For LLVM this should be trivial.

As a stop-gap we might be able to use an
@assume UDA, but I do think "assume" could be added as a reserved keyword in the near future in anticipation...

>
> Then add «assume» clauses to the standard library that are manually proven correct and also add an option to convert assume to assert for additional unit-testing.

I like the idea of having a compiler flag to treat assumes as asserts, maybe even in release mode?

> This allows you to get started working on the optimizer and figure out how to make good use of the extra facts.

That's probably a much longer-term goal, but just being able to move normal error-checking into contracts and depending on them in release code is a big win IMO.

> Also, optimizing the standard library affects all compiled programs... so even small gains might be worthwhile.
>
>
> STAGE 2:
>
> Add ensure-statements to the language. Those will be converted to assume-statements if you throw a switch, but display a warning that all ensure-statements have to be proven correct by an external verifier before compilation. Thus the entire burden has been moved to an external tool.
>

This might be more difficult to enforce without some tight compiler integration. An attribute like @ensured or @certified could be the "gold standard" of safe D code, which would require a proof of no run-time errors and all contracts. Without the proof, the compiler could throw an error for functions with this attribute.

Or, just treat it as a UDA and let the proof be a separate thing and let the user sort it all out.

SPARK does the latter, but I like the idea of better compiler integration and support. We'd probably need a DIP to hash this out.

> This would encourage third parties to work on tools to translate D code into verifiable code to be consumed by standard verification tooling.
>
> Extension: create an online certification service that allows library code to carry a certificate that the library code has been verified. So that all their ensure statements can be turned into assume without throwing a switch. Certificates can be revoked (in case you find a bug in the verification process).
>

This might tie in with some of the work going on with reproducible builds, but I love the idea of certs for known good object files. Could you use a Merkle tree built-up by the compiler? So the source file hashes are baked into the object file (maybe just have a special symbol), then executables would contain a hash of all the object hashes. This is sort of an ancillary project to FV but if we are talking high-integrity software, it might be an easy win and a cool feather in D's cap.

>
> STAGE 3:
>
> Then break all dynamic checks out as asserts by transforming the code. So that all bounds checks turn up as asserts. Break composite asserts into individual asserts.

Using something like the @certified (or @ensured, or @verified, or whatever) attribute, we could just turn off run-time checks in these functions if a proof exists, could be as simple as a source.d.proof file with the function name (and hash?) in it.

>
> Verifying individual functions:
> step 1.1: use data-flow to convert as many asserts as possible into assumes

I'm not sure about this one. Usually assumes are big red flags, they're the FV version of unsafe - "trust me, I know what I'm doing", that sort of thing. I might have misunderstood your intent here though.

> step 1.2: use an external verifier to prove others if possible
>
> If those fail, try to do the verification at the call site:
> step 2.1: create a unique copy of the function for the call site
> step 2.2: use dataflow to bring additional assumed preconditions into the called function
> step 2.3: go to step 1.1

This is how SPARK works, even if you can't prove a (normal Ada) function body, because of inline asm, or pointer arithmetic or something, you can still specify pre and post-contracts on the interface. I think there's a little bit of a @trusted feel to it, since the post-conditions on a non-SPARK function become sort of an assumption that the function does what it claims, given the preconditions hold. I think SPARK may leave these as run-time checks (normal Ada behavior, normal D _Debug_ behavior), which you'd have to do to have any sort of confidence in your proof.

This is where things get a little messy. Verified functions calling non-verified functions with post-condition checks would need these checks enabled even in release mode, I would think.

The cool thing is that D _today_ has all the syntax it needs to start working on tooling for this with no language changes.

Short-term, it may be of some benefit to reserve a couple keywords like assume or ensure, maybe set aside some attributes like @verified or @certified or @ensured.

Medium term, allowing invariant blocks to be associated with specific loops would probably be needed, maybe some extra compiler support for ensuring the presence of proofs.

Longer term, I really like the idea of a "trusted language base" with certificates and object hashes that a maintainer can use to ensure complete accounting of everything that went into a particular build. I'm not as concerned as some are with proven builds and linking, but there may be high-integrity requirements in the future where this sort of thing is required. I'm not sure if it's a thing yet, maybe NASA does it, but I'd be surprised if anybody else is yet.

All it takes is a high-profile case of somebody's cloud CI pipeline being compromised to use a malicious library, and everybody is going to be jumping on this sort of thing.

It may be worth a DIP or a DConf session or something to hash out some of the long-term vision, but I think it's possible to start working on tools now.

I'm not totally familiar with the landscape, but generating Why3ML from D seems like it would be one good way to start. You get access to a lot of solvers for free, it seems like it's well-documented and fairly easy to use.

-Doc
November 09, 2019
On Saturday, 9 November 2019 at 14:49:32 UTC, Doc Andrew wrote:
> As a stop-gap we might be able to use an
> @assume UDA, but I do think "assume" could be added as a reserved keyword in the near future in anticipation...

True, I meant to refer to the concept, not syntax. Maybe the syntax for assume should be ugly enough to really stand out so that it is caught in reviews,  "@ASSUME" or "@__assume"... Dunno. That is a detail, though.


> I like the idea of having a compiler flag to treat assumes as asserts, maybe even in release mode?

It could at least be useful in cases where you do exhaustive unit testing (test all combinations), then assume would be safe.


>> This allows you to get started working on the optimizer and figure out how to make good use of the extra facts.
>
> That's probably a much longer-term goal, but just being able to move normal error-checking into contracts and depending on them in release code is a big win IMO.

Well, not necessarily, some C++ compilers support it.  The problem is that you probably don't want to introduce too many auxiliary facts, as that would just slow down compilation with no gain (and perhaps even confuse a naive optimizer). Anyway, you should be able to add facts to LLVM by compiling a boolean expression and then assuming it to hold.

But maybe the language syntax should allow you to select which propositions you want to propagate to the optimizer. Not sure if LLVM can express quantifiers though (forall/exists).


> This might be more difficult to enforce without some tight compiler integration. An attribute like @ensured or @certified could be the "gold standard" of safe D code, which would require a proof of no run-time errors and all contracts. Without the proof, the compiler could throw an error for functions with this attribute.

Well, that might be an issue, if the ensure-statement is meant to help optimization then it could simply be omitted. If it is meant to help correctness then it could be translated into an assert if it isn't verified.

But I guess one will have to distinguish the two cases somehow. Which is basically why I think trying to improve basic libraries with assume-statements and manual proofs is the best starting point. Then you get an idea for how it can affect optimization.

You probably also want to bake into a certificate trust-building meta information, such as which verification method, and who did it and where, so that developers can specify constraints like "must be verified by service X using verifier Y or Z".  Or open up for hand-verified code by developer Q or a specific organization (could be a URL to a public-key).


> We'd probably need a DIP to hash this out.

I will certainly read and comment on it, if you want to write one. You could start with specc'ing out a list of requirements?


> This might tie in with some of the work going on with reproducible builds, but I love the idea of certs for known good object files.

I actually meant source code only. You distribute the source code, but with certified "requires"/"assume"/"ensures" statements that is bound to the source code using cryptographic hashing and an online index (so that it can be checked or revoked).

I guess one issue is that it has to be a standalone library (e.g. only depend on a specific version of the standard library) for this to work. Otherwise you would have to tie it to a specific certified version of all the libraries you depend on.

But... I don't actually think that is a big deal, you typically would pre-verify basic libraries. You could use a simple concatenation for the hash with sorted filenames and some sentinel character sequence for separating files, e.g.:  filename1, sep, file1, sep, filename2, sep, file2, sep, sep.


> Using something like the @certified (or @ensured, or @verified, or whatever) attribute, we could just turn off run-time checks in these functions if a proof exists, could be as simple as a source.d.proof file with the function name (and hash?) in it.

I am actually not sure how this could be done well. Because you could have array accesses within a conditional expression.

Maybe you'd actually have to annotate the indexing with some new syntax to tell the compiler that the access does not need bounds checks as it has been proven already to hold... So I guess this is an area that needs to be fleshed out with many examples.

Anyway, it seems reasonable that a tool would take source code, verify it and emit an annotated version of the source code. Then less capable compilers could use the verified results.

Most users are not going to write verifiable code themselves.


> I'm not sure about this one. Usually assumes are big red flags, they're the FV version of unsafe - "trust me, I know what I'm doing", that sort of thing. I might have misunderstood your intent here though.

I think so, if they are proven to hold by a tool/compiler then they hold. Dataflow analysis is sufficient, when it succeeds, but will fail in many cases at it is a weak approach. However, it is also much faster than a proper solver. Basically just an optimization for performance reasons.

But you also want to use assume manually sometimes when describing hardware registers (e.g. describing that bit 8-32 always are zero), or when something has been proven by hand and has a significant performance impact.


> This is how SPARK works, even if you can't prove a (normal Ada) function body, because of inline asm, or pointer arithmetic or something, you can still specify pre and post-contracts on the interface. I think there's a little bit of a @trusted feel to it, since the post-conditions on a non-SPARK function become sort of an assumption that the function does what it claims, given the preconditions hold. I think SPARK may leave these as run-time checks (normal Ada behavior, normal D _Debug_ behavior), which you'd have to do to have any sort of confidence in your proof.

Yes, for typical SPARK use cases that seems reasonable.


> This is where things get a little messy. Verified functions calling non-verified functions with post-condition checks would need these checks enabled even in release mode, I would think.

Yes, this is where different goals may lead to different designs... so the challenge is to come up with something that can both support correctness and optimization without being extremely complicated in terms of syntax.


> The cool thing is that D _today_ has all the syntax it needs to start working on tooling for this with no language changes.

Sure, you could also just define a dummy "ensures", "requires", whatever… function that takes a boolean as a parameter. With an empty body, or an assert.


> a particular build. I'm not as concerned as some are with proven builds and linking, but there may be high-integrity requirements in the future where this sort of thing is required. I'm not sure if it's a thing yet, maybe NASA does it, but I'd be surprised if anybody else is yet.

There are papers on proof-carrying-assembly code...

Although the basic idea behind certification would be that you get to use the postconditions and assumes without having to run costly verification on your own machine.

You might even see a commercial service with very strong provers and effective hardware.


> I'm not totally familiar with the landscape, but generating Why3ML from D seems like it would be one good way to start. You get access to a lot of solvers for free, it seems like it's well-documented and fairly easy to use.

Perhaps Why3ML is a good choice, but I would've started with Boogie.