January 16, 2020
On Thu, Jan 16, 2020 at 10:30:45AM -0500, Steven Schveighoffer via Digitalmars-d wrote: [...]
> There are two things to look at for safety. One is that a function is safe or not safe (that is, it has a safe implementation, even if there are calls to system functions, so therefore is callable from mechanically checked safe code). This is the part where the compiler uses function attributes to determine what is callable and what is not.
> 
> The second is how much manual review is needed for the code. This is a signal to the reviewer/reader. In the current regime, the two reasons for marking are muddled -- we don't have a good way to say "this needs manual checking, but I also want the benefits of mechanical checking". This is why I proposed a change to trusted code STILL being mechanically checked, unless you want an escape. This would allow you to mark all code that needs manual review trusted, even if it's mechanically checked (it still needs review if the system-calling parts can muck with the data).
[...]

This is why I proposed that @trusted functions should *still* be subject to @safe checks, only with the exception that they're allowed to have embedded @system blocks where such checks are relaxed (and these @system blocks are only allowed inside @trusted functions).  So the @trusted is a visual marker that it needs to be manually verified, but you still have the benefit of the compiler automatically verifying most of its body except for the (hopefully small) @system block where such checks are temporarily suspended.


T

-- 
It is impossible to make anything foolproof because fools are so ingenious. -- Sammy
January 16, 2020
On Thursday, 16 January 2020 at 15:45:46 UTC, Steven Schveighoffer wrote:
> In fact, because of how the system works, @safe code is LESS likely to mean what you think.

I'm not quite sure I follow what you mean here, can you clarify/explain?

> If you see a @safe function, it just means "some of this is mechanically checked". It doesn't mean that the function is so much more solid than a @trusted function that you can skip the review. It can have trusted escapes that force the whole function into the realm of needing review.

Yes, agreed.  And hence why your proposal for how to improve @trusted really makes sense.

> If we moved to a scheme more like I was writing about in the post you quoted, then they actually do start to take on a more solid meaning. It's still not fool-proof -- @safe functions can call @trusted functions, which can call @system functions. BUT if everyone does the job they should be doing, then you shouldn't be able to call @trusted functions and corrupt memory, and you should not have to necessarily review @safe functions.

Yes, this was exactly how I interpreted your proposal.

> There are still cases where you still have to review functions that are @safe which do not have inner functions that are trusted. These are cases where data that is usually accessible to safe functions can cause memory problems in conjunction with trusted functions. When you need to break the rules, it's very hard to contain where the rule breaking stops.

For example, in a class or struct implementation where a private variable can be accessed by both @safe and @trusted methods ... ?
January 16, 2020
On Thursday, 16 January 2020 at 18:05:25 UTC, H. S. Teoh wrote:
> This is why I proposed that @trusted functions should *still* be subject to @safe checks, only with the exception that they're allowed to have embedded @system blocks where such checks are relaxed (and these @system blocks are only allowed inside @trusted functions).  So the @trusted is a visual marker that it needs to be manually verified, but you still have the benefit of the compiler automatically verifying most of its body except for the (hopefully small) @system block where such checks are temporarily suspended.

Hang on, have 3 of us all made the same proposal?  (OK, I just reiterated what I understood to be Steven's proposal, but ...:-)

I'll leave it to others to decide if we're great minds or fools or anything in between ;-)
January 16, 2020
On Thu, Jan 16, 2020 at 10:45:46AM -0500, Steven Schveighoffer via Digitalmars-d wrote: [...]
> In fact, because of how the system works, @safe code is LESS likely to mean what you think.
> 
> If you see a @safe function, it just means "some of this is mechanically checked". It doesn't mean that the function is so much more solid than a @trusted function that you can skip the review. It can have trusted escapes that force the whole function into the realm of needing review.
[...]

Yeah, that's the part that makes me uncomfortable every time I see a @trusted lambda inside a function that *clearly* does not sport a @safe interface, as in, its safety is dependent on the surrounding code.

I think it would be better to completely outlaw @trusted blocks inside a @safe function, and to require calling an external @trusted function. And inside a @trusted function, most of the body will still be subject to @safe checks, except for explicitly-marked out @system scopes.

This way, the meaning of @safe becomes "this function has been thoroughly mechanically checked, and it will not corrupt memory provided all @trusted functions that it calls operate correctly". And @trusted would mean "this function has been mechanically checked except for those blocks explicitly marked @safe, which must be reviewed manually together with the rest of the function body".

The latter is useful as a preventative measure: if you allow unrestricted use of @system code inside a @trusted function, then every single code change made to that function requires the manual re-evaluation of the entire function, because you don't know if you've inadvertently introduced a safety hole.  Not allowing @system code by default means if you accidentally slip up outside of the isolated @system blocks, the compiler will complain and you will fix it. This way, you minimize the surface area of potential problems to a smaller scope inside the @trusted function, and leverage the compiler's automatic checks to catch your mistakes, as opposed to having zero safeguards as soon as you slap @trusted on your function.


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz
January 16, 2020
On Thu, Jan 16, 2020 at 06:10:16PM +0000, Joseph Rushton Wakeling via Digitalmars-d wrote:
> On Thursday, 16 January 2020 at 18:05:25 UTC, H. S. Teoh wrote:
> > This is why I proposed that @trusted functions should *still* be subject to @safe checks, only with the exception that they're allowed to have embedded @system blocks where such checks are relaxed (and these @system blocks are only allowed inside @trusted functions).  So the @trusted is a visual marker that it needs to be manually verified, but you still have the benefit of the compiler automatically verifying most of its body except for the (hopefully small) @system block where such checks are temporarily suspended.
> 
> Hang on, have 3 of us all made the same proposal?  (OK, I just reiterated what I understood to be Steven's proposal, but ...:-)
> 
> I'll leave it to others to decide if we're great minds or fools or anything in between ;-)

Fools or not, the important thing is whether we can convince Walter to agree with this...

This is far from the first time such an idea came up. I remember back when Mihail Strashuns was actively contributing to Phobos, we had this discussion on Github where we agreed that we'd like to reduce the scope of @trusted as much as possible, meaning, the unsafe parts of @trusted should be as small as possible in order to minimize the surface area of potential problems.  This was before people came up with the idea of a nested @trusted lambda.  We both felt very uncomfortable that there were some functions Phobos that were marked @trusted, but were so large that it was impractical to review the entire function body for correctness. Furthermore, since Phobos at the time was undergoing a rapid rate of change, we were uncomfortable with the idea that any random PR might touch some seemingly-innocuous part of a @trusted function and break its safety, yet there would be no warning whatsoever from the autotester because the compiler simply turned off all checks inside a @trusted function.

IIRC it was that discussion, which later led to other, further discussions, that eventually resulted in the idea of using nested @trusted lambdas inside functions. Of course, in the interim, we also learned from Walter what his stance was: @trusted functions should sport a safe API, i.e., even though by necessity it has to do uncheckable things inside, its outward-facing API should be such that it's impossible to break its safety without also breaking your own @safety. I.e., taking `int[]` is OK because, presumably, @safe code will not allow you to construct an `int[]` that has an illegal pointer or wrong length; but taking `int*, size_t` is not OK, because the caller can just pass the wrong length and you're screwed.  Eventually, this restriction was relaxed for nested @trusted lambdas, due to the API restriction being too onerous and impractical in some cases.

Regardless of what the story was, the basic idea is the same: to shrink the scope of unchecked code as much as possible, and to leverage the compiler's @safe-ty checks as much as possible.  Ideally, most of a @trusted function's body should actually be @safe, and only a small part @system -- where the compiler is unable to mechanically verify its correctness.  That way, if you make a mistake while editing a @trusted function, most of the time the compiler will catch it. Only inside the @system block (or whatever we decide to call the unchecked block), the checks are suspended and you have to be extra careful when changing it.

Basically, we want all the help we can get from the compiler to minimize human error, and we want to reduce the scope of human error to as narrow a scope as possible (while acknowledging that we can never fully eliminate it -- which is why we need @trusted in the first place).


T

-- 
Once bitten, twice cry...
January 16, 2020
On 1/16/20 1:08 PM, Joseph Rushton Wakeling wrote:
> On Thursday, 16 January 2020 at 15:45:46 UTC, Steven Schveighoffer wrote:
>> In fact, because of how the system works, @safe code is LESS likely to mean what you think.
> 
> I'm not quite sure I follow what you mean here, can you clarify/explain?

For example, I want a safe function that uses malloc to allocate, and free to deallocate. Perhaps that is just scratch space and it's just an implementation detail.

I want everything *else* in the function to be safe. So I have to mark the function @safe, not @trusted. Otherwise I don't get the compiler checks.

In this way, @safe cannot really be used as a marker of "don't need to manually verify" because it's the only way to turn on the mechanical checking.

So there is more incentive to mark code @safe than @trusted at the function level. I guess I should have worded it as, you are probably going to see safe prototypes that more often then not still need checking.

Same goes for template functions. How do you even know whether it can be safe or not? You can try it, but that doesn't mean there's no trusted blocks inside.

I just don't see the practicality or validity of worrying about @trusted functions more than @safe ones from a user perspective.

That being said, code out there is almost always too trusting when marking functions @trusted. They should be small and easily reviewable. The longer the function, the more chances for assumptions to sneak in.

>> There are still cases where you still have to review functions that are @safe which do not have inner functions that are trusted. These are cases where data that is usually accessible to safe functions can cause memory problems in conjunction with trusted functions. When you need to break the rules, it's very hard to contain where the rule breaking stops.
> 
> For example, in a class or struct implementation where a private variable can be accessed by both @safe and @trusted methods ... ?

A recent example was a tagged union [1]. The tag is just an integer or boolean indicating which member of the union is valid. As long as the tag matches which actual element of the union is valid, you can use trusted functions to access the union member.

However, safe code is able to twiddle the tag without the compiler complaining. The trusted code is expecting the link between the union member that is valid and the tag. In other words, you can muck with the tag all day long in @safe land, even in a completely @safe function. But it may violates the assumptions that the @trusted functions make, making the other parts unsafe.

Therefore, you have to review the whole type, even the safe calls, to make sure none of them violates the invariant.

And this is why some folks (ag0aep6g) disagree that trusted functions can be valid in this situation -- they have to be valid for ALL inputs in ALL contexts, because the alternative is that you have to manually check @safe code. I can live with the idea that @safe code needs checking within context, as long as it helps me ensure that *most* of the stuff is right.

The other option is to somehow use the compiler to enforce the semantic, like marking the *data* @system. In other words you are telling the compiler "I know that it's normally safe to change this tag, but in this case, you can't, because it will mess things up elsewhere".

-Steve

[1] https://github.com/dlang/phobos/pull/7347
January 16, 2020
On Thu, Jan 16, 2020 at 02:18:09PM -0500, Steven Schveighoffer via Digitalmars-d wrote:
> On 1/16/20 1:08 PM, Joseph Rushton Wakeling wrote:
[...]
> > For example, in a class or struct implementation where a private variable can be accessed by both @safe and @trusted methods ... ?
> 
> A recent example was a tagged union [1]. The tag is just an integer or boolean indicating which member of the union is valid. As long as the tag matches which actual element of the union is valid, you can use trusted functions to access the union member.
> 
> However, safe code is able to twiddle the tag without the compiler complaining. The trusted code is expecting the link between the union member that is valid and the tag. In other words, you can muck with the tag all day long in @safe land, even in a completely @safe function. But it may violates the assumptions that the @trusted functions make, making the other parts unsafe.

Good example!  So in this case, the trust really is between the tag and the union, not so much in the @trusted function itself. The @trusted function is really just *assuming* the validity of the correspondence between the tag and the union.  Without encoding this context somehow, the compiler cannot guarantee that some outside code (@safe code, no less!) won't break the invariant and thereby invalidate the @trusted function.


> Therefore, you have to review the whole type, even the safe calls, to make sure none of them violates the invariant.

:-(  And I guess this extends to any type that has @trusted methods that make assumptions about the data stored in the type.  Which logically leads to the idea that the data itself should be tagged somehow, and therefore your idea of tagging the *data*.


[...]
> The other option is to somehow use the compiler to enforce the semantic, like marking the *data* @system. In other words you are telling the compiler "I know that it's normally safe to change this tag, but in this case, you can't, because it will mess things up elsewhere".
[...]

So it's basically a way of tainting any code that touches the data, such that you're not allowed to touch the data unless you are @system or @trusted.

This actually makes a lot of sense, the more I think about it. Take a pointer T*, for example. Why is it illegal to modify the pointer (i.e. do pointer arithmetic with T*) in @safe code? The act of changing the pointer doesn't in itself corrupt memory.  What corrupts memory is when the pointer is changed in a way that *breaks assumptions* laid upon it by @safe code, such that when we subsequently dereference it, we may end up in UB land.  We may say that pointer dereference is @trusted, in the same sense as the tagged union access you described -- it's assuming that the pointer points to something valid -- and our pointer arithmetic has just broken that assumption.

Similarly, it's illegal to manipulate the .ptr field of an int[] in @safe code: not because that in itself corrupts memory, but because that breaks the assumption that an expression like int[i] will access valid data (provided it's within the bounds of .length).  Again, the manipulation of .ptr is @system, and array dereference with [i] is @trusted in the same sense as tagged union access: arr[i] *assumes* that there's a valid correspondence between .ptr, .length, and whatever .ptr points to.

If therefore we prohibit manipulating .ptr in @safe code but allow arr[i] (which makes assumptions about .ptr), then it makes sense to prohibit manipulation of the tagged union's tag field and allow the @trusted member to look up union fields.  It could even be argued that union field lookup ought to be @safe in the same way arr[i] is @safe: it won't corrupt memory or read out-of-bounds, contingent upon the assumptions laid on an int[]'s .ptr and .length fields not to have been broken.

IOW, we're talking about "sensitive data" here, i.e., data that must not be modified in the wrong ways because it will break assumptions that other code have laid upon it. Manipulating pointers is @system because pointers are sensitive data. Manipulating ints is @safe because ints are not sensitive data. In the same vein, the tag field of a tagged union is sensitive data, and therefore manipulating it must be @system, i.e., only a @trusted function ought to be allowed to do that.

By default, @safe comes with its own set of what constitutes sensitive data, and operations on such data are rightfully restricted.  Allowing the user to tag data as sensitive seems to be a logical extension of @safe.


T

-- 
It said to install Windows 2000 or better, so I installed Linux instead.
January 16, 2020
On Thursday, 16 January 2020 at 20:47:54 UTC, H. S. Teoh wrote:
> On Thu, Jan 16, 2020 at 02:18:09PM -0500, Steven Schveighoffer
[...]
>> The other option is to somehow use the compiler to enforce the semantic, like marking the *data* @system. In other words you are telling the compiler "I know that it's normally safe to change this tag, but in this case, you can't, because it will mess things up elsewhere".
> [...]
>
> So it's basically a way of tainting any code that touches the data, such that you're not allowed to touch the data unless you are @system or @trusted.
[...]
> By default, @safe comes with its own set of what constitutes sensitive data, and operations on such data are rightfully restricted.  Allowing the user to tag data as sensitive seems to be a logical extension of @safe.

For reference, here's the upcoming DIP:

https://github.com/dlang/DIPs/pull/179
January 17, 2020
On 16.01.20 12:50, Joseph Rushton Wakeling wrote:
> On Thursday, 16 January 2020 at 03:34:26 UTC, Timon Gehr wrote:
>> ...
> 
>> @safe does not fully eliminate risk of memory corruption in practice, but that does not mean there is anything non-absolute about the specifications of the attributes.
> 
> Would we be able to agree that the absolute part of the spec of both amounts to, "The emergence of a memory safety problem inside this function points to a bug either in the function itself or in the initialization of the data that is passed to it" ... ?
> ...

More or less. Two points:

- The _only_ precondition a @trusted/@safe function can assume for guaranteeing no memory corruption is that there is no preexisting memory corruption.

- For callers that threat the library as a black-box, this definition is essentially sufficient. (This is why there is not really a reason to treat the signatures differently to the point where changing from one to the other is a breaking API change.) White-box callers get the additional language guarantee that if the function corrupts memory, that happens when it executed some bad @trusted code and this is the motivation behind having both @safe and @trusted. @system exists because in low-level code, sometimes you want to write or use functions that have highly non-trivial preconditions for ensuring no memory corruption happens.

> (In the latter case I'm thinking that e.g. one can have a perfectly, provably correct @safe function taking a slice as input, and its behaviour can still get messed up because the user initializes a slice in some crazy unsafe way and passes that in.)
> ...

That is preexisting memory corruption. If you use @trusted/@system code to destroy an invariant that the @safe part of the language assumes to hold for a given type, you have corrupted memory.

>> As I am sure you understand, if you see a @safe function signature, you don't know that its implementation is not a single @trusted function call
> 
> Yes, on this we agree.  (I even mentioned this case in one of my posts above.)
> 
>> so the difference in signature is meaningless unless you adhere to specific conventions
> 
> Here's where I think we start having a disagreement.  I think it is meaningful to be able to distinguish between "The compiler will attempt to validate the memory safety of this function to the extent possible given the @trusted assumptions injected by the developer" (which _might_ be the entirety of the function), versus "The safety of this function will definitely not be validated in any way by the compiler".
> 
> Obviously that's _more_ helpful to the library authors than users, but it's still informative to the user: it's saying that while the _worst case_ assumptions are the same (100% unvalidated), the best case are not.
> ...

It is possible to write a @trusted function that consists of a single call to a @safe function, so you are assuming a convention where people do not call @safe code from @trusted code in certain ways. Anyway, my central point was that it is an implementation detail. That does not mean it is necessarily useless to a user in all circumstances, but that someone who writes a library will likely choose to hide it.

>> (which the library you will be considering to use as a dependency most likely will not do).
> 
> Obviously in general one should not assume virtue on the part of library developers.  But OTOH in a day-to-day practical working scenario, where one has to prioritize how often one wants to deep-dive into implementation details -- versus just taking a function's signature and docs at face value and only enquiring more deeply if something breaks -- it's always useful to have a small hint about the best vs. worst case scenarios.
> ...

Right now, the library developer has a valid incentive to actively avoid @trusted functions in their API. This is because it is always possible, @trusted is an implementation detail and changing this detail can in principle break dependent code. (E.g., a template instantiated with a @safe delegate will give you a different instantiation from the same template instantiated with a @trusted delegate, and if e.g., you have some static cache in your template function, a change from @safe to @trusted in some API can silently slow down the downstream application by a factor of two, change iteration orders through hash tables, etc.)

> It's not that @safe provides a stronger guarantee than @trusted, it's that @trusted makes clear that you are definitely in worst-case territory.  It's not a magic bullet, it's just another data point that helps inform the question of whether one might want to deep-dive up front or not (a decision which might be influenced by plenty of other factors besides memory safety concerns).
> 
> The distinction only becomes meaningless if one is unable to deep-dive and explore the library code.
> ...

I just think that if you are willing to do that, you should use e.g. grep, not the function signature where a competent library author will likely choose to hide @trusted as an implementation detail.

January 17, 2020
Am Thu, 16 Jan 2020 04:34:26 +0100 schrieb Timon Gehr:

> On 16.01.20 03:06, Joseph Rushton Wakeling wrote:
>> On Thursday, 16 January 2020 at 01:53:18 UTC, Timon Gehr wrote:
>>> It's an implementation detail. If you care about the distinction, you should check out the function's implementation, not its signature.
>> 
>> Sure. But on a practical day-to-day basis, @safe vs @trusted signatures
>> help to prioritize one's allocation of care somewhat.
>> ...
> 
> You have to be careful when writing a @trusted function, not when calling it. If you do not trust a given library, there is no reason to be more careful around a @trusted API than around a @safe API, as they do not mean different things.
> 
> @safe does not fully eliminate risk of memory corruption in practice, but that does not mean there is anything non-absolute about the specifications of the attributes. As I am sure you understand, if you see a @safe function signature, you don't know that its implementation is not a single @trusted function call, so the difference in signature is meaningless unless you adhere to specific conventions (which the library you will be considering to use as a dependency most likely will not do).
> 
> 
>> I'm coming to the conclusion that much of the differences of opinion in this thread are between folks who want to see things as absolutes, and folks who recognize that these features are tools for mitigating risk, not eliminating it.
>> 
>> 
> I was not able to figure out a response to this sentence that is both polite and honest.


I'm curious, what do you think would be the ideal scheme if we could redesign it from scratch? Only @safe/@system as function attributes and @trusted (or @system) blocks which can be used in @safe functions?



-- 
Johannes