May 05, 2013
On 2013-05-04 18:33:10 +0000, Walter Bright <newshound2@digitalmars.com> said:

> Runtime Detection
> 
> There are still a few cases that the compiler cannot statically detect. For these a runtime check is inserted, which compares the returned ref pointer to see if it lies within the stack frame of the exiting function, and if it does, halts the program. The cost will be a couple of CMP instructions and an LEA. These checks would be omitted if the -noboundscheck compiler switch was provided.

I just want to note that this has the effect of making any kind of heap allocation not done by the GC unsafe. For instance, if you have a container struct that allocates using malloc/realloc and that container gives access to its elements by reference then you're screwed (it can't be detected).

The obvious answer is to not make @trusted the function returning a reference or a slice to malloced memory. But I remember Andrei wanting to make standard containers of this sort at one point, so I think it's important to note this limitation.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

May 05, 2013
On Sunday, 5 May 2013 at 00:47:00 UTC, Walter Bright wrote:
> If the compiler accepts that code, it will crash at runtime. If it doesn't accept that code, then it will also disallow legitimate code like:
>
>      ref T foob(ref U u) { static T t; return t; }
>
>      ref T bar() { U u; return foob(u); }
>

It doesn't accept it, with or without any combination of annotation. Now, the example with a static effectively require an annotation.

>> And it illustrate
>> wonderfully what I'm saying : most people in the discussion (and it has been
>> shown now that this includes you) were unaware of how does Rust solve the problem.
>>
>> I don't think excluding a solution that isn't understood is the smartest thing
>> to do.
>
> I suggest you enumerate the cases with a Rust-like system and show us how it solves the problem without annotations. Note that Rust has pretty much zero real world usage - it's one thing to say needing to use annotations is 'rare' and another to know it based on typical usage patterns of the language.
>

Rust assume, when no annotation is present, that the return ref's lifetime is the union of ref parameters lifetime. I'm sure we can find an example of D code somewhere that don't fit into this, but real world usage in D would almost never require any annotation (this is the case of all D codebase I've played with as of now, and I don't actually see any use case for example like the static one mentioned above).

> For example, if the default is "assume the ref return refers to the ref parameter", then some containers would require the annotation and some would not. This is not very viable when doing generic coding, unless you are willing to provide two copies of each such function - one with the annotations and the other without.
>

The default can't be that as several parameters can be passed by ref. The default is return ref lifetime is the union of ref parameters lifetime. I don't see any container that require the annotation.

> Note also that if you have A calls B calls C, the annotation on C doesn't propagate up to B, again leading to a situation where you're forced to make two versions of the functions.
>
> (I say doesn't propagate because in a language that supports separate compilation, all the compiler knows about a function is its signature.)

It doesn't require code duplication. Named lifetime make sense for the caller, not the callee (in which they only are identifier that can be used to describe lifetime's relation explicitly for the caller).
May 05, 2013
On 5/5/2013 4:43 AM, Michel Fortin wrote:
> On 2013-05-04 18:33:10 +0000, Walter Bright <newshound2@digitalmars.com> said:
>
>> Runtime Detection
>>
>> There are still a few cases that the compiler cannot statically detect. For
>> these a runtime check is inserted, which compares the returned ref pointer to
>> see if it lies within the stack frame of the exiting function, and if it does,
>> halts the program. The cost will be a couple of CMP instructions and an LEA.
>> These checks would be omitted if the -noboundscheck compiler switch was provided.
>
> I just want to note that this has the effect of making any kind of heap
> allocation not done by the GC unsafe. For instance, if you have a container
> struct that allocates using malloc/realloc and that container gives access to
> its elements by reference then you're screwed (it can't be detected).
>
> The obvious answer is to not make @trusted the function returning a reference or
> a slice to malloced memory. But I remember Andrei wanting to make standard
> containers of this sort at one point, so I think it's important to note this
> limitation.

I know Andrei has thought about this, but I don't know what the solution is.


May 05, 2013
On 2013-05-05 18:19:26 +0000, Walter Bright <newshound2@digitalmars.com> said:

> On 5/5/2013 4:43 AM, Michel Fortin wrote:
>> On 2013-05-04 18:33:10 +0000, Walter Bright <newshound2@digitalmars.com> said:
>> 
>>> Runtime Detection
>>> 
>>> There are still a few cases that the compiler cannot statically detect. For
>>> these a runtime check is inserted, which compares the returned ref pointer to
>>> see if it lies within the stack frame of the exiting function, and if it does,
>>> halts the program. The cost will be a couple of CMP instructions and an LEA.
>>> These checks would be omitted if the -noboundscheck compiler switch was provided.
>> 
>> I just want to note that this has the effect of making any kind of heap
>> allocation not done by the GC unsafe. For instance, if you have a container
>> struct that allocates using malloc/realloc and that container gives access to
>> its elements by reference then you're screwed (it can't be detected).
>> 
>> The obvious answer is to not make @trusted the function returning a reference or
>> a slice to malloced memory. But I remember Andrei wanting to make standard
>> containers of this sort at one point, so I think it's important to note this
>> limitation.
> 
> I know Andrei has thought about this, but I don't know what the solution is.

Just rethrowing an idea that was already thrown here: support annotated lifetimes *in addition* to this runtime detection system. Those who use manual memory management will need it to make their code @safe. Those who stick to the GC won't have to. Anyway, you don't have to implement both right away, it can always be decided later.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

May 06, 2013
On Sunday, 5 May 2013 at 23:45:21 UTC, Michel Fortin wrote:
> Just rethrowing an idea that was already thrown here: support annotated lifetimes *in addition* to this runtime detection system. Those who use manual memory management will need it to make their code @safe. Those who stick to the GC won't have to. Anyway, you don't have to implement both right away, it can always be decided later.

Yes, that is also my point of view. We don't even need to support annotation now, simply ensure that we don't close the door to annotation.
May 06, 2013
On Saturday, 4 May 2013 at 18:33:04 UTC, Walter Bright wrote:
> Static Compiler Detection (in @safe mode):
>
> 1. Do not allow taking the address of a local variable, unless doing a safe type 'paint' operation.
>
> 2. In some cases, such as nested, private, and template functions, the source is always available so the compiler can error on those. Because of the .di file problem, doing this with auto return functions is problematic.
>
> 3. Issue error on return statements where the expression may contain a ref to a local that is going out of scope, taking into account the observations.
>
> Runtime Detection
>
> There are still a few cases that the compiler cannot statically detect. For these a runtime check is inserted, which compares the returned ref pointer to see if it lies within the stack frame of the exiting function, and if it does, halts the program. The cost will be a couple of CMP instructions and an LEA. These checks would be omitted if the -noboundscheck compiler switch was provided.

This is a brilliant solution. I'm glad my DIP seems to have helped pivot the design process into this superior conclusion, which uses something, i.e. runtime checking, I simply didn't think of. I guess I didn't realize that the stack has "bounds", so to say.

I suppose that underneath the hood the compiler will still track the state of the return value using something like a 'scope' bit. It's just that the user code doesn't need to see this bit, which is probably how it should be. And it's great to realize that a suitable safety framework - -noboundscheck - has been found which already exists to encompass the checking.

I think the main data still to be researched is the slowdown with both compile and run times with this checking implemented - not that I see how to avoid it, but it's better to know than not to know, right?
May 06, 2013
On Sunday, 5 May 2013 at 02:36:45 UTC, Jonathan M Davis wrote:
> As it is, we arguably didn't choose the best defaults with the attributes that
> we have (e.g. @system is the default instead of @safe, and impure is the
> default instead of pure). The result is that we have to use a lot of
> annotations if we want to properly take advantage of the various language
> features, whereas ideally, having to use annotations for stuff like @safety or
> purity would be the exception. Don was complaining that one reason that moving
> to D2 at Sociomantic looks unappealing in spite of the benefits is the fact
> that they're going to have to add so many extra annotations to their code.

In the thread which appeared on github someone suggested '@infer', which I altered to '@auto', which gets all the attributes automatically, and creates the '.di' with the full attributes (which might actually be problematic if they change too often and force compilation too many times). I'm starting to think it might actually be quite valuable to have this annotation available to the programmer. What do you think?
May 06, 2013
On Sat, 04 May 2013 19:30:21 -0700, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> However, if we had an attribute which explicitly designated that a function
> accepted both rvalues and lvalues (which is what auto ref was originally
> supposed to do as Andrei proposed it), then if you saw
>
> auto foo(ref int i);
> auto bar(auto ref int i);
>
> then you could be reasonably certain that foo was intending to alter its
> arguments and bar was not.

The counter argument:

foo(makeRvalue()); // error:  cannot pass rvalues to ref

// programmer: WTF?  This is stupid, but ok:

auto x = makeRvalue();
foo(x);

In other words, explicit nops aren't any better than implicit nops.  Even if we *require* the user to be explicit (and it's not at all clear from a code-review perspective that the auto x line is to circumvent the requirements), the fact that this is trivially circumvented makes it a useless feature.  It's like having const you can cast away.

I think the larger issue with binding rvalues to refs is this:

int foo(int i);
int foo(ref int i);

what does foo(1) bind to?  It MUST bind to the non-ref, or there is no point for it.

If this can be solved, binding rvalues to refs is fine.

-Steve
May 06, 2013
On 5/6/13 12:10 PM, Steven Schveighoffer wrote:
> The counter argument:
>
> foo(makeRvalue()); // error: cannot pass rvalues to ref
>
> // programmer: WTF? This is stupid, but ok:
>
> auto x = makeRvalue();
> foo(x);
>
> In other words, explicit nops aren't any better than implicit nops. Even
> if we *require* the user to be explicit (and it's not at all clear from
> a code-review perspective that the auto x line is to circumvent the
> requirements), the fact that this is trivially circumvented makes it a
> useless feature. It's like having const you can cast away.
>
> I think the larger issue with binding rvalues to refs is this:
>
> int foo(int i);
> int foo(ref int i);
>
> what does foo(1) bind to? It MUST bind to the non-ref, or there is no
> point for it.
>
> If this can be solved, binding rvalues to refs is fine.

I think we can technically make the overloading work while also allowing binding rvalues to ref. But that wouldn't help any. Consider:

ref int min(ref int a, ref int b) { return b < a ? b : a; }
...
int x;
fun(min(x, 100));

Here the result of min may be bound to an lvalue or an rvalue depending on a condition. In the latter case, combined with D's propensity to destroy temporaries too early (immediately after function calls), the behavior is silently undefined; the code may pass unittests.

This is a known issue in C++. Allowing loose binding of rvalues to ref not only inherits C++'s mistake, but also adds a fresh one.


Andrei
May 06, 2013
On Mon, 06 May 2013 06:43:38 -0700, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> I think we can technically make the overloading work while also allowing binding rvalues to ref. But that wouldn't help any. Consider:
>
> ref int min(ref int a, ref int b) { return b < a ? b : a; }
> ...
> int x;
> fun(min(x, 100));
>
> Here the result of min may be bound to an lvalue or an rvalue depending on a condition. In the latter case, combined with D's propensity to destroy temporaries too early (immediately after function calls), the behavior is silently undefined; the code may pass unittests.

Wouldn't the new runtime check fix this?

> This is a known issue in C++. Allowing loose binding of rvalues to ref not only inherits C++'s mistake, but also adds a fresh one.

I thought C++ would handle this kind of code.  I remember being able to use references to rvalues in ways that were unintuitive, but not undefined.

-Steve