January 09, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9238


Andrei Alexandrescu <andrei@erdani.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei@erdani.com


--- Comment #10 from Andrei Alexandrescu <andrei@erdani.com> 2013-01-09 15:03:08 PST ---
Desiderata
==========

Design choices may sometimes invalidate important use cases, so let's start with what we'd like to have:

1. Safety

We'd like most or all uses of ref to be safe. If not all are safe, we should have easy means to distinguish safe from unsafe cases statically. If that's not possible, we should be able to enforce safety with simple runtime checks in @safe code.

2. Efficient passing of values

The canonical use case of ref parameters is to allow the callee to modify a value in the caller. However, a significant secondary use case is as an optimization for passing arguments into a function. In such cases, the caller is not concerned with mutation and may actually want to prevent it. The remaining problem is that ref traditionally assumes the caller holds an actual lvalue, whereas in such cases the caller may want to pass an rvalue.

3. Transparently returning references to ref parameters

One important use case is functions that return one of their reference parameters, the simplest being:

ref T identity(T)(ref T obj) { return obj; }

We'd like to allow identity and to make it safe by design. If we don't, we
disallow a family of use cases such as min() and max() that return by
reference, call chaining idioms etc.

4. Sealed containers

This important use case is motivated by efficient and safe allocators. We want to support scoped and region-based allocation, and at the same time we want to combine such allocators with containers that return references to their data.

Consider as a simple example a scoped container:

struct ScopedContainer(T)
{
    private T[] payload;
    this(size_t n) { payload = new T[n]; }
    this(this) { payload = payload.dup; }
    ~this() { delete payload; }
    void opAssign(ref ScopedContainer rhs) {
      payload = rhs.payload.dup;
    }
    ref T opIndex(size_t n) { return payload[n]; }
}

The container eagerly allocates its state and deallocates it when it leaves scope. We'd like to allow opIndex to typecheck and guarantee safety.

5. Simplicity

We wish to get the design right with maximum economy in language design. One thing easily forgotten when focusing minutia while carrying significant context in mind is that whatever language additions we make come on top of an already large machinery.

There have been ideas based on defining "scope ref", "in ref", or "@attribute ref". We'd like to avoid such and instead make sure plain "ref" is useful, safe, and easy to understand.

------------

These desiderata and the interactions among them impose constraints on the design space. In the following post I'll sketch some possible designs dictated by prioritizing desiderata, and analyze the emerging tradeoffs.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 09, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9238



--- Comment #11 from Jonathan M Davis <jmdavisProg@gmx.com> 2013-01-09 15:11:59 PST ---
> There have been ideas based on defining "scope ref", "in ref", or "@attribute ref". We'd like to avoid such and instead make sure plain "ref" is useful, safe, and easy to understand.

I would argue that it's vital that ref which requires an lvalue and ref which doesn't care whether it's given an lvalue or rvalue be distinguished. You're just begging for bugs otherwise. It should be clear in a function's signature whether it's intending to take an argument by ref and mutate it or whether it's simply trying to avoid unnecessary copying.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 10, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9238



--- Comment #12 from Andrei Alexandrescu <andrei@erdani.com> 2013-01-09 16:07:04 PST ---
Design #1: statically sealed ref ================================

One possible design is to give desideratum "4. Sealed containers" priority and start from there.

Continuing the ScopedContainer example, we notice that to make it work we need the lifetime of c[n] is bounded by the lifetime of c. We set out to enforce that statically. The simplest and most conservative rule would be:

----------
For functions returning ref, the lifetime of the returned object spans at least through the scope of the caller.
----------

Impact on desiderata:

To enforce safety we'd need to disallow any ref-returning function from returning a value with too short a scope. Examples:

ref int fun(int a) { return a; }
// Error: escapes address of by-value parameter

ref int gun() { int a; return a; }
// Error: escapes address of local

ref int hun() { return *(new int); }
// fine

ref int iun(int* p) { return *p; }
// fine

ref int identity(ref int a) { return a; }
// Should work

This last function typechecks if and only if the argument is guaranteed to have a lifetime that expands through the end of the scope of the caller. In turn, if we want to observe (2) and allow rvalues to bind to ref, that means any rvalue created in the caller must exist through the end of the scope in which the rvalue was created. This is a larger extent than what D currently allows (destroy rvalues immediately after the call) and also larger than what C++ allows (destroy rvalues at the end of the full expression). It is unclear whether this has bad consequences; probably not.

One interesting consequence is that ref returns are intransitive, i.e. cannot be passed "up". Consider:

ref int identityImpl(ref int a) { return a; }
ref int identity(ref int a) { return identityImpl(a); }

Under the rule above this code won't compile although it is safe. This is because from the viewpoint of identity(), identityImpl returns an int that can only last through the scope of identity(). Attempting to return that is tantamount to returning a local as far as identity() is concerned, so it won't typecheck.

This limitation is rather severe. One obvious issue is that creating wrappers around objects will be seriously limited. For example, a range can't forward the front of a member:

struct Range {
  private AnotherRange _source;
  // ... inside some Range implementation ...
  ref T front() { return _source.front; } // error
}

Summary
=======

1. Design is safe
2. Rvalues can be bound to ref (subject to unrelated limitations) ONLY if the
lifetime of rvalues is prolonged through the end of the scope they're created
in. (Assessment: fine)
3. Implementing identity(): possible but intransitive, i.e. references can't be
passed up call chains. (Asessment: limitation is problematic.)
4. Sealed containers: possible and safe, but present wrapping problems due to
(3).
5. Simplicity: good

I'll next present a refinement of this design that improves on its disadvantages without losing the advantages.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 10, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9238



--- Comment #13 from Andrei Alexandrescu <andrei@erdani.com> 2013-01-09 18:32:23 PST ---
Design #2: ref return is sealed by arguments ============================================

So design #1 has the obvious issue that call chains can't propagate ref returns upwards even when it's safe to do so. To improve on that, let's devise a refined rule:

----------
For functions returning ref, the lifetime of the returned object spans at least the lifetime of its shortest-lived argument.
----------

Impact on desiderata:

Reconsidering the troublesome example:

ref int identityImpl(ref int a) { return a; }
ref int identity(ref int a) { return identityImpl(a); }

When compiling identity(), the compiler (without seeing the body of
identityImpl) figures that the lifetime of the value returned by
identityImpl(a) is at least as long as the lifetime of a itself. Therefore
identity() typechecks because it is allowed to return a proper.

Safety is still guaranteed however. This is because a function can never escape a reference to an object of shorter lifetime than the lifetime of the reference. Reconsidering the front() example:

struct Range {
  private AnotherRange _source;
  // ... inside some Range implementation ...
  ref T front() { return _source.front; } // fine
}

front() compiles because front is really a regular function taking a "ref Range this". Then _source is scoped inside "this" so from a lifetime standpoint "this", _source, and the result are in good order.

ref int fun() {
   Range r;
   return r.front; // error
}

fun() does not compile because the call r.front returns a value with the lifetime of r, so returning a ref is tantamount to escaping the address of a local.

ref int gun(Range r) {
   return r.front; // error
}

This also doesn't compile because the result of r.front has the lifetime of r, which is passed by value into gun.

ref int gun(ref Range r) {
   return r.front; // fine
}

This does work because the result has the same lifetime as r.

The question remains on how to handle rvalues bound to ref parameters. The previous design required that rvalues live as long as the scope, and this design would allow that too. But this design also allows the C++-style destruction of rvalues: in the call foo(bar()), if foo returns a ref, it must be used immediately because bar will be destroyed at the end of the full expression.

If we want to keep the current D rule of destroying rvalue parameters right after the call to the function, that effectively disallows any use of the ref result. This may actually be a meaningful choice.

The largest problem of this design is lifetime pollution. Consider the ScopedContainer example:

ref T opIndex(size_t n) { return payload_[n]; }

In the call c[42], the shortest lifetime is actually that of n, which binds to the rvalue 42. So the compiler is forced to a shorter guarantee of the result lifetime than the actual lifetime, because of an unrelated parameter.

Summary
=======

1. Design is safe
2. Design allows binding rvalues to ref parameters. For usability, temporaries
must last at least as long as the current expression (C++ style).
3. Returning ref parameters works with fewer restrictions than the previous
design.
4. Sealed containers are implementable.
5. Difficulty is moderate on the implementation side and moderate on the user
side.

Next iteration of the design will attempt to refine the lifetime of results so as to avoid pollution.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 23, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9238



--- Comment #14 from Andrei Alexandrescu <andrei@erdani.com> 2013-04-23 12:03:18 PDT ---
Adding an example that should work by Steve: http://forum.dlang.org/thread/ylebrhjnrrcajnvtthtt@forum.dlang.org?page=11

struct S
{
    int x;
    ref S opOpAssign(string op : "+")(ref S other) { x += other.x; return
this;}
}

ref S add5(ref S s)
{
    auto o = S(5);
    return s += o;
}

void main()
{
    auto s = S(5);
    S s2 = add5(s);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
1 2
Next ›   Last »