August 25, 2014
On Monday, 25 August 2014 at 15:09:32 UTC, bearophile wrote:
> Marc Schütz:
>
>> http://wiki.dlang.org/User:Schuetzm/scope
>
> It looks nice. But perhaps it needs some kind of proof of correctness.

Hmm... First there's the assignment rules. They make sure that nothing with a shorter lifetime ends up in a variable with a longer lifetime designation. The other part is to proof that the type deduction and argument matching rules work.

Both parts are not difficult to reason about, but I don't know what a formal proof needs to look like exactly. (How formal do we need to go?)

>
> Have you read the old blog posts (written before the creation of Rust) by Bartosz Milewski regarding the borrowing in D?

No, can you point me to them? I couldn't find them on his blog under http://bartoszmilewski.com/category/d-programming-language/
There are some posts about ownership and regions, but only in the context of multi-threading. I'm afraid this wouldn't easily fit into a hierarchical system like I have in mind.

>
>
>>Implementation of this feature is possible without doing flow control or interprocedural analysis.<
>
> I remember that Walter has recently said that he's willing to add some kind of flow analysis to the D front-end.

Interesting. The question is: is it worth it? Maybe we can already cover 99% of the use cases with a simpler construct. The concept needs to be understandable for the users of the language, too. And maybe "some kind of flow analysis" just isn't enough to get a significant improvement, maybe it would need whole-program analysis...
August 25, 2014
On Monday, 25 August 2014 at 15:38:09 UTC, Ola Fosheim Grøstad wrote:
> On Monday, 25 August 2014 at 11:09:48 UTC, Marc Schütz wrote:
>> I believe this proposal is close to the best we can achieve without resorting to data flow analysis.
>
> I agree with bearophile, perhaps data flow analysis would be desirable. So I think it would be a good idea to hold the door open on that.

I definitely don't want to exclude anything. But we need to find out whether the additional complexity of full-blown DFA is really necessary, see my reply to bearophile.

>
>> I'm unfortunately not familiar with the theoretical foundations of type systems, and formal logic. So I'm not sure what exactly
>
> I don't know all that much about linear type systems, but this is an opportunity to learn more for all of us! :-)
>
>> you mean here. It seems to me the problems with @safe are with the way it was implemented (allow everything _except_ certain things that are known to be bad, instead of the opposite way: allow nothing at first, but lift restrictions where it's safe to do so), they are not conceptual.
>
> Because real programming languages are hard to reason about it is difficult to say where things break down. So one usually will map the constructs of the language onto something simpler and more homogeneous that is easier to reason about.
>
> When it comes to @safe I think the main problem is that D makes decisions on constructs and not on the "final semantics" of the program. If you have a dedicated high level IR you can accept any program segment that can be proven to be memory safe in terms of the IR. The point is to have an IR that is less complicated than the full language, but that retains needed information that is lost in a low level IR and which you need to prove memory safety.
>
> memset() is not unsafe per se, it is unsafe with the wrong parameters. So you have to prove that the parameters are in the safe region. Same thing with freeing memory and borrowed pointers etc. You need a correctness proof even if you give up on generality.
>
> You may reject many safe programs but at least verify as many simple safe programs as you can.
>
> A bonus of having a high level IR is that you more easily can combine languages with fewer interfacing problems. That would be an advantage if you want a DSL to cooperate with D.

But this would require knowledge about the inner workings of memset() to be part of the IR, or memset() to be implemented in it. The same IR (or an equivalent one) would then need to be part of language specification, otherwise different compilers would allow different operations.

IMO the general idea of the current design is not okay, because it's easy to implement, and easy to specify (a whitelist, or as currently, blacklist approach). If something has been overlooked, it can be added incrementally; at the same time there's always @trusted to let the programmer specify what the compiler can't prove.
August 25, 2014
On Monday, 25 August 2014 at 18:14:03 UTC, Marc Schütz wrote:
> But this would require knowledge about the inner workings of memset() to be part of the IR, or memset() to be implemented in it.

The latter. You might have "zerofill(ptr,length)" and turn it into "clear(data)" if it is @safe.

> The same IR (or an equivalent one) would then need to be part of language specification, otherwise different compilers would allow different operations.

I think you could have a low performance minimum requirement reference implementation for this between the front-end and the back-end.

As in, not written for performance, but as a baseline for testing the real implementation.

> currently, blacklist approach). If something has been overlooked, it can be added incrementally; at the same time there's always @trusted to let the programmer specify what the compiler can't prove.

I think borrow, @safe and other types of correctness oriented analysis aspects should be seen as two aspects of the same thing. So it is desirable to design it as a whole IMO.

Especially if assert() is going to turn into assume(). Which I believe will only work reliably for preconditions on a code-scope up to the postconditions, but I believe you need some heavy duty analysis of the call-chain to get anywhere.

I presume you can do the same for allocations between preconditions and postconditions and that could let you establish @safe for a wider range of constructs this way.

I.e. the work on assert->assume for preconditions could lead to a wider range of @safe and also perhaps automatic memory allocation optimizations.

It sounds reasonable that constraining the input sometimes would give information that could help on establishing @safe on code that would otherwise have to be assume unsafe.

Ola.


August 25, 2014
Marc Schütz:

>(How formal do we need to go?)

It's not just a matter of how much formal to go, but also a matter of how much D to formalize to avoid unwanted interactions (surprises) later.

But before going formal, you need to wait for comments from Walter & Andrei; and possibly also comments from a good Rust core developer, because they have discussed such topics in detail for lot of time :-)

Bye,
bearophile
August 26, 2014
On 24 August 2014 23:14, via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> In the "Opportunities for D" thread, Walter again mentioned the topics ref counting, GC, uniqueness, and borrowing, from which a lively discussion developed [1]. I took this thread as an opportunity to write down some ideas about these topics. The result is a rather extensive proposal for the implementation of borrowing, and its implementations:
>
> http://wiki.dlang.org/User:Schuetzm/scope
>
> This is not a real DIP, but before I put more work into formalizing it, I'd like to hear some thoughts from the languages gurus here:
>
> * Is this the general direction we want to go? Is it acceptable in general?
> * Is the proposal internally consistent?
> * How big would the effort to implement it be? (I suspect it's a large
> amount of work, but relatively straightforward.)
>
> [1] http://forum.dlang.org/thread/lphnen$1ml7$1@digitalmars.com
>

This is the initiative I've been dreaming of for years!
I'll give it a critical review when I have some time. But in the mean time
at first glance, I just wanted to say, very nice work! :)
This is largely inline with my thoughts since I first came to D, except you
flesh out a few problem areas (return scope, which we discussed at length
at dconf2013), and the ideas appear to be quite sound.

This direction really opens up the language to support manual and
user-defined memory management at a much deeper and more useful level.
In my experience, manual/custom memory management depends extensively on
templates, and also implies that practically every api in any library
anywhere must also receive template args, such that it can receive objects
templated with custom allocation strategies.
D is a language that is acutely susceptible to over-templatisation and
extreme template bloat. This work on scope is critically important to
mitigate that tendency of D in library code.


August 26, 2014
On Tuesday, 26 August 2014 at 11:47:39 UTC, Manu via Digitalmars-d wrote:
> This is the initiative I've been dreaming of for years!
> I'll give it a critical review when I have some time. But in the mean time
> at first glance, I just wanted to say, very nice work! :)

I had a suspicion you would like it ;-)

> This is largely inline with my thoughts since I first came to D, except you
> flesh out a few problem areas (return scope, which we discussed at length
> at dconf2013), and the ideas appear to be quite sound.

I've heard that mentioned a few times. I guess it was an informal discussion only, not an official talk, right? I only know about a few discussions afterwards on the news group.

>
> This direction really opens up the language to support manual and
> user-defined memory management at a much deeper and more useful level.
> In my experience, manual/custom memory management depends extensively on
> templates, and also implies that practically every api in any library
> anywhere must also receive template args, such that it can receive objects
> templated with custom allocation strategies.
> D is a language that is acutely susceptible to over-templatisation and
> extreme template bloat. This work on scope is critically important to
> mitigate that tendency of D in library code.

I'm looking forward to hear your thoughts about it, because you're one of the people who are involved in large complex projects where this is relevant. For me, it is until now only a theoretical exercise, it's important to hear from someone who can assess how it would work out in practice.

I'm also curious about Walter and Andrei's opinion (I believe they are quite busy with other things at the moment, so we'll have to wait), and Kenji and others who can give an estimate about the implementation.
August 26, 2014
Am Sun, 24 Aug 2014 13:14:43 +0000
schrieb "Marc Schütz" <schuetzm@gmx.net>:

> In the "Opportunities for D" thread, Walter again mentioned the topics ref counting, GC, uniqueness, and borrowing, from which a lively discussion developed [1]. I took this thread as an opportunity to write down some ideas about these topics. The result is a rather extensive proposal for the implementation of borrowing, and its implementations:
> 
> http://wiki.dlang.org/User:Schuetzm/scope

The amount of possible use-cases you listed for this extension is
staggering. It surely carries its own weight.
scope!(ident1, ident2, ...) was quite clever. inout could
borrow from this.

> This is not a real DIP, but before I put more work into formalizing it, I'd like to hear some thoughts from the languages gurus here:
> 
> * Is this the general direction we want to go? Is it acceptable
> in general?
> * Is the proposal internally consistent?

Can anyone tell without actually implementing it? :)
You could try to formalize some error messages and how the
compiler's reasoning would go. What happens when I pass
identifiers to scope!(…) return types?
- Can the compiler look up the identifiers' types and scopes
  in all cases?
- Will the compiler deduce the return type from these
  identifiers? E.g. "scope!(someString) ref getString();" will
  work like in your example? I don't think so, because the
  "lifetime identifiers" could be structs containing the
  returned type.
- What if we used incompatible types like
  "scope!(someString, someIntPtr)" there?
- What about variables?

  static int someInt = 32;
  string someString;
  scope!(someString, someInt) int* x;
  x = &someInt;

  Is the declaration of x in error? Strings don't contain
  integers unless unsafe casts are used, so why would they
  narrow the lifetime of an integer reference?

- Is it necessary to keep around all declared lifetime
  identifiers? In the snippet above (assuming it is valid), it
  looks like the shorter lived `someString' is enough to
  establish the semantics.

> * How big would the effort to implement it be? (I suspect it's a large amount of work, but relatively straightforward.)
> 
> [1] http://forum.dlang.org/thread/lphnen$1ml7$1@digitalmars.com

-- 
Marco

August 27, 2014
On Tuesday, 26 August 2014 at 22:53:30 UTC, Marco Leise wrote:
> Can anyone tell without actually implementing it? :)

I think this is the biggest problem with all scope proposals. While concept is very natural figuring out all small details is incredibly tricky and one can never be sure it works without trying in practice on large-ish project. Of course implementing it is no small feat either which is pretty much any progress in this direction is so slow.
August 27, 2014
Marco Leise:

> The amount of possible use-cases you listed for this extension
> is staggering.

With scope management in code like this:


import std.stdio, std.algorithm;
int foo(in int[] a) {
    return sum([0, 1] ~ a[1 .. $ - 2] ~ 0 ~
               [a[$ - 1] + 1, a[$ - 1] + 2]);
}
void main() {
    int[5] b = [10, 20, 30, 40, 50];
    b.foo.writeln;
}


The compiler in theory could lower the code to something like this, removing all heap allocations for the array literals and the concatenations, allowing 'foo' to be annotated with @nogc (I don't know if Rust is able to do this, I presume it can):


import std.stdio, std.algorithm, core.stdc.stdlib;
T foo(T)(in T[] a) @nogc {
    immutable size_t L = 2 + a.length - 3 + 1 + 2;
    auto ptr = alloca(T.sizeof * L);
    if (ptr == null)
        exit(1); // Or throw a stack overflow error.
    T[] b = (cast(T*)ptr)[0 .. L];
    b[0] = 0;
    b[1] = 1;
    b[2 .. $ - 3] = a[1 .. $ - 2];
    b[$ - 3] = 0;
    b[$ - 2] = a[$ - 1] + 1;
    b[$ - 1] = a[$ - 1] + 2;
    return sum(b);

}
void main() {
    int[5] c = [10, 20, 30, 40, 50];
    c.foo.writeln;
}


But in some cases you don't want those arrays to be allocated on the stack because you know they are very large. So the [...]s array literal syntax becomes useful again:

import std.stdio, std.algorithm;
int foo(const scope int[] a) {
    return sum([0, 1]s ~ a[1 .. $ - 2] ~ 0 ~
               [a[$ - 1] + 1, a[$ - 1] + 2]s);
}


But this is still not enough, because even if those parts are all safely stack-allocated, the result of the concatenation is still not stack-allocated.


This could be enough:

import std.stdio, std.algorithm;
int foo(const scope int[] a) @nogc {
    auto[$] a2 = [0, 1]s ~ a[1 .. $ - 2] ~ 0 ~
                 [a[$ - 1] + 1, a[$ - 1] + 2]s;
    return sum(a2[]);
}

Bye,
bearophile
August 27, 2014
On Tuesday, 26 August 2014 at 22:53:30 UTC, Marco Leise wrote:
> You could try to formalize some error messages and how the
> compiler's reasoning would go. What happens when I pass
> identifiers to scope!(…) return types?
> - Can the compiler look up the identifiers' types and scopes
>   in all cases?

It has to be able to. If a non-visible identifier is specified, it is an error.

> - Will the compiler deduce the return type from these
>   identifiers? E.g. "scope!(someString) ref getString();" will
>   work like in your example? I don't think so, because the
>   "lifetime identifiers" could be structs containing the
>   returned type.

I see, this was a stupid mistake. Of course this function needs to specify the full type, as no type deduction can happen when the body isn't available. Fixed in on the Wiki.

> - What if we used incompatible types like
>   "scope!(someString, someIntPtr)" there?
> - What about variables?
>

Just to be sure: You can only specify variables as owners, not types. I've clarified this on the Wiki.

>   static int someInt = 32;
>   string someString;
>   scope!(someString, someInt) int* x;
>   x = &someInt;
>
>   Is the declaration of x in error? Strings don't contain
>   integers unless unsafe casts are used, so why would they
>   narrow the lifetime of an integer reference?

This is an interesting thought... But I would still allow this for a different reason. At the beginning, I only thought about `scope` as a tool for memory management, but I believe it can be applied for lifetime management of arbitrary resources. Let's slightly modify your example, and use different types:

    Task someProcess;
    scope!someProcess HANDLE my_handle;

`someProcess` could represent an external process that is managed by this program, and `my_handle` could refer to some kind of object in this external process. This handle is only valid as long as the process exists, even though it is not a memory reference, and `Task` might not contain any members of type `HANDLE`. (This is not an ideal example, of course, because the process could terminate for reasons outside of our control.) A similar example would be a connection to the X server, and a handle to an object allocated from it.

I already wrote this idea into the section "scope for non-reference types", with a simpler example. In fact, I believe that the entire proposal will be a bit simpler if references/pointers aren't treated specially.

>
> - Is it necessary to keep around all declared lifetime
>   identifiers? In the snippet above (assuming it is valid), it
>   looks like the shorter lived `someString' is enough to
>   establish the semantics.

Yes, it's possible to some degree, see the section "Considerations for the implementation". Unfortunately, this doesn't work with function declarations, and is incompatible with the suggested `scope!(const ...)` extension.