DIP1000 scope inference (page 3)

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 10/26/2022 7:38 AM, Steven Schveighoffer wrote:
> On 10/26/22 4:03 AM, Walter Bright wrote:
>> On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
>>> In a `@trusted` function today, without dip1000, the above is perfectly reasonable and not invalid. Will dip1000 make it corrupt memory?
>>
>> A very good question. Clearly, having code work when it is @safe, but cause memory corruption when it is marked @trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
>>
> 
> I should be clear here

I understood the issue <g>.

> The last thing we want dip1000 to do is *cause* memory corruption.

We're in full agreement here.

October 26, 2022

Posted by Walter Bright
in reply to Walter Bright

Walter Bright

Posted in reply to Walter Bright

On 10/26/2022 1:03 AM, Walter Bright wrote:
> On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
>> In a `@trusted` function today, without dip1000, the above is perfectly reasonable and not invalid. Will dip1000 make it corrupt memory?
> 
> A very good question. Clearly, having code work when it is @safe, but cause memory corruption when it is marked @trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
> 

[Some more thinking about the problem]

The question is when is [1,2,3] allocated on the stack, and when is it allocated on the GC heap?

Some points:

1. in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did

2. allocating on the heap means it is unusable in @nogc code

3. when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient

4. it runs against the idea that the simpler code should be more efficient than the complex code

Therefore, I suggest the following:

    [1,2,3] is always allocated on the stack

    [1,2,3].dup is always allocated on the heap

and thus, its behavior is not dependent on inference.

How we transition to this, we'll have to figure out.

October 26, 2022

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 10/26/22 8:57 PM, Walter Bright wrote:

On 10/26/2022 1:03 AM, Walter Bright wrote:

On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:

In a @trusted function today, without dip1000, the above is perfectly reasonable and not invalid. Will dip1000 make it corrupt memory?

A very good question. Clearly, having code work when it is @safe, but cause memory corruption when it is marked @trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.

[Some more thinking about the problem]

The question is when is [1,2,3] allocated on the stack, and when is it allocated on the GC heap?

Some points:

in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did
allocating on the heap means it is unusable in @nogc code
when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient
it runs against the idea that the simpler code should be more efficient than the complex code

Therefore, I suggest the following:

[1,2,3] is always allocated on the stack

[1,2,3].dup is always allocated on the heap

and thus, its behavior is not dependent on inference.

How we transition to this, we'll have to figure out.

Please no! We can allocate on the stack by explicitly requesting it:

int[3] = [1, 2, 3];

The issue is the DRYness of it. This has been proposed before, just:

int[$] = [1, 2, 3];

If we are going to fix something, let's fix this! It's backwards compatible too.

If anything, the compiler can just punt and say all array literals that aren't immediately assigned to static arrays are allocated on the heap. Then it's consistent.

Allocating array literals on the heap is awesome, please don't change that! D is one of the best learning languages for high-performance code because you don't have to worry at all about memory management out of the box. I'm actually OK with backends using stack allocations because it can prove they aren't escaping, why can't we just rely on that?

-Steve

October 27, 2022

Posted by Dukc
in reply to Steven Schveighoffer

Dukc

Posted in reply to Steven Schveighoffer

On Thursday, 27 October 2022 at 00:57:47 UTC, Walter Bright wrote:

Therefore, I suggest the following:

[1,2,3] is always allocated on the stack

Please no. Far too much breakage for the value (even without going to the question whether it'd be added value in the first place).

allocating on the heap means it is unusable in @nogc code

The compiler will error, and the programmer can manually fix it. No silent errors. @nogc code is still a bit of a special case, GC-using code is the normal we want to optimise the language for.

when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient

The compiler is still free to optimise those as a stack allocation, if it can prove there's no escaping of the data. scope is just used to enforce that being the case in @safe, or giving the compiler the permission to assume that being the case in @trusted and @system.

October 27, 2022

Posted by German Diago
in reply to Walter Bright

German Diago

Posted in reply to Walter Bright

On Thursday, 27 October 2022 at 00:57:47 UTC, Walter Bright wrote:
> On 10/26/2022 1:03 AM, Walter Bright wrote:
>> On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
>>> In a `@trusted` function today, without dip1000, the above is perfectly reasonable and not invalid. Will dip1000 make it corrupt memory?
>> 
>> A very good question. Clearly, having code work when it is @safe, but cause memory corruption when it is marked @trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
>> 
>
> [Some more thinking about the problem]
>
> The question is when is [1,2,3] allocated on the stack, and when is it allocated on the GC heap?
>
> Some points:
>
> 1. in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did
>
> 2. allocating on the heap means it is unusable in @nogc code
>
> 3. when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient
>
> 4. it runs against the idea that the simpler code should be more efficient than the complex code
>
> Therefore, I suggest the following:
>
>     [1,2,3] is always allocated on the stack
>
>     [1,2,3].dup is always allocated on the heap
>

As a person who has used D but not extensively, I was suprised of type[] vs type[N] behavior all the time. I agree that [1, 2, 3] should allocate in the stack but I am not sure how much code that could break? For example, if before it was on the heap, what happens with this now?

int [] func() {
  // Allocated in the stack, I presume that not safe, should add .dup?
  int[] v  = [1, 2, 3];
  return v;
}

How it should work?

October 27, 2022

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 10/26/2022 6:26 PM, Steven Schveighoffer wrote:
> Please no! We can allocate on the stack by explicitly requesting it:
> 
> ```d
> int[3] = [1, 2, 3];
> ```
> 
> The issue is the DRYness of it. This has been proposed before, just:
> 
> ```d
> int[$] = [1, 2, 3];
> ```

How would this be done:

    foo([1,2,3] + a)

i.e. using an array literal in places other than an initialization?


> If we are going to fix something, let's fix this! It's backwards compatible too.
> 
> If anything, the compiler can just punt and say all array literals that aren't immediately assigned to static arrays are allocated on the heap. Then it's consistent.

And inefficient.


> Allocating array literals on the heap is *awesome*, please don't change that! D is one of the best learning languages for high-performance code because you don't have to worry at all about memory management out of the box. I'm actually OK with backends using stack allocations because it can prove they aren't escaping, why can't we just rely on that?

I thought your test case showed the problem with that :-/

October 27, 2022

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 10/27/22 9:44 AM, Walter Bright wrote:

On 10/26/2022 6:26 PM, Steven Schveighoffer wrote:

Please no! We can allocate on the stack by explicitly requesting it:

int[3] = [1, 2, 3];

The issue is the DRYness of it. This has been proposed before, just:

int[$] = [1, 2, 3];

How would this be done:

foo([1,2,3] + a)

Already works today, except I don't know what the + a means:

foo([1, 2, 3].staticArray);

> >

If we are going to fix something, let's fix this! It's backwards compatible too.

If anything, the compiler can just punt and say all array literals that aren't immediately assigned to static arrays are allocated on the heap. Then it's consistent.

And inefficient.

Inefficiencies that are taken care of by modern backends, such as llvm and gcc.

> >

I thought your test case showed the problem with that :-/

Backends that put it on the stack are not using language constructs such as scope to make assumptions, they are using actual analysis of the control flow to prove that it doesn't escape.

-Steve

October 27, 2022

Posted by Quirin Schroll
in reply to tsbockman

Quirin Schroll

Posted in reply to tsbockman

On Wednesday, 26 October 2022 at 20:24:38 UTC, tsbockman wrote:

On Wednesday, 26 October 2022 at 10:43:11 UTC, German Diago wrote:

Is not trusted code (note my little D experience so sorry if I am asking something relatively stupid) unsafe? I mean, @safe is safe, @trusted is ??, @system is you go your own.

So what are the guarantees of @trusted compared to @system?

A @safe function is guaranteed by the compiler to be memory safe to call from other @safe code with (almost) any possible arguments and under (almost) any circumstances.

A @trusted function is guaranteed by its author to be memory safe to call from other @safe code with (almost) any possible arguments and under (almost) any circumstances.

The “(almost)” should be absent. If you mean something other than compiler bugs, please tell us.

A @system function may require the caller to follow additional rules beyond those enforced by the compiler, even in @safe code, to maintain memory safety. Since the compiler does not know what these additional rules are and cannot enforce them automatically, calling @system functions directly from @safe code is forbidden.

Attribute	Must check definition	Must check each caller
`@safe`	compiler	compiler
`@trusted`	programmer	compiler
`@system`	programmer	programmer

Assume the function is implemented correctly, then try to figure out how to call the function from @safe code in a way that violates memory safety. If there is a way to do so, the function should be @system.

Otherwise, it should be @safe if that compiles, or @trusted if not.

I agree with the characterization of @safe and @system. For @trusted functions, there’s something more to say:

Widely accessible ones (e.g. public, package, protected, even private in a big module) should have a @safe interface, i.e. you can use them like @safe functions in all regards; they just aren’t @safe because of some implementation details.
Narrowly accessible ones (e.g. private (in a small module), local functions, immediately executed lambdas) can have a @system interface, but their surroundings can be trusted to use the function correctly.

October 27, 2022

Posted by Quirin Schroll
in reply to German Diago

Quirin Schroll

Posted in reply to German Diago

On Thursday, 27 October 2022 at 09:36:25 UTC, German Diago wrote:

On Thursday, 27 October 2022 at 00:57:47 UTC, Walter Bright wrote:

On 10/26/2022 1:03 AM, Walter Bright wrote:

On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:

In a @trusted function today, without dip1000, the above is perfectly reasonable and not invalid. Will dip1000 make it corrupt memory?

[Some more thinking about the problem]

The question is when is [1,2,3] allocated on the stack, and when is it allocated on the GC heap?

Some points:

in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did
allocating on the heap means it is unusable in @nogc code
when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient
it runs against the idea that the simpler code should be more efficient than the complex code

Therefore, I suggest the following:

[1,2,3] // is always allocated on the stack

[1,2,3].dup // is always allocated on the heap

As a person who has used D but not extensively, I was suprised of type[] vs type[N] behavior all the time. I agree that [1, 2, 3] should allocate in the stack but I am not sure how much code that could break? For example, if before it was on the heap, what happens with this now?

int[] func() {
  // Allocated in the stack, I presume that not safe, should add .dup?
  int[] v  = [1, 2, 3];
  return v;
}

How it should work?

If [1, 2, 3] is stack allocated, it should not compile (at least not in @safe code, probably not in @system code either). The problem is not the assignment to v (that is of the same kind as a pointer to a local variable), but that its value is returned and thus leaking the address of a local.

October 27, 2022

Posted by ag0aep6g
in reply to Quirin Schroll

ag0aep6g

Posted in reply to Quirin Schroll