delegate confusion - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » delegate confusion

Thread overview

delegate confusion
Aug 04, 2017 bitwise
Aug 04, 2017 bitwise
Aug 04, 2017 Steven Schveighoffer
Aug 04, 2017 bitwise
Aug 04, 2017 Timon Gehr
Aug 04, 2017 Stefan Koch
Aug 04, 2017 bitwise
Aug 04, 2017 Moritz Maxeiner
Aug 04, 2017 Timon Gehr
Aug 04, 2017 Moritz Maxeiner
Aug 04, 2017 Moritz Maxeiner
Aug 04, 2017 bitwise

August 04, 2017

delegate confusion

Posted by bitwise

bitwise

I'm confused about how D's lambda capture actually works, and can't find any clear specification on the issue. I've read the comments on the bug about what's described below, but I'm still confused. The conversation there dropped off in 2016, and the issue hasn't been fixed, despite high bug priority and plenty of votes.

Consider this code:

void foo() {
    void delegate()[] funs;

    foreach(i; 0..5)
        funs ~= (){ writeln(i); };

    foreach(fun; funs)
        fun();
}

void bar() {
    void delegate()[] funs;

    foreach(i; 0..5)
    {
        int j = i;
        funs ~= (){ writeln(j); };
    }
    foreach(fun; funs)
        fun();
}


void delegate() baz() {
    int i = 1234;
    return (){ writeln(i); };
}

void overwrite() {
    int i = 5;
    writeln(i);
}

int main(string[] argv)
{
    foo();
    bar();

    auto fn = baz();
    overwrite();
    fn();

    return 0;
}

First, I run `foo`. The output is "4 4 4 4 4".
So I guess `i` is captured by reference, and the second loop in `foo` works because the stack hasn't unwound, and `i` hasn't been overwritten, and `i` contains the last value that was assigned to it.

Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack works in C#, I suppose it's reasonable to assume the D compiler would just reuse stack space for `j`, and that the C# compiler has some special logic built in to handle this.

Now, I test my conclusions above, and run `baz`, `overwrite` and `fn`. The result? total confusion.
The output is "5" then "1234". So if the lambdas are referencing the stack, why wasn't 1234 overwritten?

Take a simple C++ program for example:

int* foo() {
    int i = 1234;
    return &i;
}

void overwrite() {
    int i = 5;
    printf("%d\n", i);
}

int main()
{
    auto a = foo();
    overwrite();
    printf("%d\n", *a);
	return 0;
}

This outputs "5" and "5" which is exactly what I expect, because I'm overwriting the stack space where the first `i` was stored with "5".

So now, I'm thinking.... D must be storing these captures on the heap then..right? So why would I get "4 4 4 4 4" instead of "0 1 2 3 4" for `foo` and `bar`?

This makes absolutely no sense at all.

It seems like there are two straight forward approaches available here:

1) capture everything by reference, in which case the `overwrite` example would work just like the C++ version. Then, it would be up to the programmer to heap allocate anything living beyond the current scope.

2) heap allocate a chunk of space for each lambda's captures, and copy everything captured into that space when the lambda is constructed. This of course, would mean that `foo` and `bar` would both output "0 1 2 3 4".

When I look at the output I get from the code above though, it seems like neither of these things were done, and that someone has gone way out of their way to implement some very strange behavior.

What I would prefer, would be a mixture of reference and value capture like C++, where I could explicitly state whether I wanted (1) or (2). I would settle for (2) though.

While I'm sure there is _some_ reason that things currently work the way they do, the current behavior is very unintuitive, and gives no control over how things are captured.

August 04, 2017

Re: delegate confusion

Posted by bitwise
in reply to bitwise

bitwise

Posted in reply to bitwise

*lambda confusion

August 04, 2017

Re: delegate confusion

Posted by Steven Schveighoffer
in reply to bitwise

Steven Schveighoffer

Posted in reply to bitwise

On 8/4/17 12:57 PM, bitwise wrote:
> I'm confused about how D's lambda capture actually works, and can't find any clear specification on the issue. I've read the comments on the bug about what's described below, but I'm still confused. The conversation there dropped off in 2016, and the issue hasn't been fixed, despite high bug priority and plenty of votes.
> 
> Consider this code:
> 
> void foo() {
>      void delegate()[] funs;
> 
>      foreach(i; 0..5)
>          funs ~= (){ writeln(i); };
> 
>      foreach(fun; funs)
>          fun();
> }
> 
> void bar() {
>      void delegate()[] funs;
> 
>      foreach(i; 0..5)
>      {
>          int j = i;
>          funs ~= (){ writeln(j); };
>      }
>      foreach(fun; funs)
>          fun();
> }
> 
> 
> void delegate() baz() {
>      int i = 1234;
>      return (){ writeln(i); };
> }
> 
> void overwrite() {
>      int i = 5;
>      writeln(i);
> }
> 
> int main(string[] argv)
> {
>      foo();
>      bar();
> 
>      auto fn = baz();
>      overwrite();
>      fn();
> 
>      return 0;
> }
> 
> First, I run `foo`. The output is "4 4 4 4 4".
> So I guess `i` is captured by reference, and the second loop in `foo` works because the stack hasn't unwound, and `i` hasn't been overwritten, and `i` contains the last value that was assigned to it.
> 
> Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack works in C#, I suppose it's reasonable to assume the D compiler would just reuse stack space for `j`, and that the C# compiler has some special logic built in to handle this.
> 
> Now, I test my conclusions above, and run `baz`, `overwrite` and `fn`. The result? total confusion.
> The output is "5" then "1234". So if the lambdas are referencing the stack, why wasn't 1234 overwritten?
> 
> Take a simple C++ program for example:
> 
> int* foo() {
>      int i = 1234;
>      return &i;
> }
> 
> void overwrite() {
>      int i = 5;
>      printf("%d\n", i);
> }
> 
> int main()
> {
>      auto a = foo();
>      overwrite();
>      printf("%d\n", *a);
>      return 0;
> }
> 
> This outputs "5" and "5" which is exactly what I expect, because I'm overwriting the stack space where the first `i` was stored with "5".
> 
> So now, I'm thinking.... D must be storing these captures on the heap then..right? So why would I get "4 4 4 4 4" instead of "0 1 2 3 4" for `foo` and `bar`?
> 
> This makes absolutely no sense at all.

Because the stack frame of foo or bar or baz is stored on the heap BEFORE the function is entered. The compiler determines that the stack frame will need to be captured, so it captures it on function entry, not when the delegate is taken. Then the variable location is reused for the loop, and all delegates point at the same stack frame.

This is necessary for cases where the delegate may affect the frame data during the function call. For instance:

void foo()
{
   int i;
   auto dg = { ++i;};
   dg();
   dg();
   assert(i == 2);
}

What is needed is to allocate one frame per scope, and have the delegate point at the right ones.

Note, the C++ behavior uses dangling stack pointers, and not something we want to support in D.

-Steve

August 04, 2017

Re: delegate confusion

Posted by Timon Gehr
in reply to bitwise

Timon Gehr

Posted in reply to bitwise

On 04.08.2017 18:57, bitwise wrote:
> I'm confused about how D's lambda capture actually works, and can't find any clear specification on the issue. I've read the comments on the bug about what's described below, but I'm still confused. The conversation there dropped off in 2016, and the issue hasn't been fixed, despite high bug priority and plenty of votes.
> 
> Consider this code:
> 
> void foo() {
>      void delegate()[] funs;
> 
>      foreach(i; 0..5)
>          funs ~= (){ writeln(i); };
> 
>      foreach(fun; funs)
>          fun();
> }
> 
> void bar() {
>      void delegate()[] funs;
> 
>      foreach(i; 0..5)
>      {
>          int j = i;
>          funs ~= (){ writeln(j); };
>      }
>      foreach(fun; funs)
>          fun();
> }
> 
> 
> void delegate() baz() {
>      int i = 1234;
>      return (){ writeln(i); };
> }
> 
> void overwrite() {
>      int i = 5;
>      writeln(i);
> }
> 
> int main(string[] argv)
> {
>      foo();
>      bar();
> 
>      auto fn = baz();
>      overwrite();
>      fn();
> 
>      return 0;
> }
> 
> First, I run `foo`. The output is "4 4 4 4 4".
> So I guess `i` is captured by reference, and the second loop in `foo` works because the stack hasn't unwound, and `i` hasn't been overwritten, and `i` contains the last value that was assigned to it.
> 
> Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack works in C#,

It's very important to understand that the C# is different, even though it looks similar. In D, the foreach loop variable is a distinct declaration for each loop iteration, while in C#, the same loop variable is repeatedly reassigned. In C#, the issue is bad language design, while in D, the issue is a buggy compiler implementation leading to memory corruption.

> I suppose it's reasonable to assume the D compiler would just reuse stack space for `j

It's reasonable to assume that the D compiler uses the same memory location for all of the distinct variables. This is a dangling pointer bug, if you wish. Both of your examples should print "0 1 2 3 4".

> and that the C# compiler has some special logic built in to handle this.
> ...

The C# compiler just uses the correct rules for creating closures. (It is hard for the compiler to screw this up, because the underlying platform aims to prevents memory corruption.)

> Now, I test my conclusions above, and run `baz`, `overwrite` and `fn`. The result? total confusion.
> The output is "5" then "1234". So if the lambdas are referencing the stack, why wasn't 1234 overwritten?
> ...

The lambdas are referencing the heap, but all of them reference identical heap locations. This should not happen. Distinct variables shouldn't share the same memory.

> Take a simple C++ program for example:
> 
> int* foo() {
>      int i = 1234;
>      return &i;
> }
> 
> void overwrite() {
>      int i = 5;
>      printf("%d\n", i);
> }
> 
> int main()
> {
>      auto a = foo();
>      overwrite();
>      printf("%d\n", *a);
>      return 0;
> }
> 
> This outputs "5" and "5" which is exactly what I expect, because I'm overwriting the stack space where the first `i` was stored with "5".
>  > So now, I'm thinking.... D must be storing these captures on the heap
> then..right? So why would I get "4 4 4 4 4" instead of "0 1 2 3 4" for `foo` and `bar`?
> 
> This makes absolutely no sense at all.
> 
> It seems like there are two straight forward approaches available here:
> 
> 1) capture everything by reference, in which case the `overwrite` example would work just like the C++ version. Then, it would be up to the programmer to heap allocate anything living beyond the current scope.
> ...

Capturing by reference is not the same as creating stack references. The language semantics don't even need to be implemented using a stack.

> 2) heap allocate a chunk of space for each lambda's captures, and copy everything captured into that space when the lambda is constructed. This of course, would mean that `foo` and `bar` would both output "0 1 2 3 4".
> ...

3) heap allocate a chunk of space for each captured scope (as in lisp and C#).

The way to go is 3). 1) is bad, because it completely prevents closures from being escaped, 2) is bad because it does not allow sharing of closure memory.

> When I look at the output I get from the code above though, it seems like neither of these things were done, and that someone has gone way out of their way to implement some very strange behavior.
> ...

Absolutely not. The current behavior was quite straightforward to implement, but it is wrong. Bugs often lead to strange behavior. This does not imply that such bugs are intentional.

> What I would prefer, would be a mixture of reference and value capture like C++, where I could explicitly state whether I wanted (1) or (2). I would settle for (2) though.
> ...

"Like C++" does not work: in C++, each lambda has its own unique type.

> While I'm sure there is _some_ reason that things currently work the way they do, the current behavior is very unintuitive, and gives no control over how things are captured.
> 

You can work around the bug like this:

foreach(i;0..5)(){
    int j=i;
    funs~=(){ writeln(j); };
}()

August 04, 2017

Re: delegate confusion

Posted by Moritz Maxeiner
in reply to bitwise

Moritz Maxeiner

Posted in reply to bitwise

On Friday, 4 August 2017 at 16:57:37 UTC, bitwise wrote:
> I'm confused about how D's lambda capture actually works, and can't find any clear specification on the issue. I've read the comments on the bug about what's described below, but I'm still confused. The conversation there dropped off in 2016, and the issue hasn't been fixed, despite high bug priority and plenty of votes.

How it works is described here [1] (and the GC involvement also listed here [2]), with the key sentences being

>> Delegates to non-static nested functions contain two pieces of data: the pointer to the stack frame of the lexically enclosing function (called the frame pointer) and the address of the function.

i.e. delegates point to the enclosing function's *stack frame* and access of its variables through that single pointer.

and

>> The stack variables referenced by a nested function are still valid even after the function exits (this is different from D 1.0). This is called a closure.

i.e. when you return a delegate to somewhere where the enclosing function's stack frame will have become invalid, D creates a (delegate) closure, copying the necessary frame pointed to by the delegate's frame pointer to the GC managed heap.

>
> Consider this code:
>
> void foo() {
>     void delegate()[] funs;
>
>     foreach(i; 0..5)
>         funs ~= (){ writeln(i); };
>
>     foreach(fun; funs)
>         fun();
> }
>
> void bar() {
>     void delegate()[] funs;
>
>     foreach(i; 0..5)
>     {
>         int j = i;
>         funs ~= (){ writeln(j); };
>     }
>     foreach(fun; funs)
>         fun();
> }
>
>
> void delegate() baz() {
>     int i = 1234;
>     return (){ writeln(i); };
> }
>
> void overwrite() {
>     int i = 5;
>     writeln(i);
> }
>
> int main(string[] argv)
> {
>     foo();
>     bar();
>
>     auto fn = baz();
>     overwrite();
>     fn();
>
>     return 0;
> }
>
> First, I run `foo`. The output is "4 4 4 4 4".
> So I guess `i` is captured by reference, and the second loop in `foo` works because the stack hasn't unwound, and `i` hasn't been overwritten, and `i` contains the last value that was assigned to it.

`i` is accessed by each of the four delegates through their respective frame pointer, which (for all of them) points to foo's stack frame, where the value of `i` is 4 after the loop terminates.

>
> Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack works in C#, I suppose it's reasonable to assume the D compiler would just reuse stack space for `j`, and that the C# compiler has some special logic built in to handle this.

Yes, `j` exists once in foo's stack frame, so the same thing as in the above happens, because `j`'s value after the loop's termination is also 4.

>
> Now, I test my conclusions above, and run `baz`, `overwrite` and `fn`. The result? total confusion.
> The output is "5" then "1234". So if the lambdas are referencing the stack, why wasn't 1234 overwritten?

This works as per spec:
Invoking baz() creates a delegate pointing to baz's stack frame and when you return it, that frame is copied to the GC managed heap by the runtime (because the delegate would have an invalid frame pointer otherwise).
overwrite is a normal function with its own stack frame, which is used in its call to writeln.
It does not interfact with baz, or the delegate returned by baz, in any way.

> [...]

[1] https://dlang.org/spec/function.html#closures
[2] https://dlang.org/spec/garbage.html#op_involving_gc

August 04, 2017

Re: delegate confusion

Posted by Timon Gehr
in reply to Moritz Maxeiner

Timon Gehr

Posted in reply to Moritz Maxeiner

On 04.08.2017 19:36, Moritz Maxeiner wrote:
>>
>> Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack works in C#, I suppose it's reasonable to assume the D compiler would just reuse stack space for `j`, and that the C# compiler has some special logic built in to handle this.
> 
> Yes, `j` exists once in foo's stack frame, so the same thing as in the above happens, because `j`'s value after the loop's termination is also 4.

Make `j` 'immutable' to appreciate why this behavior is unsound (this is a form of memory corruption).

August 04, 2017

Re: delegate confusion

Posted by Stefan Koch
in reply to Timon Gehr

Stefan Koch

Posted in reply to Timon Gehr

On Friday, 4 August 2017 at 17:27:52 UTC, Timon Gehr wrote:
> In D, the foreach loop variable is a distinct declaration for each loop iteration, while in C#, the same loop variable is repeatedly reassigned. In C#, the issue is bad language design, while in D, the issue is a buggy compiler implementation leading to memory corruption.
> [ ... ]
> It's reasonable to assume that the D compiler uses the same memory location for all of the distinct variables. This is a dangling pointer bug, if you wish. Both of your examples should print "0 1 2 3 4".
> [ ... ]
>
> 3) heap allocate a chunk of space for each captured scope (as in lisp and C#).
>
> The way to go is 3). 1) is bad, because it completely prevents closures from being escaped, 2) is bad because it does not allow sharing of closure memory.

Thanks for you insight Timon.
Would you mind writing an ER. (enhancment request) for that.
And a small spec-like proto-DIP ?

I'd love to adopt that behavior for newCTFE where it is actually the more straightforward way. (in light of the constraints newCTFEs architecture has)

August 04, 2017

Re: delegate confusion

Posted by Moritz Maxeiner
in reply to Timon Gehr

Moritz Maxeiner

Posted in reply to Timon Gehr

On Friday, 4 August 2017 at 17:44:23 UTC, Timon Gehr wrote:
> On 04.08.2017 19:36, Moritz Maxeiner wrote:
>>>
>>> Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack works in C#, I suppose it's reasonable to assume the D compiler would just reuse stack space for `j`, and that the C# compiler has some special logic built in to handle this.
>> 
>> Yes, `j` exists once in foo's stack frame, so the same thing as in the above happens, because `j`'s value after the loop's termination is also 4.
>
> Make `j` 'immutable' to appreciate why this behavior is unsound (this is a form of memory corruption).

I was (explicitly) arguing that it's in keeping with the current spec.
That the spec is unsound and should be updated is another matter (on which I agree with you).

August 04, 2017

Re: delegate confusion

Posted by Moritz Maxeiner
in reply to Moritz Maxeiner

Moritz Maxeiner

Posted in reply to Moritz Maxeiner

On Friday, 4 August 2017 at 17:47:01 UTC, Moritz Maxeiner wrote:
> On Friday, 4 August 2017 at 17:44:23 UTC, Timon Gehr wrote:
>> On 04.08.2017 19:36, Moritz Maxeiner wrote:
>>>>
>>>> [...]
>
> I was (explicitly) arguing that it's in keeping with the current spec.
> That the spec is unsound and should be updated is another matter (on which I agree with you).

s/arguing/explaining/

August 04, 2017

Re: delegate confusion

Posted by bitwise
in reply to Steven Schveighoffer

bitwise

Posted in reply to Steven Schveighoffer

On Friday, 4 August 2017 at 17:18:41 UTC, Steven Schveighoffer wrote:
> On 8/4/17 12:57 PM, bitwise wrote:
>> [...]
>
> Because the stack frame of foo or bar or baz is stored on the heap BEFORE the function is entered. The compiler determines that the stack frame will need to be captured, so it captures it on function entry, not when the delegate is taken. Then the variable location is reused for the loop, and all delegates point at the same stack frame.
>
> This is necessary for cases where the delegate may affect the frame data during the function call. For instance:
>
> void foo()
> {
>    int i;
>    auto dg = { ++i;};
>    dg();
>    dg();
>    assert(i == 2);
> }
>
> What is needed is to allocate one frame per scope, and have the delegate point at the right ones.
>
> Note, the C++ behavior uses dangling stack pointers, and not something we want to support in D.
>
> -Steve

Thanks for clearing this up. Looking over my examples again, this makes sense now. I suppose while this behavior is not ideal, it does mean that I can safely throw lambdas that capture things into a queue to be executed later, which was my main concern.

I wish this forum was a little more advanced so I could change the post title I fudged and make this information more visible =/

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation