Strange closure behaviour

Jun 15, 2019

Emmanuelle

Jun 15, 2019

Adam D. Ruppe

Jun 15, 2019

Jun 15, 2019

Jun 15, 2019

Jun 15, 2019

Jun 16, 2019

Jun 16, 2019

Take a look at this code: --- import std.stdio; void main() { alias Func = void delegate(int); int[][] nums = new int[][5]; Func[] funcs; foreach (x; 0 .. 5) { funcs ~= (int i) { nums[x] ~= i; }; } foreach (i, func; funcs) { func(cast(int) i); } writeln(nums); } --- (https://run.dlang.io/is/oMjNRL) The output is: --- [[], [], [], [], [0, 1, 2, 3, 4]] --- Personally, this makes no sense to me. This is the result I was expecting: --- [[0], [1], [2], [3], [4]] --- Why is it "locking" the bound `x` to the last element? It seems like the compiler is overwriting the closure for `x`, somehow. So, I'm wondering why D is doing that. Is it a compiler bug? Or is this the expected behaviour?

On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote: > Is it a compiler bug? Yup, a very longstanding bug. You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript): funcs ~= ((x) => (int i) { nums[x] ~= i; })(x); Or maybe less confusingly written long form: funcs ~= (delegate(x) { return (int i) { nums[x] ~= i; }; })(x); You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it.

On Saturday, 15 June 2019 at 00:30:43 UTC, Adam D. Ruppe wrote: > On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote: >> Is it a compiler bug? > > Yup, a very longstanding bug. > > You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript): > > funcs ~= ((x) => (int i) { nums[x] ~= i; })(x); > > Or maybe less confusingly written long form: > > funcs ~= (delegate(x) { > return (int i) { nums[x] ~= i; }; > })(x); > > You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it. Oh, I see. Unfortunate that it's a longstanding compiler bug, but at least the rather awkward workaround will do. Thank you!

On Saturday, 15 June 2019 at 01:21:46 UTC, Emmanuelle wrote: > On Saturday, 15 June 2019 at 00:30:43 UTC, Adam D. Ruppe wrote: >> On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote: >>> Is it a compiler bug? >> >> Yup, a very longstanding bug. >> >> You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript): >> >> funcs ~= ((x) => (int i) { nums[x] ~= i; })(x); >> >> Or maybe less confusingly written long form: >> >> funcs ~= (delegate(x) { >> return (int i) { nums[x] ~= i; }; >> })(x); >> >> You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it. > > Oh, I see. Unfortunate that it's a longstanding compiler bug, but at least the rather awkward workaround will do. Thank you! I don't know if we can tell this is a compiler bug. The same behavior happens in Python. The logic being variable `x` is captured by the closure. That closure's context will contain a pointer/reference to x. Whenever x is updated outside of the closure, the context still points to the modified x. Hence the seemingly strange behavior. Adam's workaround ensures that the closure captures a temporary `x` variable on the stack: a copy will be made instead of taking a reference, since a pointer to `x` would be dangling once the `delegate(x){...}` returns. Most of the time, we want a pointer/reference to the enclosed variables in our closures. Note that C++ 17 allows one to select the capture mode: the following link lists 8 of them: https://en.cppreference.com/w/cpp/language/lambda#Lambda_capture. D offers a convenient default that works most of the time. The trade-off is having to deal with the creation of several closures referencing a variable being modified in a single scope, like the incremented `x` of the for loop. That said, I wouldn't mind having the compiler dealing with that case: detecting that `x` is within a for loop and making copies of it in the closures contexts.

On Saturday, 15 June 2019 at 16:29:29 UTC, Rémy Mouëza wrote: > I don't know if we can tell this is a compiler bug. I can't remember where the key fact was, but I used to agree with you (several languages work this same way, and it makes a lot of sense for ease of the implementation), but someone convinced me otherwise by pointing to the language of the D spec. I just can't find that reference right now... It is worth noting too that the current behavior also opens up a whole in the immutable promises; the loop variable can be passed as immutable to the outside via a delegate, but then modified afterward, which is unambiguously a bug. Regardless of bug vs spec, it isn't implemented and I wouldn't expect that to change any time soon, so it is good to just learn the wrapper function technique :) (and it is useful in those other languages too)

On Saturday, 15 June 2019 at 16:29:29 UTC, Rémy Mouëza wrote: > I don't know if we can tell this is a compiler bug. The same behavior happens in Python. The logic being variable `x` is captured by the closure. That closure's context will contain a pointer/reference to x. Whenever x is updated outside of the closure, the context still points to the modified x. Hence the seemingly strange behavior. I come from Ruby, where it works as I expected, so I assumed all languages would work like that; but then, D surprised me, and now, Python too, and apparently a whole bunch of other languages (which is honestly kinda disheartening since I like throwing lambdas everywhere.)

June 16, 2019

Re: Strange closure behaviour

Posted by Timon Gehr
in reply to Rémy Mouëza

Permalink

Timon Gehr

Posted in reply to Rémy Mouëza

Permalink

On 15.06.19 18:29, Rémy Mouëza wrote:
> On Saturday, 15 June 2019 at 01:21:46 UTC, Emmanuelle wrote:
>> On Saturday, 15 June 2019 at 00:30:43 UTC, Adam D. Ruppe wrote:
>>> On Saturday, 15 June 2019 at 00:24:52 UTC, Emmanuelle wrote:
>>>> Is it a compiler bug?
>>>
>>> Yup, a very longstanding bug.
>>>
>>> You can work around it by wrapping it all in another layer of function which you immediately call (which is fairly common in javascript):
>>>
>>>         funcs ~= ((x) => (int i) { nums[x] ~= i; })(x);
>>>
>>> Or maybe less confusingly written long form:
>>>
>>>         funcs ~= (delegate(x) {
>>>             return (int i) { nums[x] ~= i; };
>>>         })(x);
>>>
>>> You write a function that returns your actual function, and immediately calls it with the loop variable, which will explicitly make a copy of it.
>>
>> Oh, I see. Unfortunate that it's a longstanding compiler bug, but at least the rather awkward workaround will do. Thank you!
> 
> I don't know if we can tell this is a compiler bug.

It's a bug. It's memory corruption. Different objects with overlapping lifetimes use the same memory location.

> The same behavior happens in Python.

No, it's not the same. Python has no sensible notion of variable scope.

>>> for i in range(3): pass
...
>>> print(i)
2

Yuck.

> The logic being variable `x` is captured by the closure. That closure's context will contain a pointer/reference to x. Whenever x is updated outside of the closure, the context still points to the modified x. Hence the seemingly strange behavior.
> ...

It's not the same instance of the variable. Foreach loop variables are local to the loop body. They may both be called `x`, but they are not the same. It's most obvious with `immutable` variables.

> Adam's workaround ensures that the closure captures a temporary `x` variable on the stack: a copy will be made instead of taking a reference, since a pointer to `x` would be dangling once the `delegate(x){...}` returns.
> 
> Most of the time, we want a pointer/reference to the enclosed variables in our closures. Note that C++ 17 allows one to select the capture mode: the following link lists 8 of them: https://en.cppreference.com/w/cpp/language/lambda#Lambda_capture.
> ...

No, this is not an issue of by value vs by reference. All captures in D are by reference, yet the behavior is wrong.

> D offers a convenient default that works most of the time. The trade-off is having to deal with the creation of several closures referencing a variable being modified in a single scope, like the incremented `x` of the for loop.
> ...

By reference capturing may be a convenient default, but even capturing by reference the behavior is wrong.

On Sunday, 16 June 2019 at 01:36:38 UTC, Timon Gehr wrote: > It's a bug. It's memory corruption. Different objects with overlapping > lifetimes use the same memory location. Okay. Seen that way, it is clear to me why it's a bug. > ... > No, it's not the same. Python has no sensible notion of variable scope. > > >>> for i in range(3): pass > ... > >>> print(i) > 2 > > Yuck. I got confused by this Python behavior: ls = [] for i in range(0, 5): ls.append(lambda x: x + i) for fun in ls: print(fun(0)) This prints: 4 4 4 4 4

Forums