Thread overview | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On Mon, Sep 02, 2013 at 12:27:44AM +0200, Joseph Rushton Wakeling wrote: > Hello all, > > I came across an interesting little pitfall of delegates recently which I thought I'd share. > > TL;DR: having reference types as struct members can be dangerous. ;-) Yes. > Suppose we have a fairly simple range which is supposed to count from 0 to some maximum. The trick is that the count is implemented by a function that's accessed by a delegate. > > struct Foo > { > private size_t _count = 0, _max; > private void delegate() jump = null; > > @disable this(); > > this(size_t max) > { > _max = max; This: > jump = &jump10; > } And this: [...] > void jump10() > { > _count += 10; > writeln("At end of jump, count = ", _count); > } > } are a recipé for disaster. :) Well, Adam Ruppe & myself found out the hard way that delegates closing over struct variables are dangerous, because structs can be moved about / copied by value etc.. Consider this innocent-looking code: struct S { void delegate()[] cleanups; void* data; this(int) { data = malloc(1024); cleanups ~= { free(data); }; } ~this() { foreach_reverse (f; cleanups) f(); } } S makeS() { return S(123); /* create new instance of S */ } void main() { auto s = makeS(); ... // do some other stuff here } // segfault Spot the bug. :) It's very non-obvious, of course. It's ultimately caused by the delegate that closes over S.data. When the ctor creates the delegate, the delegate's context pointer is set to the instance of S being initialized in makeS(). This instance then returns S to main() -- but since structs are passed *by value*, it gets *copied* into a new location reserved for the local variable s in main(). The original instance of S, that the delegate's context pointer is pointing to, is now invalid. If main() exits right away, it *appears* to be fine -- because the original instance of S is still sitting on the stack (though technically it is no longer valid since makeS()'s scope has ended). But as soon as you make another function call, or allocate new variables on the stack, etc., they will begin to overwrite that original copy of S, and so when the dtor of s is invoked, it calls the delegate which tries to access the *original* copy of S that has now been overwritten with garbage. Result: segfault. The compiler really should reject attempts to close over struct instances in this way, as it's really dangerous, but looks innocent. Worst of all, this bug is extremely hard to trace, because by the time the dtor is invoked, the original cause of the problem may already be very far away, and code reviews generally fail to catch it because it's so subtle. T -- Let's eat some disquits while we format the biskettes. |
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 02/09/13 06:41, H. S. Teoh wrote:
> Spot the bug. :)
Ouch!! :-)
The first time I ever implemented random sampling was in C, and it was a version of the same algorithm as in Phobos, but written for the GNU Scientific Library instead. Most of the complex types in GSL are implemented as pointers to "objects" which consist of some kind of data payload plus function pointers that make up the object's API.
So, in that context, it was trivial to use function pointers to manage the sampling algorithm's switch mid-sample from one skip() function to another.
But in D -- risky at least for now, until things like RandomSample are re-implemented as reference types (which they should be).
I suppose one way round this might be to re-implement the to-be-delegated-to function as a free function whose argument is a reference to the calling object, but that's finnicky and not very nice.
|
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 09/02/13 00:27, Joseph Rushton Wakeling wrote: > Hello all, > > I came across an interesting little pitfall of delegates recently which I thought I'd share. > > TL;DR: having reference types as struct members can be dangerous. ;-) > > Suppose we have a fairly simple range which is supposed to count from 0 to some maximum. The trick is that the count is implemented by a function that's accessed by a delegate. > > struct Foo > { > private size_t _count = 0, _max; > private void delegate() jump = null; > > @disable this(); > > this(size_t max) > { > _max = max; > jump = &jump10; > } > > bool empty() @property @safe pure const nothrow > { > return (_count > _max); > } > > size_t front() > { > return _count; > } > > void popFront() > { > if (empty) > { > return; > } > > writeln("At start of popFront(), count = ", _count); > writeln("Jumping ..."); > jump(); > writeln("At end of popFront(), count = ", _count); > } > > void jump10() > { > _count += 10; > writeln("At end of jump, count = ", _count); > } > } > > Now let's put this struct to use: > > auto foo = Foo(50); > > foreach (f; foo) > { > writeln(f); > } > > What we expect is that the program will print out 0, 10, 20, 30, 40, 50 (on new lines) and exit. > > In fact, the foreach () loop never exits and the logging writeln's we put in tell us a strange story: > ... and so on. It seems like the value of _count is never being incremented. The logging function inside jump10 keeps telling us: "At end of jump, count = 70" (and 80, and 90, and ... 30470 ... and ...), but front() and popFront() keep telling us the count is zero. > > Of course, it's because of the delegate. The foreach () loop has taken a (value-type) copy of the Foo struct, but the delegate jump() is a reference -- which is pointing to the function jump10 in the _original_ Foo, not the copy that foreach () is operating on. > > Hence, it's the original's _count that's being incremented, and not that in foreach () loop's copy -- and so the foreach loop never exits. this(this) { jump.ptr = &this; } While this may not seem ideal, sometimes you do *not* want to do it (when you need to keep the original context), so there are no obvious alternatives that would handle all cases automagically. artur |
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 02/09/13 15:39, Artur Skawina wrote:
> While this may not seem ideal, sometimes you do *not* want to do it
> (when you need to keep the original context), so there are no obvious
> alternatives that would handle all cases automagically.
I'm sure there are valid use cases where this behaviour is desirable. In fact, all approaches to creating reference-type structs rest on this, although in those cases the reference is usually a pointer to a payload rather than a delegate to a function.
The point is, it's a risk if you don't appreciate how value and reference types intermingle -- and even when you do in principle, it's easy to accidentally overlook in practice. The Phobos unittests, which for RandomSample are very extensive, didn't catch this problem.
|
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 09/02/13 15:49, Joseph Rushton Wakeling wrote:
> On 02/09/13 15:39, Artur Skawina wrote:
>> While this may not seem ideal, sometimes you do *not* want to do it (when you need to keep the original context), so there are no obvious alternatives that would handle all cases automagically.
>
> I'm sure there are valid use cases where this behaviour is desirable. In fact, all approaches to creating reference-type structs rest on this, although in those cases the reference is usually a pointer to a payload rather than a delegate to a function.
>
> The point is, it's a risk if you don't appreciate how value and reference types intermingle -- and even when you do in principle, it's easy to accidentally overlook in practice. The Phobos unittests, which for RandomSample are very extensive, didn't catch this problem.
Requiring captures to be explicit would reduce the chance of such bugs happening, but also have a significant cost and be a rather drastic change to the language...
In this case, there's no need for a delegate, as you do not want to operate on the original object. So you can simply do:
//...
private void function(ref typeof(this)) _jump;
private void jump() { _jump(this); }
this(size_t max)
{
_max = max;
_jump = &jump10;
}
//...
static jump10(ref typeof(this)this_)
{
this_._count += 10;
writeln("At end of jump, count = ", this_._count);
}
It's cheaper than the other alternative (updating ctx in cpctor), but slightly more verbose. More importantly, AFAICT, this is a better fit for the actual problem.
artur
|
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 02/09/13 16:18, Artur Skawina wrote: > Requiring captures to be explicit would reduce the chance of such > bugs happening, but also have a significant cost and be a rather > drastic change to the language... For what it's worth, I'm not advocating for any change in the language. I'm simply highlighting a place where it's possible to shoot oneself in the foot :-) > In this case, there's no need for a delegate, as you do not want > to operate on the original object. So you can simply do: > > //... > private void function(ref typeof(this)) _jump; > private void jump() { _jump(this); } > > this(size_t max) > { > _max = max; > _jump = &jump10; > } > //... > static jump10(ref typeof(this)this_) > { > this_._count += 10; > writeln("At end of jump, count = ", this_._count); > } > > It's cheaper than the other alternative (updating ctx in cpctor), > but slightly more verbose. More importantly, AFAICT, this is a > better fit for the actual problem. Oh, nice thought. Thank you! :-) |
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 09/02/13 16:43, Joseph Rushton Wakeling wrote:
>> In this case, there's no need for a delegate, as you do not want to operate on the original object. So you can simply do:
>>
>> //...
>> private void function(ref typeof(this)) _jump;
>> private void jump() { _jump(this); }
>>
>> this(size_t max)
>> {
>> _max = max;
>> _jump = &jump10;
>> }
>> //...
>> static jump10(ref typeof(this)this_)
>> {
>> this_._count += 10;
>> writeln("At end of jump, count = ", this_._count);
>> }
>>
>> It's cheaper than the other alternative (updating ctx in cpctor), but slightly more verbose. More importantly, AFAICT, this is a better fit for the actual problem.
>
> Oh, nice thought. Thank you! :-)
I'm not sure where the delegates are supposed to be defined, the above allows defining then externally. If that is not required and you only need to select them from a set of predefined internal ones, then you can use your original code with something like:
private void function() _jump;
private void jump() { void delegate() dg; dg.ptr=&this; dg.funcptr=_jump; dg(); }
this(size_t max)
{
_max = max;
_jump = (&jump10).funcptr;
}
That way `jump10` (and any other such functions) can still use the implicit-
'this'.
/I/ would probably just wrap the function body in 'with (this_) {...}'.
artur
|
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 02/09/13 17:02, Artur Skawina wrote:
> I'm not sure where the delegates are supposed to be defined, the above
> allows defining then externally. If that is not required and you only
> need to select them from a set of predefined internal ones, then you can
> use your original code with something like:
>
> private void function() _jump;
> private void jump() { void delegate() dg; dg.ptr=&this; dg.funcptr=_jump; dg(); }
> this(size_t max)
> {
> _max = max;
> _jump = (&jump10).funcptr;
> }
>
> That way `jump10` (and any other such functions) can still use the implicit-
> 'this'.
This second solution may actually rescue my revision to RandomSample -- I'm going to try it out. :-)
|
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 09/02/13 17:06, Joseph Rushton Wakeling wrote:
> On 02/09/13 17:02, Artur Skawina wrote:
>> I'm not sure where the delegates are supposed to be defined, the above allows defining then externally. If that is not required and you only need to select them from a set of predefined internal ones, then you can use your original code with something like:
>>
>> private void function() _jump;
>> private void jump() { void delegate() dg; dg.ptr=&this; dg.funcptr=_jump; dg(); }
>> this(size_t max)
>> {
>> _max = max;
>> _jump = (&jump10).funcptr;
>> }
>>
>> That way `jump10` (and any other such functions) can still use the implicit-
>> 'this'.
>
> This second solution may actually rescue my revision to RandomSample -- I'm going to try it out. :-)
The nasty part of that is that the type of .funcptr, hence also of the '_jump' pointer, is bogus. So calling via '_jump' directly can succeed, but do the wrong thing. It's a language issue; one reasonable workaround would be to define a MemberPtr type, which disallows direct calls.
artur
|
September 02, 2013 Re: Pitfalls of delegates inside ranges | ||||
---|---|---|---|---|
| ||||
On 02/09/13 17:44, Artur Skawina wrote:
> The nasty part of that is that the type of .funcptr, hence also of the
> '_jump' pointer, is bogus. So calling via '_jump' directly can succeed,
> but do the wrong thing. It's a language issue; one reasonable workaround
> would be to define a MemberPtr type, which disallows direct calls.
SafeDelegate, perhaps? Might be a worthwhile library addition.
|
Copyright © 1999-2021 by the D Language Foundation