August 02, 2018
On Monday, 30 July 2018 at 21:02:56 UTC, Steven Schveighoffer wrote:
> Would it be a valid optimization to have D remove the requirement for allocation when it can determine that the entire data structure of the item in question is an rvalue, and would fit into the data pointer part of the delegate?
>
> Here's what I'm looking at:
>
> auto foo(int x)
> {
>    return { return x + 10; };
> }
>
> In this case, D allocates a pointer on the heap to hold "x", and then return a delegate which uses the pointer to read x, and then return that plus 10.
>
> However, we could store x itself in the storage of the pointer of the delegate. This removes an indirection, and also saves the heap allocation.
>
> Think of it like "automatic functors".
>
> Does it make sense? Would it be feasible for the language to do this? The type system already casts the delegate pointer to a void *, so it can't make any assumptions, but this is a slight break of the type system.
>
> The two requirements I can think of are:
> 1. The data in question must fit into a word
> 2. It must be guaranteed that the data is not going to be mutated (either via the function or any other function). Maybe it's best to require the state to be const/immutable.
>
> I've had several cases where I was tempted to not use delegates because of the allocation cost, and simply return a specialized struct, but it's so annoying to do this compared to making a delegate. Plus something like this would be seamless with normal delegates as well (in case you do need a real delegate).
>
> -Steve

I think the number of cases where you could optimize this is very small.  And the complexity of getting the compiler to analyze cases to determine when this is possible would be very large.

In addition, a developer can already do this explicitly if they want, i.e.

auto foo(int x)
{
    static struct DummyStructToMakeFunctionWithDelegateAbi
    {
        int passthru() const { return cast(int)&this; }
    }
    DummyStructToMakeFunctionWithDelegateAbi dummyStruct;
    auto dg = &dummyStruct.passthru;
    dg.ptr = cast(void*)(x + 10); // treat the void* pointer as an int value
    return dg;
}

void main(string[] args)
{
    auto dg = foo(32);
    import std.stdio;
    writefln("dg() = %s", dg());
}

It's definitely ugly but it works.  This will print the number "42" as expected.

This would be a case where DIP1011 extern(delegate) would come in handy :) i.e.

extern(delegate) int passthru(void* ptr) { return cast(int)ptr; }
int delegate() foo2(int x)
{
    return &(cast(void*)(x + 10)).passthru;
}

August 02, 2018
On Thursday, 2 August 2018 at 16:21:58 UTC, Jonathan Marler wrote:
> On Monday, 30 July 2018 at 21:02:56 UTC, Steven Schveighoffer wrote:
>> Would it be a valid optimization to have D remove the requirement for allocation when it can determine that the entire data structure of the item in question is an rvalue, and would fit into the data pointer part of the delegate?
>>
>> Here's what I'm looking at:
>>
>> auto foo(int x)
>> {
>>    return { return x + 10; };
>> }
>>
>> In this case, D allocates a pointer on the heap to hold "x", and then return a delegate which uses the pointer to read x, and then return that plus 10.
>>
>> However, we could store x itself in the storage of the pointer of the delegate. This removes an indirection, and also saves the heap allocation.
>>
>> Think of it like "automatic functors".
>>
>> Does it make sense? Would it be feasible for the language to do this? The type system already casts the delegate pointer to a void *, so it can't make any assumptions, but this is a slight break of the type system.
>>
>> The two requirements I can think of are:
>> 1. The data in question must fit into a word
>> 2. It must be guaranteed that the data is not going to be mutated (either via the function or any other function). Maybe it's best to require the state to be const/immutable.
>>
>> I've had several cases where I was tempted to not use delegates because of the allocation cost, and simply return a specialized struct, but it's so annoying to do this compared to making a delegate. Plus something like this would be seamless with normal delegates as well (in case you do need a real delegate).
>>
>> -Steve
>
> I think the number of cases where you could optimize this is very small.  And the complexity of getting the compiler to analyze cases to determine when this is possible would be very large.
>
> In addition, a developer can already do this explicitly if they want, i.e.
>
> auto foo(int x)
> {
>     static struct DummyStructToMakeFunctionWithDelegateAbi
>     {
>         int passthru() const { return cast(int)&this; }
>     }
>     DummyStructToMakeFunctionWithDelegateAbi dummyStruct;
>     auto dg = &dummyStruct.passthru;
>     dg.ptr = cast(void*)(x + 10); // treat the void* pointer as an int value
>     return dg;
> }
>
> void main(string[] args)
> {
>     auto dg = foo(32);
>     import std.stdio;
>     writefln("dg() = %s", dg());
> }
>
> It's definitely ugly but it works.  This will print the number "42" as expected.
>
> This would be a case where DIP1011 extern(delegate) would come in handy :) i.e.
>
> extern(delegate) int passthru(void* ptr) { return cast(int)ptr; }
> int delegate() foo2(int x)
> {
>     return &(cast(void*)(x + 10)).passthru;
> }

Actually, I'll do you one better.  Here's a potential library function for it.  I'm calling these types of delegates "value pointer delegates".

// Assume this is in a library somewhere
auto makeValuePtrDelegate(string valueName, string funcBody, T)(T value)
{
    static struct DummyStruct
    {
        auto method() const
        {
            mixin("auto " ~ valueName ~ " = cast(T)&this;");
            mixin (funcBody);
        }
    }
    DummyStruct dummy;
    auto dg = &dummy.method;
    dg.ptr = cast(void*)value;
    return dg;
}

auto foo(int x)
{
    return makeValuePtrDelegate!("val", q{ return val + 10; })(x);
}

void main(string[] args)
{
    auto dg = foo(32);
    import std.stdio;
    writefln("dg() = %s", dg());
}

August 02, 2018
On 8/2/18 12:21 PM, Jonathan Marler wrote:
> On Monday, 30 July 2018 at 21:02:56 UTC, Steven Schveighoffer wrote:
>> Would it be a valid optimization to have D remove the requirement for allocation when it can determine that the entire data structure of the item in question is an rvalue, and would fit into the data pointer part of the delegate?
>>
>> Here's what I'm looking at:
>>
>> auto foo(int x)
>> {
>>    return { return x + 10; };
>> }
>>
>> In this case, D allocates a pointer on the heap to hold "x", and then return a delegate which uses the pointer to read x, and then return that plus 10.
>>
>> However, we could store x itself in the storage of the pointer of the delegate. This removes an indirection, and also saves the heap allocation.
>>
>> Think of it like "automatic functors".
>>
>> Does it make sense? Would it be feasible for the language to do this? The type system already casts the delegate pointer to a void *, so it can't make any assumptions, but this is a slight break of the type system.
>>
>> The two requirements I can think of are:
>> 1. The data in question must fit into a word
>> 2. It must be guaranteed that the data is not going to be mutated (either via the function or any other function). Maybe it's best to require the state to be const/immutable.
>>
>> I've had several cases where I was tempted to not use delegates because of the allocation cost, and simply return a specialized struct, but it's so annoying to do this compared to making a delegate. Plus something like this would be seamless with normal delegates as well (in case you do need a real delegate).
>>
> 
> I think the number of cases where you could optimize this is very small.  And the complexity of getting the compiler to analyze cases to determine when this is possible would be very large.

It's not that complicated, you just have to analyze how much data is needed from the context inside the delegate. First iteration, all of the data has to be immutable, so it should be relatively straightforward.

> In addition, a developer can already do this explicitly if they want, i.e.
> 
> auto foo(int x)
> {
>      static struct DummyStructToMakeFunctionWithDelegateAbi
>      {
>          int passthru() const { return cast(int)&this; }
>      }
>      DummyStructToMakeFunctionWithDelegateAbi dummyStruct;
>      auto dg = &dummyStruct.passthru;
>      dg.ptr = cast(void*)(x + 10); // treat the void* pointer as an int value
>      return dg;
> }

Yep, just make that dummyStruct static or else it will allocate, and it should work. The concern I have with doing it this way is all the breakage of the type system.

-Steve
August 02, 2018
On Thursday, 2 August 2018 at 15:12:10 UTC, Steven Schveighoffer wrote:
> On 8/2/18 11:00 AM, Kagamin wrote:
>> I suppose it's mostly for mutability, so if it's const, it can be optimized based on type information only:
>> 
>> auto foo(in int x)
>> {
>>     return { return x + 10; };
>> }
>
> I'm not sure what you mean here.

I think he's saying that the check for immutability could simply consist in checking that all captured variables (well, not too much room for a lot of them ;)) have a const type.

It's definitely an interesting idea, and the obvious benefit over a library solution is that you wouldn't have to think about this optimization when writing a delegate; if the captured stuff happens to be const and fit into a pointer, the GC won't be bothered, nice.
August 02, 2018
On 8/2/18 3:57 PM, kinke wrote:
> On Thursday, 2 August 2018 at 15:12:10 UTC, Steven Schveighoffer wrote:
>> On 8/2/18 11:00 AM, Kagamin wrote:
>>> I suppose it's mostly for mutability, so if it's const, it can be optimized based on type information only:
>>>
>>> auto foo(in int x)
>>> {
>>>     return { return x + 10; };
>>> }
>>
>> I'm not sure what you mean here.
> 
> I think he's saying that the check for immutability could simply consist in checking that all captured variables (well, not too much room for a lot of them ;)) have a const type.

OK, yes, that's what I was thinking as well. On a 64-bit system, you could stuff 2 ints in there, which is a common theme for my code :)

> It's definitely an interesting idea, and the obvious benefit over a library solution is that you wouldn't have to think about this optimization when writing a delegate; if the captured stuff happens to be const and fit into a pointer, the GC won't be bothered, nice.

Yeah, a library solution is opt-in, whereas if the compiler does it as an optimization, it's seamless (mostly invisible). And works in @nogc when possible.

-Steve
August 02, 2018
Leaking may be an issue. This currently works:

```
static const(int)* global;

auto foo(const int param)
{
    return { global = &param; return param + 10; };
}

void main()
{
    {
        int arg = 42;
        auto dg = foo(42);
        auto r = dg();
        assert(r == 52);
    }
    assert(*global == 42);
}
```

`global` would be dangling as soon as the delegate `dg` goes out of scope.
August 02, 2018
On Thursday, 2 August 2018 at 21:28:27 UTC, kinke wrote:
> Leaking may be an issue.

Ah, I guess that's why you mentioned the use-as-rvalue requirement.
August 03, 2018
On Monday, 30 July 2018 at 21:02:56 UTC, Steven Schveighoffer wrote:
> Would it be a valid optimization to have D remove the requirement for allocation when it can determine that the entire data structure of the item in question is an rvalue, and would fit into the data pointer part of the delegate?
>
> Here's what I'm looking at:
>
> auto foo(int x)
> {
>    return { return x + 10; };
> }

This is something I've been wanting for a long time as well. It was also implemented in Ocean ( https://github.com/sociomantic-tsunami/ocean/blob/e53ac93fbf3bfa9b2dceec1a2b6dc4a0ec7f78b2/src/ocean/core/TypeConvert.d#L249-L311 ).
AFAIK it should be possible, although not trivial to do.
August 03, 2018
On Thursday, 2 August 2018 at 17:21:47 UTC, Steven Schveighoffer wrote:
> On 8/2/18 12:21 PM, Jonathan Marler wrote:
>> On Monday, 30 July 2018 at 21:02:56 UTC, Steven Schveighoffer wrote:
>>> Would it be a valid optimization to have D remove the requirement for allocation when it can determine that the entire data structure of the item in question is an rvalue, and would fit into the data pointer part of the delegate?
>>>
>>> Here's what I'm looking at:
>>>
>>> auto foo(int x)
>>> {
>>>    return { return x + 10; };
>>> }
>>>
>>> In this case, D allocates a pointer on the heap to hold "x", and then return a delegate which uses the pointer to read x, and then return that plus 10.
>>>
>>> However, we could store x itself in the storage of the pointer of the delegate. This removes an indirection, and also saves the heap allocation.
>>>
>>> Think of it like "automatic functors".
>>>
>>> Does it make sense? Would it be feasible for the language to do this? The type system already casts the delegate pointer to a void *, so it can't make any assumptions, but this is a slight break of the type system.
>>>
>>> The two requirements I can think of are:
>>> 1. The data in question must fit into a word
>>> 2. It must be guaranteed that the data is not going to be mutated (either via the function or any other function). Maybe it's best to require the state to be const/immutable.
>>>
>>> I've had several cases where I was tempted to not use delegates because of the allocation cost, and simply return a specialized struct, but it's so annoying to do this compared to making a delegate. Plus something like this would be seamless with normal delegates as well (in case you do need a real delegate).
>>>
>> 
>> I think the number of cases where you could optimize this is very small.  And the complexity of getting the compiler to analyze cases to determine when this is possible would be very large.
>
> It's not that complicated, you just have to analyze how much data is needed from the context inside the delegate. First iteration, all of the data has to be immutable, so it should be relatively straightforward.

After thinking about it more I suppose it wouldn't be that complicated to implement.  For delegate literals, you already need to gather a list of all the data you need to put on the heap, and if it can all fit inside a pointer, then you can just put it there instead.

On that note, I think if a developer wants to be sure that this optimization occurs in their code, they should explicitly use a library solution like the one in Ocean or the one I gave. If a developer relies on the optimization, then when it doesn't work you won't get any information as to why it couldn't perform the optimization (i.e. some data was mutable or were not r-values). Depending on the code, this failure will either be ignored or break some dependency on the optimization like @nogc.  With a library solution, it explicitly copies the data into the pointer so you'll get an explicit error message if it doesn't fit or has some other issue.

Something else to consider is this would cause some discrepancy with the @nogc attribute based on the platform's pointer width. By making this an optimization that you don't have to "opt-in", the developer may be unaware that their code is depending on this optimization that won't work on other platforms.  Their code could become platform-dependent without them knowing. However, I suppose the counter-argument is that code that uses delegate literals with @nogc would probably we aware of this, but still something to consider.

In the end, I think that most if not all use cases would be better off using the library solution if they want this optimization.  This allows the developer to "opt-in" or "opt-out" of this optimization and enables the compiler to provide error messages when the "opt-in" with incompatible usage.

August 03, 2018
On Friday, 3 August 2018 at 14:46:59 UTC, Jonathan Marler wrote:
> After thinking about it more I suppose it wouldn't be that complicated to implement.  For delegate literals, you already need to gather a list of all the data you need to put on the heap, and if it can all fit inside a pointer, then you can just put it there instead.

Nope, immutability (and no escaping) are additional requirements, as each delegate copy has its own context then, as opposed to a single shared GC closure.

> In the end, I think that most if not all use cases would be better off using the library solution if they want this optimization.

I disagree.