Thread overview
RFC: generic safety, specialized implementation?
Jan 19, 2018
Luís Marques
Jan 19, 2018
H. S. Teoh
Jan 19, 2018
Adam D. Ruppe
Jan 20, 2018
Timothee Cour
January 19, 2018
Consider some C code like this:

    struct Foo
    {
        struct Foo *next;
        int bar;
    };

    struct Bar
    {
        struct Bar *next;
        char* name;
    };

    struct Unrelated
    {
        int x;
        int y;
    };

    void combulate(void *item)
    {
        struct ItemWithNext
        {
            struct ItemWithNext *next;
        };

        struct ItemWithNext *inext = item;

        // ... lots of code here
        inext->next = ...;
    }

    int main()
    {
        struct Foo foo;
        struct Bar bar;
        struct Unrelated unrelated;
        combulate(&foo);
        combulate(&bar);
        combulate(&unrelated); // bug
    }

The function `combulate` must use a `void*` parameter to accept any argument that structurally conforms to the interface it expects -- in this case, having a `next` field as its first member. That has the disadvantage that the type system doesn't catch the mistake of passing it an Unrelated value, which doesn't conform to the expected binary interface. In D we might instead do something like this:

    struct Foo
    {
        Foo* next;
        int bar;
    }

    struct Bar
    {
        Bar* next;
        char* name;
    }

    struct Unrelated
    {
        int x;
        int y;
    };

    void combulate(T)(T* item)
    {
        // ... lots of code here
        item.next = ...;
    }

    void main()
    {
        Foo foo;
        Bar bar;
        Unrelated unrelated;
        combulate(&foo);
        combulate(&bar);
        combulate(&unrelated); // compilation error
    }

This catches the bug, but will have the disadvantage of generating code for the various types to which combulate is specialized, even though the body of the function template doesn't rely on anything that varies between the specializations. That is problematic in the context where I'm using this (embedded systems). So instead I've started using a mixed approach, with generic code that checks for the appropriate type, but delegates the actual work to a non-templated function (a fully specialized function template in my actual case), except in the cases where the actual code is small enough or the specialization significantly improves the performance. Something like this:

    private struct ItemWithNext
    {
        ItemWithNext *next;
    }

    private void combulate(ItemWithNext* item)
    {
        // ... lots of code here
        item.next = ...;
    }

    pragma(inline, true)
    void combulate(T)(T* item)
        if(someKindOfCheck!item)
    {
        combulate(cast(ItemWithNext*) item);
    }

So far this seems to be working well for me. Do you have experience writing this kind of code? Do you have any advice that might be relevant to this situation?

PS: I know I don't have to define and use a structure to access the next field, but I feel like that generalizes better to other scenarios and is clearer.
January 19, 2018
On Fri, Jan 19, 2018 at 07:18:22PM +0000, Luís Marques via Digitalmars-d wrote: [...]
>     void combulate(T)(T* item)
>     {
>         // ... lots of code here
>         item.next = ...;
>     }
[...]
[...]
> This catches the bug, but will have the disadvantage of generating code for the various types to which combulate is specialized, even though the body of the function template doesn't rely on anything that varies between the specializations.

Yeah, I think this is one area where the way the compiler implements templates could be improved.  Quite often, in a complex template (whether it's a template function or a template aggregate like a struct or class) only a subset of the code actually depends on the template arguments. Nevertheless, the compiler duplicates the entirety of the code in the generated template. This leads to a lot of template bloat.

I'm not sure how possible it is in the current compiler implementation, but it would be nice if the compiler were a bit smarter about instantiating templates.  If an instantiated function (could be any subset of the code, but granularity at the function level is probably easiest to implement) would not result in code that differs from other instantiations in the generated IR, only emit the code if it hasn't been emitted yet; otherwise just alias that particular instantiation to the previous instantiation.

Perhaps an enhancement request could be filed for this.


[...]
> That is problematic in the context where I'm using this (embedded systems). So instead I've started using a mixed approach, with generic code that checks for the appropriate type, but delegates the actual work to a non-templated function (a fully specialized function template in my actual case), except in the cases where the actual code is small enough or the specialization significantly improves the performance. Something like this:
[...]
> So far this seems to be working well for me. Do you have experience writing this kind of code? Do you have any advice that might be relevant to this situation?
[...]

Yeah, basically, this is just doing what I described above by hand. I've done similar refactorings before, to reduce template bloat. It certainly seems to work well.  However, it would be nice if the compiler automated such rote work for us.


T

-- 
Help a man when he is in trouble and he will remember you when he is in trouble again.
January 19, 2018
On Friday, 19 January 2018 at 19:18:22 UTC, Luís Marques wrote:
> So far this seems to be working well for me. Do you have experience writing this kind of code? Do you have any advice that might be relevant to this situation?

This is actually the way a lot of stuff is implemented in d runtime code, like the array helper functions (the interface done via a bit of magic in the compiler rather than templates mostly tho). I find it works in some cases with a bit smaller binary, little runtime speed change, and it can sometimes help compile speeds.

Of course, you can also sometimes use interfaces and classes to achieve this same goal.
January 19, 2018
this is very related to ICF (identical code folding), which some linkers do.

in summary: gold and lld have options to do that, and it can be unsafe to do if code relies on each function having distinct address;
> gold's --icf=safe option only enables ICF for functions that can be proven not to have their address taken, so code that relies on distinct addresses will still work.


links:
https://stackoverflow.com/questions/38662972/does-the-linker-usually-optimize-away-duplicated-code-from-different-c-templat
https://stackoverflow.com/questions/15168924/gcc-clang-merging-functions-with-identical-instructions-comdat-folding
https://releases.llvm.org/3.9.0/tools/lld/docs/ReleaseNotes.html


On Fri, Jan 19, 2018 at 11:29 AM, H. S. Teoh via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Fri, Jan 19, 2018 at 07:18:22PM +0000, Luís Marques via Digitalmars-d wrote: [...]
>>     void combulate(T)(T* item)
>>     {
>>         // ... lots of code here
>>         item.next = ...;
>>     }
> [...]
> [...]
>> This catches the bug, but will have the disadvantage of generating code for the various types to which combulate is specialized, even though the body of the function template doesn't rely on anything that varies between the specializations.
>
> Yeah, I think this is one area where the way the compiler implements templates could be improved.  Quite often, in a complex template (whether it's a template function or a template aggregate like a struct or class) only a subset of the code actually depends on the template arguments. Nevertheless, the compiler duplicates the entirety of the code in the generated template. This leads to a lot of template bloat.
>
> I'm not sure how possible it is in the current compiler implementation, but it would be nice if the compiler were a bit smarter about instantiating templates.  If an instantiated function (could be any subset of the code, but granularity at the function level is probably easiest to implement) would not result in code that differs from other instantiations in the generated IR, only emit the code if it hasn't been emitted yet; otherwise just alias that particular instantiation to the previous instantiation.
>
> Perhaps an enhancement request could be filed for this.
>
>
> [...]
>> That is problematic in the context where I'm using this (embedded systems). So instead I've started using a mixed approach, with generic code that checks for the appropriate type, but delegates the actual work to a non-templated function (a fully specialized function template in my actual case), except in the cases where the actual code is small enough or the specialization significantly improves the performance. Something like this:
> [...]
>> So far this seems to be working well for me. Do you have experience writing this kind of code? Do you have any advice that might be relevant to this situation?
> [...]
>
> Yeah, basically, this is just doing what I described above by hand. I've done similar refactorings before, to reduce template bloat. It certainly seems to work well.  However, it would be nice if the compiler automated such rote work for us.
>
>
> T
>
> --
> Help a man when he is in trouble and he will remember you when he is in trouble again.