Thread overview
Suboptimal array copy in druntime?
Apr 16, 2017
Guillaume Chatelet
Apr 16, 2017
Stefan Koch
Apr 16, 2017
Nicholas Wilson
Apr 16, 2017
Guillaume Chatelet
April 16, 2017
I was looking at the _d_arrayassign family functions in druntime:
https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L47
https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L139

The code seems suboptimal for several reasons:

1. memcpy is more efficient on big arrays than iterating on a few bytes because it can use mmx/sse/avx. I would naturally memcpy the whole array and postblit/destroy individual elements separately.

2. ti.destroy and ti.postblit are always called but they might do nothing, since the code is not templated the compiler can't eliminate the calls. How about caching in TypeInfo if the type has a non empty destructor / postblit and do:

  if(ti.hasDestroy)
    for(element : dst_array)
      ti.destroy(element);
  memcpy(dst_array, src_array);
  if(ti.hasPostBlit)
    for(element : dst_array)
      ti.postblit(element);

Granted that worse case we iterate the array several time, we could fallback to the current implementation if both are set.

Did I miss something?
April 16, 2017
On Sunday, 16 April 2017 at 10:08:22 UTC, Guillaume Chatelet wrote:
> I was looking at the _d_arrayassign family functions in druntime:
> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L47
> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L139
>
> [...]

Nope.
Those are valid points.

Templatizing the code is the way to go.
April 16, 2017
On Sunday, 16 April 2017 at 10:33:01 UTC, Stefan Koch wrote:
> On Sunday, 16 April 2017 at 10:08:22 UTC, Guillaume Chatelet wrote:
>> I was looking at the _d_arrayassign family functions in druntime:
>> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L47
>> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L139
>>
>> [...]
>
> Nope.
> Those are valid points.
>
> Templatizing the code is the way to go.

Indeed. See also http://dconf.org/2017/talks/cojocaru.html
April 16, 2017
On Sunday, 16 April 2017 at 11:25:15 UTC, Nicholas Wilson wrote:
> On Sunday, 16 April 2017 at 10:33:01 UTC, Stefan Koch wrote:
>> On Sunday, 16 April 2017 at 10:08:22 UTC, Guillaume Chatelet wrote:
>>> I was looking at the _d_arrayassign family functions in druntime:
>>> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L47
>>> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L139
>>>
>>> [...]
>>
>> Nope.
>> Those are valid points.
>>
>> Templatizing the code is the way to go.
>
> Indeed. See also http://dconf.org/2017/talks/cojocaru.html

Sweet! Glad to see this is being worked on :)
April 16, 2017
On Sunday, 16 April 2017 at 11:58:11 UTC, Guillaume Chatelet wrote:
> On Sunday, 16 April 2017 at 11:25:15 UTC, Nicholas Wilson wrote:
>> On Sunday, 16 April 2017 at 10:33:01 UTC, Stefan Koch wrote:
>>> On Sunday, 16 April 2017 at 10:08:22 UTC, Guillaume Chatelet wrote:
>>>> I was looking at the _d_arrayassign family functions in druntime:
>>>> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L47
>>>> https://github.com/dlang/druntime/blob/master/src/rt/arrayassign.d#L139
>>>>
>>>> [...]
>>>
>>> Nope.
>>> Those are valid points.
>>>
>>> Templatizing the code is the way to go.
>>
>> Indeed. See also http://dconf.org/2017/talks/cojocaru.html
>
> Sweet! Glad to see this is being worked on :)

Specifically, see these pull requests as an example how the rest of druntime can be turned into templates:

https://github.com/dlang/dmd/pull/6597
https://github.com/dlang/druntime/pull/1781
https://github.com/dlang/dmd/pull/6634
https://github.com/dlang/druntime/pull/1792

I tested the array comparison lowering a couple of weeks ago and the results looked promising with regards to reducing link-time dependencies:

main.cpp:

#include <array>
#include <cstdio>

int compareArrays(const int *p1, size_t len1, const int *p2, size_t len2);

int main()
{
    std::array<int, 3> arr1 = {1, 2, 3};
    std::array<int, 3> arr2 = {1, 2, 4};

    int res = compareArrays(arr1.begin(), arr1.size(), arr2.begin(), arr2.size());

    printf("%d\n", res);
}
compare.d:

extern(C++) pure nothrow @nogc
int compareArrays(scope const(int)* p1, size_t len1, scope const(int)* p2, size_t len2)
{
    return p1[0 .. len1] < p2[0 .. len2];
}

extern(C) void _d_dso_registry() {}
$ ~/dlang/install.sh install dmd-nightly
Downloading and unpacking http://nightlies.dlang.org/dmd-master-2017-03-28/dmd.master.linux.tar.xz
######################################################################## 100.0%
dub-1.2.1 already installed

Run `source ~/dlang/dmd-master-2017-03-28/activate` in your shell to use dmd-master-2017-03-28.
This will setup PATH, LIBRARY_PATH, LD_LIBRARY_PATH, DMD, DC, and PS1.
Run `deactivate` later on to restore your environment.

$ source ~/dlang/dmd-master-2017-03-28/activate

$ g++ -std=c++11 -c main.cpp && \
  dmd -O -betterC -c compare.d && \
  g++ main.o compare.o -o d_array_compare

$ ./d_array_compare
1

$ nm compare.o
0000000000000000 t
0000000000000000 W _D6object12__T5__cmpTiZ5__cmpFNaNbNiNexAixAiZi
0000000000000000 T _d_dso_registry
                 U _d_dso_registry
                 U _GLOBAL_OFFSET_TABLE_
                 U __start_minfo
                 U __stop_minfo
0000000000000000 T _Z13compareArraysPKimS0_m

So choose your favorite under-performing runtime hook (https://wiki.dlang.org/Runtime_Hooks; https://github.com/dlang/dmd/blob/v2.074.0/src/ddmd/backend/rtlsym.h#L42 - definitive list) and turn it into a template :P