Jump to page: 1 2
Thread overview
Choosing an Approach for the Template Lowering of _d_arrayctor
4 days ago
Teodor Dutu
4 days ago
Teodor Dutu
4 days ago
Paul Backus
4 days ago
russhy
4 days ago
Teodor Dutu
4 days ago
kinke
3 days ago
kinke
4 days ago

Hi,

As part of SAoC 2021, I changed the lowering of _d_arrayctor from using the runtime hook to a template, via theses PRs:

We'll call the former approach the hook approach and the latter, the template approach.

This template approach required me to declare the returned array inside a union. This was needed because by declaring this array directly, the compiler was inserting a call to __ArrayDtor before exiting the function. This call was in contradiction with the error handling already performed by _d_arrayctor. I gave a few more details about this issue in this post. However, this came with the drawback of not using NRVO, because of the cast at end of the function. This additional copy causes a performance decrease, compared to the hook approach, as I'm going to show later in this post.

A third approach, which I call the hack approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter to _d_arrayctor, in order to change the function's strong purity to weak purity. The reasons for this unorthodox approach are better explained in this post, and the PR for it is here. Due to this hack, there is no need to declare the returned array inside the scope of _d_arrayctor, which also removes the need for the union trick. As a result, no additional copying is performed and this approach has increased performance when compared to the hook approach.

In order to measure the performance of each lowering, I designed this benchmark, which I compiled with dmd and ran, using all 3 approaches. It measures the time taken by the call to _d_arrayctor for 3 struct sizes: small (empty), medium (64 bytes) and large (256 bytes), as well as 3 array lengths: small (1 element), medium (64 elements) and large (256 elements). 256 elements is a large enough array, because the lowering to _d_arrayctor is only performed for static arrays, which are unlikely to be much larger. The numerical results can be seen below:

Running benchmark on branch master (_d_arrayctor as a hook):
_0B_Struct_1Elem @ 1000000 runs: average time = 13ms; std dev = 9.53674e-06
_0B_Struct_64Elems @ 1000000 runs: average time = 280.4ms; std dev = 0.489898
_0B_Struct_256Elems @ 1000000 runs: average time = 1062.18ms; std dev = 0.433128
_64B_Struct_1Elem @ 1000000 runs: average time = 13.01ms; std dev = 0.0994988
_64B_Struct_64Elems @ 1000000 runs: average time = 309.99ms; std dev = 0.0994992
_64B_Struct_256Elems @ 1000000 runs: average time = 1194.11ms; std dev = 0.31289
_256B_Struct_1Elem @ 1000000 runs: average time = 15ms; std dev = 1.43051e-05
_256B_Struct_64Elems @ 1000000 runs: average time = 482.23ms; std dev = 1.73698
_256B_Struct_256Elems @ 1000000 runs: average time = 2960.72ms; std dev = 1.20067

Running benchmark with _d_arrayctor as a template:
0B_Struct_1Elem @ 1000000 runs: average time = 9.6934ms; std dev = 0.019658
0B_Struct_64Elems @ 1000000 runs: average time = 219.086ms; std dev = 0.467551
0B_Struct_256Elems @ 1000000 runs: average time = 820.951ms; std dev = 1.88878
64B_Struct_1Elem @ 1000000 runs: average time = 15.6967ms; std dev = 0.046066
64B_Struct_64Elems @ 1000000 runs: average time = 287.54ms; std dev = 0.979795
64B_Struct_256Elems @ 1000000 runs: average time = 1177.63ms; std dev = 1.06447
256B_Struct_1Elem @ 1000000 runs: average time = 20.8763ms; std dev = 0.259867
256B_Struct_64Elems @ 1000000 runs: average time = 981.405ms; std dev = 0.67177
256B_Struct_256Elems @ 1000000 runs: average time = 4029.59ms; std dev = 0.511771

Running benchmark with _d_arrayctor as a template (with the hack-y 3rd parameter):
_0B_Struct_1Elem @ 1000000 runs: average time = 9.00001ms; std dev = 6.67572e-06
_0B_Struct_64Elems @ 1000000 runs: average time = 183.94ms; std dev = 0.525738
_0B_Struct_256Elems @ 1000000 runs: average time = 689.091ms; std dev = 0.286183
_64B_Struct_1Elem @ 1000000 runs: average time = 10.01ms; std dev = 0.0994987
_64B_Struct_64Elems @ 1000000 runs: average time = 227.23ms; std dev = 0.645833
_64B_Struct_256Elems @ 1000000 runs: average time = 925.25ms; std dev = 0.766486
_256B_Struct_1Elem @ 1000000 runs: average time = 11.07ms; std dev = 0.255147
_256B_Struct_64Elems @ 1000000 runs: average time = 417.92ms; std dev = 0.271293
_256B_Struct_256Elems @ 1000000 runs: average time = 2576.1ms; std dev = 8.36003

In order to better visualise the numbers above, I also plotted the bar charts below:

_d_arrayctor Performance Comparison

As you can see, the hack is faster than the hook. Its running times are about 15-25% lower than those of the hook for larger structs (64 and 256 bytes) and as much as 35% lower for the empty struct. On the other hand, the template approach, while being 20-25% faster than the hook when called on the empty struct, is as much as 35% slower than the hook, on larger (256-byte) structs.

Due to this discrepancy in performance, I think the best approach is the hack, despite adding an unused parameter to _d_arrayctor. What do you think?

Thanks,
Teodor

4 days ago

On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:

>

A third approach, which I call the hack approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter...

Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?

4 days ago

On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov wrote:

>

On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:

>

A third approach, which I call the hack approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter...

Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?

The first iteration of _d_arrayctor was like this, as implemented by this PR. The problem with it was that it made the compiler issue some warnings when testing phobos, because the lowering of something like S[3] dst = src; looked like this: _d_arrayctor(dst[], src[]). _d_arrayctor was strongly pure and, since the return value of the function was ignored, the compiler deemed the whole function call useless.

I went more in-depth about this problem in this post. Following this, I tried to change _d_arrayctor to declare the returned array inside its own scope, hoping to make use of NRVO, but it wasn't the case, because I had to resort to the union trick.

4 days ago

On Thursday, 25 November 2021 at 14:35:56 UTC, Teodor Dutu wrote:

>

On Thursday, 25 November 2021 at 14:24:59 UTC, Stanislav Blinov wrote:

>

On Thursday, 25 November 2021 at 13:28:18 UTC, Teodor Dutu wrote:

>

A third approach, which I call the hack approach, is very close to the one using the template. The difference is that the hack adds a third, unused, pointer-type parameter...

Hm... if you're going to have a pointer parameter, why not instead simply keep the two parameter variant but pass the first parameter by reference?

The first iteration of _d_arrayctor was like this, as implemented by this PR. The problem with it was that it made the compiler issue some warnings when testing phobos, because the lowering of something like S[3] dst = src; looked like this: _d_arrayctor(dst[], src[]). _d_arrayctor was strongly pure and, since the return value of the function was ignored, the compiler deemed the whole function call useless.

If you take the first parameter by reference, that shouldn't happen. I.e. in

void _d_arrayctor(A,B)(return ref A to, scope B from)
if (is(A : X[]) && is (B : Y[]) && is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) { /* ... */ }

A would be a static array, taken by reference.

4 days ago

On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov wrote:

>

If you take the first parameter by reference, that shouldn't happen. I.e. in

void _d_arrayctor(A,B)(return ref A to, scope B from)
if (is(A : X[]) && is (B : Y[]) && is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) { /* ... */ }

A would be a static array, taken by reference.

Unfortunately it is impossible to implement _d_arrayctor with this signature without invoking undefined behavior. See the discussion in this thread for details: https://forum.dlang.org/thread/simesvkancmscrtsciwq@forum.dlang.org

4 days ago
I would look at placing an out or ref on one of the two parameters and rewrite the call slightly, rather than a new parameter.

F[] from = [...], to;
to._d_arrayctor(form);

To something more like that.

But the benchmarks will say weather this works or not.
4 days ago

On Thursday, 25 November 2021 at 15:10:10 UTC, Paul Backus wrote:

>

On Thursday, 25 November 2021 at 14:59:54 UTC, Stanislav Blinov wrote:

>

If you take the first parameter by reference, that shouldn't happen. I.e. in

void _d_arrayctor(A,B)(return ref A to, scope B from)
if (is(A : X[]) && is (B : Y[]) && is(immutable(typeof(A.init[0])) == immutable(typeof(B.init[0])))) { /* ... */ }

A would be a static array, taken by reference.

Unfortunately it is impossible to implement _d_arrayctor with this signature without invoking undefined behavior. See the discussion in this thread for details: https://forum.dlang.org/thread/simesvkancmscrtsciwq@forum.dlang.org

Yikes. But then this goes the same for the "hack" version of the OP, does it not?..

What can be done then? If I understand correctly, the union approach was arrived at due to void initialization, as in case of exception array of garbage data is getting destructed. But then, couldn't one just write a T.init into the whole array if an exception is thrown, and not rely on a union at all?..

A _d_arrayctor(A,B)(B from)
if (appropriateContraintGoesHere)
{
    Unqual!(typeof(A.init[0]))[A.length] result = void;
    version (_D_BetterC) {}
    else scope(failure)
        foreach (ref it; result)
            emplaceInitializer(it); // this is nothrow
    copyEmplaceRange(from, result[]);
    return result;
}

though there's the question of calling appropriate copy ctors, "thanks" to all the qualifier combinations.

4 days ago

On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:

>
    foreach (ref it; result)
        emplaceInitializer(it); // this is nothrow

...pff, yeah, of course, and then we run into a segfault if it's an array of nested structs, which was the problem "solved" with void initialization in the first place. Not fun >:-E

4 days ago

Very nice post, with benchmark and graph to visualize! thanks for the effort!

4 days ago

On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:

>

What can be done then? If I understand correctly, the union approach was arrived at due to void initialization, as in case of exception array of garbage data is getting destructed. But then, couldn't one just write a T.init into the whole array if an exception is thrown, and not rely on a union at all?..

Actually, I used void initialisation because T.init couldn't be used when T is a nested struct, because of the context pointer, which is unavailable in the scope of _d_arrayctor.

On Thursday, 25 November 2021 at 16:16:11 UTC, Stanislav Blinov wrote:

> >
   foreach (ref it; result)
       emplaceInitializer(it); // this is nothrow

...pff, yeah, of course, and then we run into a segfault if it's an array of nested
structs, which was the problem "solved" with void initialization in the first place.
Not fun >:-E

I think you've already touched on what I said above, here. It's just that there wouldn't be a seg fault, but a compiler error, because of the aforementioned context pointer.

« First   ‹ Prev
1 2