Thread overview
Red-Black Gauss-seidel with mir
Sep 13
Christoph
Sep 13
9il
Sep 14
Christoph
Sep 14
9il
Sep 15
Christoph
September 13
Hi all,

I am trying to implement a sweep method for a 2D Red-black Gauss-Seidel Solver with the help of mir and its slices.
The fastest Version I discovered so far looks like this:
```
void sweep(T, size_t Dim : 2, Color color)(in Slice!(T*, 2) F, Slice!(T*, 2) U, T h2)
{
    const auto m = F.shape[0];
    const auto n = F.shape[1];
    auto UF = U.field;
    auto FF = F.field;

    foreach (i; 1 .. m - 1)
    {
        for (size_t j = 1 + (i + 1 + color) % 2; j < n - 1; j += 2)
        {
            auto flatindex = i * m + j;
            UF[flatindex] = (
                    UF[flatindex - m] +
                    UF[flatindex + m] +
                    UF[flatindex - 1] +
                    UF[flatindex + 1] - h2 * FF[flatindex]) / 4.0;
        }
    }
}
```
Accessing the field of the mirslice directly seems to be incredibly fast and I assume this might be the fastest version for this purpose?

I have already implemented this in Python with numpy and there it looks like this:

```
def sweep_2D(color, F, U, h2):

    m, n = F.shape

    if color == 1:
        # red
        U[1:m - 1:2, 2:n - 1:2] = (U[0:m - 2:2, 2:n - 1:2] +
                                   U[2::2, 2:n - 1:2] +
                                   U[1:m - 1:2, 1:n - 2:2] +
                                   U[1:m - 1:2, 3::2] -
                                   F[1:m - 1:2, 2:n - 1:2] * h2) / (4.0)
        U[2:m - 1:2, 1:n - 1:2] = (U[1:m - 2:2, 1:n - 1:2] +
                                   U[3::2, 1:n - 1:2] +
                                   U[2:m - 1:2, 0:n - 2:2] +
                                   U[2:m - 1:2, 2::2] -
                                   F[2:m - 1:2, 1:n - 1:2] * h2) / (4.0)
...

```
My Question now is: Is there a possibility too use the mir slices in a way that "looks" similar to the numpy version and is similar fast?
Below there is fastest version I found up to now, but it is still much slower then the version with the two for loops. It seems to me, that accessing and modifying only parts of a mirslice with this "indexed" syntax in combination with strided is really slow. Is this true or am I using the slices wrong?

```
static if (color == Color.red)
{
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] = U[0 .. m - 2, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 0 .. n - 2].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 2 .. n].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] -= F[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0;

        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] = U[1 .. m - 2, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[3 .. m, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 1 .. n - 2].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 3 .. n].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] -= F[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0;

}
...

```

Thanks for your help in Advance!

Christoph

September 13
On Sunday, 13 September 2020 at 14:48:30 UTC, Christoph wrote:
> Hi all,
>
> I am trying to implement a sweep method for a 2D Red-black Gauss-Seidel Solver with the help of mir and its slices.
> The fastest Version I discovered so far looks like this:
> ```
> void sweep(T, size_t Dim : 2, Color color)(in Slice!(T*, 2) F, Slice!(T*, 2) U, T h2)
> {
>     const auto m = F.shape[0];
>     const auto n = F.shape[1];
>     auto UF = U.field;
>     auto FF = F.field;
>
> [...]

Hi Christoph,

More details are required. What compiler and command line has been used?
The full source of the benchmark would be helpful.

Kind regards,
Ilya
September 14
Hi Ilya,

On Sunday, 13 September 2020 at 19:29:31 UTC, 9il wrote:
> More details are required. What compiler and command line has been used?

I have tested it with dmd and ldc and called them just with
$ dub build --compiler=ldc(dmd)
with no more configurations in the dub.json file.

I have compared them with a 100 x 100 example problem and the version with the for-loops from above takes around 1.8s compiled with ldc and 0.8s compiled with dmd.
The slow version from below takes with the same problem around 18s(dmd) and 12s (ldc) on my maschine.

The driver function for my Gaussseidel method looks like this:
```
/++
This is a Gauss Seidel Red Black implementation
it solves AU = F, with A being a poisson matrix like this
        1 1 1 1 .. 1
        1 4 1 0 .. 1
        1 1 4 1 .. 1
        .          .
        . 0..1 4 1 .
        1 .. 1 1 1 1
so the borders of U remain unchanged
Params:
    U = slice of dimension Dim
    F = slice of dimension Dim
    h = the distance between the grid points
Returns: U
+/

Slice!(T*, Dim) GS_RB(T, size_t Dim, size_t max_iter = 10_000_000,
        size_t norm_iter = 1_000, double eps = 1e-8)
       (in Slice!(T*, Dim) F, Slice!(T*, Dim) U, in T h)
        if (1 <= Dim && Dim <= 3 && isFloatingPoint!T)
{

    const T h2 = h * h;

    foreach (it; 1 .. max_iter + 1)
    {
        if (it % norm_iter == 0)
        {
            const auto norm = compute_residual!(T, Dim)(F, U, h).nrmL2;
            if (norm <= eps)
            {
                break;
            }

        }
        // rote Halbiteration
        sweep!(T, Dim, Color.red)(F, U, h2);
        // schwarze Halbiteration
        sweep!(T, Dim, Color.black)(F, U, h2);
    }
    return U;
}

```
And the full slow version looks like this:

```
void sweep(T, size_t Dim : 2, Color color)
     (in Slice!(T*, 2) F, Slice!(T*, 2) U, T h2)
{
    const auto m = F.shape[0];
    const auto n = F.shape[1];
    static if (color == Color.red)
    {
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] = U[0 .. m - 2, 1 .. n - 1].strided!(0, 1)(2, 2);
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m, 1 .. n - 1].strided!(0, 1)(2, 2);
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 0 .. n - 2].strided!(0, 1)(2, 2);
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 2 .. n].strided!(0, 1)(2, 2);
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] -= F[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2) * h2;
        U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] /= cast(T) 4.0;

        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] = U[1 .. m - 2, 2 .. n - 1].strided!(0, 1)(2, 2);
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[3 .. m, 2 .. n - 1].strided!(0, 1)(2, 2);
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 1 .. n - 2].strided!(0, 1)(2, 2);
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 3 .. n].strided!(0, 1)(2, 2);
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] -= F[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2) * h2;
        U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] /= cast(T) 4.0;
    }
    else static if (color == Color.black)
    {
        U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] = U[0 .. m - 2, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 1 .. n - 2].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 3 .. n].strided!(0, 1)(2, 2) / 4.0;
        U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] -= F[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0;

        U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] = U[1 .. m - 2, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[3 .. m, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 0 .. n - 2].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 2 .. n].strided!(0, 1)(2, 2) / 4.0;
        U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] -= F[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0;
    }
    else
    {
        static assert(false, color.stringof ~ "invalid color");
    }
}
```
Thank you very much for your time.

Christoph
September 14
On Monday, 14 September 2020 at 09:50:16 UTC, Christoph wrote:
> Hi Ilya,
>
> On Sunday, 13 September 2020 at 19:29:31 UTC, 9il wrote:
>> [...]
>
> I have tested it with dmd and ldc and called them just with
> $ dub build --compiler=ldc(dmd)
> with no more configurations in the dub.json file.
>
> [...]

On Monday, 14 September 2020 at 09:50:16 UTC, Christoph wrote:

For a release performance, it should be run in release mode
```
dub build --build=release --compiler=ldc2
```
I expect it will speed up the slow version a few times.

Also, the slow version has a few times more memory access then the fast version and Python. The improvement would look more like the C code and require inner loops.

Your fast version looks good too me. If it is correct, it is very good.
September 15
On Monday, 14 September 2020 at 14:32:08 UTC, 9il wrote:
> For a release performance, it should be run in release mode
> ```
> dub build --build=release --compiler=ldc2
> ```
> I expect it will speed up the slow version a few times.
Oh yes, that made both versions much faster!

Thank you very much!

Christoph