Thread overview
bounds checking and optimization
Jun 27, 2021
Bruce Carneal
Jun 29, 2021
Johan
Jun 30, 2021
Bruce Carneal
Jun 30, 2021
Johan
Jun 30, 2021
Bruce Carneal
Jul 01, 2021
Elronnd
Jul 01, 2021
Bruce Carneal
Jun 30, 2021
David Nadlinger
June 27, 2021

The following code auto vectorizes nicely, as you'd hope:

void abc(const(int)[] a, const(int)[] b, int[] c) @nogc nothrow pure @trusted {
    const bound = c.length;
    (a.length == bound) || assert(0);
    (b.length == bound) || assert(0);
    foreach(i; 0..bound)
        c.ptr[i] = a.ptr[i] + b.ptr[i];
}

If you drop the .ptr suffixes the bounds checks are elided, correctly, but the code is not auto vectorized.

Similarly, if you drop the .ptr suffixes and compile @safe, the bounds checks are elided but the code is not auto vectorized.

It would be great if known-within-bound indexing could yield fast code without requiring either .ptr or @trusted. IOW, I'm hoping for faster safe code. Is that easily achieved in cases like the above or is it quite hard?

June 29, 2021

On Sunday, 27 June 2021 at 17:49:22 UTC, Bruce Carneal wrote:

>

The following code auto vectorizes nicely, as you'd hope:

void abc(const(int)[] a, const(int)[] b, int[] c) @nogc nothrow pure @trusted {
    const bound = c.length;
    (a.length == bound) || assert(0);
    (b.length == bound) || assert(0);
    foreach(i; 0..bound)
        c.ptr[i] = a.ptr[i] + b.ptr[i];
}

If you drop the .ptr suffixes the bounds checks are elided, correctly, but the code is not auto vectorized.

Similarly, if you drop the .ptr suffixes and compile @safe, the bounds checks are elided but the code is not auto vectorized.

It would be great if known-within-bound indexing could yield fast code without requiring either .ptr or @trusted. IOW, I'm hoping for faster safe code. Is that easily achieved in cases like the above or is it quite hard?

Vectorization does happen when running it again through the optimization pipeline:
D --> LLVM IR: https://d.godbolt.org/
IR --> opt --> IR: https://opt.godbolt.org/

This means that we can solve it by reordering/optimizing our optimization pipeline in LDC. It is non-trivial work and I would suggest not to undertake it before we have switched to LLVM's new PassManager (because I think that also tweaks the order of optimization passes)

-Johan

June 30, 2021

On Tuesday, 29 June 2021 at 11:20:19 UTC, Johan wrote:

>

On Sunday, 27 June 2021 at 17:49:22 UTC, Bruce Carneal wrote:

>

[snip]
[snip]

This means that we can solve it by reordering/optimizing our optimization pipeline in LDC. It is non-trivial work and I would suggest not to undertake it before we have switched to LLVM's new PassManager (because I think that also tweaks the order of optimization passes)

-Johan

Thanks for the update Johan. I'll keep working with the "manual" version, @trusted and .ptr, and watch for any updates. You guys seem pretty busy already so I hope it fits in with your other work.

A related question: is there anything preventing the compiler from lifting range checks out of loops and throwing early? For example, on something like:

foreach(i, ref dst; c[])
    dst = a[i] + b[i];

could c.length be extracted and asserted as being leq a.length and b.length before we drop in to the loop?

Finally, it seems like bounds checking optimizations of this sort belong in the front end or am I mistaken?

June 30, 2021

On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:

>

A related question: is there anything preventing the compiler from lifting range checks out of loops and throwing early? For example, on something like:

foreach(i, ref dst; c[])
    dst = a[i] + b[i];

could c.length be extracted and asserted as being leq a.length and b.length before we drop in to the loop?

That would be observable behavior change, so not possible. Unless the behavior upon out-of-bounds access is defined as UB.

>

Finally, it seems like bounds checking optimizations of this sort belong in the front end or am I mistaken?

I am wary of any optimization actually performed (rather than checking for validity) in the frontend.

-Johan

June 30, 2021

On Wednesday, 30 June 2021 at 11:43:08 UTC, Johan wrote:

>

On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:

>

A related question: is there anything preventing the compiler from lifting range checks out of loops and throwing early? For example, on something like:

foreach(i, ref dst; c[])
    dst = a[i] + b[i];

could c.length be extracted and asserted as being leq a.length and b.length before we drop in to the loop?

That would be observable behavior change, so not possible. Unless the behavior upon out-of-bounds access is defined as UB.

So a compiler that could pull bounds checking out of a loop would need to generate code that would run a potentially shortened loop and then throw if OOB. Something like:

const __bound = min(a.length, b.length, c.length);
const __throwAtEnd = a.length < __bound || b.length < __bound;
foreach(i, ref dst; c[0..__bound])
    dst = a[i] + b[i];
if (__throwAtEnd) { /* throw */ }

The above is ugly enough that I imagine it puts enabling speedy @safe loops of this type pretty far down on the TODO list.

I now suspect that there may be an easier path to safe data parallel code: we improve code gen for sequences of operations on slices (single bounds check at the top of the basic block and operation fusing to reduce the memory touches).

Thanks for the info.

July 01, 2021
On 30 Jun 2021, at 12:43, Johan via digitalmars-d-ldc wrote:
> On Wednesday, 30 June 2021 at 01:18:53 UTC, Bruce Carneal wrote:
>> could c.length be extracted and asserted as being leq a.length and b.length before we drop in to the loop?
>
> That would be observable behavior change, so not possible. Unless the behavior upon out-of-bounds access is defined as UB.

There might be LLVM optimisation passes to recognise the range checks and coalesce them into a single one before the loop header, though – I vaguely remember seeing something like this in the tree at some point, developed with a few towards Java, and not enabled by default in the PassManagerBuilder. We might be able to reuse/adapt that logic.

 — David
July 01, 2021
On Wednesday, 30 June 2021 at 15:51:58 UTC, Bruce Carneal wrote:
> So a compiler that could pull bounds checking out of a loop would need to generate code that would run a potentially shortened loop and then throw if OOB.

This is effectively how loop unrolling works in hotspot.
July 01, 2021

On Wednesday, 30 June 2021 at 15:51:58 UTC, Bruce Carneal wrote:

>

...

const __bound = min(a.length, b.length, c.length);
const __throwAtEnd = a.length < __bound || b.length < __bound;

The compiler emitted throw flag in the pseudo-code should be:

const __throwAtEnd = a.length < c.length || b.length < c.length;

Sorry bout the noise. I appreciate the later contributors having read through it to the intent.

Glad to hear that there may be some paths open to faster @safe code in the future.