By ref and by pointer kills performance. (page 2)

Settings

Help

Index » General » By ref and by pointer kills performance. (page 2)

February 13

Re: By ref and by pointer kills performance.

Posted by deadalnix
in reply to Patrick Schluter

Permalink

deadalnix

Posted in reply to Patrick Schluter

Permalink

On Tuesday, 13 February 2024 at 15:11:58 UTC, Patrick Schluter wrote:

Yes, that's normal. The compiler cannot know from the declaration alone if your pointer overlaps. In C you can declare the pointers with restrict which will tell the compiler that the pointers don't overlap. I don't know why D doesn't support restrict.

It's a major footgun without much benefit. In this case, one could load the integers in a local variable and use the value from there, and you'd get teh same effet.

February 13

Re: By ref and by pointer kills performance.

Posted by Bruce Carneal
in reply to jmh530

Permalink

Bruce Carneal

Posted in reply to jmh530

Permalink

On Tuesday, 13 February 2024 at 15:40:23 UTC, jmh530 wrote:

On Tuesday, 13 February 2024 at 06:02:47 UTC, Bruce Carneal wrote:

[snip]

To reuse the value the compiler would have to prove that the memory locations do not overlap. FORTRAN does not have this problem, neither does ldc once you take responsibility for non-overlap with the @restrict attribute as seen here:

https://d.godbolt.org/z/z9vYndWqP

When loops are involved between potentially overlapping indexed arrays I've seen ldc go through the proof and do two versions of the code with a branch.

As a heads up, the LDC wiki page doesn't have restrict on it.

https://wiki.dlang.org/LDC-specific_language_changes

Does LDC's @restrict only work with pointers directly and not slices? fillRestricted2 doesn't compile (it only fails because value is a slice, not because dest is one. But fillRestricted3 compiles just fine.

void fillRestricted2(@restrict uint[] value, uint[] dest)
{
    dest[0] = value[0];
    dest[1] = value[1];
    dest[2] = value[2];
    dest[3] = value[3];
}

void fillRestricted3(@restrict uint* value, uint[] dest)
{
    dest[0] = value[0];
    dest[1] = value[1];
    dest[2] = value[2];
    dest[3] = value[3];
}

There was a little discussion a while back about @restrict being generalized to slices but, as you note, it's not there yet.

I typically use @restrict on slice .ptr fields passed into nested functions within an @trusted function. Yes, that's a little clumsy but it lets you do bounds checks and the like before dropping into the hot loops.

If you're willing to tolerate the code bloat, and your code is simple, ldc will do the proof and auto-vec things for you. @restrict also seems to keep the auto-vec optimizer engaged in not-strictly-vanilla formulations.

If you have multiple sources and a single destination array note that you only need @restrict on the destination. You're just trying to give the back end a clear view of dependencies.

As always, godbolt is your friend when optimizing known bottlenecks of this sort.

February 13

Re: By ref and by pointer kills performance.

Posted by jmh530
in reply to Bruce Carneal

Permalink

jmh530

Posted in reply to Bruce Carneal

Permalink

On Tuesday, 13 February 2024 at 15:56:59 UTC, Bruce Carneal wrote:

[snip]

There was a little discussion a while back about @restrict being generalized to slices but, as you note, it's not there yet.

Thanks for the detailed response!

[snip]

As always, godbolt is your friend when optimizing known bottlenecks of this sort.

godbolt is definitely your friend when you can use it.

February 13

Re: By ref and by pointer kills performance.

Posted by claptrap
in reply to Bruce Carneal

Permalink

claptrap

Posted in reply to Bruce Carneal

Permalink

On Tuesday, 13 February 2024 at 06:02:47 UTC, Bruce Carneal wrote:

On Tuesday, 13 February 2024 at 02:11:45 UTC, claptrap wrote:

https://d.godbolt.org/z/z9vYndWqP

When loops are involved between potentially overlapping indexed arrays I've seen ldc go through the proof and do two versions of the code with a branch.

Ah OK makes sense. The restrict attribute will help with what I'm doing.

thanks.

February 14

Re: By ref and by pointer kills performance.

Posted by Richard (Rikki) Andrew Cattermole
in reply to Patrick Schluter

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Patrick Schluter

Permalink

On 14/02/2024 4:14 AM, Patrick Schluter wrote:
> Is not a thread issue. The memory the pointers point to only needs to overlap and the loads are required to get the "right" result.

It can be a thread issue, but yes overlapping is another potential situation where it can get over written. I prefer talking about threads, because they bring an entirely unknown element to the discussion and shutdown any localized assumptions.

In the case of immutable it should not be possible for immutable memory to be changed unless someone did something bad.

Immutable is a very strong guarantee at the process level, that there are no mutable pointers pointing at immutable memory.

Therefore the compiler is free to make that optimization without concern. If it breaks, it's the users fault for misusing the language.

"The second way is to cast data to immutable. When doing so, it is up to the programmer to ensure that any mutable references to the same data are not used to modify the data after the cast."

https://dlang.org/spec/const3.html#creating_immutable_data

February 14

Re: By ref and by pointer kills performance.

Posted by Richard (Rikki) Andrew Cattermole
in reply to Johan

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Johan

Permalink

On 14/02/2024 2:30 AM, Johan wrote:
> On Tuesday, 13 February 2024 at 03:31:31 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> dmd having bad codegen here isn't a surprise, that is to be expected.
>>
>> Now for ldc:
>>
>> ```d
>> void fillBP(immutable(uint*) value, uint* dest) {
>>      dest[0] = *value;
>>      dest[1] = *value;
>>      dest[2] = *value;
>>      dest[3] = *value;
>> }
>> ```
>>
>> I expected that to not do the extra loads, but it did.
> 
> I hope someone can find the link to some DConf talk (me or Andrei) or forum post where I talk about why LDC assumes that `immutable(uint*)` points to _mutable_ (nota bene) data. The reason is the mutable thread synchronization field in immutable class variable storage (`__monitor`), combined with casting an immutable class to an array of immutable bytes.
> 
> Side-effects in-between immutable(uint*) lookup could run into a synchronization event on the immutable data (i.e. mutating it).
> In the case of `fillBP` there are no side-effects possible between the reads, so it appears that indeed the optimization could be done. But a different thread might write to the data. I don't know how that data-race is then defined...
> For the general case, side-effects are possible (e.g. a function call) so it is not possible to simply assume that `immutable` reference arguments never alias to other reference arguments; this complicates implementing the desired optimization.
> I'm not saying it is impossible, it's just extra effort (and proof is needed).
> 
> -Johan

If you don't like the problem, change the problem.

After thinking about it, I can agree for D classes immutable can't be known if it truely applies or not. A lock could still be taken in const code.

So the problem is:

For a type that is not a D class (COM, extern(C++) are not included in that definition), mark as immutable to LLVM IR so optimizations can be used wrt. immutable.

After all, to call an immutable this, method it must also be marked as const or immutable. The this pointer won't be allowed to be mutated unless someone did something bad (again, not the compiler's fault that the user went against it).

I don't think much thought needs to go into this.

February 14

Re: By ref and by pointer kills performance.

Posted by Richard (Rikki) Andrew Cattermole
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

After even more thinking, I'm reminded of an operator overload that I want for reference counting.

``opGoingIntoROM``.

The reason for reference counting is obvious, to turn off reference counting.

But it also applies here.

Having this knowledge on the monitor could turn off the actual lock/unlock mutation.

Okay this is compelling enough to report it.

https://issues.dlang.org/show_bug.cgi?id=24393

This is no longer an "I want". It's a genuine type guarantee issue of something we currently support.

February 14

Re: By ref and by pointer kills performance.

Posted by Richard (Rikki) Andrew Cattermole
in reply to ryuukk_

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to ryuukk_

Permalink

On 14/02/2024 4:20 AM, ryuukk_ wrote:
> Wow, OOP people ruining performance for everybody, who would have thought..
> 
> That makes me worried now.. -betterC doesn't seem to change anything about optimization, or rather, lack of optimization..

No, it isn't limited to classes, the same issue appears with reference counting.

Immutable simply isn't providing the guarantee of immutable.

February 14

Re: By ref and by pointer kills performance.

Posted by Basile B.
in reply to Patrick Schluter

Permalink

Basile B.

Posted in reply to Patrick Schluter

Permalink

On Tuesday, 13 February 2024 at 15:14:06 UTC, Patrick Schluter wrote:
> On Tuesday, 13 February 2024 at 03:31:31 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> dmd having bad codegen here isn't a surprise, that is to be expected.
>>
>> Now for ldc:
>>
>> ```d
>> void fillBP(immutable(uint*) value, uint* dest) {
>>      dest[0] = *value;
>>      dest[1] = *value;
>>      dest[2] = *value;
>>      dest[3] = *value;
>> }
>> ```
>>
>> I expected that to not do the extra loads, but it did.
>>
>> ```d
>> void fillBP(immutable(uint*) value, uint* dest) {
>> 	dest[0 .. 4][] = *value;
>> }
>> ```
>>
>> And that certainly should not be doing it either.
>> Even if it wasn't immutable.
>>
>> For your code, because it is not immutable and therefore can be changed externally on another thread, the fact that the compiler has to do the loads is correct. This isn't a bug.
>
> Is not a thread issue. The memory the pointers point to only needs to overlap and the loads are required to get the "right" result.

It's been proven that the main problem, i.e what defeated CSE ( the Common Sub Expression optim pass), is the possible overlap between the two parameter.

However there'is still a risk on the value, i.e it can be modified by another thread, like mentioned Johan IIRC. The destination of the pointer you pass as "value" might change between one of the fourth assignment.

To be frank I thought this was the problem.

February 13

Re: By ref and by pointer kills performance.

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On 2/12/2024 7:31 PM, Richard (Rikki) Andrew Cattermole wrote:
> dmd having bad codegen here isn't a surprise, that is to be expected.

I'm used to people saying that DMD doesn't do data flow analysis. It does. In fact, it is based on my OptimumC, which was the first C compiler on DOS to do DFA back in the 1980s.

This isn't a case of buggy DFA. It's a case of doing DFA correctly.

The issue is pointer aliasing. A pointer can point to anything, including const data. Therefore, storing through a pointer can alter any value that is reachable via a pointer. Therefore, storing through a pointer invalidates any cached value already read.

This is what you're seeing.

C99 tried to address this with __restrict, but few people use it or understand it. D didn't bother with it because people will inevitably misuse __restrict and get their data mysteriously corrupted.

Top | Forum index | About this forum

Forums