overlapping copy semantics question - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » LDC » overlapping copy semantics question

Thread overview

overlapping copy semantics question
May 20, 2024 Bruce Carneal
May 20, 2024 kinke
May 20, 2024 Bruce Carneal
May 20, 2024 Bruce Carneal

May 20, 2024

overlapping copy semantics question

Posted by Bruce Carneal

Bruce Carneal

Given the possibility for overlap is it correct that the two copy functions in the godbolt example compile to identical code?

https://godbolt.org/z/jPhT5vhee

I think the @restrict function is compiled correctly since the compiler is free to assume no overlap, but the code generated for the unattributed variant appears to be off. Perhaps I'm misunderstanding something wrt copy semantics?

As a point of reference, when compiled with gdc the code generated for the two functions is not identical.

May 20, 2024

Re: overlapping copy semantics question

Posted by kinke
in reply to Bruce Carneal

kinke

Posted in reply to Bruce Carneal

I think the problem here is that you don't get the expected optimization to a memcpy (with -O3) when using the @restrict UDA, with the variant taking a D slice. So no correctness issue.

This boils down to the expected memcpy, apparently needing unpacking of D slices:

void cpr2(size_t srcLength, ubyte* src, @restrict ubyte* dst)
{
    foreach (i; 0 .. srcLength)
        dst[i] = src[i];
}

May 20, 2024

Re: overlapping copy semantics question

Posted by Bruce Carneal
in reply to kinke

Bruce Carneal

Posted in reply to kinke

On Monday, 20 May 2024 at 18:14:50 UTC, kinke wrote:

>

I think the problem here is that you don't get the expected optimization to a memcpy (with -O3) when using the @restrict UDA, with the variant taking a D slice. So no correctness issue.

This boils down to the expected memcpy, apparently needing unpacking of D slices:

void cpr2(size_t srcLength, ubyte* src, @restrict ubyte* dst)
{
    foreach (i; 0 .. srcLength)
        dst[i] = src[i];
}

I don't view that missed optimization as much of a problem, although I will note that gdc decided to issue a call to memmove for the @restrict slice code under -O3. The LDC cpr2 call out to memcpy for the lowered/non-slice variant seems entirely justified given @restrict.

What seems like a problem is emitting SIMD code for the vanilla (no attributes) cp() variant that doesn't produce the same result as a simple scalar loop would. Consider:

values at location x: 0, 1, 2, 3, 4, ...
src at location x
dst at location x + 1

Shouldn't the vanilla scalar copy loop for the above just result in a bunch of zeros? This is what I'd expect if a dead simple loop body were generated. If, on the other hand, you emit SIMD code for the loads and stores, as LDC is want to do, you get something different.

What am I missing?

May 20, 2024

Re: overlapping copy semantics question

Posted by Bruce Carneal
in reply to Bruce Carneal

Bruce Carneal

Posted in reply to Bruce Carneal

On Monday, 20 May 2024 at 23:09:35 UTC, Bruce Carneal wrote:

>

On Monday, 20 May 2024 at 18:14:50 UTC, kinke wrote:

>

...

What am I missing?

What I "missed" was the overlap check and branch around on entry to the vanilla code block that protects the SIMD version.

Sorry for the noise. Glad to learn about the, slight, optimization though.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation