Thread overview | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 02, 2018 Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
uint load32_le(in ref ubyte[4] s) { return s[0] | (s[1]<<8) | (s[2]<<16) | (s[3]<<24); } void store32_le(ref ubyte[4] dest, uint val) { dest[0]=cast(byte)val; dest[1]=cast(byte)(val>>8); dest[2]=cast(byte)(val>>16); dest[3]=cast(byte)(val>>24); } The first function is optimized to one load, but the second remains as 4 stores. Is there a code pattern that gets around this? |
June 02, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | On Saturday, 2 June 2018 at 10:40:43 UTC, Kagamin wrote:
> uint load32_le(in ref ubyte[4] s)
> {
> return s[0] | (s[1]<<8) | (s[2]<<16) | (s[3]<<24);
> }
>
> void store32_le(ref ubyte[4] dest, uint val)
> {
> dest[0]=cast(byte)val;
> dest[1]=cast(byte)(val>>8);
> dest[2]=cast(byte)(val>>16);
> dest[3]=cast(byte)(val>>24);
> }
>
> The first function is optimized to one load, but the second remains as 4 stores. Is there a code pattern that gets around this?
```
void store32_le_optim(ref ubyte[4] dest, uint val)
{
import core.stdc.string;
memcpy(&dest, &val, val.sizeof);
}
```
LLVM is not yet smart enough to optimize adjacent stores, but it does assume it is valid to use knowledge of standard memcpy semantics.
-Johan
|
June 02, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On Saturday, 2 June 2018 at 11:44:40 UTC, Johan Engelen wrote: > > ``` > void store32_le_optim(ref ubyte[4] dest, uint val) > { > import core.stdc.string; > memcpy(&dest, &val, val.sizeof); > } > ``` This only works on little endian machines of course, but the proper version is easy: ``` void store32_le_optim(ref ubyte[4] dest, uint val) { import core.stdc.string; ubyte[4] temp; temp[0]=cast(ubyte)val; temp[1]=cast(ubyte)(val>>8); temp[2]=cast(ubyte)(val>>16); temp[3]=cast(ubyte)(val>>24); memcpy(&dest, &temp, temp.sizeof); } ``` See in action for Little Endian and Big Endian: https://godbolt.org/g/QqcCpi -Johan |
June 03, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On Saturday, 2 June 2018 at 18:32:37 UTC, Johan Engelen wrote:
> ```
> void store32_le_optim(ref ubyte[4] dest, uint val)
> {
> import core.stdc.string;
> ubyte[4] temp;
> temp[0]=cast(ubyte)val;
> temp[1]=cast(ubyte)(val>>8);
> temp[2]=cast(ubyte)(val>>16);
> temp[3]=cast(ubyte)(val>>24);
> memcpy(&dest, &temp, temp.sizeof);
> }
> ```
[For endian-ness conversion of integers, we have ldc.intrinsics.llvm_bswap().]
|
June 03, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to kinke | On Sunday, 3 June 2018 at 11:54:29 UTC, kinke wrote:
> [For endian-ness conversion of integers, we have ldc.intrinsics.llvm_bswap().]
And, more portably, their core.bitop.bswap() aliases (restricted to uint and ulong though).
|
June 03, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to kinke | On 3 Jun 2018, at 12:54, kinke via digitalmars-d-ldc wrote:
> [For endian-ness conversion of integers, we have ldc.intrinsics.llvm_bswap().]
At the risk of pointing out the obvious, I usually find it vastly preferable to just write the code in a way that's independent of the target platform's endianness – like in Johan's example – and let the optimizer deal with eliding the explicit handling if possible. Even DMD's optimizer can recognize those patterns just fine.
— David
|
June 03, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On Sunday, 3 June 2018 at 16:51:13 UTC, David Nadlinger wrote:
> On 3 Jun 2018, at 12:54, kinke via digitalmars-d-ldc wrote:
>> [For endian-ness conversion of integers, we have ldc.intrinsics.llvm_bswap().]
>
> At the risk of pointing out the obvious, I usually find it vastly preferable to just write the code in a way that's independent of the target platform's endianness – like in Johan's example – and let the optimizer deal with eliding the explicit handling if possible. Even DMD's optimizer can recognize those patterns just fine.
No need to reinvent the wheel, the Phobos solution is trivial enough:
void store32_le(ref ubyte[4] dest, uint val)
{
import std.bitmanip;
dest = nativeToLittleEndian(val);
}
There's no overhead for little-endian machines with `-O`, but a suboptimal non-inlined druntime call instead of the LLVM intrinsic directly in the other case.
|
June 04, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On Saturday, 2 June 2018 at 18:32:37 UTC, Johan Engelen wrote:
> ```
> void store32_le_optim(ref ubyte[4] dest, uint val)
> {
> import core.stdc.string;
> ubyte[4] temp;
> temp[0]=cast(ubyte)val;
> temp[1]=cast(ubyte)(val>>8);
> temp[2]=cast(ubyte)(val>>16);
> temp[3]=cast(ubyte)(val>>24);
> memcpy(&dest, &temp, temp.sizeof);
> }
> ```
void store32_le_optim(ref ubyte[4] dest, uint val)
{
import core.stdc.string;
ubyte[4] temp;
temp[0]=cast(ubyte)val;
temp[1]=cast(ubyte)(val>>8);
temp[2]=cast(ubyte)(val>>16);
temp[3]=val>>24;
dest=temp;
}
this works; CTFE doesn't support memcpy.
|
June 04, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On Saturday, 2 June 2018 at 11:44:40 UTC, Johan Engelen wrote:
> LLVM is not yet smart enough to optimize adjacent stores
I thought it tries to account for memory access errors: depending on how the processor checks memory access the first bytes might not be stored.
|
June 04, 2018 Re: Work around conservative optimization | ||||
---|---|---|---|---|
| ||||
Posted in reply to kinke | On Sunday, 3 June 2018 at 11:54:29 UTC, kinke wrote:
> [For endian-ness conversion of integers, we have ldc.intrinsics.llvm_bswap().]
You can't just store integer on alignment-sensitive (even little endian) platforms.
|
Copyright © 1999-2021 by the D Language Foundation