Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
January 26, 2014 Struct copies | ||||
---|---|---|---|---|
| ||||
The following code is compiled with the ldc2 compiler based on LLVM 3.3.1. This swaps two values in-place: void swap(T)(ref T x, ref T y) pure nothrow { immutable aux = x; x = y; y = aux; } If I swap uint values I get the asm and IR: __D5test611__T4swapTkZ4swapFNaNbNfKkKkZv: pushl %esi movl 8(%esp), %ecx movl (%ecx), %edx movl (%eax), %esi movl %esi, (%ecx) movl %edx, (%eax) popl %esi ret $4 ; Function Attrs: nounwind define x86_stdcallcc void @"\01__D5test65swap1FNaNbKkKkZv"(i32* inreg nocapture %y_arg, i32* nocapture %x_arg) #0 { entry: %tmp = load i32* %x_arg, align 4 %tmp2 = load i32* %y_arg, align 4 store i32 %tmp2, i32* %x_arg, align 4 store i32 %tmp, i32* %y_arg, align 4 ret void } Often I have a simple struct like this, with a sizeof equal to a size_t or two size_t (a size_t is a 32 bit unsigned on this system): struct Foo { ushort a; char b, c; } If I instantiate the swap function template on values of type Foo I get the asm and IR: __D5test621__T4swapTS5test63FooZ4swapFNaNbNfKS5test63FooKS5test63FooZv: pushl %edi pushl %esi movl 12(%esp), %ecx movw (%ecx), %dx movw 2(%ecx), %si movl (%eax), %edi movl %edi, (%ecx) movw %dx, (%eax) movw %si, 2(%eax) popl %esi popl %edi ret $4 ; Function Attrs: nounwind define x86_stdcallcc void @"\01__D5test65swap2FNaNbKS5test63FooKS5test63FooZv"(%test6.Foo* inreg nocapture %y_arg, %test6.Foo* nocapture %x_arg) #0 { entry: %0 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 0 %1 = load i16* %0, align 1 %2 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 1 %3 = load i8* %2, align 1 %4 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 2 %5 = load i8* %4, align 1 %6 = bitcast %test6.Foo* %y_arg to i32* %7 = bitcast %test6.Foo* %x_arg to i32* %8 = load i32* %6, align 1 store i32 %8, i32* %7, align 1 %9 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 0 store i16 %1, i16* %9, align 1 %10 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 1 store i8 %3, i8* %10, align 1 %11 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 2 store i8 %5, i8* %11, align 1 ret void } If I create a new union Bar that contains a 32 bit integer that comprises all three Foo fields: union Bar { uint all; struct { ushort a; char b, c; } } Now I can define a new swap function that works on values of type Bar: void swap2(ref Bar x, ref Bar y) pure nothrow { immutable Bar aux = x; x.all = y.all; y.all = aux.all; } Its asm and IR are shorter: __D5test65swap2FNaNbKS5test63BarKS5test63BarZv: pushl %esi movl 8(%esp), %ecx movl (%ecx), %edx movl (%eax), %esi movl %esi, (%ecx) movl %edx, (%eax) popl %esi ret $4 ; Function Attrs: nounwind define x86_stdcallcc void @"\01__D5test65swap3FNaNbKS5test63BarKS5test63BarZv"(%test6.Bar* inreg nocapture %y_arg, %test6.Bar* nocapture %x_arg) #0 { entry: %0 = getelementptr inbounds %test6.Bar* %x_arg, i32 0, i32 0 %1 = load i32* %0, align 1 %tmp4 = getelementptr %test6.Bar* %y_arg, i32 0, i32 0 %tmp5 = load i32* %tmp4, align 4 store i32 %tmp5, i32* %0, align 4 store i32 %1, i32* %tmp4, align 4 ret void } In the case of swapping Foos why isn't LLVM optimizing the swap function to a shorter asm like swap2? I have asked this on the LLVM IRC channel, and aKor has told me that similar C code Clang on swaps two Foo using a memcpy so uses a single 32 bit copy. So perhaps ldc2 can do the same for this common case. Bye, bearophile |
January 27, 2014 Re: Struct copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Sunday, 26 January 2014 at 13:02:50 UTC, bearophile wrote: > > In the case of swapping Foos why isn't LLVM optimizing the swap function to a shorter asm like swap2? I have asked this on the LLVM IRC channel, and aKor has told me that similar C code Clang on swaps two Foo using a memcpy so uses a single 32 bit copy. So perhaps ldc2 can do the same for this common case. > Hi bearophile! In fact, ldc uses llvm.memcpy in the swap function. This is what I get with ldc 0.13.0-alpha1 using LLVM 3.4 on mingw32 with no optimization: define weak_odr x86_stdcallcc void @"\01__D4swap20__T4swapTS4swap3FooZ4swapFNaNbNfKS4swap3FooKS4swap3FooZv"(%swap.Foo* inreg %y_arg, %swap.Foo* %x_arg) { entry: %aux = alloca %swap.Foo, align 2 %tmp = bitcast %swap.Foo* %aux to i8* %tmp1 = bitcast %swap.Foo* %x_arg to i8* call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* %tmp1, i32 4, i32 1, i1 false) %tmp2 = load %swap.Foo* %aux %tmp3 = bitcast %swap.Foo* %x_arg to i8* %tmp4 = bitcast %swap.Foo* %y_arg to i8* call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp3, i8* %tmp4, i32 4, i32 1, i1 false) %tmp5 = load %swap.Foo* %x_arg %tmp6 = bitcast %swap.Foo* %y_arg to i8* %tmp7 = bitcast %swap.Foo* %aux to i8* call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp6, i8* %tmp7, i32 4, i32 1, i1 false) %tmp8 = load %swap.Foo* %y_arg ret void } Using -O2 or -O3, I get IR and ASM similar to the one you posted. I do not understand this. I'll check what clang is doing here. Regards, Kai |
January 27, 2014 Re: Struct copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kai Nacke | On Monday, 27 January 2014 at 07:00:18 UTC, Kai Nacke wrote:
>
> Using -O2 or -O3, I get IR and ASM similar to the one you posted. I do not understand this. I'll check what clang is doing here.
>
The obvious difference between ldc and clang is that clang generates better alignment information. Otherwise, the IR is almost identical.
Regards,
Kai
|
January 28, 2014 Re: Struct copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | It would seem that ldc is performing a memberwise assignment. It could probably be optimized away since it's known at compile time whether the fields have their own assignment overloaded or not. With unions it's straight: just a memcopy on the largest size (sadly dmd doesn't do that yet, but it also does all sorts of nasty things with unions). With structs it's a little more involving. Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :) |
January 28, 2014 Re: Struct copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stanislav Blinov | Stanislav Blinov:
> Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)
What's strange on this?
void swap(T)(ref T x, ref T y) pure nothrow {
immutable aux = x;
x = y;
y = aux;
}
Bye,
bearophile
|
January 28, 2014 Re: Struct copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Tuesday, 28 January 2014 at 01:39:47 UTC, bearophile wrote:
> Stanislav Blinov:
>
>> Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)
>
> What's strange on this?
>
>
> void swap(T)(ref T x, ref T y) pure nothrow {
> immutable aux = x;
> x = y;
> y = aux;
> }
Won't swap references or pointers (due to immutable) or structs with disabled postblit (due to assignment). Solution to first is simple: immutable -> auto. Second would basically require you to perform memcpy manually anyway.
|
Copyright © 1999-2021 by the D Language Foundation