Thread overview
Struct copies
Jan 26, 2014
bearophile
Jan 27, 2014
Kai Nacke
Jan 27, 2014
Kai Nacke
Jan 28, 2014
Stanislav Blinov
Jan 28, 2014
bearophile
Jan 28, 2014
Stanislav Blinov
January 26, 2014
The following code is compiled with the ldc2 compiler based on LLVM 3.3.1.

This swaps two values in-place:

void swap(T)(ref T x, ref T y) pure nothrow {
    immutable aux = x;
    x = y;
    y = aux;
}


If I swap uint values I get the asm and IR:

__D5test611__T4swapTkZ4swapFNaNbNfKkKkZv:
	pushl	%esi
	movl	8(%esp), %ecx
	movl	(%ecx), %edx
	movl	(%eax), %esi
	movl	%esi, (%ecx)
	movl	%edx, (%eax)
	popl	%esi
	ret	$4


; Function Attrs: nounwind
define x86_stdcallcc void @"\01__D5test65swap1FNaNbKkKkZv"(i32* inreg nocapture %y_arg, i32* nocapture %x_arg) #0 {
entry:
  %tmp = load i32* %x_arg, align 4
  %tmp2 = load i32* %y_arg, align 4
  store i32 %tmp2, i32* %x_arg, align 4
  store i32 %tmp, i32* %y_arg, align 4
  ret void
}


Often I have a simple struct like this, with a sizeof equal to a size_t or two size_t (a size_t is a 32 bit unsigned on this system):

struct Foo {
    ushort a;
    char b, c;
}


If I instantiate the swap function template on values of type Foo I get the asm and IR:

__D5test621__T4swapTS5test63FooZ4swapFNaNbNfKS5test63FooKS5test63FooZv:
	pushl	%edi
	pushl	%esi
	movl	12(%esp), %ecx
	movw	(%ecx), %dx
	movw	2(%ecx), %si
	movl	(%eax), %edi
	movl	%edi, (%ecx)
	movw	%dx, (%eax)
	movw	%si, 2(%eax)
	popl	%esi
	popl	%edi
	ret	$4


; Function Attrs: nounwind
define x86_stdcallcc void @"\01__D5test65swap2FNaNbKS5test63FooKS5test63FooZv"(%test6.Foo* inreg nocapture %y_arg, %test6.Foo* nocapture %x_arg) #0 {
entry:
  %0 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 0
  %1 = load i16* %0, align 1
  %2 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 1
  %3 = load i8* %2, align 1
  %4 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 2
  %5 = load i8* %4, align 1
  %6 = bitcast %test6.Foo* %y_arg to i32*
  %7 = bitcast %test6.Foo* %x_arg to i32*
  %8 = load i32* %6, align 1
  store i32 %8, i32* %7, align 1
  %9 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 0
  store i16 %1, i16* %9, align 1
  %10 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 1
  store i8 %3, i8* %10, align 1
  %11 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 2
  store i8 %5, i8* %11, align 1
  ret void
}


If I create a new union Bar that contains a 32 bit integer that comprises all three Foo fields:

union Bar {
    uint all;
    struct {
        ushort a;
        char b, c;
    }
}


Now I can define a new swap function that works on values of type Bar:


void swap2(ref Bar x, ref Bar y) pure nothrow {
    immutable Bar aux = x;
    x.all = y.all;
    y.all = aux.all;
}


Its asm and IR are shorter:

__D5test65swap2FNaNbKS5test63BarKS5test63BarZv:
    pushl   %esi
    movl    8(%esp), %ecx
    movl    (%ecx), %edx
    movl    (%eax), %esi
    movl    %esi, (%ecx)
    movl    %edx, (%eax)
    popl    %esi
    ret $4


; Function Attrs: nounwind
define x86_stdcallcc void @"\01__D5test65swap3FNaNbKS5test63BarKS5test63BarZv"(%test6.Bar* inreg nocapture %y_arg, %test6.Bar* nocapture %x_arg) #0 {
entry:
  %0 = getelementptr inbounds %test6.Bar* %x_arg, i32 0, i32 0
  %1 = load i32* %0, align 1
  %tmp4 = getelementptr %test6.Bar* %y_arg, i32 0, i32 0
  %tmp5 = load i32* %tmp4, align 4
  store i32 %tmp5, i32* %0, align 4
  store i32 %1, i32* %tmp4, align 4
  ret void
}


In the case of swapping Foos why isn't LLVM optimizing the swap function to a shorter asm like swap2? I have asked this on the LLVM IRC channel, and aKor has told me that similar C code Clang on swaps two Foo using a memcpy so uses a single 32 bit copy. So perhaps ldc2 can do the same for this common case.

Bye,
bearophile
January 27, 2014
On Sunday, 26 January 2014 at 13:02:50 UTC, bearophile wrote:
>
> In the case of swapping Foos why isn't LLVM optimizing the swap function to a shorter asm like swap2? I have asked this on the LLVM IRC channel, and aKor has told me that similar C code Clang on swaps two Foo using a memcpy so uses a single 32 bit copy. So perhaps ldc2 can do the same for this common case.
>

Hi bearophile!

In fact, ldc uses llvm.memcpy in the swap function. This is what I get with ldc 0.13.0-alpha1 using LLVM 3.4 on mingw32 with no optimization:

define weak_odr x86_stdcallcc void @"\01__D4swap20__T4swapTS4swap3FooZ4swapFNaNbNfKS4swap3FooKS4swap3FooZv"(%swap.Foo* inreg %y_arg, %swap.Foo* %x_arg) {
entry:
  %aux = alloca %swap.Foo, align 2
  %tmp = bitcast %swap.Foo* %aux to i8*
  %tmp1 = bitcast %swap.Foo* %x_arg to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* %tmp1, i32 4, i32 1, i1 false)
  %tmp2 = load %swap.Foo* %aux
  %tmp3 = bitcast %swap.Foo* %x_arg to i8*
  %tmp4 = bitcast %swap.Foo* %y_arg to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp3, i8* %tmp4, i32 4, i32 1, i1 false)
  %tmp5 = load %swap.Foo* %x_arg
  %tmp6 = bitcast %swap.Foo* %y_arg to i8*
  %tmp7 = bitcast %swap.Foo* %aux to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp6, i8* %tmp7, i32 4, i32 1, i1 false)
  %tmp8 = load %swap.Foo* %y_arg
  ret void
}

Using -O2 or -O3, I get IR and ASM similar to the one you posted. I do not understand this. I'll check what clang is doing here.

Regards,
Kai
January 27, 2014
On Monday, 27 January 2014 at 07:00:18 UTC, Kai Nacke wrote:
>
> Using -O2 or -O3, I get IR and ASM similar to the one you posted. I do not understand this. I'll check what clang is doing here.
>

The obvious difference between ldc and clang is that clang generates better alignment information. Otherwise, the IR is almost identical.

Regards,
Kai

January 28, 2014
It would seem that ldc is performing a memberwise assignment. It could probably be optimized away since it's known at compile time whether the fields have their own assignment overloaded or not. With unions it's straight: just a memcopy on the largest size (sadly dmd doesn't do that yet, but it also does all sorts of nasty things with unions). With structs it's a little more involving.

Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)
January 28, 2014
Stanislav Blinov:

> Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)

What's strange on this?


void swap(T)(ref T x, ref T y) pure nothrow {
    immutable aux = x;
    x = y;
    y = aux;
}

Bye,
bearophile
January 28, 2014
On Tuesday, 28 January 2014 at 01:39:47 UTC, bearophile wrote:
> Stanislav Blinov:
>
>> Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)
>
> What's strange on this?
>
>
> void swap(T)(ref T x, ref T y) pure nothrow {
>     immutable aux = x;
>     x = y;
>     y = aux;
> }

Won't swap references or pointers (due to immutable) or structs with disabled postblit (due to assignment). Solution to first is simple: immutable -> auto. Second would basically require you to perform memcpy manually anyway.