June 19, 2013
Hi bearophile,

On Saturday, 8 June 2013 at 00:53:48 UTC, bearophile wrote:
>> I have found another bug that is less easy to reduce...
>
> I have localized it, it took some time:

I just added the issues to the GitHub tracker, thanks again for the excellent reports.

Unfortunately, I pretty much won't be able to work on LDC at all during the next few weeks, but I am definitely going to look into the issues as soon as possible – if nobody else beats me to it, which I hope *will* happen.

David
June 22, 2013
David Nadlinger:

> I just added the issues to the GitHub tracker, thanks again for the excellent reports.

You are welcome. Minutes ago I have added some SIMD-related bug reports in the D Bugzilla. I have also found two SIMD-related things that maybe are related just to LDC2.

---------------------------------

This seems a LDC2 bug:

LDC bug:


import core.simd: double2;
void main() {
    double2 x = [1.0, 2.0];
    double2 r1 = x + [1.0, 2.0];
    double2 r2 = [1.0, 2.0] + x;
}


LDC2 v.0.11.0 gives:

test.d(5): Error: cannot implicitly convert expression (cast(__vector(double[2u]))[1, 2] + x) of type double[] to __vector(double[2u])



import core.simd: double2;
void main() {
    double x = 1.0, y = 2.0;
    double2 a = [x, y];
    double2 sum = [0.0, 0.0];
    sum += [x, y] / a;
}


LDC2 v.0.11.0 gives:

test.d(6): Error: incompatible types for ((sum) += (cast(__vector(double[2u]))[x, y] / a)): '__vector(double[2u])' and 'double[]'

---------------------------------

And this seems a ldc2 performance bug worth fixing:


import core.simd: double2;
double foo(in double2 x) pure nothrow {
    return x.array[0] + x.array[1];
}
int main() {
    double2 x = [1.0, 2.0];
    return cast(int)foo(x);
}


LDC2 compiles "foo" to:

__D4temp3fooFNaNbxNhG2dZd:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-8, %esp
    subl    $8, %esp
    movapd  %xmm0, %xmm1
    unpckhpd    %xmm1, %xmm1
    addsd   %xmm0, %xmm1
    movsd   %xmm1, (%esp)
    fldl    (%esp)
    movl    %ebp, %esp
    popl    %ebp
    ret


But gcc compiles similar code (that uses __m128d instead of double2) using the instruction "haddpd", that I think is shorter/more efficient here.

(I don't know how dmd and gdc compile that program).

Bye,
bearophile
June 22, 2013
> But gcc compiles similar code (that uses __m128d instead of double2) using the instruction "haddpd", that I think is shorter/more efficient here.

C code:


#include <emmintrin.h>
double foo(const __m128d x) {
    return x[0] + x[1];
}
int main() {
    __m128d x = _mm_set_pd(1.0, 2.0);
    return (int)foo(x);
}


Compiled with:

gcc -S -Ofast -fomit-frame-pointer -march=native -mfpmath=sse -msse2 test.c -o test.s

GCC version 4.8.0

The asm of "foo":

_foo:
    subl    $20, %esp
    haddpd  %xmm0, %xmm0
    movsd   %xmm0, (%esp)
    fldl    (%esp)
    addl    $20, %esp
    ret


Bye,
bearophile
June 22, 2013
> _foo:
>     subl    $20, %esp
>     haddpd  %xmm0, %xmm0
>     movsd   %xmm0, (%esp)
>     fldl    (%esp)
>     addl    $20, %esp
>     ret

A discussion on the #llvm IRC channel seems to show that's a llvm fault, so this is not a bug report for ldc2. Only the other is valid for ldc2.

Bye,
bearophile
June 27, 2013
Another test case:


import core.simd: double2;
struct Foo {
    double2 x;
    this(uint) {
        x = [0.0, 0.0];
    }
}
void main() {
    Foo y = Foo();
}


ldmd2 gives:

fpext source and destination must both be a vector or neither
  %tmp1 = fpext double 0x7FFC000000000000 to <2 x double>
Broken module found, compilation aborted!

Bye,
bearophile
June 30, 2013
int main() {
    import core.simd;
    float[16] a = 1.0;
    float4 t = 0, k = 2;
    auto b = cast(float4[])a;
    for (size_t i = 0; i < b.length; i++)
        t += b[i] * k;
    return cast(int)t.array[2];
}



Compiling it with "ldmd2 -O":

Error: Instruction does not dominate all uses!
  %tmp33.Elt = fmul float %tmp33.Elt.lhs, 2.000000e+00
  %1 = fadd float %0, %tmp33.Elt
Instruction does not dominate all uses!
  %tmp31 = load <4 x float>* %tmp30, align 16
  %tmp33.Elt.lhs = extractelement <4 x float> %tmp31, i32 2
Broken module found, compilation terminated.
Broken module found, compilation terminated.
Broken module found, compilation terminated.

Bye,
bearophile
June 30, 2013
On Sunday, 30 June 2013 at 01:52:46 UTC, bearophile wrote:
> int main() {
>     import core.simd;
>     float[16] a = 1.0;
>     float4 t = 0, k = 2;
>     auto b = cast(float4[])a;
>     for (size_t i = 0; i < b.length; i++)
>         t += b[i] * k;
>     return cast(int)t.array[2];
> }
>
>
>
> Compiling it with "ldmd2 -O":
>
> Error: Instruction does not dominate all uses!
>   %tmp33.Elt = fmul float %tmp33.Elt.lhs, 2.000000e+00
>   %1 = fadd float %0, %tmp33.Elt
> Instruction does not dominate all uses!
>   %tmp31 = load <4 x float>* %tmp30, align 16
>   %tmp33.Elt.lhs = extractelement <4 x float> %tmp31, i32 2
> Broken module found, compilation terminated.
> Broken module found, compilation terminated.
> Broken module found, compilation terminated.
>
> Bye,
> bearophile

Thanks for the nice test case.

It looks like a LLVM 3.3 problem. I can reproduce it with LDC head and LLVM 3.3, but not with LLVM trunk or LLVM 3.2. I will further investigate it..

Kai

June 30, 2013
On Thursday, 27 June 2013 at 20:00:47 UTC, bearophile wrote:
> Another test case:
>
>
> import core.simd: double2;
> struct Foo {
>     double2 x;
>     this(uint) {
>         x = [0.0, 0.0];
>     }
> }
> void main() {
>     Foo y = Foo();
> }
>
>
> ldmd2 gives:
>
> fpext source and destination must both be a vector or neither
>   %tmp1 = fpext double 0x7FFC000000000000 to <2 x double>
> Broken module found, compilation aborted!
>
> Bye,
> bearophile

Thanks for the nice test case. I can reproduce it with LDC head and LLVM trunk. I created issue #420 for it.

Kai
June 30, 2013
Kai:

> Thanks for the nice test case. I can reproduce it with LDC head and LLVM trunk. I created issue #420 for it.

You are welcome. Have you also seen the two cases above?

import core.simd: double2;
void main() {
    double2 x = [1.0, 2.0];
    double2 r1 = x + [1.0, 2.0];
    double2 r2 = [1.0, 2.0] + x;
}


import core.simd: double2;
void main() {
    double x = 1.0, y = 2.0;
    double2 a = [x, y];
    double2 sum = [0.0, 0.0];
    sum += [x, y] / a;
}


Bye,
bearophile
June 30, 2013
On Sunday, 30 June 2013 at 17:10:38 UTC, Kai Nacke wrote:
> On Sunday, 30 June 2013 at 01:52:46 UTC, bearophile wrote:
>> int main() {
>>    import core.simd;
>>    float[16] a = 1.0;
>>    float4 t = 0, k = 2;
>>    auto b = cast(float4[])a;
>>    for (size_t i = 0; i < b.length; i++)
>>        t += b[i] * k;
>>    return cast(int)t.array[2];
>> }
>>
>>
>>
>> Compiling it with "ldmd2 -O":
>>
>> Error: Instruction does not dominate all uses!
>>  %tmp33.Elt = fmul float %tmp33.Elt.lhs, 2.000000e+00
>>  %1 = fadd float %0, %tmp33.Elt
>> Instruction does not dominate all uses!
>>  %tmp31 = load <4 x float>* %tmp30, align 16
>>  %tmp33.Elt.lhs = extractelement <4 x float> %tmp31, i32 2
>> Broken module found, compilation terminated.
>> Broken module found, compilation terminated.
>> Broken module found, compilation terminated.
>>
>> Bye,
>> bearophile
>
> Thanks for the nice test case.
>
> It looks like a LLVM 3.3 problem. I can reproduce it with LDC head and LLVM 3.3, but not with LLVM trunk or LLVM 3.2. I will further investigate it..
>
> Kai

That's now issue #421.