July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #20 from hsteoh@quickfur.ath.cx 2013-07-11 21:46:05 PDT ---
Could it be because amd64 doesn't support this optimization? Seems odd, though. I'd expect it to work at least up to 16-bit argument form, since that's common to both x86 and amd.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #21 from Iain Buclaw <ibuclaw@ubuntu.com> 2013-07-12 00:58:49 PDT ---
1. Not my problem. :)
2. When comparing gdc and dmd, make sure your actually looking at object files
generated by gdc, and not dmd. :)
3. That's highly unlikely as I've tested on x86_64. :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #22 from Iain Buclaw <ibuclaw@ubuntu.com> 2013-07-12 01:00:57 PDT ---
(In reply to comment #21)
> 1. Not my problem. :)
> 2. When comparing gdc and dmd, make sure your actually looking at object files
> generated by gdc, and not dmd. :)
> 3. That's highly unlikely as I've tested on x86_64. :)

and 4. Under simple test conditions where all parameter values are const/known, templated function calls tend to have a habit of being inlined / optimised away.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #23 from Iain Buclaw <ibuclaw@ubuntu.com> 2013-07-12 01:02:52 PDT ---
(In reply to comment #22)
> (In reply to comment #21)
> > 1. Not my problem. :)
> > 2. When comparing gdc and dmd, make sure your actually looking at object files
> > generated by gdc, and not dmd. :)
> > 3. That's highly unlikely as I've tested on x86_64. :)
> 
> and 4. Under simple test conditions where all parameter values are const/known, templated function calls tend to have a habit of being inlined / optimised away.

and 5. Make sure that you use bearophiles last implementation example. ;)

http://d.puremagic.com/issues/show_bug.cgi?id=6829#c12

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #24 from bearophile_hugs@eml.cc 2013-07-12 01:59:52 PDT ---
(In reply to comment #23)
> and 5. Make sure that you use bearophiles last implementation example. ;)

Updated code, lacks unittests:


import std.traits: isIntegral, isUnsigned;

/// Left-shift x by n bits.
T rol(T)(in T x, in uint nBits) @safe pure nothrow
if (isIntegral!T && isUnsigned!T)
in {
    assert(nBits < (T.sizeof * 8));
} body {
    return cast(T)((x << nBits) | (x >> ((T.sizeof * 8) - nBits)));
}

/// Right-shift x by n bits.
T ror(T)(in T x, in uint nBits) @safe pure nothrow
if (isIntegral!T && isUnsigned!T)
in {
    assert(nBits < (T.sizeof * 8));
} body {
    return cast(T)((x >> nBits) | (x << ((T.sizeof * 8) - nBits)));
}

void main() {
    // Tests to check for assembly output.
    {
        __gshared static ubyte xb;
        __gshared static ushort xs;
        __gshared static uint xi;
        __gshared static ulong xl;
        __gshared static uint yi;

        rol(xb, yi);   // rolb
        ror(xb, yi);   // rorb

        rol(xs, yi);   // rolw
        ror(xs, yi);   // rorw

        rol(xi, yi);   // roll
        ror(xi, yi);   // rorl

        rol(xl, yi);   // version(X86_64) rolq
        ror(xl, yi);   // version(X86_64) rorq
    }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #25 from hsteoh@quickfur.ath.cx 2013-07-12 08:13:16 PDT ---
Nope, it's still not working. I copied-n-pasted exactly the code posted above, and compiled with gdc -frelease -O3 test.d, and here is the disassembly output:

00000000004042d0 <_D4test10__T3rolThZ3rolFNaNbNfxhxkZh>:
  4042d0:    40 0f b6 ff              movzbl %dil,%edi
  4042d4:    b9 08 00 00 00           mov    $0x8,%ecx
  4042d9:    29 f1                    sub    %esi,%ecx
  4042db:    89 f8                    mov    %edi,%eax
  4042dd:    d3 f8                    sar    %cl,%eax
  4042df:    89 f1                    mov    %esi,%ecx
  4042e1:    d3 e7                    shl    %cl,%edi
  4042e3:    09 f8                    or     %edi,%eax
  4042e5:    c3                       retq
  4042e6:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  4042ed:    00 00 00

00000000004042f0 <_D4test10__T3rorThZ3rorFNaNbNfxhxkZh>:
  4042f0:    40 0f b6 ff              movzbl %dil,%edi
  4042f4:    b9 08 00 00 00           mov    $0x8,%ecx
  4042f9:    29 f1                    sub    %esi,%ecx
  4042fb:    89 f8                    mov    %edi,%eax
  4042fd:    d3 e0                    shl    %cl,%eax
  4042ff:    89 f1                    mov    %esi,%ecx
  404301:    d3 ff                    sar    %cl,%edi
  404303:    09 f8                    or     %edi,%eax
  404305:    c3                       retq
  404306:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  40430d:    00 00 00

0000000000404310 <_D4test10__T3rolTtZ3rolFNaNbNfxtxkZt>:
  404310:    0f b7 ff                 movzwl %di,%edi
  404313:    b9 10 00 00 00           mov    $0x10,%ecx
  404318:    29 f1                    sub    %esi,%ecx
  40431a:    89 f8                    mov    %edi,%eax
  40431c:    d3 f8                    sar    %cl,%eax
  40431e:    89 f1                    mov    %esi,%ecx
  404320:    d3 e7                    shl    %cl,%edi
  404322:    09 f8                    or     %edi,%eax
  404324:    c3                       retq
  404325:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  40432c:    00 00 00
  40432f:    90                       nop

0000000000404330 <_D4test10__T3rorTtZ3rorFNaNbNfxtxkZt>:
  404330:    0f b7 ff                 movzwl %di,%edi
  404333:    b9 10 00 00 00           mov    $0x10,%ecx
  404338:    29 f1                    sub    %esi,%ecx
  40433a:    89 f8                    mov    %edi,%eax
  40433c:    d3 e0                    shl    %cl,%eax
  40433e:    89 f1                    mov    %esi,%ecx
  404340:    d3 ff                    sar    %cl,%edi
  404342:    09 f8                    or     %edi,%eax
  404344:    c3                       retq
  404345:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  40434c:    00 00 00
  40434f:    90                       nop

0000000000404350 <_D4test10__T3rolTkZ3rolFNaNbNfxkxkZk>:
  404350:    b9 20 00 00 00           mov    $0x20,%ecx
  404355:    89 f8                    mov    %edi,%eax
  404357:    29 f1                    sub    %esi,%ecx
  404359:    d3 e8                    shr    %cl,%eax
  40435b:    89 f1                    mov    %esi,%ecx
  40435d:    d3 e7                    shl    %cl,%edi
  40435f:    09 f8                    or     %edi,%eax
  404361:    c3                       retq
  404362:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  404369:    00 00 00
  40436c:    0f 1f 40 00              nopl   0x0(%rax)

0000000000404370 <_D4test10__T3rorTkZ3rorFNaNbNfxkxkZk>:
  404370:    b9 20 00 00 00           mov    $0x20,%ecx
  404375:    89 f8                    mov    %edi,%eax
  404377:    29 f1                    sub    %esi,%ecx
  404379:    d3 e0                    shl    %cl,%eax
  40437b:    89 f1                    mov    %esi,%ecx
  40437d:    d3 ef                    shr    %cl,%edi
  40437f:    09 f8                    or     %edi,%eax
  404381:    c3                       retq
  404382:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  404389:    00 00 00
  40438c:    0f 1f 40 00              nopl   0x0(%rax)

0000000000404390 <_D4test10__T3rolTmZ3rolFNaNbNfxmxkZm>:
  404390:    b9 40 00 00 00           mov    $0x40,%ecx
  404395:    48 89 f8                 mov    %rdi,%rax
  404398:    29 f1                    sub    %esi,%ecx
  40439a:    48 d3 e8                 shr    %cl,%rax
  40439d:    89 f1                    mov    %esi,%ecx
  40439f:    48 d3 e7                 shl    %cl,%rdi
  4043a2:    48 09 f8                 or     %rdi,%rax
  4043a5:    c3                       retq
  4043a6:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  4043ad:    00 00 00

00000000004043b0 <_D4test10__T3rorTmZ3rorFNaNbNfxmxkZm>:
  4043b0:    b9 40 00 00 00           mov    $0x40,%ecx
  4043b5:    48 89 f8                 mov    %rdi,%rax
  4043b8:    29 f1                    sub    %esi,%ecx
  4043ba:    48 d3 e0                 shl    %cl,%rax
  4043bd:    89 f1                    mov    %esi,%ecx
  4043bf:    48 d3 ef                 shr    %cl,%rdi
  4043c2:    48 09 f8                 or     %rdi,%rax
  4043c5:    c3                       retq
  4043c6:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  4043cd:    00 00 00


It's still using shift + bitwise OR instead of substituting a rotate instruction. I'm also on Linux x86_64 (so says uname -m), so I've no idea what I'm doing wrong. And gdc --version says it's gdc (GCC) 4.8.1. Did something go wrong/missing in my gdc build??

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #26 from Iain Buclaw <ibuclaw@ubuntu.com> 2013-07-12 08:21:21 PDT ---
Don't currently have a gdc 4.8 compiler at hand to test...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 12, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #27 from hsteoh@quickfur.ath.cx 2013-07-12 08:40:43 PDT ---
Huh? So which gdc have you been using to test this earlier?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com


--- Comment #28 from Walter Bright <bugzilla@digitalmars.com> 2013-07-13 00:32:38 PDT ---
(In reply to comment #2)
> (In reply to comment #1)
> 
> > What's the pattern that DMD recognizes for rotate instructions?
> 
> Walter offers this example of recognizable rotation:

Ack, the example is bad. This one generates rol/ror:

-----------
void test(int shift)
{
    uint a = 7;
    uint r;
    r = (a >> shift) | (a << (int.sizeof * 8 - shift));
    assert(r == 0x8000_0003);
    r = (r << shift) | (r >> (int.sizeof * 8 - shift));
    assert(r == 7);
}
-----------
compiling with -O:

_D3foo4testFiZv comdat
        assume  CS:_D3foo4testFiZv
L0:             push    EAX
                mov     EDX,7
                mov     ECX,EAX
                push    EAX
                ror     EDX,CL
                cmp     EDX,080000003h
                push    EBX
                mov     4[ESP],EDX
                je      L22
                mov     EAX,6
                call    near ptr _D3foo8__assertFiZv
L22:            mov     EBX,4[ESP]
                mov     ECX,8[ESP]
                rol     EBX,CL
                cmp     EBX,7
                je      L3B
                mov     EAX,8
                call    near ptr _D3foo8__assertFiZv
L3B:            pop     EBX
                add     ESP,8
                ret

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6829



--- Comment #29 from Walter Bright <bugzilla@digitalmars.com> 2013-07-13 00:35:45 PDT ---
Fix test case:

https://github.com/D-Programming-Language/dmd/pull/2341

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------