Thread overview
A little benchmark in D and C
Jul 04, 2014
bearophile
Jul 04, 2014
Jyxent
Jul 04, 2014
David Nadlinger
Jul 04, 2014
bearophile
Jul 04, 2014
bearophile
Jul 06, 2014
safety0ff
Jul 06, 2014
safety0ff
July 04, 2014
Through Reddit I've found a little benchmark. Here there is a D version, followed by a C99 version.

I compile with:
ldmd2 -O -release -inline -noboundscheck test1.d
gcc -O3 -std=c99 test2.c -o test2

I use gcc 4.8.0.

On an old PC the run-time of C version is about 5.2 seconds, while the D version compiled with the latest ldc2 runs in about 26 seconds (almost 5 times slower).

Bye,
bearophile

---------------------------------

// D version.
import core.stdc.stdio;

enum int t = 20;

bool isEvenlyDivisible(in int i, in int a, in int b)
pure nothrow @safe {
    if (i > b)
        return true;
    else
        return (a % i == 0) && isEvenlyDivisible(i + 1, a, b);
}

void run() nothrow {
    int i = 10;
    while (!isEvenlyDivisible(2, i, t))
        i += 2;
    printf("%d\n", i);
}

void main() {
    for (int i = 0; i < 5; i++)
      run;
}

---------------------------------

// C version.
#include <stdio.h>
#include <stdbool.h>

const int t = 20;

bool isEvenlyDivisible(const int i, const int a, const int b) {
    if (i > b)
        return true;
    else
        return (a % i == 0) && isEvenlyDivisible(i + 1, a, b);
}

void run() {
    int i = 10;
    while (!isEvenlyDivisible(2, i, t))
        i += 2;
    printf("%d\n", i);
}

int main() {
    for (int i = 0; i < 5; i++)
      run();
    return 0;
}

---------------------------------
July 04, 2014
On Friday, 4 July 2014 at 15:40:26 UTC, bearophile wrote:
> I compile with:
> ldmd2 -O -release -inline -noboundscheck test1.d
> gcc -O3 -std=c99 test2.c -o test2
>
> I use gcc 4.8.0.
>
> On an old PC the run-time of C version is about 5.2 seconds, while the D version compiled with the latest ldc2 runs in about 26 seconds (almost 5 times slower).

I repeated your test with ldc2, clang, and gcc.  The D and C programs both run in about 16 seconds when compiled with ldc2 and clang on my machine.  The gcc compiled C program ran in about 1.5 seconds.  So probably something LLVM specific?
July 04, 2014
On Friday, 4 July 2014 at 15:40:26 UTC, bearophile wrote:
> Through Reddit I've found a little benchmark. Here there is a D version, followed by a C99 version.
>
> I compile with:
> ldmd2 -O -release -inline -noboundscheck test1.d
> gcc -O3 -std=c99 test2.c -o test2
>
> […]

A quick test suggests that this seems to be a matter of GCC's optimizer picking up on something that LLVM's doesn't, as Clang 3.4.2 produces an executable just as slow/fast than that by LDC built against LLVM 3.4.2.

Would be interesting to look at the asm to see what is going on here.

David
July 04, 2014
David Nadlinger:

> A quick test suggests that this seems to be a matter of GCC's optimizer picking up on something that LLVM's doesn't, as Clang 3.4.2 produces an executable just as slow/fast than that by LDC built against LLVM 3.4.2.

So it looks like a ER/bug report for LLVM instead of LDC :-)

Bye,
bearophile
July 04, 2014
> David Nadlinger:
>
>> A quick test suggests that this seems to be a matter of GCC's optimizer picking up on something that LLVM's doesn't, as Clang 3.4.2 produces an executable just as slow/fast than that by LDC built against LLVM 3.4.2.

Discussed a bit in the llvm irc channel and then I have reported it:
http://llvm.org/bugs/show_bug.cgi?id=20205

Bye,
bearophile
July 06, 2014
On Friday, 4 July 2014 at 15:55:50 UTC, David Nadlinger wrote:
>
> Would be interesting to look at the asm to see what is going on here.

I reversed the gcc assembly output, gcc compiles the run() function as follows:

void run() {
    unsigned i = 10;
    for (;; i += 2)
    {
//    movl    $0x55555555, %ebp // magic contant for division via multiplication
//    mull    %ebp // edx = mulhi (magic, i)
//    shrl    %edx // edx >>= 1 (part of magic division)
//    leal    (%rdx,%rdx,2), %eax // eax = 3*rdx
//    if (i - eax != 0) // i - 3 * rdx = remainder
//        continue
        if ((i & 3) != 0) // above comments show how this line is implemented
            continue;     // see ch.10 section 3 of hacker's delight 2nd ed.
        if (i % 5 != 0)
            continue;
        if (i % 6 != 0)
            continue;
        if (i % 7 != 0)
            continue;
        if (isEvenlyDivisible(8, i, t))
            break;
    }

    printf("%d\n", i);
}

Without the division by 3 optimization the code runs slower with that check in place. Basically there's some epic wins in unrolling that function.
July 06, 2014
On Sunday, 6 July 2014 at 08:30:03 UTC, safety0ff wrote:
>
>         if ((i & 3) != 0) // above comments show how this line is implemented
>             continue;     // see ch.10 section 3 of hacker's delight 2nd ed.
>         if (i % 5 != 0)
>             continue;

Oops, this should be:
          if (i % 3 != 0) // above comments show how this line is implemented
             continue;     // see ch.10 section 3 of hacker's delight 2nd ed.
          if ((i & 3) != 0)
             continue;
          if (i % 5 != 0)
              continue;

Also, I switched the signed integers to unsigned to help simplify the assembly.