Thread overview
Taming the optimizer
Jun 14, 2018
Mike Franklin
Jun 14, 2018
Johan Engelen
Jun 15, 2018
David Nadlinger
June 14, 2018
I'm trying to run benchmarks on my memcpy implementation (https://forum.dlang.org/post/trenuawrekkbewjudmsy@forum.dlang.org) using LDC with optimizations enabled (e.g. LDC -O3 memcpyd.d).  In my first implementation, the optimizer stripped out most of the code I was trying to measure.

Using the information at https://stackoverflow.com/questions/40122141/preventing-compiler-optimizations-while-benchmarking, I've created this:

void use(void* p)
{
    version(LDC)
    {
        import ldc.llvmasm;
         __asm("", "r", p);
    }
}

void clobber()
{
    version(LDC)
    {
        import ldc.llvmasm;
        __asm("","~{memory}");
    }
}

// `f` is the function I wish to benchmark.  it's an
// implementation of memcpy in D
Duration benchmark(T, alias f)(const T* src, T* dst)
{
    enum iterations = 10_000_000;
    Duration result;
    auto sw = StopWatch(AutoStart.yes);

    sw.reset();
    foreach (_; 0 .. iterations)
    {
        f(src, dst);
        use(dst);
        clobber();
    }
    result = sw.peek();

    return result;
}

This seems to work, but I don't know that I've implemented it properly; especially the `use` function.  How would you write this to achieve a real-world optimized measurement?  What's the equivalent of...

static void escape(void *p) {
  asm volatile("" : : "g"(p) : "memory");
}

... in LDC inline assembly?

Thanks,
Mike
June 14, 2018
On Thursday, 14 June 2018 at 03:39:39 UTC, Mike Franklin wrote:
>
> Using the information at https://stackoverflow.com/questions/40122141/preventing-compiler-optimizations-while-benchmarking, I've created this:

Have you read this too?
https://llvm.org/docs/LangRef.html#inline-assembler-expressions

> This seems to work, but I don't know that I've implemented it properly; especially the `use` function.  How would you write this to achieve a real-world optimized measurement?  What's the equivalent of...
>
> static void escape(void *p) {
>   asm volatile("" : : "g"(p) : "memory");
> }

Your use function may be correct, I'm not 100% sure. The escape function you ask for is clobbering _all_ memory (not only the memory accessible through `p`), so that then becomes:

void escape(void* p)
{
         import ldc.llvmasm;
          __asm("", "r,~{memory}", p); // added the memory clobber here
}

-Johan



June 15, 2018
Hi Mike,

On 14 Jun 2018, at 4:39, Mike Franklin via digitalmars-d-ldc wrote:
> What's the equivalent of...
>
> static void escape(void *p) {
>   asm volatile("" : : "g"(p) : "memory");
> }
>
> ... in LDC inline assembly?

As you probably found out already, the LLVM flavour inline assembly is somewhat sparsely documented. However, Clang supports GCC-style inline assembly, so if you have a piece of (GC)C code that does what you want, you can use its LLVM IR output as a guide.

For example, this is the relevant code generated for `escape`, obtained using `clang -emit-llvm -S` (on Apple clang-900.0.39.2):

  call void asm sideeffect "", "imr,~{memory},~{dirflag},~{fpsr},~{flags}"(i8* %0)

As the clobber string is generated programmatically, it might not always be as concise as possible, though, and sometimes there might be more than one supported syntax for the same concept (like `g` and `imr`).

If you wanted to come up with a well-researched proposal for druntime intrinsics to do these things (e.g. how would escape/use interact with CSE on strongly pure functions), this would be a very valuable contribution to the state of benchmarking in D.

 — David