gdc or ldc for faster programs? (page 4)

Settings

Help

Index » Learn » gdc or ldc for faster programs? (page 4)

January 27, 2022

Re: gdc or ldc for faster programs?

Posted by Johan Engelen
in reply to Ali Çehreli

Permalink

Johan Engelen

Posted in reply to Ali Çehreli

Permalink

On Thursday, 27 January 2022 at 16:46:59 UTC, Ali Çehreli wrote:
>
> What I know is that weak symbols can be overridden by strong symbols during linking. Which means, if a function body is inlined which also has a weak symbol, some part of the program may be using the inlined definition and some other parts may be using the overridden definition. Thanks to separate compilation, they need not match hence the violation of the one-definition rule (ODR).

But the language requires ODR, so we can emit templates as weak_odr, telling the optimizer and linker that the symbols should be merged _and_ that ODR can be assumed to hold (i.e. inlining is OK).
The onus of honouring ODR is on the user - not the compiler - because we allow the user to do separate compilation. Some more detailed explanation and example:
https://stackoverflow.com/questions/44335046/how-does-the-linker-handle-identical-template-instantiations-across-translation/44346057

-Johan

January 27, 2022

Re: gdc or ldc for faster programs?

Posted by Siarhei Siamashka
in reply to Johan Engelen

Permalink

Siarhei Siamashka

Posted in reply to Johan Engelen

Permalink

On Thursday, 27 January 2022 at 18:12:18 UTC, Johan Engelen wrote:
> But the language requires ODR, so we can emit templates as weak_odr, telling the optimizer and linker that the symbols should be merged _and_ that ODR can be assumed to hold (i.e. inlining is OK).

Thanks! This was also my impression. But the problem is that Iain Buclaw seems to disagree with us. He claims that template functions must be overridable by global functions and this is supposed to inhibit template functions inlining. Is there any independent source to back up your or Iain's claim?

> The onus of honouring ODR is on the user - not the compiler - because we allow the user to do separate compilation.

My own limited experiments with various code snippets convinced me that D compilers actually try their best to prevent ODR violation, so it isn't like users can easily hurt themselves: https://forum.dlang.org/thread/cstjhjvmmibonbajwbbl@forum.dlang.org

Also module names are added as a part of function names mangling. Having an accidental clash of symbol names shouldn't be very likely in a valid D project. Though I'm not absolutely sure whether this provides a sufficient safety net.

January 28, 2022

Re: gdc or ldc for faster programs?

Posted by Iain Buclaw
in reply to Siarhei Siamashka

Permalink

Iain Buclaw

Posted in reply to Siarhei Siamashka

Permalink

On Thursday, 27 January 2022 at 20:28:40 UTC, Siarhei Siamashka wrote:
> On Thursday, 27 January 2022 at 18:12:18 UTC, Johan Engelen wrote:
>> But the language requires ODR, so we can emit templates as weak_odr, telling the optimizer and linker that the symbols should be merged _and_ that ODR can be assumed to hold (i.e. inlining is OK).
>
> Thanks! This was also my impression. But the problem is that Iain Buclaw seems to disagree with us. He claims that template functions must be overridable by global functions and this is supposed to inhibit template functions inlining. Is there any independent source to back up your or Iain's claim?
>

For example, druntime depends on this behaviour.

Template: https://github.com/dlang/druntime/blob/a0ad8c42c15942faeeafb016e81a360113ae1b6b/src/rt/config.d#L46-L58

Regular symbol: https://github.com/dlang/druntime/blob/a17bb23b418405e1ce8e4a317651039758013f39/test/config/src/test19433.d#L1

If we can rely on instantiated symbols to not violate ODR, then you would be able to put symbols in the .link-once section.  However all duplicates must also be in the .link-once section, else you'll get duplicate definition errors.

January 28, 2022

Re: gdc or ldc for faster programs?

Posted by Siarhei Siamashka
in reply to Iain Buclaw

Permalink

Siarhei Siamashka

Posted in reply to Iain Buclaw

Permalink

On Friday, 28 January 2022 at 18:02:27 UTC, Iain Buclaw wrote:

For example, druntime depends on this behaviour.

Template: https://github.com/dlang/druntime/blob/a0ad8c42c15942faeeafb016e81a360113ae1b6b/src/rt/config.d#L46-L58

Ouch. From where I stand, this looks like some really ugly hack abusing both the template keyword and mangle pragma. Presumably intended to implement this part of the spec: https://dlang.org/library/rt/config.html

Moreover, these are even global variables rather than functions. Wouldn't it make more sense to use a special "weak" attribute for this particular use case? I see that there was a related discussion here: https://forum.dlang.org/post/rgmp5d$198g$1@digitalmars.com

Regular symbol: https://github.com/dlang/druntime/blob/a17bb23b418405e1ce8e4a317651039758013f39/test/config/src/test19433.d#L1

If we can rely on instantiated symbols to not violate ODR, then you would be able to put symbols in the .link-once section. However all duplicates must also be in the .link-once section, else you'll get duplicate definition errors.

Duplicate definition errors are surely better than something fishy silently happening under the hood. They can be solved when/if we encounter them. That said, I can confirm that GDC 10 indeed fails with multiple definition of 'rt_cmdline_enabled' linker error when trying to compile:

extern(C) __gshared bool rt_cmdline_enabled = false;
void main() { }

But can't GDC just use something like this in rt/config.d to solve the problem?

version(GNU) {
    import gcc.attribute;
    pragma(mangle, "rt_envvars_enabled") @attribute("weak") __gshared bool rt_envvars_enabled_ = false;
    pragma(mangle, "rt_cmdline_enabled") @attribute("weak") __gshared bool rt_cmdline_enabled_ = true;
    pragma(mangle, "rt_options") @attribute("weak") __gshared string[] rt_options_ = [];
    bool rt_envvars_enabled()() { return rt_envvars_enabled_; }
    bool rt_cmdline_enabled()() { return rt_cmdline_enabled_; }
    bool rt_options()() { return rt_options_; }
} else {
    // put each variable in its own COMDAT by making them template instances
    template rt_envvars_enabled()
    {
        pragma(mangle, "rt_envvars_enabled") __gshared bool rt_envvars_enabled = false;
    }
    template rt_cmdline_enabled()
    {
        pragma(mangle, "rt_cmdline_enabled") __gshared bool rt_cmdline_enabled = true;
    }
    template rt_options()
    {
        pragma(mangle, "rt_options") __gshared string[] rt_options = [];
    }
}

January 29, 2022

Re: gdc or ldc for faster programs?

Posted by Salih Dincer
in reply to Ali Çehreli

Permalink

Salih Dincer

Posted in reply to Ali Çehreli

Permalink

On Wednesday, 26 January 2022 at 18:00:41 UTC, Ali Çehreli wrote:

For completeness (and noise :/) here is the final version of the program:

Could you also try the following code with the same configurations?

struct LongScale {
  struct ShortStack {
    short[] stack;
    size_t index;

    @property back() {
      return this.stack[0];
    }

    @property push(short data) {
      this.stack ~= data;
      this.index++;
    }

    @property pop() {
     return this.stack[--this.index];
    }
  }

  ShortStack stack;

  this(long i) {
    long s, t = i;
    for(long e = 3; e <= 18; e += 3) {
      s = 10^^e;
      stack.push = cast(short)((t % s) / (s/1000L));
      t -= t % s;
    }
    stack.push = cast(short)(t / s);
  }

  string toString() {
    string[] scale = [" zero", "thousand", "million",
    "billion", "trillion", "quadrillion", "quintillion"];
    string r;
    for(long e = 6; e > 0; e--) {
      auto t = stack.pop;
      r ~= t > 1 ? " " ~to!string(t) : t ? " one" : "";
      r ~= t ? " " ~scale[e] : "";
    }
    r ~= stack.back ? " " ~to!string(stack.back) : "";
    return r.length ? r : scale[0];
  }
}

import std.conv, std.stdio;
void main()
{
  long[] inputs = [ 741, 1_500, 2_001,
  5_005, 1_250_000, 3_000_042, 10_000_000,
  1_000_000, 2_000_000, 100_000, 200_000,
  10_000, 20_000, 1_000, 2_000, 74, 7, 0,
  1_999_999_999_999];

  foreach(long i; inputs) {
    auto OUT = LongScale(i);
    auto STR = OUT.toString[1..$];
    writefln!"%s"(STR);
  }
}

January 29, 2022

Re: gdc or ldc for faster programs?

Posted by Ali Çehreli
in reply to Salih Dincer

Permalink

Ali Çehreli

Posted in reply to Salih Dincer

Permalink

On 1/29/22 10:04, Salih Dincer wrote:

> Could you also try the following code with the same configurations?

The program you posted with 2 million random values:

ldc 1.9 seconds
gdc 2.3 seconds
dmd 2.8 seconds

I understand such short tests are not definitive but to have a rough idea between two programs, the last version of my program that used sprintf with 2 million numbers takes less time:

ldc 0.4 seconds
gdc 0.5 seconds
dmd 0.5 seconds

(And now we know gdc can go about 7% faster with additional command line switches.)

Ali

January 29, 2022

Re: gdc or ldc for faster programs?

Posted by max haughton
in reply to Ali Çehreli

Permalink

max haughton

Posted in reply to Ali Çehreli

Permalink

On Saturday, 29 January 2022 at 18:28:06 UTC, Ali Çehreli wrote:
> On 1/29/22 10:04, Salih Dincer wrote:
>
> > Could you also try the following code with the same
> configurations?
>
> The program you posted with 2 million random values:
>
> ldc 1.9 seconds
> gdc 2.3 seconds
> dmd 2.8 seconds
>
> I understand such short tests are not definitive but to have a rough idea between two programs, the last version of my program that used sprintf with 2 million numbers takes less time:
>
> ldc 0.4 seconds
> gdc 0.5 seconds
> dmd 0.5 seconds
>
> (And now we know gdc can go about 7% faster with additional command line switches.)
>
> Ali

You need to be compiling with PGO to test the compilers optimizer to the maximum. Without PGO they have to assume a fairly conservative flow through the code which means things like inlining and register allocation are effectively flying blind.

January 29, 2022

Re: gdc or ldc for faster programs?

Posted by Siarhei Siamashka
in reply to Ali Çehreli

Permalink

Siarhei Siamashka

Posted in reply to Ali Çehreli

Permalink

On Saturday, 29 January 2022 at 18:28:06 UTC, Ali Çehreli wrote:

(And now we know gdc can go about 7% faster with additional command line switches.)

No, we don't know this yet ;-) That's just what I said and I may be bullshitting. Or the configuration of my computer is significantly different from yours and the exact speedup/slowdown number may be different. So please verify it yourself. You can edit your dub.json file to add the following line to it:

"dflags-gdc": ["-fno-weak-templates"],

Then rebuild your spellout test program with gdc (just like you did before), run benchmarks and report results. The '-fno-weak-templates' option should show up in the gdc invocation command line.

January 30, 2022

Re: gdc or ldc for faster programs?

Posted by Salih Dincer
in reply to Ali Çehreli

Permalink

Salih Dincer

Posted in reply to Ali Çehreli

Permalink

On Saturday, 29 January 2022 at 18:28:06 UTC, Ali Çehreli wrote:
> On 1/29/22 10:04, Salih Dincer wrote:
>
> > Could you also try the following
> > code with the same configurations?
>
> The program you posted with 2 million random values:
>
> ldc 1.9 seconds
> gdc 2.3 seconds
> dmd 2.8 seconds
>
> I understand such short tests are not definitive but to have a rough idea between two programs, the last version of my program that used sprintf with 2 million numbers takes less time...
>

sprintf() might be really fast, but your algorithm is definitely 2.5x faster than mine! (with LDC) I couldn't compile with GDC. Theoretically, I might have lost the challenge :)

With love and respect...

January 31, 2022

Re: gdc or ldc for faster programs?

Posted by Patrick Schluter
in reply to Elronnd

Permalink

Patrick Schluter

Posted in reply to Elronnd

Permalink

On Tuesday, 25 January 2022 at 22:41:35 UTC, Elronnd wrote:
> On Tuesday, 25 January 2022 at 22:33:37 UTC, H. S. Teoh wrote:
>> interesting because idivl is known to be one of the slower instructions, but gdc nevertheless considered it not worthwhile to replace it, whereas ldc seems obsessed about avoid idivl at all costs.
>
> Interesting indeed.  Two remarks:
>
> 1. Actual performance cost of div depends a lot on hardware.  IIRC on my old intel laptop it's like 40-60 cycles; on my newer amd chip it's more like 20; on my mac it's ~10.  GCC may be assuming newer hardware than llvm.  Could be worth popping on a -march=native -mtune=native.  Also could depend on how many ports can do divs; i.e. how many of them you can have running at a time.
>
> 2. LLVM is more aggressive wrt certain optimizations than gcc, by default.  Though I don't know how relevant that is at -O3.

-O3 often chooses longer code and unrollsmore agressively inducing higher miss rates in the instruction caches.
-O2 can beat -O3 in some cases when code size is important.

Top | Forum index | About this forum

Forums