gdc or ldc for faster programs? (page 3)

Settings

Help

Index » Learn » gdc or ldc for faster programs? (page 3)

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Johan
in reply to Iain Buclaw

Permalink

Johan

Posted in reply to Iain Buclaw

Permalink

On Wednesday, 26 January 2022 at 11:25:47 UTC, Iain Buclaw wrote:
> On Wednesday, 26 January 2022 at 04:28:25 UTC, Ali Çehreli wrote:
>> On 1/25/22 16:15, Johan wrote:
>> > On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli
>> wrote:
>> >>
>> >> I am using compilers installed by Manjaro Linux's package
>> system:
>> >>
>> >> ldc: LDC - the LLVM D compiler (1.28.0):
>> >>   based on DMD v2.098.0 and LLVM 13.0.0
>> >>
>> >> gdc: dc (GCC) 11.1.0
>> >>
>> >> dmd: DMD64 D Compiler v2.098.1
>> >
>> > What phobos version is gdc using?
>>
>> Oh! Good question. Unfortunately, I don't think Phobos modules contain that information. The following line outputs 2076L:
>>
>> pragma(msg, __VERSION__);
>>
>> So, I guess I've been comparing apples to oranges but in this case an older gdc is doing pretty well.
>>
>
> Doubt it.  Functions such as to(), map(), etc. have pretty much remained unchanged for the last 6-7 years.

The stdlib makes a huge difference in performance.
Ali's program uses string manipulation, GC, ... much more than to() and map().

Quick test on my M1 macbook:
LDC1.27, arm64 binary (native): ~0.83s
LDC1.21, x86_64 binary (rosetta, not native to CPU instruction set): ~0.75s
Couldn't test with LDC 1.6 (dlang2.076), because it is too old and not running on M1/Monterey (?).

-Johan

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Ali Çehreli
in reply to Johan

Permalink

Ali Çehreli

Posted in reply to Johan

Permalink

On 1/26/22 04:06, Johan wrote:

> The stdlib makes a huge difference in performance.
> Ali's program uses string manipulation,

Yes, on the surface, I thought my inner loop had just / and % but of course there is that formattedWrite. I will change the code to use sprintf into a static buffer (instead of the current Appender).

> GC

That shouldn't affect it because there are just about 8 allocations to be shared in the Appender.

> , ... much more than to()

Not in the 2 million loop.

> and
> map().

Only in the initialization.

> Quick test on my M1 macbook:
> LDC1.27, arm64 binary (native): ~0.83s
> LDC1.21, x86_64 binary (rosetta, not native to CPU instruction set): ~0.75s

I think std.format gained abilities over the years. I will report back.

Ali

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Steven Schveighoffer
in reply to Johan

Permalink

Steven Schveighoffer

Posted in reply to Johan

Permalink

On 1/26/22 7:06 AM, Johan wrote:

Couldn't test with LDC 1.6 (dlang2.076), because it is too old and not running on M1/Monterey (?).

There was a range of macos dmd binaries that did not work after a certain MacOS. I think it had to do with the hack for TLS that apple changed, so it no longer worked.

-Steve

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Ali Çehreli
in reply to Ali Çehreli

Permalink

Ali Çehreli

Posted in reply to Ali Çehreli

Permalink

ldc shines with sprintf. And dmd suprises by being a little bit faster than gdc! (?)

ldc (2.098.0): ~6.2 seconds
dmd (2.098.1): ~7.4 seconds
gdc (2.076.?): ~7.5 seconds

Again, here are the versions of the compilers that are readily available on my system:

> ldc: LDC - the LLVM D compiler (1.28.0):
>    based on DMD v2.098.0 and LLVM 13.0.0
>
> gdc: dc (GCC) 11.1.0 (Uses dmd 2.076 front end)
>
> dmd: DMD64 D Compiler v2.098.1

They were compiled with

  dub run --compiler=<COMPILER> --build=release-nobounds --verbose

where <COMPILER> was ldc, dmd, or gdc.

I replaced formattedWrite in the code with sprintf. For example, the inner loop became

  foreach (divider; dividers!T.retro) {
    const quotient = number / divider.value;

    if (quotient) {
      output += sprintf(output, fmt!T.ptr, quotient, divider.word.ptr);
    }

    number %= divider.value;
  }
}

For completeness (and noise :/) here is the final version of the program:

module spellout.spellout;

// This program was written as a programming kata to spell out
// certain parts of integers as in "1 million 2 thousand
// 42". Note that this way of spelling-out numbers is not
// grammatically correct in English.

// Returns a string that contains the partly spelled-out version
// of the parameter.
//
// You must copy the returned string when needed as this function
// uses the same internal buffer for all invocations of the same
// template instance.
auto spellOut(T)(in T number_) {
  import std.string : strip;
  import std.traits : Unqual;
  import std.meta : AliasSeq;
  import core.stdc.stdio : sprintf;

  enum longestString =
    "negative 9 quintillion 223 quadrillion 372 trillion" ~
    " 36 billion 854 million 775 thousand 808";

  static char[longestString.length + 1] buffer;
  auto output = buffer.ptr;

  // We treat these specially because the algorithm below does
  // 'number = -number' and calls the same implementation
  // function. The trouble is, for example, -int.min is still a
  // negative number.
  alias problematics = AliasSeq!(
    byte, "negative 128",
    short, "negative 32 thousand 768",
    int, "negative 2 billion 147 million 483 thousand 648",
    long, longestString);

  static assert((problematics.length % 2) == 0);

  static foreach (i, P; problematics) {
    static if (i % 2) {
      // This is a string; skip

    } else {
      // This is a problematic type
      static if (is (T == P)) {
        // Our T happens to be this problematic type
        if (number_ == T.min) {
          // and we are dealing with a problematic value
          output += sprintf(output, problematics[i + 1].ptr);
          return buffer[0 .. (output - buffer.ptr)];
        }
      }
    }
  }

  auto number = cast(Unqual!T)number_; // Thanks 'in'! :p

  if (number == 0) {
    output += sprintf(output, "zero");

  } else {
    if (number < 0) {
      output += sprintf(output, "negative");
      static if (T.sizeof < int.sizeof) {
        // Being careful with implicit conversions. (See the dmd
        // command line switch -preview=intpromote)
        number = cast(T)(-cast(int)number);

      } else {
        number = -number;
      }
    }

    spellOutImpl(number, output);
  }

  return buffer[0 .. (output - buffer.ptr)].strip;
}

unittest {
  assert(1_001_500.spellOut == "1 million 1 thousand 500");
  assert((-1_001_500).spellOut ==
         "negative 1 million 1 thousand 500");
  assert(1_002_500.spellOut == "1 million 2 thousand 500");
}

template fmt(T) {
  static if (is (T == long)||
             is (T == ulong)) {
    static fmt = " %lld %s";

  } else {
    static fmt = " %u %s";
  }
}

import std.format : format;

void spellOutImpl(T)(T number, ref char * output)
in (number > 0, format!"Invalid number: %s"(number)) {
  import std.range : retro;
  import core.stdc.stdio : sprintf;

  foreach (divider; dividers!T.retro) {
    const quotient = number / divider.value;

    if (quotient) {
      output += sprintf(output, fmt!T.ptr, quotient, divider.word.ptr);
    }

    number %= divider.value;
  }
}

struct Divider(T) {
  T value;        // 1_000, 1_000_000, etc.
  string word;    // "thousand", etc
}

// Returns the words related with the provided size of an
// integral type. The parameter is number of bytes
// e.g. int.sizeof
auto words(size_t typeSize) {
  // This need not be recursive at all but it was fun using
  // recursion.
  final switch (typeSize) {
  case 1: return [ "" ];
  case 2: return words(1) ~ [ "thousand" ];
  case 4: return words(2) ~ [ "million", "billion" ];
  case 8: return words(4) ~ [ "trillion", "quadrillion", "quintillion" ];
  }
}

unittest {
  // These are relevant words for 'int' and 'uint' values:
  assert(words(4) == [ "", "thousand", "million", "billion" ]);
}

// Returns a Divider!T array associated with T
auto dividers(T)() {
  import std.range : array, enumerate;
  import std.algorithm : map;

  static const(Divider!T[]) result =
    words(T.sizeof)
    .enumerate!T
    .map!(t => Divider!T(cast(T)(10^^(t.index * 3)), t.value))
    .array;

  return result;
}

unittest {
  // Test a few entries
  assert(dividers!int[1] == Divider!int(1_000, "thousand"));
  assert(dividers!ulong[3] == Divider!ulong(1_000_000_000, "billion"));
}

void main() {
  version (test) {
    return;
  }

  import std.meta : AliasSeq;
  import std.stdio : writefln;
  import std.random : Random, uniform;
  import std.conv : to;

  static foreach (T; AliasSeq!(byte, ubyte, short, ushort,
                               int, uint, long, ulong)) {{
      // A few numbers for each type
      report(T.min);
      report((T.max / 4).to!T);  // Overcome int promotion for
                                 // shorter types because I want
                                 // to test with the exact type
                                 // e.g. for byte.
      report(T.max);
    }}

  enum count = 20_000_000;
  writefln!"Testing with %,s random numbers"(spellOut(count));

  // Use the same seed to be fair between compilations
  enum seed = 0;
  auto rnd = Random(seed);

  ulong totalLength;
  foreach (i; 0 .. count) {
    const number = uniform(int.min, int.max, rnd);
    const result = spellOut(number);
    totalLength += result.length;
  }

  writefln!("A meaningless number to prevent the compiler from" ~
            " removing the entire loop: %,s")(totalLength);
}

void report(T)(T number) {
  import std.stdio : writefln;
  writefln!"  %6s % ,s: %s"(T.stringof, number, spellOut(number));
}

Ali

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Iain Buclaw
in reply to forkit

Permalink

Iain Buclaw

Posted in reply to forkit

Permalink

On Wednesday, 26 January 2022 at 11:43:39 UTC, forkit wrote:
> On Wednesday, 26 January 2022 at 11:25:47 UTC, Iain Buclaw wrote:
>>
>> Whenever I've watched talks/demos where benchmarks were the central topic, GDC has always blown LDC out the water when it comes to matters of math.
>> ..
>
> https://dlang.org/blog/2020/05/14/lomutos-comeback/

Andrei forgot to do a follow up where one weird trick makes the gdc compiled lumutos same speed as C++ (and faster than ldc).

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Siarhei Siamashka
in reply to Ali Çehreli

Permalink

Siarhei Siamashka

Posted in reply to Ali Çehreli

Permalink

On Wednesday, 26 January 2022 at 18:00:41 UTC, Ali Çehreli wrote:
> ldc shines with sprintf. And dmd suprises by being a little bit faster than gdc! (?)
>
> ldc (2.098.0): ~6.2 seconds
> dmd (2.098.1): ~7.4 seconds
> gdc (2.076.?): ~7.5 seconds
>
> Again, here are the versions of the compilers that are readily available on my system:
>
> > ldc: LDC - the LLVM D compiler (1.28.0):
> >    based on DMD v2.098.0 and LLVM 13.0.0
> >
> > gdc: dc (GCC) 11.1.0 (Uses dmd 2.076 front end)

It's not DMD doing a good job here, but GDC11 shooting itself in the foot by requiring additional  esoteric command line options if you really want to produce optimized binaries. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102765 for more details.

You can try to re-run your benchmark after adding '-flto' or '-fno-weak-templates' to GDC command line. I see a ~7% speedup for your code on my computer.

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Iain Buclaw
in reply to Siarhei Siamashka

Permalink

Iain Buclaw

Posted in reply to Siarhei Siamashka

Permalink

On Wednesday, 26 January 2022 at 18:39:07 UTC, Siarhei Siamashka wrote:
>
> It's not DMD doing a good job here, but GDC11 shooting itself in the foot by requiring additional  esoteric command line options if you really want to produce optimized binaries.

The D language shot itself in the foot by requiring templates to have weak semantics.

If DMD and LDC inline weak functions, that's their bug.

January 26, 2022

Re: gdc or ldc for faster programs?

Posted by Siarhei Siamashka
in reply to Iain Buclaw

Permalink

Siarhei Siamashka

Posted in reply to Iain Buclaw

Permalink

On Wednesday, 26 January 2022 at 18:41:51 UTC, Iain Buclaw wrote:
> The D language shot itself in the foot by requiring templates to have weak semantics.
>
> If DMD and LDC inline weak functions, that's their bug.

As I already mentioned in the bugzilla, it would be really useful to see a practical example of DMD and LDC running into troubles because of mishandling weak templates. I was never able to find anything about "requiring templates to have weak semantics" anywhere in the Dlang documentation or on the Internet. Asking for clarification in this forum yielded no results either. Maybe I'm missing something obvious when reading the https://dlang.org/spec/template.html page?

I have no doubt that you have your own opinion about how this stuff is supposed to work, but I have no crystal ball and don't know what's happening in your head.

January 27, 2022

Re: gdc or ldc for faster programs?

Posted by Ali Çehreli
in reply to Siarhei Siamashka

Permalink

Ali Çehreli

Posted in reply to Siarhei Siamashka

Permalink

On 1/26/22 11:07, Siarhei Siamashka wrote:
> On Wednesday, 26 January 2022 at 18:41:51 UTC, Iain Buclaw wrote:
>> The D language shot itself in the foot by requiring templates to have
>> weak semantics.
>>
>> If DMD and LDC inline weak functions, that's their bug.
>
> As I already mentioned in the bugzilla, it would be really useful to see
> a practical example of DMD and LDC running into troubles because of
> mishandling weak templates.

I am not experienced enough to answer but the way I understand weak symbols, it is possible to run into trouble but it will probably never happen. When it happens, I suspect people can find workarounds like disabling inlining.

> I was never able to find anything about
> "requiring templates to have weak semantics" anywhere in the Dlang
> documentation or on the Internet.

The truth is some part of D's spec is the implementation. When I compile the following program (with dmd)

void foo(T)() {}

void main() {
  foo!int();
}

I see that template instantiations are linked through weak symbols:

$ nm deneme | grep foo
[...]
0000000000021380 W _D6deneme__T3fooTiZQhFNaNbNiNfZv

What I know is that weak symbols can be overridden by strong symbols during linking. Which means, if a function body is inlined which also has a weak symbol, some part of the program may be using the inlined definition and some other parts may be using the overridden definition. Thanks to separate compilation, they need not match hence the violation of the one-definition rule (ODR).

Ali

January 27, 2022

Re: gdc or ldc for faster programs?

Posted by H. S. Teoh
in reply to Ali Çehreli

Permalink

H. S. Teoh

Posted in reply to Ali Çehreli

Permalink

On Thu, Jan 27, 2022 at 08:46:59AM -0800, Ali Çehreli via Digitalmars-d-learn wrote: [...]
> I see that template instantiations are linked through weak symbols:
> 
> $ nm deneme | grep foo
> [...]
> 0000000000021380 W _D6deneme__T3fooTiZQhFNaNbNiNfZv
> 
> What I know is that weak symbols can be overridden by strong symbols during linking.
[...]

Yes, and it also means that only one copy of the symbol will make it into the executable. This is one of the ways we leverage the linker to eliminate (merge) duplicate template instantiations.


T

-- 
Claiming that your operating system is the best in the world because more people use it is like saying McDonalds makes the best food in the world. -- Carl B. Constantine

Top | Forum index | About this forum

Forums