Jump to page: 1 25  
Page
Thread overview
gdc or ldc for faster programs?
Jan 25, 2022
Ali Çehreli
Jan 25, 2022
Johan
Jan 25, 2022
Ali Çehreli
Jan 25, 2022
forkit
Jan 25, 2022
H. S. Teoh
Jan 25, 2022
Adam D Ruppe
Jan 25, 2022
Daniel N
Jan 25, 2022
Ali Çehreli
Jan 25, 2022
H. S. Teoh
Mar 11, 2022
Chris Piker
Jan 25, 2022
H. S. Teoh
Jan 25, 2022
Ali Çehreli
Jan 25, 2022
H. S. Teoh
Jan 25, 2022
Elronnd
Jan 25, 2022
H. S. Teoh
Jan 31, 2022
Patrick Schluter
Jan 31, 2022
Elronnd
Jan 31, 2022
Siarhei Siamashka
Mar 09, 2022
Iain Buclaw
Jan 25, 2022
Ali Çehreli
Jan 25, 2022
Ali Çehreli
Jan 26, 2022
Johan
Jan 26, 2022
Ali Çehreli
Jan 26, 2022
Iain Buclaw
Jan 26, 2022
forkit
Jan 26, 2022
Iain Buclaw
Jan 26, 2022
Johan
Jan 26, 2022
Ali Çehreli
Jan 26, 2022
Ali Çehreli
Jan 26, 2022
Siarhei Siamashka
Jan 26, 2022
Iain Buclaw
Jan 26, 2022
Siarhei Siamashka
Jan 27, 2022
Ali Çehreli
Jan 27, 2022
H. S. Teoh
Jan 27, 2022
Johan Engelen
Jan 27, 2022
Siarhei Siamashka
Jan 28, 2022
Iain Buclaw
Jan 28, 2022
Siarhei Siamashka
Jan 29, 2022
Salih Dincer
Jan 29, 2022
Ali Çehreli
Jan 29, 2022
max haughton
Jan 29, 2022
Siarhei Siamashka
Jan 30, 2022
Salih Dincer
January 25, 2022
Sorry for being vague and not giving the code here but a program I wrote about spelling-out parts of a number (in Turkish) as in "1 milyon 42" runs much faster with gdc.

The program integer-divides the number in a loop to find quotients and adds the word next to it. One obvious optimization might be to use POSIX div() and friends to get the quotient and the remainder at one shot but I made myself believe that the compilers already do that. (But still not sure. :o) )

I am not experienced with dub but I used --build=release-nobounds and verified that -O3 is used for both compilers. (I also tried building manually with GNU 'make' with e.g. -O5 and the results were similar.)

For a test run for 2 million numbers:

ldc: ~0.95 seconds
gdc: ~0.79 seconds
dmd: ~1.77 seconds

I am using compilers installed by Manjaro Linux's package system:

ldc: LDC - the LLVM D compiler (1.28.0):
  based on DMD v2.098.0 and LLVM 13.0.0

gdc: dc (GCC) 11.1.0

dmd: DMD64 D Compiler v2.098.1

I've been mainly a dmd person for various reasons and was under the impression that ldc was the clear winner among the three. What is your experience? Does gdc compile faster programs in general? Would ldc win if I took advantage of e.g. link-time optimizations?

Ali
January 25, 2022
On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
>
> I am not experienced with dub but I used --build=release-nobounds and verified that -O3 is used for both compilers. (I also tried building manually with GNU 'make' with e.g. -O5 and the results were similar.)

`-O5` does not do anything different than `-O3` for LDC.

> For a test run for 2 million numbers:
>
> ldc: ~0.95 seconds
> gdc: ~0.79 seconds
> dmd: ~1.77 seconds
>
> I am using compilers installed by Manjaro Linux's package system:
>
> ldc: LDC - the LLVM D compiler (1.28.0):
>   based on DMD v2.098.0 and LLVM 13.0.0
>
> gdc: dc (GCC) 11.1.0
>
> dmd: DMD64 D Compiler v2.098.1
>
> I've been mainly a dmd person for various reasons and was under the impression that ldc was the clear winner among the three. What is your experience? Does gdc compile faster programs in general? Would ldc win if I took advantage of e.g. link-time optimizations?

Tough to say. Of course DMD is not a serious contender, but I believe the difference between GDC and LDC is very small and really in the details, i.e. you'll have to look at assembly to find out the delta.
Have you tried `--enable-cross-module-inlining` with LDC?

-Johan


January 25, 2022
On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
> ldc: ~0.95 seconds
> gdc: ~0.79 seconds
> dmd: ~1.77 seconds

Not surprising at all: gdc is excellent and underrated in the community.
January 25, 2022
On Tue, Jan 25, 2022 at 11:52:17AM -0800, Ali Çehreli via Digitalmars-d-learn wrote:
> Sorry for being vague and not giving the code here but a program I wrote about spelling-out parts of a number (in Turkish) as in "1 milyon 42" runs much faster with gdc.
> 
> The program integer-divides the number in a loop to find quotients and adds the word next to it. One obvious optimization might be to use POSIX div() and friends to get the quotient and the remainder at one shot but I made myself believe that the compilers already do that. (But still not sure. :o))

Don't guess at what the compilers are doing; disassemble the binary and see for yourself exactly what the difference is. Use run.dlang.io for a convenient interface that shows you exactly how the compilers translated your code. Or if you're macho, use `objdump -d` and search for _Dmain (or the specific function if you know how it's mangled).


> I am not experienced with dub but I used --build=release-nobounds and verified that -O3 is used for both compilers. (I also tried building manually with GNU 'make' with e.g. -O5 and the results were similar.)
> 
> For a test run for 2 million numbers:
> 
> ldc: ~0.95 seconds
> gdc: ~0.79 seconds
> dmd: ~1.77 seconds

For measurements under 1 second, I'm skeptical of the accuracy, because there could be all kinds of background noise, CPU interrupts and stuff that could be skewing the numbers.  What about do a best-of-3-runs with 20 million numbers (expected <20 seconds per run) and see how the numbers look?

Though having said all that, I can say at least that dmd's relatively poor performance seems in line with my previous observations. :-P The difference between ldc and gdc is harder to pinpoint; they each have different optimizers that could work better or worse than the other depending on the specifics of what the program is doing.


[...]
> I've been mainly a dmd person for various reasons and was under the impression that ldc was the clear winner among the three. What is your experience? Does gdc compile faster programs in general? Would ldc win if I took advantage of e.g. link-time optimizations?
[...]

I'm not sure LDC is the clear winner.  I only prefer LDC because LDC's architecture makes it easier for cross-compilation (with GCC/GDC you need to jump through a lot more hoops to get a working cross compiler). GDC is also tied to the GCC release cycle, and tends to be several language versions behind LDC.  But both compilers have excellent optimizers, but they are definitely different so for some things GDC will beat LDC, for other things LDC will beat GDC. It may depend on the specific optimization flags you use as well.

But these sorts of statements are just generalizations. The best way to find out for sure is to disassemble the executable and see for yourself what the assembly looks like. :-)


T

-- 
Public parking: euphemism for paid parking. -- Flora
January 25, 2022
On Tuesday, 25 January 2022 at 20:04:04 UTC, Adam D Ruppe wrote:
> On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
>> ldc: ~0.95 seconds
>> gdc: ~0.79 seconds
>> dmd: ~1.77 seconds
>

Maybe you can try --ffast-math on ldc.
January 25, 2022
On 1/25/22 11:52, Ali Çehreli wrote:

> a program I wrote about spelling-out parts of a number

Here is the program as a single module:

module spellout.spellout;

// This program was written as a code kata to spell out
// certain parts of integers as in "1 million 2 thousand
// 42". Note that this way of spelling-out numbers is not
// grammatically correct in English.

// Returns a string that contains the partly spelled-out version
// of the parameter.
//
// You must copy the returned string when needed as this function
// uses the same internal buffer for all invocations of the same
// template instance.
auto spellOut(T)(in T number_) {
  import std.array : Appender;
  import std.string : strip;
  import std.traits : Unqual;
  import std.meta : AliasSeq;

  static Appender!(char[]) result;
  result.clear;

  // We treat these specially because the algorithm below does
  // 'number = -number' and calls the same implementation
  // function. The trouble is, for example, -int.min is still a
  // negative number.
  alias problematics = AliasSeq!(
    byte, "negative 128",
    short, "negative 32 thousand 768",
    int, "negative 2 billion 147 million 483 thousand 648",
    long, "negative 9 quintillion 223 quadrillion 372 trillion" ~
          " 36 billion 854 million 775 thousand 808");

  static assert((problematics.length % 2) == 0);

  static foreach (i, P; problematics) {
    static if (i % 2) {
      // This is a string; skip

    } else {
      // This is a problematic type
      static if (is (T == P)) {
        // Our T happens to be this problematic type
        if (number_ == T.min) {
          // and we are dealing with a problematic value
          result ~= problematics[i + 1];
          return result.data;
        }
      }
    }
  }

  auto number = cast(Unqual!T)number_; // Thanks 'in'! :p

  if (number == 0) {
    result ~= "zero";

  } else {
    if (number < 0) {
      result ~= "negative";
      static if (T.sizeof < int.sizeof) {
        // Being careful with implicit conversions. (See the dmd
        // command line switch -preview=intpromote)
        number = cast(T)(-cast(int)number);

      } else {
        number = -number;
      }
    }

    spellOutImpl(number, result);
  }

  return result.data.strip;
}

unittest {
  assert(1_001_500.spellOut == "1 million 1 thousand 500");
  assert((-1_001_500).spellOut ==
         "negative 1 million 1 thousand 500");
  assert(1_002_500.spellOut == "1 million 2 thousand 500");
}

import std.format : format;
import std.range : isOutputRange;

void spellOutImpl(T, O)(T number, ref O output)
if (isOutputRange!(O, char))
in (number > 0, format!"Invalid number: %s"(number)) {
  import std.range : retro;
  import std.format : formattedWrite;

  foreach (divider; dividers!T.retro) {
    const quotient = number / divider.value;

    if (quotient) {
      output.formattedWrite!" %s %s"(quotient, divider.word);
    }

    number %= divider.value;
  }
}

struct Divider(T) {
  T value;        // 1_000, 1_000_000, etc.
  string word;    // "thousand", etc
}

// Returns the words related with the provided size of an
// integral type. The parameter is number of bytes
// e.g. int.sizeof
auto words(size_t typeSize) {
  // This need not be recursive at all but it was fun using
  // recursion.
  final switch (typeSize) {
  case 1: return [ "" ];
  case 2: return words(1) ~ [ "thousand" ];
  case 4: return words(2) ~ [ "million", "billion" ];
  case 8: return words(4) ~ [ "trillion", "quadrillion", "quintillion" ];
  }
}

unittest {
  // These are relevant words for 'int' and 'uint' values:
  assert(words(4) == [ "", "thousand", "million", "billion" ]);
}

// Returns a Divider!T array associated with T
auto dividers(T)() {
  import std.range : array, enumerate;
  import std.algorithm : map;

  static const(Divider!T[]) result =
    words(T.sizeof)
    .enumerate!T
    .map!(t => Divider!T(cast(T)(10^^(t.index * 3)), t.value))
    .array;

  return result;
}

unittest {
  // Test a few entries
  assert(dividers!int[1] == Divider!int(1_000, "thousand"));
  assert(dividers!ulong[3] == Divider!ulong(1_000_000_000, "billion"));
}

void main() {
  version (test) {
    return;
  }

  import std.meta : AliasSeq;
  import std.stdio : writefln;
  import std.random : Random, uniform;
  import std.conv : to;

  static foreach (T; AliasSeq!(byte, ubyte, short, ushort,
                               int, uint, long, ulong)) {{
      // A few numbers for each type
      report(T.min);
      report((T.max / 4).to!T);  // Overcome int promotion for
                                 // shorter types because I want
                                 // to test with the exact type
                                 // e.g. for byte.
      report(T.max);
    }}

  enum count = 2_000_000;
  writefln!"Testing with %,s random numbers"(spellOut(count));

  // Use the same seed to be fair between compilations
  enum seed = 0;
  auto rnd = Random(seed);

  ulong totalLength;
  foreach (i; 0 .. count) {
    const number = uniform(int.min, int.max, rnd);
    const result = spellOut(number);
    totalLength += result.length;
  }

  writefln!("A meaningless number to prevent the compiler from" ~
            " removing the entire loop: %,s")(totalLength);
}

void report(T)(T number) {
  import std.stdio : writefln;
  writefln!"  %6s % ,s: %s"(T.stringof, number, spellOut(number));
}

Ali

January 25, 2022
On Tue, Jan 25, 2022 at 08:04:04PM +0000, Adam D Ruppe via Digitalmars-d-learn wrote:
> On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
> > ldc: ~0.95 seconds
> > gdc: ~0.79 seconds
> > dmd: ~1.77 seconds
> 
> Not surprising at all: gdc is excellent and underrated in the community.

The GCC optimizer is actually pretty darned good, comparable to LDC's. I only prefer LDC because of easier cross-compilation and more up-to-date language version (due to GDC being tied to GCC's release cycle). But I wouldn't hesitate to use gdc if I didn't need to cross-compile or use features from the latest language version.

DMD's optimizer is miles behind LDC/GDC, sad to say. About the only thing that keeps me using dmd is its lightning-fast compilation times, ideal for iterative development. For anything performance related, DMD isn't even on my radar.


T

-- 
Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye
January 25, 2022
On 1/25/22 12:01, Johan wrote:

> Have you tried `--enable-cross-module-inlining` with LDC?

Tried now. Makes no difference that I can sense, likely because there is only one module anyway. :) (But I guess it works over Phobos modules too.)

Ali
January 25, 2022
On 1/25/22 12:59, Daniel N wrote:

> Maybe you can try --ffast-math on ldc.

Did not make a difference.

Ali

January 25, 2022
On 1/25/22 12:42, H. S. Teoh wrote:

>> For a test run for 2 million numbers:
>>
>> ldc: ~0.95 seconds
>> gdc: ~0.79 seconds
>> dmd: ~1.77 seconds
>
> For measurements under 1 second, I'm skeptical of the accuracy, because
> there could be all kinds of background noise, CPU interrupts and stuff
> that could be skewing the numbers.  What about do a best-of-3-runs with
> 20 million numbers (expected <20 seconds per run) and see how the
> numbers look?

Makes sense. The results are similar to the 2 million run.

> But these sorts of statements are just generalizations. The best way to
> find out for sure is to disassemble the executable and see for yourself
> what the assembly looks like. :-)

I posted the program to have more eyes on the assembly. ;)

Ali

« First   ‹ Prev
1 2 3 4 5