Jump to page: 1 2
Thread overview
DMD and GDC are unnecessarily using heap allocations for closures
May 30, 2022
Siarhei Siamashka
May 30, 2022
user1234
May 30, 2022
Siarhei Siamashka
May 30, 2022
Iain Buclaw
May 30, 2022
deadalnix
May 30, 2022
Nicholas Wilson
May 30, 2022
Johan
May 30, 2022
max haughton
Jun 01, 2022
Siarhei Siamashka
May 30, 2022
max haughton
May 30, 2022
Iain Buclaw
May 31, 2022
Siarhei Siamashka
May 30, 2022

Consider the following example:

import std.algorithm, std.range, std.stdio;

long binary_search(long n, long val, long i) {
  return iota(1, i + 1).map!(x => n / x == val).assumeSorted.upperBound(false).length;
}

// A solution for https://atcoder.jp/contests/abc230/tasks/abc230_e
long solve(long n) {
  long ans = 0, i = n;
  while (i != 0) {
    long val = n / i;
    auto cnt = binary_search(n, val, i);
    ans += cnt * val;
    i -= cnt;
  }
  return ans;
}

void main() {
  long ans = 0;
  foreach (n ; 1 .. 100000)
    ans += solve(n);
  writeln(ans);
}

Benchmarks with GDC 11.2.0, LDC 1.27.1 (LLVM 12.0.0) and DMD 2.091.1:

$ dmd -O -g -release -inline test.d && time ./test
55836809328

real	0m10.654s
user	0m11.001s
sys	0m0.052s

$ gdc-11.2.0 -O3 -g -frelease -flto test.d && time ./a.out
55836809328

real	0m6.520s
user	0m6.519s
sys	0m0.000s


$ ldc2 -O -g -release test.d && time ./test
55836809328

real	0m1.904s
user	0m1.903s
sys	0m0.000s

LDC produces significantly faster code here and one of the major contributing factors is that LDC is able to avoid heap allocations (as can be confirmed by running a profiler). It is possible to force LDC to also use heap allocations via adding '--disable-gc2stack' option and the performance drops:

$ ldc2 -O -g -release --disable-gc2stack test.d && time ./test
55836809328

real	0m3.621s
user	0m3.620s
sys	0m0.000s

So only LDC is doing a proper job and lives up to its state of the art optimizing compiler reputation. But not everything is perfect even with LDC. In another thread https://forum.dlang.org/thread/t6rijv$nlb$1@digitalmars.com Steven Schveighoffer mentioned @nogc annotations. Now if I add @nogc attribute to 'binary_search' function, then LDC refuses to compile the source code:

$ ldc2 -O -g -release test.d && time ./test
test.d(3): Error: function `test.binary_search` is `@nogc` yet allocates closures with the GC
test.d(4):        test.binary_search.__lambda4 closes over variable n at test.d(3)

What do you think about all of this?

May 30, 2022

On Monday, 30 May 2022 at 06:47:24 UTC, Siarhei Siamashka wrote:

>

Consider the following example:

import std.algorithm, std.range, std.stdio;

[...]

maybe private long binary_search(long n, long val, long i){...} will make think other compilers that the stuff is only used localy ?

May 30, 2022

On Monday, 30 May 2022 at 07:42:29 UTC, user1234 wrote:

>

maybe private long binary_search(long n, long val, long i){...} will make think other compilers that the stuff is only used localy ?

The code can be massaged in various ways to reduce the allocation overhead. For example, the whole 'binary_search' function is not necessary and the code from it can be manually inlined (this reduces the allocation penalty).

So far I'm quite happy about the quality of the code generated by LDC and it has never failed me. But I wonder if the @nogc attribute can play a bit nicer with the GC2Stack IR optimization pass? For example, somehow postpone the error reporting in the frontend and only fail if GC2Stack is actually unable to eliminate all GC allocations in the @nogc code. Maybe I can report this to LDC developers.

Not sure if anything can be done about GDC. Iain does not seem to appreciate reports of this kind in GCC bugzilla and I don't want to annoy him unnecessarily.

May 30, 2022

On Monday, 30 May 2022 at 10:24:12 UTC, Siarhei Siamashka wrote:

>

Not sure if anything can be done about GDC. Iain does not seem to appreciate reports of this kind in GCC bugzilla and I don't want to annoy him unnecessarily.

Since when have I said that? Maybe I'm missing a context. :-)

IMO the DMD front-end should be fixed to not tell GDC and LDC to allocate a closure in the first place, then LDC wouldn't have the need for a custom pass to revert the GC allocations.

May 30, 2022

On Monday, 30 May 2022 at 10:24:12 UTC, Siarhei Siamashka wrote:

>

So far I'm quite happy about the quality of the code generated by LDC and it has never failed me. But I wonder if the @nogc attribute can play a bit nicer with the GC2Stack IR optimization pass? For example, somehow postpone the error reporting in the frontend and only fail if GC2Stack is actually unable to eliminate all GC allocations in the @nogc code. Maybe I can report this to LDC developers.

(LDC dev) I doubt this will be able to be fixed. @nogc is a frontend construct and would be difficult, if not impossible, to have it depend on the ability of backend optimisation passes that may or may not be able to remove all calls to the GC (if they are enabled at all).

LDC does have some specific semantic analysis passes but all of them are done after DMD's ones. GC2stack is an IR level optimisation not a frontend one and IR is only generated after all frontend semantic analysis is completed without error.

May 30, 2022

On Monday, 30 May 2022 at 10:24:12 UTC, Siarhei Siamashka wrote:

>

On Monday, 30 May 2022 at 07:42:29 UTC, user1234 wrote:

>

maybe private long binary_search(long n, long val, long i){...} will make think other compilers that the stuff is only used localy ?

The code can be massaged in various ways to reduce the allocation overhead. For example, the whole 'binary_search' function is not necessary and the code from it can be manually inlined (this reduces the allocation penalty).

So far I'm quite happy about the quality of the code generated by LDC and it has never failed me. But I wonder if the @nogc attribute can play a bit nicer with the GC2Stack IR optimization pass? For example, somehow postpone the error reporting in the frontend and only fail if GC2Stack is actually unable to eliminate all GC allocations in the @nogc code. Maybe I can report this to LDC developers.

The @nogc attribute is a language-level guarantee that no GC allocations will happen in that function. This guarantee can not depend on an optimization, unless any compiler implementation is required to implement that optimization also for unoptimized code: effectively making it no longer an "optimization", but just normal language behavior. (example: C++17 requires copy elision when a function returns a temporary object+ (unnamed object), but does not require it when a function returns a named object.)

In short: what you propose for LDC should not be done.

-Johan

May 30, 2022

On Monday, 30 May 2022 at 10:41:10 UTC, Johan wrote:

>

On Monday, 30 May 2022 at 10:24:12 UTC, Siarhei Siamashka wrote:

>

On Monday, 30 May 2022 at 07:42:29 UTC, user1234 wrote:

>

maybe private long binary_search(long n, long val, long i){...} will make think other compilers that the stuff is only used localy ?

The code can be massaged in various ways to reduce the allocation overhead. For example, the whole 'binary_search' function is not necessary and the code from it can be manually inlined (this reduces the allocation penalty).

So far I'm quite happy about the quality of the code generated by LDC and it has never failed me. But I wonder if the @nogc attribute can play a bit nicer with the GC2Stack IR optimization pass? For example, somehow postpone the error reporting in the frontend and only fail if GC2Stack is actually unable to eliminate all GC allocations in the @nogc code. Maybe I can report this to LDC developers.

The @nogc attribute is a language-level guarantee that no GC allocations will happen in that function. This guarantee can not depend on an optimization, unless any compiler implementation is required to implement that optimization also for unoptimized code: effectively making it no longer an "optimization", but just normal language behavior. (example: C++17 requires copy elision when a function returns a temporary object+ (unnamed object), but does not require it when a function returns a named object.)

In short: what you propose for LDC should not be done.

-Johan

Ideally the dmd frontend would have a nice reliable IR to do this analysis properly and then have a more aggressive notion of @nogc, but it doesn't unfortunately.

SDC does have an IR but it's a long way away from being able to worry about this kind of issue. Hypothetically it would be much more feasible to drop into an IR where you can do dataflow analysis properly then go back to an AST.

Also, the semantics can be specified as a set of test cases any D compiler must be able to resolve without closures. That way you can rely on it but don't have to force the implementations to do a certain thing (other than doing something at all).

I did fiddle around with getting GCC to do this but I'm not sure it can be done without breaking a sweat.

May 30, 2022

On Monday, 30 May 2022 at 06:47:24 UTC, Siarhei Siamashka wrote:

>

Consider the following example:

import std.algorithm, std.range, std.stdio;

[...]

Also it's worth noting you can actually make some of these range patterns nogc via using zip(repeat(closureVar), blah).map

May 30, 2022

On Monday, 30 May 2022 at 10:34:18 UTC, Iain Buclaw wrote:

>

On Monday, 30 May 2022 at 10:24:12 UTC, Siarhei Siamashka wrote:

>

Not sure if anything can be done about GDC. Iain does not seem to appreciate reports of this kind in GCC bugzilla and I don't want to annoy him unnecessarily.

Since when have I said that? Maybe I'm missing a context. :-)

IMO the DMD front-end should be fixed to not tell GDC and LDC to allocate a closure in the first place, then LDC wouldn't have the need for a custom pass to revert the GC allocations.

It still would benefit from it. All the other transformation that LLVM will apply to the code are likely to uncover opportunities to revert GC allocations. For instance:

auto foo() {
  return new Object();
}

void bar() {
  foo();
}

Nothing in the code "as this" allow you to remove the allocation, but once foo is inlined in bar, then the object can be allocated on stack (and then optimized away :P).

Maybe this is trivial enough so that it could realistically be done in the front end, but I think this makes the point well enough: optimizations often do uncover other optimization opportunities.

May 30, 2022

On Monday, 30 May 2022 at 06:47:24 UTC, Siarhei Siamashka wrote:

>

$ gdc-11.2.0 -O3 -g -frelease -flto test.d && time ./a.out
55836809328

real 0m6.520s
user 0m6.519s
sys 0m0.000s

What do you think about all of this?

Out of curiosity, are you linking in phobos statically or dynamically? You can force either with -static-libphobos or -shared-libphobos.

« First   ‹ Prev
1 2