Thread overview
Missed optimization?
Oct 21, 2016
Oct 21, 2016
David Nadlinger
Oct 22, 2016
David Nadlinger
Oct 22, 2016
Nov 01, 2016
October 21, 2016
I was experimenting with pure functions, exceptions and inlining and I noticed that LDC seems to generate inferior code to GDC in the cases I was considering.

For example:!((compiler:ldc,options:'-release+-O3+-boundscheck%3Doff',sourcez:PTBuFMCcGcEsHsB2ACAvMgLgC0vA7gDbjTQDcAUAUgObICGAJgwIwAUVitdAlMgA4BXSOHIBvcsmSwAZslZ006DtQB0AWzoAPbhOTiIMBIlbZchYtB2TJwjEJTL1WiuALQR1zDnzJE4PMgAopoAxuB8GEasAETR3BSSAL66tvb0yADUyMwUyZQ0yNLw8KywamoCGHQARkTIygDaALr0OuI24HaQKHQNAAxNKowsmfT9g8M55HnKyNV0kKXllTV1jS08Yrqz1QA0IQmYkACeerqSGCdnnpLVaGMDQ0xTnonIIXQYIVh6c/cALAAmUjIZKeEL3XqPSaHMHvT7fX4Q9CAgDMILhqW6cwyB2m%2BU4czoAC8lhUqrVwPUaM1WltJDt9odLqd2tY7ugoRNnodJMiHtyWLDdB8vj9RH90ECQfy0RiUp00tVcbkCbQAI4CTRklaU6mcWmbNmMvEXK5s26Q8ZPIXnI6su18q3Qnl2t6ixESjnIaWg5AgZACRDCOjfVYeUEihHiyU%2B4Hve5yyMdLooZV45JAA%3D%3D)),filterAsm:(commentOnly:!t,directives:!t,labels:!t),version:3

Compared to:

Even in the simplest case (function foo) contains duplicated memory load and comparison from the add1 function.

Changing to the throwless version LDC generates similar code for all cases.


I was thinking about autodecoding and wondering whether it was possible for the following example:
auto decode(immutable char[] s) pure
struct Result { dchar codepoint; uint advance; }
// throw Exception if s is invalid / erroneous
// return Result (decoded codepoint, number of chars to advance);
dchar front(string s) { return s.decode.codepoint; }
void popFront(ref string s) { s = s[s.decode.advance .. $];

ulong loop(string s)
  ulong checksum;
  while (s.length)
    checksum += s.front;
  return checksum;

To optimize the loop function (using inlining and purity):

ulong loop(string s)
  ulong checksum;
  while (s.length)
    auto tmp = s.decode
    checksum += tmp.codepoint;
    s = [tmp.advance .. $];
  return checksum;
October 21, 2016
On 21 Oct 2016, at 22:14, safety0ff via digitalmars-d-ldc wrote:
> I was experimenting with pure functions, exceptions and inlining and I noticed that LDC seems to generate inferior code to GDC in the cases I was considering.
> […]

That's indeed a bit surprising. The generated IR looks like this:

define i64 @_D4test3fooFyAlZl({ i64, i64* } %a_arg) local_unnamed_addr #0 comdat {
  %.getAddressOf_dump.i2 = alloca { i64, i8* }, align 8 ; [#uses = 4, size/byte = 16]
  %.getAddressOf_dump.i = alloca { i64, i8* }, align 8 ; [#uses = 4, size/byte = 16]
  %a_arg.fca.1.extract = extractvalue { i64, i64* } %a_arg, 1 ; [#uses = 2]
  %1 = load i64, i64* %a_arg.fca.1.extract, align 8 ; [#uses = 2]
  %2 = bitcast { i64, i8* }* %.getAddressOf_dump.i to i8* ; [#uses = 2]
  call void @llvm.lifetime.start(i64 16, i8* %2)
  %3 = icmp eq i64 %1, 9223372036854775807        ; [#uses = 1]
  br i1 %3, label %if.i, label %_D4test4add1FNalZl.exit


_D4test4add1FNalZl.exit:                          ; preds = %0
  call void @llvm.lifetime.end(i64 16, i8* %2)
  %9 = load i64, i64* %a_arg.fca.1.extract, align 8 ; [#uses = 2]
  %10 = bitcast { i64, i8* }* %.getAddressOf_dump.i2 to i8* ; [#uses = 2]
  call void @llvm.lifetime.start(i64 16, i8* %10)
  %11 = icmp eq i64 %9, 9223372036854775807       ; [#uses = 1]
  br i1 %11, label %if.i8, label %_D4test4add1FNalZl.exit9

There doesn't seem to be any stores which would potentially make %9 not foldable with %1. Tracking this down might be an interesting trip down the LLVM pass pipeline.

 — David
October 22, 2016
On Friday, 21 October 2016 at 22:19:26 UTC, David Nadlinger wrote:
> Tracking this down might be an interesting trip down the LLVM pass pipeline.

I forgot to mention: Could you please create an entry on for this?

 — David
October 22, 2016
On Saturday, 22 October 2016 at 01:07:02 UTC, David Nadlinger wrote:
> I forgot to mention: Could you please create an entry on for this?
>  — David

November 01, 2016
On Friday, 21 October 2016 at 21:14:56 UTC, safety0ff wrote:
> Hello,
> I was experimenting with pure functions, exceptions and inlining and I noticed that LDC seems to generate inferior code to GDC in the cases I was considering.
> [...]

what about?