Jump to page: 1 2 3
Thread overview
Profile-guided optimization (PGO)
Dec 08, 2015
Johan Engelen
Dec 08, 2015
David Nadlinger
Dec 08, 2015
Johan Engelen
Dec 08, 2015
David Nadlinger
Dec 11, 2015
Johan Engelen
Dec 11, 2015
David Nadlinger
Dec 11, 2015
Johan Engelen
Dec 13, 2015
David Nadlinger
Dec 13, 2015
Johan Engelen
Dec 13, 2015
David Nadlinger
Dec 13, 2015
Johan Engelen
Jan 10, 2016
Johan Engelen
Dec 11, 2015
Kagamin
Dec 11, 2015
Johan Engelen
Dec 10, 2015
David Nadlinger
Dec 10, 2015
Johan Engelen
Dec 10, 2015
David Nadlinger
Dec 10, 2015
Liran Zvibel
Dec 23, 2015
Kagamin
Dec 23, 2015
Johan Engelen
Dec 08, 2015
Johan Engelen
Dec 10, 2015
Johan Engelen
Dec 10, 2015
David Nadlinger
Dec 23, 2015
Johan Engelen
December 08, 2015
Hi all,
  I have been working on getting rudimentary PGO going in LDC. It's pretty much ready! [1]
(does not work on Windows yet... I have to fix LLVM's compile-rt code)

I've implemented something very similar to Clang: LDC uses profile information (generated by an instrumented executable built by LDC) to tag each branch in the code with branch weights. The actual optimizations are done by LLVM; at the moment LDC only adds metadata to the IR.

At this point, I want your input: commandline option naming, easy to use? (llvm-profdata is needed...), do you get substantial performance boosts, runtime library inclusion or separate lib for profile data file writing, bugs, uninstrumented branches/switches, etc.
All comments are welcome (please be kind ;-).

Before I announce it in the "Announce" forum, I want to hear your thoughts first.

Thanks!
  Johan


[1]
http://wiki.dlang.org/LDC_LLVM_profiling_instrumentation#Profile-Guided_Optimization_.28PGO.29_status_in_LDC
December 08, 2015
Hi Johan,

On 8 Dec 2015, at 20:13, Johan Engelen via digitalmars-d-ldc wrote:
> I've implemented something very similar to Clang: LDC uses profile information (generated by an instrumented executable built by LDC)

Did you also try using it with sample profiles acquired by an external profiler yet, as described in the Clang page on PGO?

 — David
December 08, 2015
On Tuesday, 8 December 2015 at 19:13:41 UTC, Johan Engelen wrote:
> (does not work on Windows yet... I have to fix LLVM's compile-rt code)

I fixed a nasty [*] bug in compile-rt's profile writing code, and now it also works on Windows. (The IR tests fail on Windows because running a compiled executable from LIT fails for some reason on Windows.)

[*] https://stackoverflow.com/questions/5537066/strange-0x0d-being-added-to-my-binary-file
Now I know what to look for first if I see 0x0D's in my files...
December 08, 2015
On Tuesday, 8 December 2015 at 20:08:15 UTC, David Nadlinger wrote:
> 
> Did you also try using it with sample profiles acquired by an external profiler yet, as described in the Clang page on PGO?

Hi David,
  No, I have not look at that yet.
Thanks a lot for the testcase you posted on Github. Will sink my teeth in fixing that first.
December 08, 2015
On 8 Dec 2015, at 23:35, Johan Engelen via digitalmars-d-ldc wrote:
> Thanks a lot for the testcase you posted on Github. Will sink my teeth in fixing that first.

You're welcome – I hope it's enough information to reproduce it, but I don't have a debug build of LLVM on this machine right now.

 — David
December 10, 2015
On Tuesday, 8 December 2015 at 22:35:11 UTC, Johan Engelen wrote:
> Thanks a lot for the testcase you posted on Github. Will sink my teeth in fixing that first.

Speaking of test cases: This might be an obvious and/or stupid suggestion, but did you try building the Phobos unit tests (and maybe also dmd-testsuite/runnable) with PGO? I'd suspect it would give you quite a broad coverage of basic language constructs.

 - David
December 10, 2015
On Tuesday, 8 December 2015 at 20:08:15 UTC, David Nadlinger wrote:
> Did you also try using it with sample profiles acquired by an external profiler yet, as described in the Clang page on PGO?

Let me add that this would probably be something nice to have for the initial release, as users could fall back to using perf, etc. if the instrumentation part is still buggy or incomplete for their code.

 - David
December 10, 2015
Also, for use cases like ours, where the system runs for extended periods of time, and optimizing the init time, which may be minutes is not interesting at all, just being able to run perf while the system is doing something interesting to improve is a big plus.

Liran

> On Dec 10, 2015, at 16:30, David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc@puremagic.com> wrote:
> 
> On Tuesday, 8 December 2015 at 20:08:15 UTC, David Nadlinger wrote:
>> Did you also try using it with sample profiles acquired by an external profiler yet, as described in the Clang page on PGO?
> 
> Let me add that this would probably be something nice to have for the initial release, as users could fall back to using perf, etc. if the instrumentation part is still buggy or incomplete for their code.
> 
> - David


December 10, 2015
On Thursday, 10 December 2015 at 14:27:59 UTC, David Nadlinger wrote:
>
> Speaking of test cases: This might be an obvious and/or stupid suggestion, but did you try building the Phobos unit tests (and maybe also dmd-testsuite/runnable) with PGO? I'd suspect it would give you quite a broad coverage of basic language constructs.

Nope didn't do that yet :S :S   Looks like it is needed to iron out some remaining bugs.

I underestimated the complexity of D's AST (some objects are placed in multiple locations in the AST?), which gave rise to an assertion fail in your testcase; plus I forgot to add throw statements to the AST tree walker, leading to another assertion fail. Those issues have been fixed now, and now it breaks with the same error you found. It is confusing because I did not (mean to) change any of the codegen, other than adding counter increment instructions and branch instruction metadata (both trivial additions).  But I did have to add extra basicblocks for switch statements... perhaps I can search there first.
Hope to have a resolution for your test case quickly.

I also have not tested at all how this works with multiple object files linked together, or other possibly more complicated things. I thought a fun testcase would be to compile DDMD with PGO enabled, compile itself as a profiling run, rebuild with PGO and test if compiling, say, Phobos is quicker/slower.

I am very curious to see what constructs will see a significant performance boost, if any at all.
December 10, 2015
On Tuesday, 8 December 2015 at 19:13:41 UTC, Johan Engelen wrote:
>
> Before I announce it in the "Announce" forum, I want to hear your thoughts first.

Clearly I was too optimistic about the quality of my work so far, hehe.
« First   ‹ Prev
1 2 3