Profile-guided optimization (PGO) (page 2)

On 10 Dec 2015, at 19:43, Johan Engelen via digitalmars-d-ldc wrote: > Clearly I was too optimistic about the quality of my work so far, hehe. Quite the contrary – you chose to start with the hard part (instrumentation-based instead of sampling-based), and DMD's AST is notoriously, uh, fluid in meaning and under-documented. — David

On Tuesday, 8 December 2015 at 22:41:22 UTC, David Nadlinger wrote: > On 8 Dec 2015, at 23:35, Johan Engelen via digitalmars-d-ldc wrote: >> Thanks a lot for the testcase you posted on Github. Will sink my teeth in fixing that first. > > You're welcome – I hope it's enough information to reproduce it, but I don't have a debug build of LLVM on this machine right now. After two more bug fixes: the regexp microbench now works. Results with the regexp bench (bench.d): > time ldc2 bench.d -O3 -of=bench_normal --> 52s time ./bench_normal --> 2.55s 98%cpu > time ldc2 bench.d -fprofile-instr-generate -of=bench_instr --> 11s time ./bench_instr --> 6.72s 99%cpu llvm-profdata merge default.profraw -o bench.profdata time ldc2 bench.d -O3 -fprofile-instr-use=bench.profdata -of=bench_pgo --> 48.35s time ./bench_pgo --> 2.48s 98%cpu (timing numbers for bench_normal and bench_pgo are +- 0.01) So PGO brings it from 2.55 to 2.48 sec, ~3% improvement. Disappointing, but well... it works!

On 11 Dec 2015, at 1:26, Johan Engelen via digitalmars-d-ldc wrote: > After two more bug fixes: the regexp microbench now works. > […] > Disappointing, but well... it works! Don't forget that this was just a random program I pulled from the Rosettacode compilation, though – I didn't have a benchmark ready where I know that branch prediction or inlining improvements would make a difference. — David

On Friday, 11 December 2015 at 00:26:22 UTC, Johan Engelen wrote: > So PGO brings it from 2.55 to 2.48 sec, ~3% improvement. > Disappointing, but well... it works! PGO can also reduce physical memory consumption due to less code loaded into memory.

On Friday, 11 December 2015 at 14:07:56 UTC, Kagamin wrote: > On Friday, 11 December 2015 at 00:26:22 UTC, Johan Engelen wrote: >> So PGO brings it from 2.55 to 2.48 sec, ~3% improvement. >> Disappointing, but well... it works! > > PGO can also reduce physical memory consumption due to less code loaded into memory. Yes, indeed. I'd like to find a good example of code where this shows up in practice, so that PGO really does improve performance significantly (say, >10%).

On 11 Dec 2015, at 1:38, Johan Engelen via digitalmars-d-ldc wrote: > What do you think about "llvm-profdata"? Should we ship that with LDC? Yes, we should probably ship it with the binary packages. For distro packages, we are of course dependent on the LLMV packages to include the tools, but at least the Homebrew package actually does. What is left to do before we can merge a first version into the main repository? A partial list: - Deal with the remaining FIXME comments (at least open separate GitHub issues for them), as well as with commented-out fragments from the Clang implementation. - Find some way to avoid ICE-type regressions on real-world D code, for example by building the druntime/Phobos unit tests with instrumentation on. - Decide on a name for the command line switches. The GCC-style "-f" prefix isn't currently used for most of the options, but that's not necessarily much of an argument. On a rather unrelated note, did you try whether the profile data also gives sensible results with llvm-cov? If yes, that might be something nice to mention in that upcoming announcement, even though we also have DMD-style -cov support, of course. — David

December 13, 2015

Re: Profile-guided optimization (PGO)

Posted by Johan Engelen
in reply to David Nadlinger

Permalink

Johan Engelen

Posted in reply to David Nadlinger

Permalink

On Sunday, 13 December 2015 at 13:21:33 UTC, David Nadlinger wrote:
>
> What is left to do before we can merge a first version into the main repository? A partial list:
>
>  - Deal with the remaining FIXME comments (at least open separate GitHub issues for them), as well as with commented-out fragments from the Clang implementation.

Yep :-)  That's what the FIXME comments are there for: so I don't forget :-)
The plan is to deal with all FIXME's, and remove the unused commented-out Clang fragments.

>  - Find some way to avoid ICE-type regressions on real-world D code, for example by building the druntime/Phobos unit tests with instrumentation on.

I just fixed two more ICEs, and now the dmd-testsuite succeeds with -fprofile-instr-generate. Running the druntime/Phobos unittests now with -fprofile-instr-generate.

Also, I added a pragma(LDC_profile_instr, true|false) to enable/disable instrumentation codegen for specific functions. My main reason for this is to help people speed up instrumented binaries, and it also helps circumventing ICEs.
See tests/ir/profile/pragma.d.
(perhaps you think of a better name for the pragma)

>  - Decide on a name for the command line switches. The GCC-style "-f" prefix isn't currently used for most of the options, but that's not necessarily much of an argument.

I have absolutely no preference here. I think we should do what the world is already familiar with. Iirc, -fprofile-instr-generate is a Clang option, and that Clang is moving towards / will support GCC's option naming (-fprofile-generate, -fprofile-use).
DMD has a -profile option, but I have not read up on what that will do.
I guess we will not add any option for PGO to ldmd2?

> On a rather unrelated note, did you try whether the profile data also gives sensible results with llvm-cov? If yes, that might be something nice to mention in that upcoming announcement, even though we also have DMD-style -cov support, of course.

Did not look into this at all yet. Clang's PGOGen code has some extra functions for gcov support and more. It's all commented out for now, but it looks like we can support more tools relatively easily with the current implementation. I also see hints of sampling-based PGO in the code, for example.

Another important TODO item: remove the profiling runtime from druntime, and instead add a separate runtime-profiling lib (suggestions for a name?  ldc-profile.lib?).

On 13 Dec 2015, at 14:59, Johan Engelen via digitalmars-d-ldc wrote: > I just fixed two more ICEs, and now the dmd-testsuite succeeds with -fprofile-instr-generate. Running the druntime/Phobos unittests now with -fprofile-instr-generate. Nice! > I have absolutely no preference here. I think we should do what the world is already familiar with. Iirc, -fprofile-instr-generate is a Clang option, and that Clang is moving towards / will support GCC's option naming (-fprofile-generate, -fprofile-use). Yeah, me neither. > DMD has a -profile option, but I have not read up on what that will do. It makes DMD's druntime emit some profiling info as text files (trace.def/trace.log) at program exit. This is for manual analysis only, no PGO-type functionality in sight. > I guess we will not add any option for PGO to ldmd2? Yeah, as DMD does not have any PGO functionality. Of course it will still pass the ldc2 options through. > Another important TODO item: remove the profiling runtime from druntime, and instead add a separate runtime-profiling lib (suggestions for a name? ldc-profile.lib?). Maybe add a "rt" suffix to make clear that this is the actual program runtime part? Ultimately does not really matter, though. — David

On Sunday, 13 December 2015 at 14:16:53 UTC, David Nadlinger wrote: > On 13 Dec 2015, at 14:59, Johan Engelen via digitalmars-d-ldc wrote: >> I guess we will not add any option for PGO to ldmd2? > > Yeah, as DMD does not have any PGO functionality. Of course it will still pass the ldc2 options through. Lol, I should try to read up on these simple things first... To do the tests, I had modified ldmd to recognize -fprofile-instr-generate...

Forums