Thread overview
Compiler flags when profiling a build
Oct 25, 2020
Per Nordlöw
Oct 25, 2020
Guillaume Piolat
Oct 25, 2020
Per Nordlöw
Oct 25, 2020
Gregory II
Oct 25, 2020
Guillaume Piolat
Oct 25, 2020
Guillaume Piolat
Oct 25, 2020
Johan Engelen
Oct 26, 2020
Jacob Carlborg
Oct 26, 2020
Per Nordlöw
October 25, 2020
What flags do you feed to dmd/ldc when you profile a build?

Do you initially

- compile in debug or release mode?
- activate inlining or not?
- use dmd or ldc?
- use any other alternatives not mentioned above?
October 25, 2020
On Sunday, 25 October 2020 at 14:08:21 UTC, Per Nordlöw wrote:
> What flags do you feed to dmd/ldc when you profile a build?
>
> Do you initially
>


> - compile in debug or release mode?

dub -b release-debug   # optimizations AND debug information

If you want no bounds check you can make a custom build type in dub.


> - activate inlining or not?

Inlining on.

> - use dmd or ldc?

LDC


> - use any other alternatives not mentioned above?

The AMD profiler is a very nice alternative to Intel Amplifier.
If you don't provide debug info then the line information will be wrong.
Automate your comparisons to improve statistical significance etc.
October 25, 2020
On Sunday, 25 October 2020 at 14:34:54 UTC, Guillaume Piolat wrote:
> dub -b release-debug   # optimizations AND debug information

Why not use dmd's own `-profile` flag? Is too intrusive on performance? I've noticed a massive slow-down with about a magnitude.

> The AMD profiler is a very nice alternative to Intel Amplifier.

I'm sitting on Linux. What is the preferred open alternative there? What are the pros and cons of the choices oprofile, gprof, sysprof, ...? Does anybody have a good comparison chart?

I want to profile an application that parser files into ASTs and generates text from those AST. Current bottleneck is currently AST-node allocations.
October 25, 2020
On Sunday, 25 October 2020 at 14:47:50 UTC, Per Nordlöw wrote:
> On Sunday, 25 October 2020 at 14:34:54 UTC, Guillaume Piolat wrote:
>> dub -b release-debug   # optimizations AND debug information
>
> Why not use dmd's own `-profile` flag? Is too intrusive on performance? I've noticed a massive slow-down with about a magnitude.

I've never used the flag but I don't imagine it is going to be any good anyways. Anything that adds to your own programs computation is going to skew it.

If you aren't profiling the final build, eg LDC with all optimizations and inlining on, then your optimizations are kind of pointless.

>> The AMD profiler is a very nice alternative to Intel Amplifier.
>
> I'm sitting on Linux. What is the preferred open alternative there? What are the pros and cons of the choices oprofile, gprof, sysprof, ...? Does anybody have a good comparison chart?
>
> I want to profile an application that parser files into ASTs and generates text from those AST. Current bottleneck is currently AST-node allocations.

Both AMD's and Intel's profilers support Linux. They give you access to hardware counters, at the very least you won't find profilers with more information than these. Pick whichever processor you have.

https://developer.amd.com/amd-uprof/
https://software.intel.com/content/www/us/en/develop/tools/vtune-profiler.html

I also use tracy, which integrates with your code to give you a more accurate picture. You have to write your own D wrapper though, and place additional code such as mixins to the functions you want to profile. Unlike D's -profile you can choose what to enable and narrow it down to a specific issue you are trying to fix. So that it doesn't slow your run times too much.

https://github.com/wolfpld/tracy


October 25, 2020
On Sunday, 25 October 2020 at 14:47:50 UTC, Per Nordlöw wrote:
>
> Why not use dmd's own `-profile` flag? Is too intrusive on performance? I've noticed a massive slow-down with about a magnitude.

1. Because LDC doesn't have -profile and changing backends is an important an easy optimization.

2. Quoting https://en.wikipedia.org/wiki/Profiling_(computer_programming)#Data_granularity_in_profiler_types :

> In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden.
October 25, 2020
On Sunday, 25 October 2020 at 19:58:29 UTC, Guillaume Piolat wrote:
>
> 1. Because LDC doesn't have -profile and changing backends is an important an easy optimization.

Erratum: ldc does have -profile, at least it works in ldmd2 so should map to a ldc2 flag.
October 25, 2020
On Sunday, 25 October 2020 at 20:08:52 UTC, Guillaume Piolat wrote:
> On Sunday, 25 October 2020 at 19:58:29 UTC, Guillaume Piolat wrote:
>>
>> 1. Because LDC doesn't have -profile and changing backends is an important an easy optimization.
>
> Erratum: ldc does have -profile, at least it works in ldmd2 so should map to a ldc2 flag.

LDC has --fdmd-trace-functions resp. --finstrument-functions for DMD resp. basic GCC-style profiling.

cheers,
  Johan

October 26, 2020
On Sunday, 25 October 2020 at 14:08:21 UTC, Per Nordlöw wrote:
> What flags do you feed to dmd/ldc when you profile a build?
>
> Do you initially
>
> - compile in debug or release mode?
> - activate inlining or not?
> - use dmd or ldc?
> - use any other alternatives not mentioned above?

For maximum performance you should compile with LDC and do the following:

* Enable optimizations: -O3
* Enable Link Time Optimizations (LTO): --flto=full
* Link against druntime and Phobos compiled for LTO: --defaultlib=druntime-ldc-lto,phobos2-ldc-lto
* Target your specific CPU instead of some generic CPU. This will enable SSE and other features that otherwise are disabled: -mcpu=native

* Depending on you're preferences, you might want to disable asserts, contracts and invariants and bounds checks in non-@safe functions: --release
* You can also control the bounds check in more details using this flag: --boundscheck

* For profiling I assume you want debug symbols as well: -g

--
/Jacob Carlborg
October 26, 2020
On Monday, 26 October 2020 at 10:18:43 UTC, Jacob Carlborg wrote:
> For maximum performance you should compile with LDC and do the following:

Thanks