Thread overview
An easy-way to profile apps visually
Dec 15, 2022
Guillaume Piolat
Dec 17, 2022
Hipreme
Dec 18, 2022
Guillaume Piolat
Dec 18, 2022
Hipreme
Dec 18, 2022
Guillaume Piolat
December 15, 2022

tl;dr Simply generates a JSON that follows the Trace Event Format.

Trace Event Format is a simple JSON format that is then read by web apps like:

You can get images of your instrumented program like that very easily:
https://imgur.com/a/q4Uwosz

Surprisingly, TLS really shines there, since you can collect the JSON trace in a thread-local manner and concatenate the output at the end. Though, the reallocs will get more and more expensive as time goes by. The profile size balloons easily.

All in all I think explicit frame profiling like that is a valuable alternative to either sampling or instrumentation profiler. At least you can finally visualize parallelism and how much of it is synchronization.

Profiler implementation in dplug:gui => https://github.com/AuburnSounds/Dplug/blob/master/gui/dplug/gui/profiler.d (haven't tested outside Windows for now... I was surprised synchronization stuff was relatively lightweight), it would be a small deal of work to strip it of its library.

December 17, 2022

On Thursday, 15 December 2022 at 14:14:01 UTC, Guillaume Piolat wrote:

>

tl;dr Simply generates a JSON that follows the Trace Event Format.

[...]

What are your thoughts about that? Do you think is it worth? Or is the proposal totally different? I have been using AMD uProf and I have been good results with it

December 18, 2022

On Saturday, 17 December 2022 at 10:24:29 UTC, Hipreme wrote:

>

What are your thoughts about that? Do you think is it worth? Or is the proposal totally different? I have been using AMD uProf and I have been good results with it

I think sampling profilers are good for finding places where you spend CPU, and "frame" profiling is good to find parallelization opportunities and latency improvements.

December 18, 2022

On Sunday, 18 December 2022 at 14:25:45 UTC, Guillaume Piolat wrote:

>

On Saturday, 17 December 2022 at 10:24:29 UTC, Hipreme wrote:

>

What are your thoughts about that? Do you think is it worth? Or is the proposal totally different? I have been using AMD uProf and I have been good results with it

I think sampling profilers are good for finding places where you spend CPU, and "frame" profiling is good to find parallelization opportunities and latency improvements.

When the "frame profiling" could show you a parallelization opportunity? I'm thinking on how I could apply that in my context

December 18, 2022

On Sunday, 18 December 2022 at 14:59:12 UTC, Hipreme wrote:

>

When the "frame profiling" could show you a parallelization opportunity? I'm thinking on how I could apply that in my context

For example in my image example, I never had the idea before that first draw of the background widget could load two images at the same time in order to save first open time.

When you program in CUDA, it's very similar with the nvidia profiler, and majorly easier to optimize for the bottleneck.