tl;dr Simply generates a JSON that follows the Trace Event Format.
Trace Event Format is a simple JSON format that is then read by web apps like:
You can get images of your instrumented program like that very easily:
Surprisingly, TLS really shines there, since you can collect the JSON trace in a thread-local manner and concatenate the output at the end. Though, the reallocs will get more and more expensive as time goes by. The profile size balloons easily.
All in all I think explicit frame profiling like that is a valuable alternative to either sampling or instrumentation profiler. At least you can finally visualize parallelism and how much of it is synchronization.
Profiler implementation in dplug:gui => https://github.com/AuburnSounds/Dplug/blob/master/gui/dplug/gui/profiler.d (haven't tested outside Windows for now... I was surprised synchronization stuff was relatively lightweight), it would be a small deal of work to strip it of its library.