Optimization fun (page 3)

On Fri, Nov 07, 2014 at 05:31:44PM -0800, Walter Bright via Digitalmars-d wrote: > On 11/7/2014 4:41 PM, H. S. Teoh via Digitalmars-d wrote: [...] > >But speaking of which, I found dmd -profile's output in trace.log a little difficult to understand because of the lack of self-documenting headers in the first section. gprof, for example, produces nicely-labelled output that describes what the data means. Would you accept a PR along these lines? > > I don't really know what you have in mind, so I'll have to say "it depends". Keep in mind that subsequent runs of the profiler parse the previous output file and add it to the results. Just add a simple header that describes what each column means. > >Also, in the second section, I'm getting some negative numbers for the call times, which seem to indicate integer overflow? This basically destroys the usefulness of dmd -profile for my test cases, since most of the interesting cases for me are those with long running times, which for sure will run into integer overflow at the sample rate that dmd -profile is using, causing the second section to be essentially randomly sorted. > > Yeah, a negative number is likely an overflow. Reduce the test case! Unfortunately, I can't. The behaviour I'm trying to profile is the long-term running case, because there's a whole bunch of setup at the beginning that initially takes a while, but eventually it's the long-running part that dominates the runtime behaviour. Reducing the test case will only increase the initial noise, which I'm not interested in, and reduce the running time of the main loop of interest. Besides, what I'm trying to optimize for is the large, complex case; the simple cases already run fast enough that the performance characteristics are not that interesting to me. It's the long-term part that's interesting because that's when the GC starts kicking in and pressure on system RAM starts to increase. I'm surprised that dmd's profiler can't even handle something that only runs for 7-8 seconds or so! gprof certainly has no such limitation. Is it relatively simple to make dmd -profile use larger integer widths for profiling? If not, I'm afraid I'm just gonna have to stick with gdc/gprof instead. T -- Your inconsistency is the only consistent thing about you! -- KD

On 11/7/2014 5:51 PM, H. S. Teoh via Digitalmars-d wrote: > I'm surprised that dmd's profiler can't even handle something that only > runs for 7-8 seconds or so! It's based on a design I wrote decades ago, when machines were a lot slower. > Is it relatively simple to make dmd -profile use larger integer widths > for profiling? I don't know. Haven't looked at it for a while.

November 08, 2014

Re: Optimization fun

Posted by Kiith-Sa
in reply to H. S. Teoh

Permalink

Kiith-Sa

Posted in reply to H. S. Teoh

Permalink

On Saturday, 8 November 2014 at 01:53:33 UTC, H. S. Teoh via Digitalmars-d wrote:
> On Fri, Nov 07, 2014 at 05:31:44PM -0800, Walter Bright via Digitalmars-d wrote:
>> On 11/7/2014 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
> [...]
>> >But speaking of which, I found dmd -profile's output in trace.log a
>> >little difficult to understand because of the lack of
>> >self-documenting headers in the first section. gprof, for example,
>> >produces nicely-labelled output that describes what the data means.
>> >Would you accept a PR along these lines?
>> 
>> I don't really know what you have in mind, so I'll have to say "it
>> depends".  Keep in mind that subsequent runs of the profiler parse the
>> previous output file and add it to the results.
>
> Just add a simple header that describes what each column means.
>
>
>> >Also, in the second section, I'm getting some negative numbers for
>> >the call times, which seem to indicate integer overflow? This
>> >basically destroys the usefulness of dmd -profile for my test cases,
>> >since most of the interesting cases for me are those with long
>> >running times, which for sure will run into integer overflow at the
>> >sample rate that dmd -profile is using, causing the second section to
>> >be essentially randomly sorted.
>> 
>> Yeah, a negative number is likely an overflow. Reduce the test case!
>
> Unfortunately, I can't. The behaviour I'm trying to profile is the
> long-term running case, because there's a whole bunch of setup at the
> beginning that initially takes a while, but eventually it's the
> long-running part that dominates the runtime behaviour. Reducing the
> test case will only increase the initial noise, which I'm not interested
> in, and reduce the running time of the main loop of interest. Besides,
> what I'm trying to optimize for is the large, complex case; the simple
> cases already run fast enough that the performance characteristics are
> not that interesting to me. It's the long-term part that's interesting
> because that's when the GC starts kicking in and pressure on system RAM
> starts to increase.
>
> I'm surprised that dmd's profiler can't even handle something that only
> runs for 7-8 seconds or so! gprof certainly has no such limitation.
>
> Is it relatively simple to make dmd -profile use larger integer widths
> for profiling? If not, I'm afraid I'm just gonna have to stick with
> gdc/gprof instead.
>
>
> T

Except for very specific cases, neither gprof or DMD's profiler are good for profiling. If you're on Linux, you have perf, which works well with D and is way ahead of anything the DMD profiler will be able to do *after* man-years of further development.

See (shameless advertisement):

http://defenestrate.eu/_static/profiling-slides/index.html#22

(especially perf record/perf report)

08-Nov-2014 03:22, Walter Bright пишет: > On 11/7/2014 2:58 PM, Dmitry Olshansky wrote: >> That's the problem with profilers: >> they say what takes time but not why :) >> >> Often I find myself looking at memcpy at the top of the list, so >> obvious the >> "textbook" answer is to optimize memcpy ;) In contrast it should be >> read as "you >> seem to do excessive copying of data". > > dmd's profiler will give you a tree showing who called each function how > many times. This helps a lot with the "why" question. > This information is very limited esp in case of recursive functions, it's still more of what instead of why. "Why" always takes a bit of research. Also I'd suggest to use many different profilers. -- Dmitry Olshansky

On Sat, Nov 08, 2014 at 02:29:25AM +0000, Kiith-Sa via Digitalmars-d wrote: [...] > Except for very specific cases, neither gprof or DMD's profiler are good for profiling. If you're on Linux, you have perf, which works well with D and is way ahead of anything the DMD profiler will be able to do *after* man-years of further development. > > See (shameless advertisement): > > http://defenestrate.eu/_static/profiling-slides/index.html#22 > > (especially perf record/perf report) The slides mention not to use gprof, but never explain why? Also, the recommendations are kinda not what I'm looking for; I'm not looking so much for interactive performance (trying to optimize branch predictions, etc.), because I'm anticipating that in the future, most of the problems I'll be running will be I/O bound because the size of the problems will *far* exceed available RAM. Mispredicted branches or L1/L2 cache misses are kinda irrelevant in this scenario. Or at least, they're not as important as they're made out to be, since what will matter more is how to minimize disk seeks and how to keep frequently-accessed cached disk pages in the CPU cache. So it's still somewhat relevant, just not as much as other applications like interactive games. T -- Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder

Forums