Thread overview | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 22, 2013 Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=9787 This can be a fun little project, with a nice payoff. Any takers? |
March 25, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Mar 22, 2013, at 3:39 PM, Walter Bright <newshound2@digitalmars.com> wrote:
> http://d.puremagic.com/issues/show_bug.cgi?id=9787
>
> This can be a fun little project, with a nice payoff. Any takers?
Bonus points if the code is made multithread capable. When I was thinking about this before, the correct approach seemed to be tracking profile data on a per-thread basis and then merging results into final on thread termination.
|
March 25, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Friday, 22 March 2013 at 22:39:56 UTC, Walter Bright wrote: > http://d.puremagic.com/issues/show_bug.cgi?id=9787 > > This can be a fun little project, with a nice payoff. Any takers? This is a bit off-topic, but: A polling profiler would be more precise and efficient than an instrumenting profiler. A polling profiler simply periodically pauses the program thread, records its state, and resumes it. The advantage is that execution times of small functions are not skewed by the overhead added by instrumentation. A polling profiler runs mostly in its own thread, so it has a smaller impart on the main program thread. A polling profiler is also capable of measuring performance down to a CPU-instruction level. The disadvantages of a polling profiler are: 1. The program must run for a considerable amount of time, so the profiler gathers enough samples to build a good picture of the program's performance; 2. As a consequence of the above, functions that execute quickly / are called relatively rarely may not appear in the profiler's output at all; 3. Stack frames must be enabled, to be able to collect call stacks. On Windows, I've had good success with compiling D programs with -g, converting their debug information using cv2pdb[1], then profiling them using Very Sleepy - I have a fork of it with some enhancements[2]. [1]: http://dsource.org/projects/cv2pdb [2]: http://blog.thecybershadow.net/2013/01/11/very-sleepy-fork/ |
March 25, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On 3/25/2013 4:22 PM, Vladimir Panteleev wrote:
> The disadvantages of a polling profiler are:
4. not getting the fan in / fan out data.
5. requires non-trivial effort in getting it to work on each platform.
|
March 26, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Monday, 25 March 2013 at 23:52:26 UTC, Walter Bright wrote:
> On 3/25/2013 4:22 PM, Vladimir Panteleev wrote:
>> The disadvantages of a polling profiler are:
>
> 4. not getting the fan in / fan out data.
It is assembled from collected stack frames (assuming I understood the term correctly).
|
March 26, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On 3/25/2013 7:26 PM, Vladimir Panteleev wrote:
> On Monday, 25 March 2013 at 23:52:26 UTC, Walter Bright wrote:
>> On 3/25/2013 4:22 PM, Vladimir Panteleev wrote:
>>> The disadvantages of a polling profiler are:
>>
>> 4. not getting the fan in / fan out data.
>
> It is assembled from collected stack frames (assuming I understood the term
> correctly).
While you can get the caller (after all, debuggers do it), it can be arbitrarily costly (in terms of execution speed) to do so, which can negate many of the advantages of a probing profiler. The ones I've seen didn't bother to do it.
Fan in/out is very useful because the most effective optimization is to not call the time consuming functions, and this path information enables you to figure out where you don't really need to call it.
|
March 26, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Tuesday, 26 March 2013 at 02:57:07 UTC, Walter Bright wrote: > While you can get the caller (after all, debuggers do it), it can be arbitrarily costly (in terms of execution speed) to do so, which can negate many of the advantages of a probing profiler. What? You just read the value EBP is pointing at, or something like that. Walking the call stack is basically walking a linked list. > The ones I've seen didn't bother to do it. Maybe they just weren't very good profilers ;) I've tried a few before I found Very Sleepy. > Fan in/out is very useful because the most effective optimization is to not call the time consuming functions, and this path information enables you to figure out where you don't really need to call it. Who's arguing that? |
March 26, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On 3/25/2013 8:01 PM, Vladimir Panteleev wrote: > On Tuesday, 26 March 2013 at 02:57:07 UTC, Walter Bright wrote: >> While you can get the caller (after all, debuggers do it), it can be >> arbitrarily costly (in terms of execution speed) to do so, which can negate >> many of the advantages of a probing profiler. > > What? You just read the value EBP is pointing at, or something like that. > Walking the call stack is basically walking a linked list. If only it were that simple. 1. many stack frames do not have an EBP 2. the stack frames on Win64 require doing a bunch of table searches to figure out - they don't use EBP 3. even when you find the return address, then it's a costly process to figure out what function that address belongs in >> The ones I've seen didn't bother to do it. > > Maybe they just weren't very good profilers ;) I've tried a few before I found > Very Sleepy. > >> Fan in/out is very useful because the most effective optimization is to not >> call the time consuming functions, and this path information enables you to >> figure out where you don't really need to call it. > > Who's arguing that? Just wanted to point out how useful it is! |
March 26, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Tuesday, 26 March 2013 at 05:01:03 UTC, Walter Bright wrote:
> On 3/25/2013 8:01 PM, Vladimir Panteleev wrote:
>> On Tuesday, 26 March 2013 at 02:57:07 UTC, Walter Bright wrote:
>>> While you can get the caller (after all, debuggers do it), it can be
>>> arbitrarily costly (in terms of execution speed) to do so, which can negate
>>> many of the advantages of a probing profiler.
>>
>> What? You just read the value EBP is pointing at, or something like that.
>> Walking the call stack is basically walking a linked list.
>
> If only it were that simple.
>
> 1. many stack frames do not have an EBP
>
> 2. the stack frames on Win64 require doing a bunch of table searches to figure out - they don't use EBP
>
> 3. even when you find the return address, then it's a costly process to figure out what function that address belongs in
>
You can still stop the thread, gather the data you are interested in, and doing the whole process while resuming the application, which leverage concurrency.
The obvious advantage is that you don't measure the profiler's performance in addition to your app's.
BTW, I want to raise issue with fibers. We should report 2 stacks : the stack of function calls, and the stack of fiber calls.
|
March 26, 2013 Re: Improve performance of -profile by factor of 10 | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On 3/25/2013 10:16 PM, deadalnix wrote:
> You can still stop the thread, gather the data you are interested in, and doing
> the whole process while resuming the application, which leverage concurrency.
The obvious difficulty with that is when the app is posting data to the profiling thread faster than the latter can process it.
At some point, I'm going to say feel free to write a better profiler! I only suggest that it be as easy to use as the existing one.
|
Copyright © 1999-2021 by the D Language Foundation