On 20 September 2012 15:36, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
On 9/20/12 2:42 AM, Manu wrote:
On 19 September 2012 12:38, Peter Alexander
<peter.alexander.au@gmail.com <mailto:peter.alexander.au@gmail.com>> wrote:

        The fastest execution time is rarely useful to me, I'm almost
        always much
        more interested in the slowest execution time.
        In realtime software, the slowest time is often the only
        important factor,
        everything must be designed to tolerate this possibility.
        I can also imagine other situations where multiple workloads are
        competing
        for time, the average time may be more useful in that case.


    The problem with slowest is that you end up with the occasional OS
    hiccup or GC collection which throws the entire benchmark off. I see
    your point, but unless you can prevent the OS from interrupting, the
    time would be meaningless.


So then we need to start getting tricky, and choose the slowest one that
is not beyond an order of magnitude or so outside the average?

The "best way" according to some of the people who've advised my implementation of the framework at Facebook is to take the mode of the measurements distribution, i.e. the time at the maximum density.

I implemented that (and it's not easy). It yielded numbers close to the minimum, but less stable and needing more iterations to become stable (when they do get indeed close to the minimum).

Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.

If you do more than a single iteration, the minimum will virtually always be influenced by ideal cache pre-population, which is unrealistic. Memory locality is often the biggest contributing performance hazard in many algorithms, and usually the most unpredictable. I want to know about that in my measurements.
Reproducibility is not important to me as accuracy. And I'd rather be conservative(/pessimistic) with the error.

What guideline would you apply to estimate 'real-world' time spent when always working with hyper-optimistic measurements?