September 19, 2012
On 2012-09-19 09:58, Jonathan M Davis wrote:

> util is one of the worst package names ever, because it means basically
> nothing. Any function could go in there.

Well, the "util" package in Phobos is called "std".

> As for a time/date package, we already have std.datetime (which will hopefully
> be split into the package std.datetime at some point, but we need something
> like DIP 15 or 16 before we can do that), and we're moving the benchmarking
> _out_ of there. If std.datetime were already a package, then maybe putting it
> in there would make some sense, but benchmarking is arguably fundamentally
> different from what the rest of std.datetime does. I really so no problem with
> benchmarking being its own thing, and std.benchmark works just fine for that.

I just think we have too many top level modules.


-- 
/Jacob Carlborg
September 19, 2012
On 2012-09-19 11:38, Peter Alexander wrote:

> The problem with slowest is that you end up with the occasional OS
> hiccup or GC collection which throws the entire benchmark off. I see
> your point, but unless you can prevent the OS from interrupting, the
> time would be meaningless.

That's way the average is good to have as well.

-- 
/Jacob Carlborg
September 19, 2012
On Wednesday, 19 September 2012 at 08:28:36 UTC, Manu wrote:
> On 19 September 2012 01:02, Andrei Alexandrescu <
> SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 9/18/12 5:07 PM, "Øivind" wrote:
>>
>>> * For all tests, the best run is selected, but would it not be
>>
>> reasonable in some cases to get the average value? Maybe excluding the
>>> runs that are more than a couple std. deviations away from the mean
>>> value..
>>>
>>
>> After extensive tests with a variety of aggregate functions, I can say
>> firmly that taking the minimum time is by far the best when it comes to
>> assessing the speed of a function.
>
>
> The fastest execution time is rarely useful to me, I'm almost always much
> more interested in the slowest execution time.
> In realtime software, the slowest time is often the only important factor,
> everything must be designed to tolerate this possibility.
> I can also imagine other situations where multiple workloads are competing
> for time, the average time may be more useful in that case.


For comparison's sake, the Criterion benchmarking package for Haskell is worth a look:

http://www.serpentine.com/blog/2009/09/29/criterion-a-new-benchmarking-library-for-haskell/

Criterion accounts for clock-call costs, displays various central tendencies, reports outliers (and their significance --- whether the variance is significantly affected by the outliers), etc., etc. It's a very well conceived benchmarking system, and might well be worth stealing from.

Best,
Graham

September 19, 2012
New question for you :)

To register benchmarks, the 'scheduleForBenchmarking' mixin inserts a shared static initializer into the module. If I have a module A and a module B, that both depend on eachother, than this will probably not work..? The runtime will detect the init cycle and fail with the following error:

"Cycle detected between modules with ctors/dtors"

Or am I wrong now?
September 20, 2012
On 19 September 2012 12:38, Peter Alexander <peter.alexander.au@gmail.com>wrote:

> The fastest execution time is rarely useful to me, I'm almost always much
>> more interested in the slowest execution time.
>> In realtime software, the slowest time is often the only important factor,
>> everything must be designed to tolerate this possibility.
>> I can also imagine other situations where multiple workloads are competing
>> for time, the average time may be more useful in that case.
>>
>
> The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless.
>

So then we need to start getting tricky, and choose the slowest one that is not beyond an order of magnitude or so outside the average?


September 20, 2012
On 9/20/12 2:42 AM, Manu wrote:
> On 19 September 2012 12:38, Peter Alexander
> <peter.alexander.au@gmail.com <mailto:peter.alexander.au@gmail.com>> wrote:
>
>         The fastest execution time is rarely useful to me, I'm almost
>         always much
>         more interested in the slowest execution time.
>         In realtime software, the slowest time is often the only
>         important factor,
>         everything must be designed to tolerate this possibility.
>         I can also imagine other situations where multiple workloads are
>         competing
>         for time, the average time may be more useful in that case.
>
>
>     The problem with slowest is that you end up with the occasional OS
>     hiccup or GC collection which throws the entire benchmark off. I see
>     your point, but unless you can prevent the OS from interrupting, the
>     time would be meaningless.
>
>
> So then we need to start getting tricky, and choose the slowest one that
> is not beyond an order of magnitude or so outside the average?

The "best way" according to some of the people who've advised my implementation of the framework at Facebook is to take the mode of the measurements distribution, i.e. the time at the maximum density.

I implemented that (and it's not easy). It yielded numbers close to the minimum, but less stable and needing more iterations to become stable (when they do get indeed close to the minimum).

Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.


Andrei
September 20, 2012
On 20 September 2012 15:36, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> On 9/20/12 2:42 AM, Manu wrote:
>
>> On 19 September 2012 12:38, Peter Alexander
>> <peter.alexander.au@gmail.com <mailto:peter.alexander.au@**gmail.com<peter.alexander.au@gmail.com>>>
>> wrote:
>>
>>         The fastest execution time is rarely useful to me, I'm almost
>>         always much
>>         more interested in the slowest execution time.
>>         In realtime software, the slowest time is often the only
>>         important factor,
>>         everything must be designed to tolerate this possibility.
>>         I can also imagine other situations where multiple workloads are
>>         competing
>>         for time, the average time may be more useful in that case.
>>
>>
>>     The problem with slowest is that you end up with the occasional OS
>>     hiccup or GC collection which throws the entire benchmark off. I see
>>     your point, but unless you can prevent the OS from interrupting, the
>>     time would be meaningless.
>>
>>
>> So then we need to start getting tricky, and choose the slowest one that is not beyond an order of magnitude or so outside the average?
>>
>
> The "best way" according to some of the people who've advised my implementation of the framework at Facebook is to take the mode of the measurements distribution, i.e. the time at the maximum density.
>
> I implemented that (and it's not easy). It yielded numbers close to the
> minimum, but less stable and needing more iterations to become stable (when
> they do get indeed close to the minimum).
>
> Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.


If you do more than a single iteration, the minimum will virtually always
be influenced by ideal cache pre-population, which is unrealistic. Memory
locality is often the biggest contributing performance hazard in many
algorithms, and usually the most unpredictable. I want to know about that
in my measurements.
Reproducibility is not important to me as accuracy. And I'd rather be
conservative(/pessimistic) with the error.

What guideline would you apply to estimate 'real-world' time spent when always working with hyper-optimistic measurements?


September 20, 2012
On 2012-09-20 14:36, Andrei Alexandrescu wrote:

> Let's use the minimum. It is understood it's not what you'll see in
> production, but it is an excellent proxy for indicative and reproducible
> performance numbers.

Why not min, max and average?

-- 
/Jacob Carlborg
September 20, 2012
On Thursday, 20 September 2012 at 12:35:15 UTC, Andrei Alexandrescu wrote:
> On 9/20/12 2:42 AM, Manu wrote:
>> On 19 September 2012 12:38, Peter Alexander
>> <peter.alexander.au@gmail.com <mailto:peter.alexander.au@gmail.com>> wrote:
>>
>>        The fastest execution time is rarely useful to me, I'm almost
>>        always much
>>        more interested in the slowest execution time.
>>        In realtime software, the slowest time is often the only
>>        important factor,
>>        everything must be designed to tolerate this possibility.
>>        I can also imagine other situations where multiple workloads are
>>        competing
>>        for time, the average time may be more useful in that case.
>>
>>
>>    The problem with slowest is that you end up with the occasional OS
>>    hiccup or GC collection which throws the entire benchmark off. I see
>>    your point, but unless you can prevent the OS from interrupting, the
>>    time would be meaningless.
>>
>>
>> So then we need to start getting tricky, and choose the slowest one that
>> is not beyond an order of magnitude or so outside the average?
>
> The "best way" according to some of the people who've advised my implementation of the framework at Facebook is to take the mode of the measurements distribution, i.e. the time at the maximum density.
>
> I implemented that (and it's not easy). It yielded numbers close to the minimum, but less stable and needing more iterations to become stable (when they do get indeed close to the minimum).
>
> Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.
>
>
> Andrei

From the responses on the thread clearly there isn't a "best way".
There are different use-cases with different tradeoffs so why not allow the user to choose the policy best suited for their use-case?
I'd suggest to provide a few reasonable common choices to choose from, as well as a way to provide a user defined calculation (function pointer/delegate?)
September 20, 2012
On 9/20/12 1:37 PM, Jacob Carlborg wrote:
> On 2012-09-20 14:36, Andrei Alexandrescu wrote:
>
>> Let's use the minimum. It is understood it's not what you'll see in
>> production, but it is an excellent proxy for indicative and reproducible
>> performance numbers.
>
> Why not min, max and average?

Because max and average are misleading and uninformative, as I explained.

Andrei