Jump to page: 1 28  
Page
Thread overview
Review of Andrei's std.benchmark
Sep 17, 2012
Jens Mueller
Sep 18, 2012
Øivind
Sep 19, 2012
Thiez
Sep 19, 2012
Manu
Sep 19, 2012
Peter Alexander
Sep 19, 2012
Jacob Carlborg
Sep 20, 2012
Manu
Sep 20, 2012
Manu
Sep 21, 2012
Manu
Sep 21, 2012
Jacob Carlborg
Sep 21, 2012
Jonathan M Davis
Sep 20, 2012
Jacob Carlborg
Sep 21, 2012
Jacob Carlborg
Sep 21, 2012
Jens Mueller
Sep 22, 2012
Simen Kjaeraas
Sep 21, 2012
Manu
Sep 20, 2012
foobar
Sep 21, 2012
Manu
Sep 19, 2012
Graham Fawcett
Sep 21, 2012
Nick Sabalausky
Sep 21, 2012
Tove
Sep 21, 2012
Johannes Pfau
Sep 21, 2012
Jacob Carlborg
Sep 21, 2012
Manu
Sep 22, 2012
Nick Sabalausky
Sep 21, 2012
David Piepgrass
Sep 21, 2012
jerro
Sep 21, 2012
Dmitry Olshansky
Sep 21, 2012
David Piepgrass
Sep 22, 2012
David Piepgrass
Sep 22, 2012
Nick Sabalausky
Sep 19, 2012
Øivind
Sep 21, 2012
Jonathan M Davis
Sep 22, 2012
Øivind
Sep 22, 2012
Øivind
Sep 22, 2012
Øivind
Sep 22, 2012
Simen Kjaeraas
Sep 22, 2012
Simen Kjaeraas
Sep 22, 2012
Jonathan M Davis
Sep 21, 2012
Dmitry Olshansky
Sep 22, 2012
Jacob Carlborg
Sep 19, 2012
Jacob Carlborg
Sep 19, 2012
Jonathan M Davis
Sep 19, 2012
Jacob Carlborg
Sep 19, 2012
Peter Alexander
Sep 21, 2012
Peter Alexander
Sep 21, 2012
Dmitry Olshansky
September 17, 2012
Hi,

it's my pleasure to announce the begin of the formal review of Andrei's std.benchmark. The review will start today and end in two weeks, on 1st of October. The review is followed by a week of voting which ends on 8th of October.

Quoting Andrei from his request for formal review:
"I reworked the benchmarking framework for backward compatibility,
flexibility, and convenience.

There are a few enhancement possibilities (such as tracking system/user time separately etc), but there is value in keeping things simple and convenient. Right now it really takes only one line of code and observing a simple naming convention to hook a module into the benchmarking framework."

Code: https://github.com/D-Programming-Language/phobos/pull/794 Docs: http://dlang.org/phobos-prerelease/std_benchmark.html

If std.benchmark is accepted it will likely lead to a deprecation of std.datetime's benchmark facilities.

The code is provided as a pull requested and being (as usual) integrated
by the auto tester for Mac OS X, FreeBSD, Linux and Windows (see
(http://d.puremagic.com/test-results/pull-history.ghtml?repoid=3&pullid=794).

In your comments you can/should address the
* design
* implementation
* documentation
* usefulness
of the library.

Provide information regarding the depth (ranging from very brief to in-depth) of your review and conclude explicitly whether std.benchmark should or shouldn't be included in Phobos.

Post all feedback to this thread. Constructive feedback is very much appreciated.

To conclude in more Andrei like words: Happy destruction!

Jens
September 17, 2012
On 9/17/12 5:13 PM, Jens Mueller wrote:
> If std.benchmark is accepted it will likely lead to a deprecation of
> std.datetime's benchmark facilities.

One note - I moved the benchmark-related stuff from std.datetime unmodified into std.benchmark and left public aliases in place, so no code breakage is imminent. We may deprecate the aliases themselves later.

> To conclude in more Andrei like words: Happy destruction!

Sounds about right! :o)


Andrei
September 18, 2012
I think the std.benchmark is definitely a useful library addition, but in my mind it currently a bit too limited.

* All tests are run 1000 times. Depending on the length of the test to benchmark, this can be too much. In some cases it would be good to be able to trade the number of runs against accuracy.

* For all tests, the best run is selected, but would it not be reasonable in some cases to get the average value? Maybe excluding the runs that are more than a couple std. deviations away from the mean value..

* Is there a way of specifying a test name other than the function-name when using the 'mixin(scheduleForBenchmarking)' approach to register benchmarks?

* I would also like to be able (if possible) to register two mentioned things (number of runs and result strategy) with the mixin approach (or similar).

* It seems like the baseline for subtraction from subsequent test runs is taken from a call to the test function, passing 1 to it. Shouldn't 0 be passed for this value?

If these can be addressed, I would like it added to the library!

September 18, 2012
On 9/18/12 5:07 PM, "Øivind" wrote:
> I think the std.benchmark is definitely a useful library addition, but
> in my mind it currently a bit too limited.
>
> * All tests are run 1000 times. Depending on the length of the test to
> benchmark, this can be too much. In some cases it would be good to be
> able to trade the number of runs against accuracy.

It would be a good idea to make that a configurable parameter.

> * For all tests, the best run is selected, but would it not be
> reasonable in some cases to get the average value? Maybe excluding the
> runs that are more than a couple std. deviations away from the mean value..

After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.

> * Is there a way of specifying a test name other than the function-name
> when using the 'mixin(scheduleForBenchmarking)' approach to register
> benchmarks?

Not currently. Probably a manual registration of an individual benchmark would make sense.

> * I would also like to be able (if possible) to register two mentioned
> things (number of runs and result strategy) with the mixin approach (or
> similar).

Makes sense.

> * It seems like the baseline for subtraction from subsequent test runs
> is taken from a call to the test function, passing 1 to it. Shouldn't 0
> be passed for this value?

I'll look into that.


Thanks,

Andrei
September 19, 2012
On 2012-09-17 23:13, Jens Mueller wrote:

> Post all feedback to this thread. Constructive feedback is very much
> appreciated.
>
> To conclude in more Andrei like words: Happy destruction!

* Why is "scheduleForBenchmarking" a string? Can't it be a template mixin?

* What's the most appropriate way of just timing a block of code? Something like this:

auto time = benchmark!({ /* some code */ })(1);

If that's the case then I suggest setting a default value of "1" for the "n" parameter.

* If I want to format the printed result differently, say in HTML, how would I do that? Should I use the "benchmark" function and iterate the BenchmarkResult array?

* BTW why doesn't benchmark return the BenchmarkResult array?

* Is this module so important to keep it as a top level module? I'm thinking something like a utility package or a time/date package. How about std.util.benchmark?

-- 
/Jacob Carlborg
September 19, 2012
On Wednesday, September 19, 2012 09:13:40 Jacob Carlborg wrote:
> * Is this module so important to keep it as a top level module? I'm thinking something like a utility package or a time/date package. How about std.util.benchmark?

util is one of the worst package names ever, because it means basically nothing. Any function could go in there.

As for a time/date package, we already have std.datetime (which will hopefully be split into the package std.datetime at some point, but we need something like DIP 15 or 16 before we can do that), and we're moving the benchmarking _out_ of there. If std.datetime were already a package, then maybe putting it in there would make some sense, but benchmarking is arguably fundamentally different from what the rest of std.datetime does. I really so no problem with benchmarking being its own thing, and std.benchmark works just fine for that.

- Jonathan M Davis
September 19, 2012
I don't see why `benchmark` takes (almost) all of its parameters as template parameters. It looks quite odd, seems unnecessary, and (if I'm not mistaken) makes certain use cases quite difficult.

For example, suppose I want to benchmark a function several times with different parameters and names, how would I do that?

foreach (i; 0..10)
{
    printBenchmark!( format("Test %d", i), { someFunc(i); } )();
}

This won't work because i isn't known at compile time, and for some use cases it can't be known at compile time.

I wouldn't mind if there was some real benefit to taking these as template arguments, but there doesn't seem to be any value at all -- it just limits usage.
September 19, 2012
On Tuesday, 18 September 2012 at 22:01:30 UTC, Andrei Alexandrescu wrote:
> After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.

What if one tries to benchmark a nondeterministic function? In such a case one might well be interested in the best run, worst run, and the average.
September 19, 2012
On 19 September 2012 01:02, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> On 9/18/12 5:07 PM, "Øivind" wrote:
>
>> * For all tests, the best run is selected, but would it not be
>
> reasonable in some cases to get the average value? Maybe excluding the
>> runs that are more than a couple std. deviations away from the mean value..
>>
>
> After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.


The fastest execution time is rarely useful to me, I'm almost always much
more interested in the slowest execution time.
In realtime software, the slowest time is often the only important factor,
everything must be designed to tolerate this possibility.
I can also imagine other situations where multiple workloads are competing
for time, the average time may be more useful in that case.

Side question:
Running a test over and over pre-populates the cache with all associated
data after the first cycle... The cache needs to be randomised between each
cycle to get realistic results.


September 19, 2012
> The fastest execution time is rarely useful to me, I'm almost always much
> more interested in the slowest execution time.
> In realtime software, the slowest time is often the only important factor,
> everything must be designed to tolerate this possibility.
> I can also imagine other situations where multiple workloads are competing
> for time, the average time may be more useful in that case.

The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless.
« First   ‹ Prev
1 2 3 4 5 6 7 8