View mode: basic / threaded / horizontal-split · Log in · Help
September 21, 2012
Re: Review of Andrei's std.benchmark
On 2012-09-21 18:21, Andrei Alexandrescu wrote:

> That's a good angle. Profiling is currently done by the -profile switch,
> and there are a couple of library functions associated with it. To my
> surprise, that documentation page has not been ported to the dlang.org
> style: http://digitalmars.com/ctg/trace.html
>
> I haven't yet thought whether std.benchmark should add more
> profiling-related primitives. I'd opine for releasing it without such
> for the time being.

If you have an API that is fairly open and provides more of the raw 
results then one can build a more profiling like solution on top of 
that. This can later be used to create a specific profiling module if we 
choose to do so.

-- 
/Jacob Carlborg
September 21, 2012
Re: Review of Andrei's std.benchmark
Am Fri, 21 Sep 2012 00:45:44 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:
> 
> The issue here is automating the benchmark of a module, which would 
> require some naming convention anyway.

A perfect use case for user defined attributes ;-)

@benchmark void foo(){}
@benchmark("File read test") void foo(){}
September 21, 2012
Re: Review of Andrei's std.benchmark
On 2012-09-21 19:45, Johannes Pfau wrote:

> A perfect use case for user defined attributes ;-)
>
> @benchmark void foo(){}
> @benchmark("File read test") void foo(){}

Yes, we need user defined attributes and AST macros ASAP :)

-- 
/Jacob Carlborg
September 21, 2012
Re: Review of Andrei's std.benchmark
> After extensive tests with a variety of aggregate functions, I 
> can say firmly that taking the minimum time is by far the best 
> when it comes to assessing the speed of a function.

Like others, I must also disagree in princple. The minimum sounds 
like a useful metric for functions that (1) do the same amount of 
work in every test and (2) are microbenchmarks, i.e. they measure 
a small and simple task. If the benchmark being measured either 
(1) varies the amount of work each time (e.g. according to some 
approximation of real-world input, which obviously may vary)* or 
(2) measures a large system, then the average and standard 
deviation and even a histogram may be useful (or perhaps some 
indicator whether the runtimes are consistent with a normal 
distribution or not). If the running-time is long then the max 
might be useful (because things like task-switching overhead 
probably do not contribute that much to the total).

* I anticipate that you might respond "so, only test a single 
input per benchmark", but if I've got 1000 inputs that I want to 
try, I really don't want to write 1000 functions nor do I want 
1000 lines of output from the benchmark. An average, standard 
deviation, min and max may be all I need, and if I need more 
detail, then I might break it up into 10 groups of 100 inputs. In 
any case, the minimum runtime is not the desired output when the 
input varies.

It's a little surprising to hear "The purpose of std.benchmark is 
not to estimate real-world time. (That is the purpose of 
profiling)"... Firstly, of COURSE I would want to estimate 
real-world time with some of my benchmarks. For some benchmarks I 
just want to know which of two or three approaches is faster, or 
to get a coarse ball-park sense of performance, but for others I 
really want to know the wall-clock time used for realistic inputs.

Secondly, what D profiler actually helps you answer the question 
"where does the time go in the real-world?"? The D -profile 
switch creates an instrumented executable, which in my experience 
(admittedly not experience with DMD) severely distorts running 
times. I usually prefer sampling-based profiling, where the 
executable is left unchanged and a sampling program interrupts 
the program at random and grabs the call stack, to avoid the 
distortion effect of instrumentation. Of course, instrumentation 
is useful to find out what functions are called the most and 
whether call frequencies are in line with expectations, but I 
wouldn't trust the time measurements that much.

As far as I know, D doesn't offer a sampling profiler, so one 
might indeed use a benchmarking library as a (poor) substitute. 
So I'd want to be able to set up some benchmarks that operate on 
realistic data, with perhaps different data in different runs in 
order to learn about how the speed varies with different inputs 
(if it varies a lot then I might create more benchmarks to 
investigate which inputs are processed quickly, and which slowly.)

Some random comments about std.benchmark based on its 
documentation:

- It is very strange that the documentation of printBenchmarks 
uses neither of the words "average" or "minimum", and doesn't say 
how many trials are done.... I suppose the obvious interpretation 
is that it only does one trial, but then we wouldn't be having 
this discussion about averages and minimums right? Øivind says 
tests are run 1000 times... but it needs to be configurable 
per-test (my idea: support a _x1000 suffix in function names, or 
_for1000ms to run the test for at least 1000 milliseconds; and 
allow a multiplier when when running a group of benchmarks, e.g. 
a multiplier argument of 0.5 means to only run half as many 
trials as usual.) Also, it is not clear from the documentation 
what the single parameter to each benchmark is (define 
"iterations count".)

- The "benchmark_relative_" feature looks quite useful. I'm also 
happy to see benchmarkSuspend() and benchmarkResume(), though 
benchmarkSuspend() seems redundant in most cases: I'd like to 
just call one function, say, benchmarkStart() to indicate "setup 
complete, please start measuring time now."

- I'm glad that StopWatch can auto-start; but the documentation 
should be clearer: does reset() stop the timer or just reset the 
time to zero? does stop() followed by start() start from zero or 
does it keep the time on the clock? I also think there should be 
a method that returns the value of peek() and restarts the timer 
at the same time (perhaps stop() and reset() should just return 
peek()?)

- After reading the documentation of comparingBenchmark and 
measureTime, I have almost no idea what they do.
September 21, 2012
Re: Review of Andrei's std.benchmark
On 21 September 2012 07:30, Andrei Alexandrescu <
SeeWebsiteForEmail@erdani.org> wrote:

> I don't quite agree. This is a domain in which intuition is having a hard
> time, and at least some of the responses come from an intuitive standpoint,
> as opposed from hard data.
>
> For example, there's this opinion that taking the min, max, and average is
> the "fair" thing to do and the most informative.


I don't think this is a 'fair' claim, the situation is that different
people are looking for different statistical information, and you can
distinguish it with whatever terminology you prefer. You are only
addressing a single use case; 'benchmarking', by your definition. I'm more
frequently interested in profiling than 'benchmark'ing, and I think both
are useful to have.

The thing is, the distinction between 'benchmarking' and 'profiling' is
effectively implemented via nothing more than the sampling algorithm; min
vs avg, so is it sensible to expose the distinction in the API in this way?
September 21, 2012
Re: Review of Andrei's std.benchmark
On 21 September 2012 07:45, Andrei Alexandrescu <
SeeWebsiteForEmail@erdani.org> wrote:

> As such, you're going to need a far more
>> convincing argument than "It worked well for me."
>>
>
> Sure. I have just detailed the choices made by std.benchmark in a couple
> of posts.
>
> At Facebook we measure using the minimum, and it's working for us.


Facebook isn't exactly 'realtime' software. Obviously, faster is always
better, but it's not in a situation where if you slip a sync point by 1ms
in an off case, it's all over. You can lose 1ms here, and make it up at a
later time, and the result is the same. But again, this feeds back to your
distinction between benchmarking and profiling.

Otherwise, I think we'll need richer results. At the very least there
>> should be an easy way to get at the raw results programmatically
>> so we can run whatever stats/plots/visualizations/**output-formats we
>> want. I didn't see anything like that browsing through the docs, but
>> it's possible I may have missed it.
>>
>
> Currently std.benchmark does not expose raw results for the sake of
> simplicity. It's easy to expose such, but I'd need a bit more convincing
> about their utility.


Custom visualisation, realtime charting/plotting, user supplied reduce
function?
September 21, 2012
Re: Review of Andrei's std.benchmark
> As far as I know, D doesn't offer a sampling profiler,

It is possible to use a sampling profiler on D executables 
though. I usually use perf on Linux and AMD CodeAnalyst on 
Windows.
September 21, 2012
Re: Review of Andrei's std.benchmark
On 21 September 2012 07:23, Andrei Alexandrescu <
SeeWebsiteForEmail@erdani.org> wrote:

> For a very simple reason: unless the algorithm under benchmark is very
> long-running, max is completely useless, and it ruins average as well.
>

This is only true for systems with a comprehensive pre-emptive OS running
on the same core. Most embedded systems will only be affected by cache
misses and bus contention, in that situation, max is perfectly acceptable.
September 21, 2012
Re: Review of Andrei's std.benchmark
On Friday, September 21, 2012 17:58:05 Manu wrote:
> Okay, I can buy this distinction in terminology.
> What I'm typically more interested in is profiling. I do occasionally need
> to do some benchmarking by your definition, so I'll find this useful, but
> should there then be another module to provide a 'profiling' API? Also
> worked into this API?

dmd has the -profile flag.

- Jonathan M Davis
September 21, 2012
Re: Review of Andrei's std.benchmark
On 9/19/12 4:06 AM, Peter Alexander wrote:
> I don't see why `benchmark` takes (almost) all of its parameters as
> template parameters. It looks quite odd, seems unnecessary, and (if I'm
> not mistaken) makes certain use cases quite difficult.

That is intentional - indirect calls would add undue overhead to the 
measurements.

Andrei
1 2 3 4 5 6 7 8
Top | Discussion index | About this forum | D home