View mode: basic / threaded / horizontal-split · Log in · Help
April 08, 2012
Re: std.benchmark ready for review. Manager sought after
Le 08/04/2012 17:48, Dmitry Olshansky a écrit :
> On 08.04.2012 12:16, Somedude wrote:
> [snip]
>>
>> Like it. Would it be a good idea to add a column with an average memory
>> used ?
> 
> In general it's next to impossible and/or entirely OS-specific.
> What can be done I think is adding a query function to GC interface that
> returns amount of RAM currently allocated on it's heap.
> 
> So adding GC heap usage can be done albeit it's not really a stable
> metric. This way one gets a nice hint on why something is slow ;)
> 

That's what I thought, thank you.
April 08, 2012
Re: std.benchmark ready for review. Manager sought after
On 4/8/12 3:03 PM, Manfred Nowak wrote:
> Andrei Alexandrescu wrote:
>
>> Clearly there is noise during normal use as well, but
>> incorporating it in benchmarks as a matter of course reduces the
>> usefulness of benchmarks
>
> On the contrary:
> 1) The "noise during normal use" has to be measured in order to detect
> the sensibility of the benchmarked program to that noise.

That sounds quite tenuous to me. How do you measure it, and what 
conclusions do you draw other than there's a more or less other stuff 
going on on the machine, and the machine itself has complex interactions?

Far as I can tell a time measurement result is:

T = A + Q + N

where:

A > 0 is actual benchmark time

Q > 0 quantization noise (uniform distribution)

N > 0 various other noises (interrupts, task switching, networking, CPU 
dynamically changing frequency, etc). Many people jump on Gaussian as an 
approximation, but my tests suggest it's hardly so because it has a lot 
of jerky outliers.

How do we estimate A given T?

> 2) The noise the benchmarked program produces has to be measured too,
> because the running benchmarked program probably increases the noise
> for all other running programs.

How to measure that? Also, that noise does not need to be measured as 
much as eliminated to the extent possible. This is because the benchmark 
app noise is a poor model of the application-induced noise.

> In addition: the noise produced by a machine under heavy load might
> bring the performance of the benchmarked program down to zero.

Of course. That's why the documentation emphasizes the necessity of 
baselines. A measurement without baselines is irrelevant.


Andrei
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
08.04.2012 21:31, Andrei Alexandrescu пишет:
> On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
>> Very good but minimum isn't a best guess. Personally I (and there will
>> be a lot of such maniacs I suppose) will think that this (minimum) time
>> can be significantly smaller than average time.
>
> I've analyzed this quite a bit at work and the average and median are
> not very informative. You need the mode, but in most benchmarks the mode
> is very close to the minimum, so using the minimum is even better.
>
> In speed measurements, all noise is additive (there's no noise that may
> make a benchmark appear to run faster). There are also quite a few
> outliers. Recording the average will include a fair amount of noise.

Yes of course. I mean than an algorithm itself should follow some 
restrictions for such measurement. It should run with the same speed 
every time. Many algorithms follows this convention, but not all.

Why will recording the average produce so much noise? As I see, floating 
point arithmetic is now used without a strong reason so it looks like a 
time of this part isn't valuable. Or is it just a temporary solution?

Anyway it should be configurable using a CT parameter, so it will not 
abuse one who doesn't need it.

>
> Clearly there is noise during normal use as well, but incorporating it
> in benchmarks as a matter of course reduces the usefulness of benchmarks
> as a mean to improve performance of the benchmarked code.

A graph is needed exactly because of that. Without a graph it really 
gives very little.

>
>> So a parameter (probably with a default value) should be added.
>> Something like enum of flags telling what we want to know. At least
>> these looks usable: minTime, <some mean time>, maxTime,
>> standardDeviation, graph (yes, good old ASCII art).
>
> Standard deviation is also not very useful because it includes all
> outliers (some of which are very far away from the mode).

So a graph is needed.

>
>> Yes, graph is needed.
>
> I am not sure about that. We may provide the raw measurement data for
> programs that want to plot things, but plotting is beyond the charter of
> std.benchmark.
>

Sorry, meant a histogram, not a curve. A histogram can be shown in a 
console very well. And it is needed because its the easiest way to show 
benchmarked program behaviour (and noise behaviour). It also requires 
only about 80 integers to store information and shouldn't produce much 
noise.

IMHO, a histogram gives lots of information and will be a good addition.


-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
> Why will recording the average produce so much noise?

As I explained, the average takes noise and outliers (some very large, 
e.g. milliseconds in a benchmark that takes microseconds) into account. 
The minimum is shielded from this issue. In the limit, the minimum for 
infinitely many measurements is the sought-after result.

> As I see, floating
> point arithmetic is now used without a strong reason so it looks like a
> time of this part isn't valuable. Or is it just a temporary solution?

I don't understand "time of this part".

> Anyway it should be configurable using a CT parameter, so it will not
> abuse one who doesn't need it.

We'd like the framework to do the right thing, and the average does not 
seem to be it.

> IMHO, a histogram gives lots of information and will be a good addition.

I disagree.


Andrei
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
Andrei Alexandrescu wrote:


> all noise is additive (there's no noise that may make a benchmark
> appear to run faster)

This is in doubt, because you yourself wrote "the machine itself has 
complex interactions". This complex interactions might lower the time 
needed for an operation of the benchmarked program.

Examples that come to mind:
a) needed data is already in a (faster) cache because it belongs to a 
memory block, from which some data is needed by some program not 
belonging to the benchmarked set---and that block isnt replaced yet.
b) needed data is stored in a hdd whose I/O scheduler uses the elevator 
algorithm and serves the request by pure chance instantly, because the 
position of the needed data is between two positions accessed by some 
programs not belonging to the benchmarked set.

Especially a hdd, if used, will be responsible for a lot of noise you 
define as "quantization noise (uniform distribution)" even if the head 
stays at the same cylinder. Not recognizing this noise would only mean 
that the data is cached and interpreting the only true read from the 
hdd as a jerky outlier sems quite wrong.


>> 1) The "noise during normal use" has to be measured in order to
>> detect the sensibility of the benchmarked program to that noise.
> How do you measure it, and what 
> conclusions do you draw other than there's a more or less other
> stuff going on on the machine, and the machine itself has complex
> interactions? 
> 
> Far as I can tell a time measurement result is:
> 
> T = A + Q + N

For example by running more than one instance of the benchmarked 
program in paralell and use the thereby gathered statistical routines 
to spread T into the additiv components A, Q and N.


>> 2) The noise the benchmarked program produces has to be measured
>> too, because the running benchmarked program probably increases
>> the noise for all other running programs.
> 
> How to measure that?

Similar to the above note.


> Also, that noise does not need to be measured
> as much as eliminated to the extent possible.

I wouldn't define two programs to be equivalent based on the time until 
completion only. That time might be identical for both programs, but if 
only one of the programs increases the answering time of the machine  
to inacceptability I would choose the other. 

-manfred
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
Added to trello.

-Steve
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this 
library, and yet they are relegated to second-class citizen: "Oh, 
I guess you can use this". Normally, I wouldn't be so picky, but 
this is a standard library. Focus should be on functionality.

Providing formatted output is a nice bonus, but to me, it's just 
a bonus. Any benchmarking part of a large project is bound to 
format the output itself (to log benchmark results against 
revisions in a database or something like that).

Also, benchmark() and runBenchmarks() are kind of confusing at 
first glance. Something along the lines of benchmarkModule() and 
benchmarkAllModules() would be more sensible.
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
On 4/9/12 10:23 AM, Francois Chabot wrote:
> Why is there so much emphasis on printBenchmarks()?
>
> benchmark() and runBenchmarks() are clearly the core of this library,
> and yet they are relegated to second-class citizen: "Oh, I guess you can
> use this". Normally, I wouldn't be so picky, but this is a standard
> library. Focus should be on functionality.

The functionality is to make benchmark easy to use, meaningful, and easy 
to interpret. I don't want to add a complicated library for 
postprocessing benchmarks because most nobody will use it.

The first function in the documentation is what most people will want to 
bring themselves to using. The functions that provide the data are 
eminently available so I disagree with the "second-class citizen" 
characterization. You want to use them, use them. They don't need to be 
given rockstar billing.


Andrei
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
On 4/9/12 9:25 AM, Manfred Nowak wrote:
> Andrei Alexandrescu wrote:
>> all noise is additive (there's no noise that may make a benchmark
>> appear to run faster)
>
> This is in doubt, because you yourself wrote "the machine itself has
> complex interactions". This complex interactions might lower the time
> needed for an operation of the benchmarked program.
>
> Examples that come to mind:
> a) needed data is already in a (faster) cache because it belongs to a
> memory block, from which some data is needed by some program not
> belonging to the benchmarked set---and that block isnt replaced yet.

Which is great, unless the program wants to measure the cache memory 
itself, in which case it would use special assembler instructions or 
large memset()s. (We do such at Facebook.)

> b) needed data is stored in a hdd whose I/O scheduler uses the elevator
> algorithm and serves the request by pure chance instantly, because the
> position of the needed data is between two positions accessed by some
> programs not belonging to the benchmarked set.
>
> Especially a hdd, if used, will be responsible for a lot of noise you
> define as "quantization noise (uniform distribution)" even if the head
> stays at the same cylinder. Not recognizing this noise would only mean
> that the data is cached and interpreting the only true read from the
> hdd as a jerky outlier sems quite wrong.

If the goal is to measure the seek time of the HDD, the benchmark itself 
should make sure the HDD cache is cleared. (What I recall they do on 
Linux is unmounting and remounting the drive.) Otherwise, it adds a 
useless component to the timing.

>>> 1) The "noise during normal use" has to be measured in order to
>>> detect the sensibility of the benchmarked program to that noise.
>> How do you measure it, and what
>> conclusions do you draw other than there's a more or less other
>> stuff going on on the machine, and the machine itself has complex
>> interactions?
>>
>> Far as I can tell a time measurement result is:
>>
>> T = A + Q + N
>
> For example by running more than one instance of the benchmarked
> program in paralell and use the thereby gathered statistical routines
> to spread T into the additiv components A, Q and N.

I disagree with running two benchmarks in parallel because that exposes 
them to even more noise (scheduling, CPU count, current machine load 
etc). I don't understand the part of the sentence starting with "...use 
the thereby...", I'd be grateful if you elaborated.


Andrei
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
> Which is great, unless the program wants to measure the cache 
> memory itself, in which case it would use special assembler 
> instructions or large memset()s. (We do such at Facebook.)

I disagree. If a regression suddenly causes a function to become
heavily cache-bound, it should show up in benchmarks somehow,
regardless of the previous expected behavior of the function.

--
Francois
1 2 3 4 5 6
Top | Discussion index | About this forum | D home