View mode: basic / threaded / horizontal-split · Log in · Help
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
Le 09/04/2012 17:23, Francois Chabot a écrit :
> Why is there so much emphasis on printBenchmarks()?
> 
> benchmark() and runBenchmarks() are clearly the core of this library,
> and yet they are relegated to second-class citizen: "Oh, I guess you can
> use this". Normally, I wouldn't be so picky, but this is a standard
> library. Focus should be on functionality.
> 
> Providing formatted output is a nice bonus, but to me, it's just a
> bonus. Any benchmarking part of a large project is bound to format the
> output itself (to log benchmark results against revisions in a database
> or something like that).
> 
> Also, benchmark() and runBenchmarks() are kind of confusing at first
> glance. Something along the lines of benchmarkModule() and
> benchmarkAllModules() would be more sensible.

The printBenchmark facility is cool and should be included imho.
It helps benchmarking being as standard as unit testing.
We don't want to have to write again and again the same boilerplate code
for such trivial uses.
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
On 4/9/12 11:29 AM, Francois Chabot wrote:
>> Which is great, unless the program wants to measure the cache memory
>> itself, in which case it would use special assembler instructions or
>> large memset()s. (We do such at Facebook.)
>
> I disagree. If a regression suddenly causes a function to become
> heavily cache-bound, it should show up in benchmarks somehow,
> regardless of the previous expected behavior of the function.

But cache binding depends on a variety of cache characteristics, i.e. 
the machine. The question is whether we accept a heavy dependence of the 
benchmark on the machine.

Andrei
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
On 4/9/12 11:44 AM, Somedude wrote:
> It helps benchmarking being as standard as unit testing.
> We don't want to have to write again and again the same boilerplate code
> for such trivial uses.

Yes, I had unittest in mind when writing the library. If one needs more 
than one statement to get an informative benchmark off the ground, we 
failed.

Andrei
April 09, 2012
Re: std.benchmark ready for review. Manager sought after
Andrei Alexandrescu wrote:

>> For example by running more than one instance of the benchmarked
>> program in paralell and use the thereby gathered statistical 
>> routines to spread T into the additiv components A, Q and N.

> I disagree with running two benchmarks in parallel because that
> exposes them to even more noise (scheduling, CPU count, current
> machine load etc).

I did not mean to run two or more benchmarks in parallel but only 
more instances of the benchmarked program _without_ the environment 
supplied by the benchmakr. Of course may there be more noise. But if 
so, that noise is dependent on the additional running instances and 
to nothing else.

Now let T(n) be the time needed to run n instances in parallel on a 
single processor. According to your definition and your remark above:

§   T(n) = n*A + Q + N + P(n)

where P(n)>=0 and P(1)==0 is the additional noise with which the 
benchmarked program disturbs itself.

Please observe that Q and N eliminate themselves on subtraction:
§  T(2) - T(1) = A + P(2) - P(1)
§  T(3) - T(2) = A + P(3) - P(2)
...

P(n+1)-P(n) measures the sensitivity of the benchmarked program to 
its one noise. As you wrote yourself its value should be close to 
zero until the machine is close to falling flat.


> I don't understand the part of the sentence starting with "...use
> the thereby...", I'd be grateful if you elaborated. 

Ohh..., an unrecognized deletion. I have written:
| use the thereby gathered _data to feed_ statistical routines to
| spread T
as elaborated above.

-manfred
April 10, 2012
Re: std.benchmark ready for review. Manager sought after
Andrei Alexandrescu wrote:
> Hello,
> 
> 
> I finally found the time to complete std.benchmark. I got to a very
> simple API design, starting where I like it: one line of code.
> 
> Code is in the form of a pull request at
> https://github.com/D-Programming-Language/phobos/pull/529. (There's
> some noise in there caused by my git n00biness). Documentation is at
> http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.
> 
> If reasonable and at all possible, I'd like to bump the priority of
> this proposal. Clearly D's user base is highly interested in
> efficiency, and many of the upcoming libraries have efficiency a
> virtual part of their design. So we should get std.benchmark in soon
> and require that new addition come with benchmarks for their
> essential functionality.
> 
> My vision is that in the future Phobos will have a significant
> benchmarks battery, which will help improving Phobos and porting to
> new platforms.

I come to think the same.

How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.

When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is
a useful addition.
In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() => {
        tms t;
        times(&t);
        return t.tms_utime;
     }

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
 * User CPU time
 * System CPU time
 * Time spent in memory allocations
Count measurements:
 * Memory usage
 * L1/L2/L3 cache misses
 * Number of executed instructions
 * Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?

All in all the user interface has been greatly simplified.

Jens
April 10, 2012
Re: std.benchmark ready for review. Manager sought after
> I've analyzed this quite a bit at work and the average and median are  
> not very informative. You need the mode, but in most benchmarks the mode  
> is very close to the minimum, so using the minimum is even better.
>
How is it better?

> In speed measurements, all noise is additive (there's no noise that may  
> make a benchmark appear to run faster). There are also quite a few  
> outliers. Recording the average will include a fair amount of noise.
>
The benchmarked function itself could have a relevant variance, e.g.
when using randomized algorithms. You already get that from the LRU table
in the array append cache.
April 10, 2012
Re: std.benchmark ready for review. Manager sought after
09.04.2012 17:26, Andrei Alexandrescu пишет:
> On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
>> Why will recording the average produce so much noise?
>
> As I explained, the average takes noise and outliers (some very large,
> e.g. milliseconds in a benchmark that takes microseconds) into account.
> The minimum is shielded from this issue. In the limit, the minimum for
> infinitely many measurements is the sought-after result.
>
>> As I see, floating
>> point arithmetic is now used without a strong reason so it looks like a
>> time of this part isn't valuable. Or is it just a temporary solution?
>
> I don't understand "time of this part".

Looks like a misunderstanding because of my bad English: "Recording the 
average will include a fair amount of noise" means for me that a process 
of "recording the average" produces "a fair amount of noise" itself, not 
that "a result includes noise".


-- 
Денис В. Шеломовский
Denis V. Shelomovskij
April 14, 2012
Re: std.benchmark ready for review. Manager sought after
(Resuming a few past discussions about std.benchmark.)

On 4/9/12 1:26 PM, Manfred Nowak wrote:
> Andrei Alexandrescu wrote:
>> I disagree with running two benchmarks in parallel because that
>> exposes them to even more noise (scheduling, CPU count, current
>> machine load etc).
>
> I did not mean to run two or more benchmarks in parallel but only
> more instances of the benchmarked program _without_ the environment
> supplied by the benchmakr. Of course may there be more noise. But if
> so, that noise is dependent on the additional running instances and
> to nothing else.

I doubt that's the case. Once we discuss running several instances in 
parallel, I predict the vagaries of scheduling become much more prevalent.

> Now let T(n) be the time needed to run n instances in parallel on a
> single processor. According to your definition and your remark above:
>
> §   T(n) = n*A + Q + N + P(n)
>
> where P(n)>=0 and P(1)==0 is the additional noise with which the
> benchmarked program disturbs itself.
>
> Please observe that Q and N eliminate themselves on subtraction:
> §  T(2) - T(1) = A + P(2) - P(1)
> §  T(3) - T(2) = A + P(3) - P(2)
> ...
>
> P(n+1)-P(n) measures the sensitivity of the benchmarked program to
> its one noise. As you wrote yourself its value should be close to
> zero until the machine is close to falling flat.

It would be great if that applied, but I'm afraid things get 
significantly more complex. For example, the noise will have a 
per-processor component, and the processes will compete for cache among 
themselves.


Andrei
April 14, 2012
Re: std.benchmark ready for review. Manager sought after
On 4/10/12 5:40 AM, Jens Mueller wrote:
> How come that the times based relative report and the percentage based
> relative report are mixed in one result? And how do I choose which one
> I'd like to see in the output.

It's in the names. If the name of a benchmark starts with 
benchmark_relative_, then that benchmark is considered relative to the 
last non-relative benchmark. Using a naming convention allows complete 
automation in benchmarking a module.

I figure it's fine that all results appear together because the absence 
of data in the relative column clarifies which is which.

> When benchmarking you can measure different things at the same time. In
> this regard the current proposal is limited. It just measures wall clock
> time. I believe extending the StopWatch to measure e.g. user CPU time is
> a useful addition.

Generally I fear piling too much on StopWatch because every feature adds 
its own noise. But there's value in collecting the result of times(). 
What would be the Windows equivalent?

> In general, allowing user defined measurements would be great.
> E.g. to measure the time spend in user mode.
> () =>  {
>           tms t;
>           times(&t);
>           return t.tms_utime;
>        }
>
> Note, that this code does not need to be portable. You can also use
> version() else static assert.
>
> Things that come to mind that I'd like to measure.
> Time measurements:
>    * User CPU time
>    * System CPU time
>    * Time spent in memory allocations
> Count measurements:
>    * Memory usage
>    * L1/L2/L3 cache misses
>    * Number of executed instructions
>    * Number of memory allocations
>
> Of course wall clock time is the ultimate measure when benchmarking.
> But often you need to investigate further (doing more measurements).
>
> Do you think adding this is worthwhile?

Absolutely. I just fear about expanding the charter of the framework too 
much. Let's see:

* Memory usage is, I think, difficult in Windows.
* Estimating cache misses and executed instructions is significant research
* Number of memory allocations requires instrumenting druntime


Andrei
April 15, 2012
Re: std.benchmark ready for review. Manager sought after
There have been quite a few good comments, but no review manager offer. 
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than 
later because it can be used by subsequent submissions (many of which 
allege efficiency as a major advantage) to show they improve over the 
state of the art.

Thanks,

Andrei
1 2 3 4 5 6
Top | Discussion index | About this forum | D home