March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Juan Manuel Cabo | On 23 March 2012 17:53, Juan Manuel Cabo <juanmanuel.cabo@gmail.com> wrote:
> But I think the most important change is that I'm now showing
> the 95% and 99% confidence intervals. (For the confidence intervals
> to mean anything, please everyone, remember to control
> your variables (don't defrag and benchmark :-) !!) so that apples
> are still apples and don't become oranges, and make sure N>30).
>
> More info on histogram and confidence intervals in the usage help.
Dude, this is awesome. I tend to just use time, but if I was doing anything more complicated, I'd use this. I would suggest changing the name while you still can. avgtime is not that informative a name given that it now does more than just "Average" times.
--
James Miller
|
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Friday, 23 March 2012 at 05:16:20 UTC, Andrei Alexandrescu wrote: [.....] >> (man, the gaussian curve is everywhere, it never ceases to >> perplex me). > > I'm actually surprised. I'm working on benchmarking lately and the distributions I get are very concentrated around the minimum. > > Andrei Well, the shape of the curve depends a lot on how the random noise gets inside the measurement. I like 'ls -lR' because the randomness comes from everywhere, and its quite bell shaped. I guess there is a lot of I/O mess (even if I/O is all cached, there are lots of opportunities for kernel mutexes to mess everything I guess). When testing "/bin/sleep 0.5", it will be quite a pretty boring histogram. And I guess than when testing something thats only CPU bound and doesn't make too much syscalls, the shape is more concentrated in a few values. On the other hand, I'm getting some weird bimodal (two peaks) curves sometimes, like the one I put on the README.md. It's definitely because of my laptop's CPU throttling, because it went away when I disabled it (for the curious ones, in ubuntu 64bit, here is a way to disable throttling (WARNING: might get hot until you undo or reboot): echo 1600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq echo 1600000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq (yes my cpu is 1.6GHz, but it rocks). --jm |
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manfred Nowak | On Thursday, 22 March 2012 at 17:13:58 UTC, Manfred Nowak wrote:
> Juan Manuel Cabo wrote:
>
>> like the unix 'time' command
>
> `version linux' is missing.
>
> -manfred
Linux only for now. Will make it work in windows this weekend.
I hope that's what you meant.
--jm
|
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to James Miller | On Friday, 23 March 2012 at 06:51:48 UTC, James Miller wrote: > Dude, this is awesome. I tend to just use time, but if I was doing > anything more complicated, I'd use this. I would suggest changing the > name while you still can. avgtime is not that informative a name given > that it now does more than just "Average" times. > > -- > James Miller > Dude, this is awesome. Thanks!! I appreciate your feedback! > I would suggest changing the name while you still can. Suggestions welcome!! --jm |
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manfred Nowak | On Friday, 23 March 2012 at 05:51:40 UTC, Manfred Nowak wrote: > > | For samples, if it is known that they are drawn from a symmetric > | distribution, the sample mean can be used as an estimate of the > | population mode. I'm not printing the population mode, I'm printing the 'sample mode'. It has a very clear meaning: most frequent value. To have frequency, I group into 'bins' by precision: 12.345 and 12.3111 will both go to the 12.3 bin. > > and the program computes the variance as if the values of the sample > follow a normal distribution, which is symmetric. This program doesn't compute the variance. Maybe you are talking about another program. This program computes the standard deviation of the sample. The sample doesn't need to of any distribution to have a standard deviation. It is not a distribution parameter, it is a statistic. > Therefore the mode of the sample is of interest only, when the variance > is calculated wrongly. ??? The 'sample mode', 'median' and 'average' can quickly tell you something about the shape of the histogram, without looking at it. If the three coincide, then maybe you are in normal distribution land. The only place where I assume normal distribution is for the confidence intervals. And it's in the usage help. If you want to support estimating weird probability distributions parameters, forking and pull requests are welcome. Rewrites too. Good luck detecting distribution shapes!!!! ;-) > > -manfred PS: I should use the t student to make the confidence intervals, and for computing that I should use the sample standard deviation (/n-1), but that is a completely different story. The z normal with n>30 aproximation is quite good. (I would have to embed a table for the t student tail factors, pull reqs velcome). PS2: I now fixed the confusion with the confidence interval of the variable and the confidence interval of the mu average, I simply now show both. (release 0.4). PS3: Statistics estimate distribution parameters. --jm |
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Juan Manuel Cabo | On 23 March 2012 21:37, Juan Manuel Cabo <juanmanuel.cabo@gmail.com> wrote:
> PS: I should use the t student to make the confidence intervals,
> and for computing that I should use the sample standard
> deviation (/n-1), but that is a completely different story.
> The z normal with n>30 aproximation is quite good.
> (I would have to embed a table for the t student tail factors,
> pull reqs velcome).
If its possible to calculate it, then you can generate a table at compile-time using CTFE. Less error-prone, and controllable accuracy.
--
James Miller
|
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Juan Manuel Cabo | On 23/03/12 09:37, Juan Manuel Cabo wrote: > On Friday, 23 March 2012 at 05:51:40 UTC, Manfred Nowak wrote: >> >> | For samples, if it is known that they are drawn from a symmetric >> | distribution, the sample mean can be used as an estimate of the >> | population mode. > > I'm not printing the population mode, I'm printing the 'sample mode'. > It has a very clear meaning: most frequent value. To have frequency, > I group into 'bins' by precision: 12.345 and 12.3111 will both > go to the 12.3 bin. > >> >> and the program computes the variance as if the values of the sample >> follow a normal distribution, which is symmetric. > > This program doesn't compute the variance. Maybe you are talking > about another program. This program computes the standard deviation > of the sample. The sample doesn't need to of any distribution > to have a standard deviation. It is not a distribution parameter, > it is a statistic. > >> Therefore the mode of the sample is of interest only, when the variance >> is calculated wrongly. > > ??? > > The 'sample mode', 'median' and 'average' can quickly tell you > something about the shape of the histogram, without > looking at it. > If the three coincide, then maybe you are in normal distribution land. > > The only place where I assume normal distribution is for the > confidence intervals. And it's in the usage help. > > If you want to support estimating weird probability > distributions parameters, forking and pull requests are > welcome. Rewrites too. Good luck detecting distribution > shapes!!!! ;-) > > >> >> -manfred > > PS: I should use the t student to make the confidence intervals, > and for computing that I should use the sample standard > deviation (/n-1), but that is a completely different story. > The z normal with n>30 aproximation is quite good. > (I would have to embed a table for the t student tail factors, > pull reqs velcome). No, it's easy. Student t is in std.mathspecial. > > PS2: I now fixed the confusion with the confidence interval > of the variable and the confidence interval of the mu average, > I simply now show both. (release 0.4). > > PS3: Statistics estimate distribution parameters. > > --jm > > > |
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don Clugston | On 23/03/12 11:20, Don Clugston wrote:
> On 23/03/12 09:37, Juan Manuel Cabo wrote:
>> On Friday, 23 March 2012 at 05:51:40 UTC, Manfred Nowak wrote:
>>>
>>> | For samples, if it is known that they are drawn from a symmetric
>>> | distribution, the sample mean can be used as an estimate of the
>>> | population mode.
>>
>> I'm not printing the population mode, I'm printing the 'sample mode'.
>> It has a very clear meaning: most frequent value. To have frequency,
>> I group into 'bins' by precision: 12.345 and 12.3111 will both
>> go to the 12.3 bin.
>>
>>>
>>> and the program computes the variance as if the values of the sample
>>> follow a normal distribution, which is symmetric.
>>
>> This program doesn't compute the variance. Maybe you are talking
>> about another program. This program computes the standard deviation
>> of the sample. The sample doesn't need to of any distribution
>> to have a standard deviation. It is not a distribution parameter,
>> it is a statistic.
>>
>>> Therefore the mode of the sample is of interest only, when the variance
>>> is calculated wrongly.
>>
>> ???
>>
>> The 'sample mode', 'median' and 'average' can quickly tell you
>> something about the shape of the histogram, without
>> looking at it.
>> If the three coincide, then maybe you are in normal distribution land.
>>
>> The only place where I assume normal distribution is for the
>> confidence intervals. And it's in the usage help.
>>
>> If you want to support estimating weird probability
>> distributions parameters, forking and pull requests are
>> welcome. Rewrites too. Good luck detecting distribution
>> shapes!!!! ;-)
>>
>>
>>>
>>> -manfred
>>
>> PS: I should use the t student to make the confidence intervals,
>> and for computing that I should use the sample standard
>> deviation (/n-1), but that is a completely different story.
>> The z normal with n>30 aproximation is quite good.
>> (I would have to embed a table for the t student tail factors,
>> pull reqs velcome).
>
> No, it's easy. Student t is in std.mathspecial.
Aargh, I didn't get around to copying it in. But this should do it.
/** Inverse of Student's t distribution
*
* Given probability p and degrees of freedom nu,
* finds the argument t such that the one-sided
* studentsDistribution(nu,t) is equal to p.
*
* Params:
* nu = degrees of freedom. Must be >1
* p = probability. 0 < p < 1
*/
real studentsTDistributionInv(int nu, real p )
in {
assert(nu>0);
assert(p>=0.0L && p<=1.0L);
}
body
{
if (p==0) return -real.infinity;
if (p==1) return real.infinity;
real rk, z;
rk = nu;
if ( p > 0.25L && p < 0.75L ) {
if ( p == 0.5L ) return 0;
z = 1.0L - 2.0L * p;
z = betaIncompleteInv( 0.5L, 0.5L*rk, fabs(z) );
real t = sqrt( rk*z/(1.0L-z) );
if( p < 0.5L )
t = -t;
return t;
}
int rflg = -1; // sign of the result
if (p >= 0.5L) {
p = 1.0L - p;
rflg = 1;
}
z = betaIncompleteInv( 0.5L*rk, 0.5L, 2.0L*p );
if (z<0) return rflg * real.infinity;
return rflg * sqrt( rk/z - rk );
}
|
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manfred Nowak | On 3/23/12 12:51 AM, Manfred Nowak wrote:
> Andrei Alexandrescu wrote:
>
>> You may want to also print the mode of the distribution,
>> nontrivial but informative
>
> In case of this implementation and according to the given link: trivial
> and noninformative, because
>
> | For samples, if it is known that they are drawn from a symmetric
> | distribution, the sample mean can be used as an estimate of the
> | population mode.
>
> and the program computes the variance as if the values of the sample
> follow a normal distribution, which is symmetric.
>
> Therefore the mode of the sample is of interest only, when the variance
> is calculated wrongly.
Again, benchmarks I've seen are always asymmetric. Not sure why those shown here are symmetric. The mode should be very close to the minimum (and in fact I think taking the minimum is a pretty good approximation of the sought-after time).
Andrei
|
March 23, 2012 Re: avgtime - Small D util for your everyday benchmarking needs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Juan Manuel Cabo | On 3/23/12 3:02 AM, Juan Manuel Cabo wrote:
> On Friday, 23 March 2012 at 05:16:20 UTC, Andrei Alexandrescu wrote:
> [.....]
>>> (man, the gaussian curve is everywhere, it never ceases to
>>> perplex me).
>>
>> I'm actually surprised. I'm working on benchmarking lately and the
>> distributions I get are very concentrated around the minimum.
>>
>> Andrei
>
>
> Well, the shape of the curve depends a lot on
> how the random noise gets inside the measurement.
[snip]
Hmm, well the way I see it, the observed measurements have the following composition:
X = T + Q + N
where T > 0 (a constant) is the "real" time taken by the processing, Q > 0 is the quantization noise caused by the limited resolution of the clock (can be considered 0 if the resolution is much smaller than the actual time), and N is noise caused by a variety of factors (other processes, throttling, interrupts, networking, memory hierarchy effects, and many more). The challenge is estimating T given a bunch of X samples.
N can be probably approximated to a Gaussian, although for short timings I noticed it's more like bursts that just cause outliers. But note that N is always positive (therefore not 100% Gaussian), i.e. there's no way to insert some noise that makes the code seem artificially faster. It's all additive.
Taking the mode of the distribution will estimate T + mode(N), which is informative because after all there's no way to eliminate noise. However, if the focus is improving T, we want an estimate as close to T as possible. In the limit, taking the minimum over infinitely many measurements of X would yield T.
Andrei
|
Copyright © 1999-2021 by the D Language Foundation