Jump to page: 1 2
Thread overview
D Mir: standard deviation speed
Jul 14, 2020
tastyminerals
Jul 14, 2020
jmh530
Jul 15, 2020
tastyminerals
Jul 15, 2020
jmh530
Jul 15, 2020
9il
Jul 15, 2020
jmh530
Jul 15, 2020
9il
Jul 15, 2020
jmh530
Jul 15, 2020
9il
Jul 15, 2020
9il
Jul 15, 2020
tastyminerals
Jul 15, 2020
9il
Jul 15, 2020
9il
Jul 15, 2020
tastyminerals
Jul 15, 2020
9il
Jul 16, 2020
tastyminerals
July 14, 2020
I am trying to implement standard deviation calculation in Mir for benchmark purposes.
I have two implementations. One is the straightforward std = sqrt(mean(abs(x - x.mean())**2)) and the other follows Welford's algorithm for computing variance (as described here: https://www.johndcook.com/blog/standard_deviation/).

However, although the first implementation should be less efficient / slower, the benchmarking results show a startling difference in its favour. I'd like to understand if I am doing something wrong and would appreciate some explanation.

# Naive std
import std.math : abs;
import mir.ndslice;
import mir.math.common : pow, sqrt, fastmath;
import mir.math.sum : sum;
import mir.math.stat : mean;

@fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
{
    pragma(inline, false);
    if (flatMatrix.empty)
        return 0.0;
    double n = cast(double) flatMatrix.length;
    double mu = flatMatrix.mean;
    return (flatMatrix.map!(a => (a - mu).abs ^^ 2).sum!"fast" / n).sqrt;
}


# std with Welford's variance
@fastmath double sdWelford(T)(Slice!(T*, 1) flatMatrix)
{
    pragma(inline, false);
    if (flatMatrix.empty)
        return 0.0;

    double m0 = 0.0;
    double m1 = 0.0;
    double s0 = 0.0;
    double s1 = 0.0;
    double n = 0.0;
    foreach (x; flatMatrix.field)
    {
        ++n;
        m1 = m0 + (x - m0) / n;
        s1 = s0 + (x - m0) * (x - m1);
        m0 = m1;
        s0 = s1;
    }
    // switch to n - 1 for sample variance
    return (s1 / n).sqrt;
}

Benchmarking:

Naive std (1k loops):
  std of [60, 60] matrix 0.001727
  std of [300, 300] matrix 0.043452
  std of [600, 600] matrix 0.182177
  std of [800, 800] matrix 0.345367

std with Welford's variance (1k loops):
  std of [60, 60] matrix 0.0225476
  std of [300, 300] matrix 0.534528
  std of [600, 600] matrix 2.0714
  std of [800, 800] matrix 3.60142

July 14, 2020
On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
> I am trying to implement standard deviation calculation in Mir for benchmark purposes.
> I have two implementations. One is the straightforward std = sqrt(mean(abs(x - x.mean())**2)) and the other follows Welford's algorithm for computing variance (as described here: https://www.johndcook.com/blog/standard_deviation/).
>
> However, although the first implementation should be less efficient / slower, the benchmarking results show a startling difference in its favour. I'd like to understand if I am doing something wrong and would appreciate some explanation.
>
> # Naive std
> import std.math : abs;
> import mir.ndslice;
> import mir.math.common : pow, sqrt, fastmath;
> import mir.math.sum : sum;
> import mir.math.stat : mean;
>
> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
> {
>     pragma(inline, false);
>     if (flatMatrix.empty)
>         return 0.0;
>     double n = cast(double) flatMatrix.length;
>     double mu = flatMatrix.mean;
>     return (flatMatrix.map!(a => (a - mu).abs ^^ 2).sum!"fast" / n).sqrt;
> }
>
>
> # std with Welford's variance
> @fastmath double sdWelford(T)(Slice!(T*, 1) flatMatrix)
> {
>     pragma(inline, false);
>     if (flatMatrix.empty)
>         return 0.0;
>
>     double m0 = 0.0;
>     double m1 = 0.0;
>     double s0 = 0.0;
>     double s1 = 0.0;
>     double n = 0.0;
>     foreach (x; flatMatrix.field)
>     {
>         ++n;
>         m1 = m0 + (x - m0) / n;
>         s1 = s0 + (x - m0) * (x - m1);
>         m0 = m1;
>         s0 = s1;
>     }
>     // switch to n - 1 for sample variance
>     return (s1 / n).sqrt;
> }
>
> Benchmarking:
>
> Naive std (1k loops):
>   std of [60, 60] matrix 0.001727
>   std of [300, 300] matrix 0.043452
>   std of [600, 600] matrix 0.182177
>   std of [800, 800] matrix 0.345367
>
> std with Welford's variance (1k loops):
>   std of [60, 60] matrix 0.0225476
>   std of [300, 300] matrix 0.534528
>   std of [600, 600] matrix 2.0714
>   std of [800, 800] matrix 3.60142

It would be helpful to provide a link.

You should only need one accumulator for mean and centered sum of squares. See the python example under the Welford example
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
This may have broken optimization somehow.

variance and standardDeviation were recently added to mir.math.stat. They have the option to switch between Welford's algorithm and the others. What you call as the naive algorithm, is VarianceAlgo.twoPass and the Welford algorithm can be toggled with VarianceAlgo.online, which is the default option. It also would be interesting if you re-did the analysis with the built-in mir functions.

There are some other small differences between your implementation and the one in mir, beyond the issue discussed above. You take the absolute value before the square root and force the use of sum!"fast". Another difference is VarianceAlgo.online in mir is using a precise calculation of the mean rather than the fast update that Welford uses. This may have a modest impact on performance, but should provide more accurate results.
July 15, 2020
On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)

@fastmath shouldn't be really used with summation algorithms except the `"fast"` version of them. Otherwise, they may or may not behave like "fast".
July 15, 2020
On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
> On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
>> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
>
> @fastmath shouldn't be really used with summation algorithms except the `"fast"` version of them. Otherwise, they may or may not behave like "fast".

`mean` is the summation algorithm too
July 15, 2020
On Tuesday, 14 July 2020 at 19:36:21 UTC, jmh530 wrote:
> On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
>>   [...]
>
> It would be helpful to provide a link.
>
> You should only need one accumulator for mean and centered sum of squares. See the python example under the Welford example
> https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
> This may have broken optimization somehow.
>
> variance and standardDeviation were recently added to mir.math.stat. They have the option to switch between Welford's algorithm and the others. What you call as the naive algorithm, is VarianceAlgo.twoPass and the Welford algorithm can be toggled with VarianceAlgo.online, which is the default option. It also would be interesting if you re-did the analysis with the built-in mir functions.
>
> There are some other small differences between your implementation and the one in mir, beyond the issue discussed above. You take the absolute value before the square root and force the use of sum!"fast". Another difference is VarianceAlgo.online in mir is using a precise calculation of the mean rather than the fast update that Welford uses. This may have a modest impact on performance, but should provide more accurate results.

Ok, the wiki page looks more informative, I shall look into my Welford implementation.

I've just used standardDeviation from Mir and it showed even worse results than both of the examples above.

Here is a (WIP) project as of now.
Line 160 in https://github.com/tastyminerals/mir_benchmarks_2/blob/master/source/basic_ops.d

std of [60, 60] matrix 0.0389492 (> 0.001727)
std of [300, 300] matrix 1.03592 (> 0.043452)
std of [600, 600] matrix 4.2875 (> 0.182177)
std of [800, 800] matrix 7.9415 (> 0.345367)

July 15, 2020
On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
> On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
>> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
>
> @fastmath shouldn't be really used with summation algorithms except the `"fast"` version of them. Otherwise, they may or may not behave like "fast".

Good to know. So, it's fine to use it with sum!"fast" but better avoid it for general purposes.

July 15, 2020
On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals wrote:
> On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
>> On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
>>> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
>>
>> @fastmath shouldn't be really used with summation algorithms except the `"fast"` version of them. Otherwise, they may or may not behave like "fast".
>
> Good to know. So, it's fine to use it with sum!"fast" but better avoid it for general purposes.

They both are more precise by default.
July 15, 2020
On Wednesday, 15 July 2020 at 06:55:51 UTC, 9il wrote:
> On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals wrote:
>> On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
>>> On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
>>>> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
>>>
>>> @fastmath shouldn't be really used with summation algorithms except the `"fast"` version of them. Otherwise, they may or may not behave like "fast".
>>
>> Good to know. So, it's fine to use it with sum!"fast" but better avoid it for general purposes.
>
> They both are more precise by default.

This was a reply to the other your post in the thread, sorry. Mir algorithms are more precise by default then the algorithms you have provided.
July 15, 2020
On Wednesday, 15 July 2020 at 06:57:21 UTC, 9il wrote:
> On Wednesday, 15 July 2020 at 06:55:51 UTC, 9il wrote:
>> On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals wrote:
>>> On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
>>>> On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
>>>>> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
>>>>
>>>> @fastmath shouldn't be really used with summation algorithms except the `"fast"` version of them. Otherwise, they may or may not behave like "fast".
>>>
>>> Good to know. So, it's fine to use it with sum!"fast" but better avoid it for general purposes.
>>
>> They both are more precise by default.
>
> This was a reply to the other your post in the thread, sorry. Mir algorithms are more precise by default then the algorithms you have provided.

Right. Is this why standardDeviation is significantly slower?


July 15, 2020
On Wednesday, 15 July 2020 at 07:34:59 UTC, tastyminerals wrote:
> On Wednesday, 15 July 2020 at 06:57:21 UTC, 9il wrote:
>> On Wednesday, 15 July 2020 at 06:55:51 UTC, 9il wrote:
>>> On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals wrote:
>>>> On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
>>>>> [...]
>>>>
>>>> Good to know. So, it's fine to use it with sum!"fast" but better avoid it for general purposes.
>>>
>>> They both are more precise by default.
>>
>> This was a reply to the other your post in the thread, sorry. Mir algorithms are more precise by default then the algorithms you have provided.
>
> Right. Is this why standardDeviation is significantly slower?

Yes. It allows you to pick a summation option, you can try others then default in benchmarks.
« First   ‹ Prev
1 2