Thread overview
Collect Statistics efficiently and easily
Sep 17, 2019
Brett
Sep 17, 2019
Paul Backus
Sep 19, 2019
Brett
September 17, 2019
Many times I have to get statistical info which is simply compute statistics on a data set that may be generating or already generated.

The code usually is

M = max(M, v);
m = min(m, v);

but other things like standard deviation, mean, etc might need to be computed.

This may need to be done on several data sets simultaneously.

is there any way that one could just compute them in one line that is efficient, probably using ranges? I'd like to avoid having to loop through a data set multiple times as it would be quite inefficient.



September 17, 2019
On Tuesday, 17 September 2019 at 01:53:39 UTC, Brett wrote:
> Many times I have to get statistical info which is simply compute statistics on a data set that may be generating or already generated.
>
> The code usually is
>
> M = max(M, v);
> m = min(m, v);
>
> but other things like standard deviation, mean, etc might need to be computed.
>
> This may need to be done on several data sets simultaneously.
>
> is there any way that one could just compute them in one line that is efficient, probably using ranges? I'd like to avoid having to loop through a data set multiple times as it would be quite inefficient.

You can use `std.algorithm.fold` to compute multiple results in a single pass:

auto stats = v.fold!(max, min);
M = stats[0];
m = stats[1];
September 19, 2019
On Tuesday, 17 September 2019 at 14:06:41 UTC, Paul Backus wrote:
> On Tuesday, 17 September 2019 at 01:53:39 UTC, Brett wrote:
>> Many times I have to get statistical info which is simply compute statistics on a data set that may be generating or already generated.
>>
>> The code usually is
>>
>> M = max(M, v);
>> m = min(m, v);
>>
>> but other things like standard deviation, mean, etc might need to be computed.
>>
>> This may need to be done on several data sets simultaneously.
>>
>> is there any way that one could just compute them in one line that is efficient, probably using ranges? I'd like to avoid having to loop through a data set multiple times as it would be quite inefficient.
>
> You can use `std.algorithm.fold` to compute multiple results in a single pass:
>
> auto stats = v.fold!(max, min);
> M = stats[0];
> m = stats[1];

That may work but I'm already iterating and doing it inside a loop.

I'm I'm specifically talking about is sort of abstract the computation of each statistic type.

If I were to convert my algorithm to be a range then maybe I could do similar to what you are saying but I would still require using more than min and max(such as avg, std, and others).

It may be viable but I'll have to think about it. I tend to find myself writing the same abstract code to compute the same statistics quite often(sometimes it deals with a history and sometimes not. E.g., I might want to compute the average and keep the last 5, or the 5 largest).