H. S. Teoh
Posted in reply to Siarhei Siamashka
 On Thu, Nov 10, 2022 at 09:45:56PM +0000, Siarhei Siamashka via Digitalmarsd wrote:
> On Thursday, 10 November 2022 at 17:46:59 UTC, Ali Çehreli wrote:
> > On 11/10/22 08:57, H. S. Teoh wrote:
> >
> > > if there's a fast way to compute the standard deviation of the multinomial
According to the Wikipedia page on multinomial distribution (linked by
Timon), it states that the variance of X_i for n rolls of a ksided dice
(with probability p_i), where i is a specific outcome, is:
Var(X_i) = n*p_i*(1  p_i)
Don't really understand where this formula came from (as I said, that page is way above my head), but we can make use of it. In this case, p_i = 1/k since we're assuming unbiased dice rolls. So this reduces to:
Var(X_i) = (n/k)*(1  1/k)
Now, since the standard deviation is just the square root of the variance, we have:
sigma = sqrt((n/k)*(1  1/k))
which is easy to compute.
Furthermore, the mean is just:
E(X_i) = n*p_i = n/k
We can then plug this into the BoxMuller transform (which I found on
Wikipedia :P):
import std.random, std.math;
const sigma = sqrt((n/k)*(1.0  1.0/k));
auto u1 = uniform01();
auto u2 = uniform01();
auto mag = sigma*sqrt(2*log(u1));
auto z0 = mag * cos(2*PI*u2) + n/k;
auto z1 = mag * sin(2*PI*u2) + n/k;
which sets z0 and z1 to be two random variables having mean n/k and variance Var(X_i). So they can be used to populate two elements of the resulting array, and should have the correct distribution.
Repeat this procedure until all array elements are populated. (If the output array length is odd, i.e., k is odd, we can just discard the last z1.) Then compute the difference d between sum of the array and n (the expected sum), which should be relatively small, and run d iterations of individual dice rolls and adjust the array (++ if the sum is too small, or  if the sum is too large) to make it sum exactly to n.
> > I am happy with my solution which includes scientific terms like normalDistributianness. :p
> >
> > [...]
> >
> > Error checking etc. are missing but hundred million dice throws in a split second... Now that's fast! :p
>
> That's fast, but unfortunately doesn't seem to be correct. Your code outputs something like this:
>
> [16548205, 16741274, 16600777, 16792301, 16585808, 16731635]
>
> While the original slow reference H. S. Teoh's implementation outputs:
>
> [16669388, 16665882, 16668019, 16671186, 16664718, 16660807]
>
> It's easy to see even with a naked eye that the statistical properties
> are obviously not the same (your implementation has a significantly
> higher variance). And for comparison, a [Python/SciPy
> implementation](https://github.com/scipy/scipy/issues/17388#issuecomment1310913940)
> outputs:
>
> [16664115, 16666144, 16677962, 16660692, 16663181, 16667906]
[...]
I implemented what I described above, and got very goodlooking results.
Output:

1 modules passed unittests
[16666837, 16666500, 16664692, 16666306, 16663568, 16672097] sum=100000000
[16661247, 16666674, 16668562, 16673476, 16665867, 16664174] sum=100000000
[16663070, 16665221, 16664850, 16668355, 16668246, 16670258] sum=100000000
[16671423, 16665663, 16667209, 16669164, 16663916, 16662625] sum=100000000
[16666194, 16669936, 16662697, 16666535, 16666671, 16667967] sum=100000000
[16665977, 16667094, 16662753, 16670737, 16666762, 16666677] sum=100000000
[16665518, 16666940, 16671829, 16664487, 16665344, 16665882] sum=100000000
[16658486, 16666671, 16667778, 16665901, 16667471, 16673693] sum=100000000
[16668149, 16669431, 16661462, 16668241, 16668286, 16664431] sum=100000000
[16660879, 16671401, 16659154, 16671245, 16667720, 16669601] sum=100000000
[16665213, 16670794, 16665401, 16667548, 16663484, 16667560] sum=100000000
[16663091, 16671196, 16669946, 16670791, 16661162, 16663814] sum=100000000
[16664305, 16670070, 16662123, 16663467, 16673867, 16666168] sum=100000000
[16664395, 16667406, 16657623, 16673838, 16670227, 16666511] sum=100000000
[16671401, 16662181, 16672923, 16657766, 16674160, 16661569] sum=100000000
[16663059, 16669574, 16662338, 16668319, 16670541, 16666169] sum=100000000
[16672366, 16662296, 16666526, 16664284, 16669201, 16665327] sum=100000000
[16665787, 16669038, 16666956, 16668728, 16660096, 16669395] sum=100000000
[16669162, 16669034, 16662788, 16663672, 16669127, 16666217] sum=100000000
[16667016, 16669649, 16661551, 16670882, 16668304, 16662598] sum=100000000

(I've no idea how accurate it is, but visual inspection certainly indicates that it's doing the right thing. And it's lightningfast. :P)
Code (compile with `dmd unittest main`):

import std;
/// BoxMuller transform.
double[2] gaussianPoint(double[2] mean, double deviation)
{
import std.math : cos, log, sin, sqrt, PI, round;
import std.random : uniform01;
auto u = uniform01();
auto v = uniform01();
auto x0 = sqrt(2.0*log(u)) * cos(2.0*PI*v);
auto y0 = sqrt(2.0*log(u)) * sin(2.0*PI*v);
double[2] result;
result[0] = mean[0] + cast(int)round(deviation * x0);
result[1] = mean[1] + cast(int)round(deviation * y0);
return result;
}
/// Roll ksided dice N times and tally each output.
int[] diceDistrib(int k, int N)
in (k > 0)
in (N > 0)
out (r; r.sum == N)
{
import std.math : sqrt, round;
import std.range : chunks;
import std.random : uniform;
const mean = cast(double)N / k;
const deviation = sqrt(mean * (1.0  1.0 / k));
double[2] means = [ mean, mean ];
// Populate output array with initial values with the correct mean and
// standard deviation.
auto result = new int[k];
foreach (chunk; result.chunks(2))
{
auto z = gaussianPoint(means, deviation);
chunk[0] = cast(int) round(z[0]);
if (chunk.length > 1)
chunk[1] = cast(int) round(z[1]);
}
// Tweak resulting array until it sums exactly to N.
auto total = result.sum;
while (total != N)
{
auto i = uniform(0, k);
if (total < N)
{
result[i]++;
total++;
}
else
{
result[i];
total;
}
}
return result;
}
unittest
{
foreach (i; 0 .. 20)
{
import std.stdio;
auto N = 100_000_000;
auto dist = diceDistrib(6, N);
writefln("%s sum=%d", dist, dist[].sum);
assert(dist.sum == N);
}
}

//
Now, just for fun, I added a writeln before the `while (total != N)` to print out just how big of a discrepancy to N the array sum is. Turns out, quite a bit: for N = 100_000_000 as above, the discrepancies range from approximately 25000 to 25000, with the typical discrepancy being about 34 digits long, which means that the code is spending quite a bit of time in that final loop. Which suggests that a possible improvement is to recursively run the initial approximation step, setting sub_N = (N  array.sum).
I'll leave that to a future iteration, though. Being able to compute a hundred million dice rolls in a split second is already good enough for what I need. :D
T

Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?
