[OT] Generating distribution of N dice rolls (page 2)

On Fri, Nov 11, 2022 at 10:52:47AM +0000, Siarhei Siamashka via Digitalmars-d wrote: > On Thursday, 10 November 2022 at 23:15:24 UTC, H. S. Teoh wrote: > > Being able to compute a hundred million dice rolls in a split second is already good enough for what I need. :-D > > How important for you is to actually have a statistically correct solution for this particular problem? > > If something is off, then this may be eventually discovered by somebody in the future. Here's one famous example: https://www.wondriumdaily.com/gregor-mendel-fake-data/ Relax, this isn't for generating fake data. :-D It's for a simulation, and actually my use case mainly involves small values of N. So technically I don't need to optimize it to this extent; it's just a nice-to-have and a fun exercise to make it resistant to performance degradation by unusually large inputs. T -- Mediocrity has been pushed to extremes.

On Friday, 11 November 2022 at 13:27:45 UTC, H. S. Teoh wrote: > On Fri, Nov 11, 2022 at 10:52:47AM +0000, Siarhei Siamashka via Digitalmars-d wrote: > > Relax, this isn't for generating fake data. :-D It's for a simulation, and actually my use case mainly involves small values of N. So technically I don't need to optimize it to this extent; it's just a nice-to-have and a fun exercise to make it resistant to performance degradation by unusually large inputs. > > > T I guess it simulates the simulator.

On 11.11.22 14:27, H. S. Teoh wrote: > On Fri, Nov 11, 2022 at 10:52:47AM +0000, Siarhei Siamashka via Digitalmars-d wrote: >> On Thursday, 10 November 2022 at 23:15:24 UTC, H. S. Teoh wrote: >>> Being able to compute a hundred million dice rolls in a split second >>> is already good enough for what I need. :-D >> >> How important for you is to actually have a statistically correct >> solution for this particular problem? >> >> If something is off, then this may be eventually discovered by >> somebody in the future. Here's one famous example: >> https://www.wondriumdaily.com/gregor-mendel-fake-data/ > > Relax, this isn't for generating fake data. :-D It's for a simulation, > and actually my use case mainly involves small values of N. So > technically I don't need to optimize it to this extent; it's just a > nice-to-have and a fun exercise to make it resistant to performance > degradation by unusually large inputs. > > > T > I think the question was more: does it matter to you whether or not the simulation models an accurate distribution of results? What kind of code is consuming those dice roll frequency tables? Note that even though the results are random, the distribution of results itself is not random and can in principle be compared precisely. It's pretty clear that you are not getting the right distribution, although I have not investigated in detail how much what you simulate deviates from the true multinomial distribution that you actually attempt to simulate. Still, I guess for some use cases, the deviation would be significant.

November 12, 2022

Re: [OT] Generating distribution of N dice rolls

Posted by Siarhei Siamashka
in reply to H. S. Teoh

Permalink

Siarhei Siamashka

Posted in reply to H. S. Teoh

Permalink

On Thursday, 10 November 2022 at 23:15:24 UTC, H. S. Teoh wrote:

According to the Wikipedia page on multinomial distribution (linked by
Timon), it states that the variance of X_i for n rolls of a k-sided dice
(with probability p_i), where i is a specific outcome, is:

Var(X_i) = np_i(1 - p_i)

Don't really understand where this formula came from (as I said, that page is way above my head), but we can make use of it.

This is where things take a wrong turn. In reality you need more than just a matching mean and variance to correctly simulate some arbitrary probability distribution: https://en.wikipedia.org/wiki/Moment_(mathematics)

Every n-th moment needs to be correct too. Some of these moments have special names (n=1 mean, n=2 variance, n=3 skewness, n=4 kurtosis, ...). If you only take care of the mean and variance for simulating a random distribution, then it's somewhat similar to approximating "sin(x) = x - (x^3 / 3!)" via taking only the first few terms of the Taylor series.

I wonder what's the reason for not using the mir-random library like suggested in the early comments? Do you want to avoid having an extra dependency?

/+dub.sdl:
dependency "mir-random" version="~>2.2.19"
+/
import std, mir.random.engine, mir.random.ndvariable;

uint[k] diceDistrib(uint k)(uint N)
  in(k > 0)
  in(N > 0)
  out(r; r[].sum == N)
{
  uint[k] result;
  double[k] p;
  p[] = 1.0 / k;
  auto rv = multinomialVar(N, p);
  rv(rne, result[]);
  return result;
}

void main()
{
  writeln(diceDistrib!6(100_000_000));
}

Forums