Thread overview
[OT] Generating distribution of N dice rolls
Nov 10, 2022
H. S. Teoh
Nov 10, 2022
Ali Çehreli
Nov 10, 2022
Sergey
Nov 10, 2022
Timon Gehr
Nov 10, 2022
jmh530
Nov 10, 2022
Timon Gehr
Nov 10, 2022
Timon Gehr
Nov 10, 2022
jmh530
Nov 21, 2022
Abdulhaq
Nov 21, 2022
H. S. Teoh
November 09, 2022
This is technically OT, but I thought I'd pick the smart brains here for my project, which happens to be written in D. ;-)

Basically, I want to write a function that takes 2 uint arguments k and N, and simulates rolling N k-sided dice and counting how many 1's, 2's, 3's, ... k's were rolled. Something like this:

	uint[k] diceDistrib(uint k)(uint N)
		in(k > 0)
		in(N > 0)
		out(r; r[].sum == N)
	{
		uint[k] result;
		foreach (i; 0 .. N) {
			result[uniform(0, k)]++;
		}
		return result;
	}

The above code works and does what I want, but since N may be large, I'd like to refactor the code to loop over k instead of N. I.e., instead of actually rolling N dice and tallying the results, the function would generate the elements of the output array directly, such that the distribution of the array elements follow the same probabilities as the above code.

Note that in all cases, the output array must sum to N; it is not enough to merely simulate the roll distribution probabilistically.

Any ideas?  (Or links if this is a well-studied problem with a known
solution.)

<ObDReference> I love how D's new contract syntax makes it so conducive to expressing programming problem requirements. ;-) </ObDReference>


T

-- 
When you breathe, you inspire. When you don't, you expire. -- The Weekly Reader
November 09, 2022
On 11/9/22 18:10, H. S. Teoh wrote:

> Note that in all cases, the output array must sum to N;

I don't know what the standard deviation should be and whether it is a function of N but the following makes sense:

- for all k, pick a random number with mean == N/k and standard deviation == ???

- if the sum > N, remove an item from a random bucket for sum - N times

- if the sum < N, add an item to a random bucket for N - sum times

Ali

November 10, 2022
On Thursday, 10 November 2022 at 02:10:32 UTC, H. S. Teoh wrote:
> This is technically OT, but I thought I'd pick the smart brains here for my project, which happens to be written in D. ;-)
>
> Basically, I want to write a function that takes 2 uint arguments k and N, and simulates rolling N k-sided dice and counting how many 1's, 2's, 3's, ... k's were rolled. Something like this:
>
> 	uint[k] diceDistrib(uint k)(uint N)
> 		in(k > 0)
> 		in(N > 0)
> 		out(r; r[].sum == N)
> 	{
> 		uint[k] result;
> 		foreach (i; 0 .. N) {
> 			result[uniform(0, k)]++;
> 		}
> 		return result;
> 	}
>
> The above code works and does what I want, but since N may be large, I'd like to refactor the code to loop over k instead of N. I.e., instead of actually rolling N dice and tallying the results, the function would generate the elements of the output array directly, such that the distribution of the array elements follow the same probabilities as the above code.
>
> Note that in all cases, the output array must sum to N; it is not enough to merely simulate the roll distribution probabilistically.
>
> Any ideas?  (Or links if this is a well-studied problem with a known
> solution.)
>
> <ObDReference> I love how D's new contract syntax makes it so conducive to expressing programming problem requirements. ;-) </ObDReference>
>
>
> T

They should have uniform distribution with well known mean and std. To have exactly N rolls you can estimate with distribution function numbers for (k-1) sides (let their sum will be M), and the rest (N-M) put into k’s side

https://scientificgems.wordpress.com/2015/11/03/mathematics-in-action-probability/

November 10, 2022
On 10.11.22 03:10, H. S. Teoh wrote:
> This is technically OT, but I thought I'd pick the smart brains here for
> my project, which happens to be written in D. ;-)
> 
> Basically, I want to write a function that takes 2 uint arguments k and
> N, and simulates rolling N k-sided dice and counting how many 1's, 2's,
> 3's, ... k's were rolled. Something like this:
> 
> 	uint[k] diceDistrib(uint k)(uint N)
> 		in(k > 0)
> 		in(N > 0)
> 		out(r; r[].sum == N)
> 	{
> 		uint[k] result;
> 		foreach (i; 0 .. N) {
> 			result[uniform(0, k)]++;
> 		}
> 		return result;
> 	}
> 
> The above code works and does what I want, but since N may be large, I'd
> like to refactor the code to loop over k instead of N. I.e., instead of
> actually rolling N dice and tallying the results, the function would
> generate the elements of the output array directly, such that the
> distribution of the array elements follow the same probabilities as the
> above code.
> 
> Note that in all cases, the output array must sum to N; it is not enough
> to merely simulate the roll distribution probabilistically.
> 
> Any ideas?  (Or links if this is a well-studied problem with a known
> solution.)
> ...

The distribution of the output is multinomial(n,k,[1/k,1/k,...,1/k]), i.e., n trials, k events, and the events follow an uniform distribution.
https://en.wikipedia.org/wiki/Multinomial_distribution

It is at least as hard to sample as binomial(n,1/k), because that is the marginal distribution of each component of the result.

I guess one approach is this (if you can find a way to sample from binomial distributions that is good enough for your use case):

auto tot=n;
foreach(i;0..k){
    result[i]=binomial(tot,1.0/(k-i));
    tot-=result[i];
}
return result;

This samples the result one component at a time.

> <ObDReference> I love how D's new contract syntax makes it so conducive
> to expressing programming problem requirements. ;-) </ObDReference>
> 

:)

November 10, 2022

On Thursday, 10 November 2022 at 10:45:50 UTC, Timon Gehr wrote:

>

[snip]

The distribution of the output is multinomial(n,k,[1/k,1/k,...,1/k]), i.e., n trials, k events, and the events follow an uniform distribution.
https://en.wikipedia.org/wiki/Multinomial_distribution

It is at least as hard to sample as binomial(n,1/k), because that is the marginal distribution of each component of the result.

I guess one approach is this (if you can find a way to sample from binomial distributions that is good enough for your use case):

[snip]

mir-random has the ability to sample from both the binomial [1] and multinomial [2] distributions (you would want the multinomial here, unless I'm missing something). I don't know how well mir-random handles large values of N or K, but there are also well-known normal and poisson approximations of the binomial distribution (I make some use of them here [3] for binomials) at least.

I wasn't exactly sure what you meant by the events follow a uniform distribution, but it's certainly possible to simulate from the multinomial to get the counts. You should also be able to calculate the PMF for it to compare the simulations against. The PMF shouldn't be uniform.

[1] http://mir-random.libmir.org/mir_random_variable.html#BinomialVariable
[2] http://mir-random.libmir.org/mir_random_ndvariable.html#MultinomialVariable
[3] https://github.com/libmir/mir-stat/blob/master/source/mir/stat/distribution/binomial.d

November 10, 2022
On 10.11.22 12:59, jmh530 wrote:
> On Thursday, 10 November 2022 at 10:45:50 UTC, Timon Gehr wrote:
>> [snip]
>>
>> The distribution of the output is multinomial(n,k,[1/k,1/k,...,1/k]), i.e., n trials, k events, and the events follow an uniform distribution.
>> https://en.wikipedia.org/wiki/Multinomial_distribution
>>
>> It is at least as hard to sample as binomial(n,1/k), because that is the marginal distribution of each component of the result.
>>
>> I guess one approach is this (if you can find a way to sample from binomial distributions that is good enough for your use case):
>>
>> [snip]
> 
> mir-random has the ability to sample from both the binomial [1] and multinomial [2] distributions (you would want the multinomial here, unless I'm missing something).

Well, I said that he needs to sample from a multinomial distribution. I just pointed out that you can sample from multinomial(n,k,...) by sampling from binomial distributions k times, and that to be able to sample from multinomial, you need to be able to sample from binomial anyway.

It turns out mir-random is doing precisely what I suggested:
https://github.com/libmir/mir-random/blob/master/source/mir/random/ndvariable.d#L344-L365

> I don't know how well mir-random handles large values of N or K,

FWIW this is what it does (haven't looked at it closely though): https://github.com/libmir/mir-random/blob/master/source/mir/random/variable.d#L1730-L1770

> but there are also well-known normal and poisson approximations of the binomial distribution (I make some use of them here [3] for binomials) at least.
> ...

Well, that seems a bit more involved. In case that is more accurate/efficient than mir-random, I guess one can sample from multinomial using that

> I wasn't exactly sure what you meant by the events follow a uniform distribution,

A multinomial is the distribution of a frequency table obtained after drawing n times from a categorical distribution. The OP is only interested in the special case where that categorical distribution is uniform. n is the number of samples, k is the number of events. It's the terminology used by the Wikipedia article I linked.

> but it's certainly possible to simulate from the multinomial to get the counts. You should also be able to calculate the PMF for it to compare the simulations against. The PMF shouldn't be uniform.
> ...

Well, it should be multinomial. :)

> 
> [1] http://mir-random.libmir.org/mir_random_variable.html#BinomialVariable
> [2] http://mir-random.libmir.org/mir_random_ndvariable.html#MultinomialVariable
> [3] https://github.com/libmir/mir-stat/blob/master/source/mir/stat/distribution/binomial.d

November 10, 2022
On 10.11.22 13:21, Timon Gehr wrote:
> n is the number of samples, k is the number of events. It's the terminology used by the Wikipedia article I linked.

Actually it uses "trials" instead of "samples".
November 10, 2022
On Thursday, 10 November 2022 at 12:21:30 UTC, Timon Gehr wrote:
> [snip]
> A multinomial is the distribution of a frequency table obtained after drawing n times from a categorical distribution. The OP is only interested in the special case where that categorical distribution is uniform. n is the number of samples, k is the number of events. It's the terminology used by the Wikipedia article I linked.
> [snip]

Ah, I see what you're saying now. When p = 1 / K, a categorical distribution is the same as a uniform distribution. I tend to think of multinomial as a generalization of binomial, but it's really a generalization of both binomial and categorical.
November 21, 2022
On Monday, 21 November 2022 at 10:44:48 UTC, Cabal Powel wrote:
> On Thursday, 10 November 2022 at 02:10:32 UTC, H. S. Teoh wrote:

>
> Hope this will help you.

The original article that you copied this from has some useful images:

https://towardsdatascience.com/modelling-the-probability-distributions-of-dice-b6ecf87b24ea


November 21, 2022
On Mon, Nov 21, 2022 at 11:08:24AM +0000, Abdulhaq via Digitalmars-d wrote:
> On Monday, 21 November 2022 at 10:44:48 UTC, Cabal Powel wrote:
> > On Thursday, 10 November 2022 at 02:10:32 UTC, H. S. Teoh wrote:
> 
> > 
> > Hope this will help you.
> 
> The original article that you copied this from has some useful images:
> 
> https://towardsdatascience.com/modelling-the-probability-distributions-of-dice-b6ecf87b24ea

It's a copy-n-paste spambot promoting a link and masquerading as something legitimate by posting (an incomplete version of) a relevant article.


T

-- 
In a world without fences, who needs Windows and Gates? -- Christian Surchi