November 12, 2005
Carlos Santander wrote:
> Niko Korhonen escribió:
> 
>>
>> Here is some example code:
>>
>> import std.random;
>>
>> void main()
>> {
>>   // Range 1-100
>>   uint r = cast(uint)((rand() / cast(double)uint.max) * 99.0) + 1;
>> }
>>
>> Modulo arithmetics would be simpler (uint r = rand() % 99 + 1), but the randomness aspect might suffer from that.
>>
> 
> I'm not familiar with what you just said, so I have to ask: what are you talking about? What's the possible suffering?
> 

I think it is because if you use modulo arithmetics, the numbers in the range from 0 to (uint.max % rangelength) occur 1 times more often that the numbers from (uint.max % rangelength + 1) to rangelength-1.
For example lets suppose uint.max = 255 and rangelength = 100:

0-55  : There are 3 possible matches (example for 18: 18, 118, 218)
56-99 : There are 2 possible matches (example for 58: 58, 158)

This has a very small impact for small ranges since uint.max is very big.


-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
November 12, 2005
Bruno Medeiros wrote:

[...]
> 0-55  : There are 3 possible matches
>        (example for 18: 18, 118, 218)
> 56-99 : There are 2 possible matches
>        (example for 58: 58, 158)
[...]

Quite true, because you enumerate all the 256 cases, 56*3+44*2=256, you describe the distribution correctly.

But now have a closer look on the computation with casting to double as suggested by the first answerer:

>> uint r= cast(uint)((rand() / cast(double)uint.max) * 99.0) + 1;

The value of the subexpression `rand() / cast(double)uint.max'
equals `1.0' at most once, because if `rand()' is not equal to
`uint.max' the value of this subexpression is less than `1.0'.
Therefore `100', the largest integer value form the given range,
is drawn only once.

Continuing your example with `uint.max==255' yields, that the remaining 255 cases of 2 respectiveley 3 possible matches occur 42 respectiveley 57 times, because 57*3+42*2=255.

Whereas with the modulo computation one knows which numbers are preferred, in the example the numbers from the range [0..55], the numbers preferred ar unknown in case of casting to double: it depends on the rounding during the cast, which numbers are preferred.

Is it even possible, that the rounding errors accumulate and that some numbers will even occur 4 times, whereas others only occur once? Who can prove, that this will not happen?

In total: under the assumption, that `uint.max' is big enough the probability, that the `100' is drawn at random in the casting-to-double-suggestion is near to zero and the remaining numbers are drawn with nearly the same probability, which underlies a variation. Whereas with the modulo-suggestion all numbers are  drawn with nearly the same probability with even less variation.

Quite a difference, isn't it?

-manfred
November 12, 2005
Manfred Nowak escribió:
> Bruno Medeiros wrote:
> 
> [...]
> 
>>0-55  : There are 3 possible matches
>>       (example for 18: 18, 118, 218)
>>56-99 : There are 2 possible matches
>>       (example for 58: 58, 158) 
> 
> [...]
> 
> Quite true, because you enumerate all the 256 cases,
> 56*3+44*2=256, you describe the distribution correctly. 
> 
> ...
> 
> -manfred

Ok, I understand. Thanks.

-- 
Carlos Santander Bernal
November 13, 2005
"Niko Korhonen" <niktheblak@hotmail.com> wrote in message news:dl1tpv$aam$1@digitaldaemon.com...
> import std.random;
>
> void main()
> {
>    // Range 1-100
>    uint r = cast(uint)((rand() / cast(double)uint.max) * 99.0) + 1;
> }
>
> Modulo arithmetics would be simpler (uint r = rand() % 99 + 1), but the
> randomness aspect might suffer from that.

Using the integer arithmetic method is better (i.e. faster and simpler) and it does not suffer from randomness problems.

And actually, it should be:

(uint r = rand() % 100 + 1)


November 13, 2005
"Bruno Medeiros" <daiphoenixNO@SPAMlycos.com> wrote in message news:dl5306$30ec$1@digitaldaemon.com...
> I think it is because if you use modulo arithmetics, the numbers in the
> range from 0 to (uint.max % rangelength) occur 1 times more often that
> the numbers from (uint.max % rangelength + 1) to rangelength-1.
> For example lets suppose uint.max = 255 and rangelength = 100:
>
> 0-55  : There are 3 possible matches (example for 18: 18, 118, 218)
> 56-99 : There are 2 possible matches (example for 58: 58, 158)
>
> This has a very small impact for small ranges since uint.max is very big.

True, however, the floating point version has the same issue.


November 14, 2005
Manfred Nowak wrote:
...
>>>uint r= cast(uint)((rand() / cast(double)uint.max) * 99.0) + 1;
> 
> 
> The value of the subexpression `rand() / cast(double)uint.max'
> equals `1.0' at most once, because if `rand()' is not equal to
> `uint.max' the value of this subexpression is less than `1.0'.
> Therefore `100', the largest integer value form the given range,
> is drawn only once. 
> 
> Continuing your example with `uint.max==255' yields, that the
> remaining 255 cases of 2 respectiveley 3 possible matches occur 42
> respectiveley 57 times, because 57*3+42*2=255. 
> 
> Whereas with the modulo computation one knows which numbers are
> preferred, in the example the numbers from the range [0..55], the
> numbers preferred ar unknown in case of casting to double: it
> depends on the rounding during the cast, which numbers are
> preferred. 
> 
> Is it even possible, that the rounding errors accumulate and that
> some numbers will even occur 4 times, whereas others only occur
> once? Who can prove, that this will not happen?
> 
> In total: under the assumption, that `uint.max' is big enough the
> probability, that the `100' is drawn at random in the
> casting-to-double-suggestion is near to zero and the remaining
> numbers are drawn with nearly the same probability, which
> underlies a variation. Whereas with the modulo-suggestion all
> numbers are  drawn with nearly the same probability with even less
> variation. 
> 
> Quite a difference, isn't it?
> 
> -manfred
No, there are no numbers who "occur" 4 times, since that would imply that *all* target numbers except the last one (0-98) would "occur" 3 times at least, meaning that the original random numbers would have to be at least 3*99 + 1 (near 300 basically, not just 255).

This stuff reminds me of line to pixel rasterization algorithms :P

-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
November 14, 2005
In article <dl8f8c$cfj$1@digitaldaemon.com>, Walter Bright says...
>
>
>"Niko Korhonen" <niktheblak@hotmail.com> wrote in message news:dl1tpv$aam$1@digitaldaemon.com...
>> import std.random;
>>
>> void main()
>> {
>>    // Range 1-100
>>    uint r = cast(uint)((rand() / cast(double)uint.max) * 99.0) + 1;
>> }
>>
>> Modulo arithmetics would be simpler (uint r = rand() % 99 + 1), but the
>> randomness aspect might suffer from that.
>
>Using the integer arithmetic method is better (i.e. faster and simpler) and it does not suffer from randomness problems.

I think Niko may be referring to the fact that some poor implementations of rand() suffer from having a lower degree of "randomness" in the less significant bits than in the more significant bits.

Though, if you need a random number generator that is uniform and gives a high degree of "randomness", rand() is a very poor choice. Other generators, such as a Mersenne Twister, will perform better.

/Oskar


November 14, 2005
Walter Bright wrote:
> "Bruno Medeiros" <daiphoenixNO@SPAMlycos.com> wrote in message
> news:dl5306$30ec$1@digitaldaemon.com...
> 
>>I think it is because if you use modulo arithmetics, the numbers in the
>>range from 0 to (uint.max % rangelength) occur 1 times more often that
>>the numbers from (uint.max % rangelength + 1) to rangelength-1.
>>For example lets suppose uint.max = 255 and rangelength = 100:
>>
>>0-55  : There are 3 possible matches (example for 18: 18, 118, 218)
>>56-99 : There are 2 possible matches (example for 58: 58, 158)
>>
>>This has a very small impact for small ranges since uint.max is very big.
> 
> 
> True, however, the floating point version has the same issue.
> 
> 
Hum, didn't tought of that initially. However, the float/ratio version at least distributes the offness across the range.

-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
November 14, 2005
Oskar Linde wrote:
> In article <dl8f8c$cfj$1@digitaldaemon.com>, Walter Bright says...
> 
>>
>>"Niko Korhonen" <niktheblak@hotmail.com> wrote in message
>>news:dl1tpv$aam$1@digitaldaemon.com...
>>
>>>import std.random;
>>>
>>>void main()
>>>{
>>>   // Range 1-100
>>>   uint r = cast(uint)((rand() / cast(double)uint.max) * 99.0) + 1;
>>>}
>>>
>>>Modulo arithmetics would be simpler (uint r = rand() % 99 + 1), but the
>>>randomness aspect might suffer from that.
>>
>>Using the integer arithmetic method is better (i.e. faster and simpler) and
>>it does not suffer from randomness problems.
> 
> 
> I think Niko may be referring to the fact that some poor implementations of
> rand() suffer from having a lower degree of "randomness" in the less
> significant bits than in the more significant bits.

Yes, although I think those days are long gone.

> Though, if you need a random number generator that is uniform and
> gives a high degree of "randomness", rand() is a very poor choice.
> Other generators, such as a Mersenne Twister, will perform better.
> 
> /Oskar

I find it a little disturbing that the docs for std.random don't give the name of the algorithm which is being used, or any information about its statistical properties.

There's a useful listing of random generators at www.agner.org (though note that most of his code is GPL'ed), and another one in www.boost.org

std.random could be greatly improved upon.
November 14, 2005
Bruno Medeiros wrote:

[...]
> No, there are no numbers who "occur" 4 times, since that would imply that *all* target numbers except the last one (0-98) would "occur" 3 times at least, meaning that the original random numbers would have to be at least 3*99 + 1 (near 300 basically, not just 255).

I do not agree with this try of a proof.

What I meant with my question is, that there might be three consecutive "buckets", that are drawn with the absolute values 232 if and only if there would not be any rounding erros resulting from the divison.

If rounding errors occur, then one draw normally falling into the middle bucket might be actually drawn from the left bucket, resulting in the absolute values 322, which is no problem, but if the normal draw would be out of the right bucket but because of the rounding error is actually drawn from the middle bucket the absolute values are 241.

Do you agree, that in all cases the total sum of draws does not change?

-manfred