Jump to page: 1 2
Thread overview
[Issue 10550] New: Xorshift32 and Xorshift160 do not generate uniformly-distributed random numbers
Jul 08, 2013
Masahiro Nakagawa
Jul 08, 2013
Masahiro Nakagawa
July 05, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550

           Summary: Xorshift32 and Xorshift160 do not generate
                    uniformly-distributed random numbers
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: joseph.wakeling@webdrake.net


--- Comment #0 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-05 07:11:03 PDT ---
The output of Xorshift32 and Xorshift160 departs strongly from uniformity.

The issue for Xorshift32 can be seen clearly even in this very simple piece of code:

    auto rng = Xorshift32(unpredictableSeed);

    foreach(r; rng.take(20))
    {
        writeln(r, "\t", rng.min, "\t", rng.max);
    }

... where we can see that the generated values are always much closer to rng.min than rng.max.

A more sophisticated example (but still very simple) is provided in the test of uniformity in https://github.com/WebDrake/std.random.test -- specifically, in: https://github.com/WebDrake/std.random.test/blob/master/uniform.d https://github.com/WebDrake/std.random.test/blob/master/test/stats.d

... which generates millions of random numbers using uniform(0.0, 0.1, rng), divides them up into bins of width 0.05, and outputs the histogram and cumulative distribution values.

For Xorshift32, this gives:

Generating 10000000 random numbers in [0, 1) with XorshiftEngine!(uint, 32, 13,
17, 5)
    5    1    1
    10    0    1
    15    0    1
    20    0    1
    25    0    1
    30    0    1
    35    0    1
    40    0    1
    45    0    1
    50    0    1
    55    0    1
    60    0    1
    65    0    1
    70    0    1
    75    0    1
    80    0    1
    85    0    1
    90    0    1
    95    0    1
    100    0    1

... while for Xorshift160 it gives:

Generating 10000000 random numbers in [0, 1) with XorshiftEngine!(uint, 160, 2,
1, 4)
    5    0.0535713    0.0535713
    10    0.0208334    0.0744047
    15    0.0446429    0.119048
    20    0.047619    0.166667
    25    0.0714286    0.238095
    30    0.0238096    0.261905
    35    0.0208333    0.282738
    40    0.0416667    0.324405
    45    0.0595238    0.383929
    50    0.0446429    0.428572
    55    0.0476191    0.476191
    60    0.0595239    0.535714
    65    0.0714285    0.607143
    70    0.0535714    0.660714
    75    0.0535713    0.714286
    80    0.0744045    0.78869
    85    0.0744049    0.863095
    90    0.0357143    0.898809
    95    0.0238095    0.922619
    100    0.0773811    1

Other RNGs, including other Xorshift types, appear to generate correct proportions.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 05, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #1 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-05 07:35:07 PDT ---
The Xorshift32 non-uniformity can be fixed by correcting the update rule in
popFront(), from:

        static if (bits == 32)
        {
            temp      = seeds_[0] ^ (seeds_[0] << a);
            temp      = temp >> b;
            seeds_[0] = temp ^ (temp << c);
        }

to:

        static if (bits == 32)
        {
            temp      = seeds_[0] ^ (seeds_[0] << a);
            temp      = temp ^ (temp >> b);
            seeds_[0] = temp ^ (temp << c);
        }

See p.3 of http://www.jstatsoft.org/v08/i14/paper -- the current implementation appears to be a typo when copying the first from the list of possible update rules.

However, if this change is made, the Xorshift unittests fail for the checks against the reference edition:

    auto checking = [
        [2463534242UL, 267649, 551450, 53765, 108832, 215250, 435468, 860211,
660133, 263375],
        [362436069UL, 2113136921, 19051112, 3010520417, 951284840, 1213972223,
3173832558, 2611145638, 2515869689, 2245824891],
        [521288629UL, 1950277231, 185954712, 1582725458, 3580567609,
2303633688, 2394948066, 4108622809, 1116800180, 3357585673],
        [88675123UL, 3701687786, 458299110, 2500872618, 3633119408, 516391518,
2377269574, 2599949379, 717229868, 137866584],
        [5783321UL, 93724048, 491642011, 136638118, 246438988, 238186808,
140181925, 533680092, 285770921, 462053907],
        [0UL, 246875399, 3690007200, 1264581005, 3906711041, 1866187943,
2481925219, 2464530826, 1604040631, 3653403911]
    ];

    alias TypeTuple!(Xorshift32, Xorshift64, Xorshift96, Xorshift128,
Xorshift160, Xorshift192) XorshiftTypes;

    foreach (I, Type; XorshiftTypes)
    {
        Type rnd;

        foreach (e; checking[I])
        {
            assert(rnd.front == e);
            rnd.popFront();
        }
    }

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 05, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #2 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-05 07:35:38 PDT ---
I also think the choice of a, b, c values may be in error: currently we have

    alias XorshiftEngine!(uint, 32,  13, 17, 5)  Xorshift32;

... but I think this is most likely a typo for

    alias XorshiftEngine!(uint, 32,  13, 17, 15)  Xorshift32;

as the paper states that there should be a < c and the triple 13, 17, 5 is not found among the list of valid triples (while 13, 17, 15 is).

However, correcting this does not prevent the unittest fail described in the previous comment.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 05, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #3 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-05 07:38:56 PDT ---
(In reply to comment #2)
> I also think the choice of a, b, c values may be in error: currently we have
> 
>     alias XorshiftEngine!(uint, 32,  13, 17, 5)  Xorshift32;

This seems to stem from a descriptive passage at the top of p.4 of the paper, where this choice of a, b, c values is described as one of the author's favourites.  However, I suspect that this is a typo as it violates the a < c rule described elsewhere in the paper and (as already stated) the 13, 17, 5 triple is not found in the table of appropriate triples for 32-bit Xorshift.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 05, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #4 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-05 08:12:24 PDT ---
Uniformity in Xorshift160 can be restored by tweaking the update rules:

        else static if (bits == 160)
        {
            temp      = seeds_[0] ^ (seeds_[0] >> a);
            seeds_[0] = seeds_[1];
            seeds_[1] = seeds_[2];
            seeds_[2] = seeds_[3];
            seeds_[3] = seeds_[4];
            seeds_[4] = seeds_[4] ^ (seeds_[4] >> c) ^ temp ^ (temp >> b);
        }

... which faithfully reproduce what is given at the bottom of p.4 of the paper, and changing the first line to:

            temp      = seeds_[0] ^ (seeds_[0] << a);

Note that this change was pure guesswork on the grounds that other bit-values of the algorithm had this alternative formulation.  It also results in a failure of the unittest on line 1032 of std.random.

Unfortunately George Marsaglia has died, so we can't ask him about typos in his papers. :-(  I was not able to find any erratum to the published article, but the discrepancies already identified make me suspect that it must be in error in several places.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 08, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #5 from bearophile_hugs@eml.cc 2013-07-08 01:46:42 PDT ---
Maybe this bug should have a "major" importance. And maybe a warning note in site ddocs should be added in the meantime.

These tests can help: http://en.wikipedia.org/wiki/Diehard_tests


>Unfortunately George Marsaglia has died, so we can't ask him about typos in his
papers.<

Some people should live longer. Probably some of his collaborators or people that have used his generators have some errata list or some suggestions to help.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 08, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550


Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major


--- Comment #6 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-08 02:46:40 PDT ---
(In reply to comment #5)
> Maybe this bug should have a "major" importance. And maybe a warning note in site ddocs should be added in the meantime.

Agree with the "major" importance, tweaked accordingly.

> These tests can help: http://en.wikipedia.org/wiki/Diehard_tests

These are now a little out of date, but there is the "dieharder" suite which is actually available as a utility in many Linux distros.  I plan on incorporating that into my test suite -- I think it should be as simple as just getting D to generate random variates using whatever method and then piping them through to dieharder.

> Some people should live longer. Probably some of his collaborators or people that have used his generators have some errata list or some suggestions to help.

I did not find anything yet, but I'll keep looking.  I assumed that his homepage would still exist with remarks like this on it, but didn't track down anything useful so far.

There are other people who've written follow-up papers on Xorshift who could be worth contacting.  I may do that if I can't find obvious documentary material.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 08, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #7 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-08 05:58:49 PDT ---
What I'd really like is to have a source for the checking values for Xorshift used in the unittests.  Masahiro, do you recall how you obtained these values? They're not in Marsaglia's paper, and Google searches to track down their source are proving fruitless. :-(

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 08, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #8 from Masahiro Nakagawa <repeatedly@gmail.com> 2013-07-08 07:28:26 PDT ---
(In reply to comment #7)
> What I'd really like is to have a source for the checking values for Xorshift used in the unittests.  Masahiro, do you recall how you obtained these values? They're not in Marsaglia's paper, and Google searches to track down their source are proving fruitless. :-(

I remember correctly, I generated test cases from paper based C implementation.
I implemented C version first and generated test cases for D unittest.
After that, I implemented XorshiftEngine for D-ish code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 08, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #9 from Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> 2013-07-08 07:51:04 PDT ---
(In reply to comment #8)
> I remember correctly, I generated test cases from paper based C implementation.
> I implemented C version first and generated test cases for D unittest.
> After that, I implemented XorshiftEngine for D-ish code.

Ahhh, OK.  Then I think we can reasonably assume that these test sequences, like the code in std.random, reflect typos in Marsaglia's paper.

For Xorshift32 we can see that Marsaglia's C implementation at the very top of p.32 contains at least one typo -- the second bitshift y=(y>>17) should be y^=(y>>17).  I am strongly suspicious that there is a second typo for the value of c, which should be 15 rather than 5.

It's definitely not the only such typo in the paper -- Panneton and L'Ecuyer
(2005) note that the (a, b, c) triple (9, 5, 1) should be (9, 5, 14).

In any case, we lose nothing by using the triple (13, 17, 15) since it's acknowledged in the paper as a valid choice.

For Xorshift160 I think that the code given has a typo for the bitshift with respect to a, as in all other such code examples the bitshift is in the opposite direction to that for b and c.  In general, the bitshifts seem to follow a rule of two in one direction, one in the other.

In summary, I think we can proceed as follows:

   * confirm with experts in the field the typos in the paper

   * generate new checking sequences with corrected versions of the C code

   * correct the D code and unittests accordingly

I am actually inclined to jump ahead with getting the patches to Phobos done, because I'm pretty confident my analysis here is correct :-)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2