[GSoC] Mir.random.flex - Generic non-uniform random sampling (page 2)

On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote: > http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html What are the columns "mu time" and "sigma^2 time" of the benchmark table in the Sampling subsection?

On Tuesday, 23 August 2016 at 11:58:53 UTC, tn wrote: > On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote: >> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html > > What are the columns "mu time" and "sigma^2 time" of the benchmark table in the Sampling subsection? In statistics mu is often used to describe the average, sigma the standard deviation and sigma^2 the variance. It was absolutely unnecessary to use this notation here (especially because standard deviation, not variance was measured). I fixed that and also added how many samples were generated per run (10M), thanks! Btw in case you didn't find the link within the article, the benchmark is available within Mir: https://github.com/libmir/mir/blob/master/benchmarks/flex/normal_dist.d

On Tuesday, 23 August 2016 at 12:12:30 UTC, Seb wrote: > [...] > I fixed that and also added how many samples were generated per run (10M), thanks! Btw I quickly added the run-time for C++ (for G++ and clang++ with -O3) and it doesn't look that bad: http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html#sampling https://github.com/libmir/mir/pull/307 Note that for comparison between languages the speed of the random engine plays a major role and that superior performance was never a top priority for this generic method (statistical quality is far more important). However I will work more on being faster than <random> for common distributions over the next weeks ;-)

On Tuesday, 23 August 2016 at 12:12:30 UTC, Seb wrote: > On Tuesday, 23 August 2016 at 11:58:53 UTC, tn wrote: >> On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote: >>> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html >> >> What are the columns "mu time" and "sigma^2 time" of the benchmark table in the Sampling subsection? > > In statistics mu is often used to describe the average, sigma the standard deviation and sigma^2 the variance. It was absolutely unnecessary to use this notation here (especially because standard deviation, not variance was measured). I fixed that and also added how many samples were generated per run (10M), thanks! Thanks for the clarification and the fix. (I am familiar with the usage of mu and sigma, but somehow I first thought that the columns corresponded to two different measurements. After the initial confusion I realized the correct meaning, but still wasn't sure about it due to contradiction between the name of the column (sigma^2, not sigma) and the values (ms, not ms^2 or something). So I thought that it would be better to ask.) Another question: You mention that statistical quality is important, but it is not clear if flex has better or worse quality than Box-Muller and Ziggurat in the case of sampling from normal distribution. Or is the difference negligible? (I realize that the real strength of flex is its versatility.)

August 23, 2016

Re: [GSoC] Mir.random.flex - Generic non-uniform random sampling

Posted by Seb
in reply to tn

Permalink

Seb

Posted in reply to tn

Permalink

On Tuesday, 23 August 2016 at 13:12:28 UTC, tn wrote:
> Another question: You mention that statistical quality is important, but it is not clear if flex has better or worse quality than Box-Muller and Ziggurat in the case of sampling from normal distribution. Or is the difference negligible? (I realize that the real strength of flex is its versatility.)

tl;dr: yes, it's negligible for the normal distribution.

Excellent question - I didn't cover this in much detail because:

(1) it has already been done extensively in the literature. For example, scroll to Table IV (page 32) at "Gaussian Random Number Generators" by Thomas et. al. (2007) for chi-squared and high-sigma tests [1] (btw in all random libraries I looked at, e.g. <random>, the Box-Muller method is used for the normal distribution)
(2) the authors of the Tinflex algorithm wrote their own UNU.RAN [2] library to prove that the transformed density rejection method is close to inversion method (without numerical errors, it would be "perfect"). Under the hood UNU.RAN uses polynomial interpolation of inverse CDF, which is needed to be able to automatically compute statistical properties. This method is on our roadmap [3].
(3) the hat/squeeze and cumulative histograms of nearly all example distributions at [4] look pretty good [5] (if the cumulative histogram is identical to the CDF curve, the errors are negligible)
(4) this is just a preview release and requires [3] for automatic testing of user-generated distributions

[1] http://www.doc.ic.ac.uk/~wl/papers/07/csur07dt.pdf
[2] http://statmath.wu.ac.at/unuran/
[3] https://github.com/libmir/mir/issues/46
[4] https://github.com/libmir/mir/tree/master/examples/flex_plot
[5] https://drive.google.com/open?id=0BwdiZp7qSaBhZXRJNHhSN1RHR3c

On Tuesday, 23 August 2016 at 13:01:29 UTC, Seb wrote: > On Tuesday, 23 August 2016 at 12:12:30 UTC, Seb wrote: >> [...] >> I fixed that and also added how many samples were generated per run (10M), thanks! > > Btw I quickly added the run-time for C++ (for G++ and clang++ with -O3) I think you should add compiler/runtimelib versions for all measurements, and all relevant flags. I'm happy to see you are using LDC for benchmarking; don't forget to mention the LLVM version for LDC's version. You already wrote `-O3` for the C++ measurements, did you not use -mcpu=native or some other performance changing flags? Cheers, Johan

On Tuesday, 23 August 2016 at 16:54:18 UTC, Johan Engelen wrote: > On Tuesday, 23 August 2016 at 13:01:29 UTC, Seb wrote: >> On Tuesday, 23 August 2016 at 12:12:30 UTC, Seb wrote: >>> [...] >>> I fixed that and also added how many samples were generated per run (10M), thanks! >> >> Btw I quickly added the run-time for C++ (for G++ and clang++ with -O3) > > I think you should add compiler/runtimelib versions for all measurements, and all relevant flags. > I'm happy to see you are using LDC for benchmarking; What else? ;-) > don't forget to mention the LLVM version for LDC's version. Oh that's at the header of the benchmark script: https://github.com/libmir/mir/blob/master/benchmarks/flex/normal_dist.d I excluded it from the post to avoid visual noise as I thought that the ones that are interested will check it anyhow. > You already wrote `-O3` for the C++ measurements, Now that I added the flags for C++ directly in the script it probably got very confusing. Sorry, I will replace C++ flags with a link to the benchmark soon. > did you not use -mcpu=native or some other performance changing flags? See the benchmark file above.

Seb, this is awesome! On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote: > http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html code samples with too long lines do not render appropriate on my browsers. (current chrome on linux and a pretty outdated firefox on windows :) ) Taking this example: S sample(S, RNG, Pdf, Hat, HatInv, Squeeze)(ref RNG gen, Pdf pdf, Hat hat, HatInv hatInvCDF, Squeeze sq) { Then the second line of parameters does not render at all. Will definitly play around with your code!

On Tuesday, 23 August 2016 at 22:48:31 UTC, Stefan wrote: > Seb, this is awesome! Thanks :) > On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote: >> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html > > code samples with too long lines do not render appropriate on my browsers. (current chrome on linux and a pretty outdated firefox on windows :) ) > > Taking this example: > > S sample(S, RNG, Pdf, Hat, HatInv, Squeeze)(ref RNG gen, Pdf pdf, Hat hat, > HatInv hatInvCDF, Squeeze sq) > { > > Then the second line of parameters does not render at all. It was a weird white-space issue. Thanks for letting me know! I fixed it and it now looks normal on the "latest, greatest" Chromium, FF and Chrome on Android: http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html#rejection-with-inversion http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html#squeeze-functions > Will definitly play around with your code! Don't hesitate to ping me / us with any questions -> https://github.com/libmir/mir/issues

Forums