Jump to page: 1 2
Thread overview
[GSoC] Mir.random.flex - Generic non-uniform random sampling
Aug 22, 2016
Seb
Aug 22, 2016
jmh530
Aug 23, 2016
Seb
Aug 22, 2016
Meta
Aug 23, 2016
Ilya Yaroshenko
Aug 23, 2016
Mike Parker
Aug 23, 2016
Nordlöw
Aug 23, 2016
Nordlöw
Aug 23, 2016
Seb
Aug 23, 2016
tn
Aug 23, 2016
Seb
Aug 23, 2016
Seb
Aug 23, 2016
Johan Engelen
Aug 23, 2016
Seb
Aug 23, 2016
tn
Aug 23, 2016
Seb
Aug 23, 2016
Stefan
Aug 23, 2016
Seb
August 22, 2016
Hey all,

I am proud to publish a report of my GSoC work as two extensive blog posts, which explain non-uniform random sampling and the mir.random.flex package (part of Mir > 0.16-beta2):

http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html


Before I start my personal retrospect, I wanted to use this opportunity to give a huge thanks and acknowledgement to my two awesome mentors Ilya Yaroshenko (9il) and Joseph Wakeling (WebDrake).

As I wrote my first line of D code this February, I have learned quite a lot during the last few months. Github allows to list all merged contributions, which might show that I got quite familiar with dlang over the time:

https://github.com/search?l=&o=desc&q=author%3Awilzbach+is%3Amerged+user%3Adlang&ref=advsearch&s=comments&type=Issues&utf8=%E2%9C%93

… and with other D repositories:

https://github.com/search?l=D&o=desc&q=author%3Awilzbach+is%3Amerged&ref=searchresults&s=comments&type=Issues&utf8=%E2%9C%93

I am pretty sure you now know me from the NG, Github, IRC, Twitter, Bugzilla, DConf16, the DWiki, #d at StackOverflow or /r/d_language, so I will skip a further introduction ;-)

Over the next weeks and months I will continue my work on mir.random, which is supposed to supersede std.random, so in case you aren’t following the Mir project [1, 2], stay tuned!

Best regards,

Seb

[1] https://github.com/libmir/mir
[2] https://twitter.com/libmir
August 22, 2016
On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
> Hey all,
>
> I am proud to publish a report of my GSoC work as two extensive blog posts, which explain non-uniform random sampling and the mir.random.flex package (part of Mir > 0.16-beta2):
>
> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html
>
>

Thanks for the well-done blog posts, especially the first one.

Does your implementation make any use of CTFE?
August 22, 2016
On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
> Hey all,
>
> I am proud to publish a report of my GSoC work as two extensive blog posts, which explain non-uniform random sampling and the mir.random.flex package (part of Mir > 0.16-beta2):
>
> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html

It's really nice to see that GSoC has been such a huge success so far. Everyone has done some really great work.


> Over the next weeks and months I will continue my work on mir.random, which is supposed to supersede std.random, so in case you aren’t following the Mir project [1, 2], stay tuned!
>
> Best regards,
>
> Seb
>
> [1] https://github.com/libmir/mir
> [2] https://twitter.com/libmir

I'm curious, have you come up with a solution to what is probably the biggest problem with  std.random, i.e., it uses value types and copying? I remember a lot of discussion about this and it seemed at the time that the only really solid solution was to make all random generators classes, though I think DIP1000 *may* help here.
August 23, 2016
On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
> Hey all,
>
> I am proud to publish a report of my GSoC work as two extensive blog posts, which explain non-uniform random sampling and the mir.random.flex package (part of Mir > 0.16-beta2):
>
> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html

Reddit: https://www.reddit.com/r/programming/comments/4z4sp7/an_introduction_to_nonuniform_random_sampling/
August 23, 2016
On Monday, 22 August 2016 at 18:09:28 UTC, Meta wrote:
> On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
>> Hey all,
>>
>> I am proud to publish a report of my GSoC work as two extensive blog posts, which explain non-uniform random sampling and the mir.random.flex package (part of Mir > 0.16-beta2):
>>
>> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
>> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html
>
> It's really nice to see that GSoC has been such a huge success so far. Everyone has done some really great work.
>
>
>> Over the next weeks and months I will continue my work on mir.random, which is supposed to supersede std.random, so in case you aren’t following the Mir project [1, 2], stay tuned!
>>
>> Best regards,
>>
>> Seb
>>
>> [1] https://github.com/libmir/mir
>> [2] https://twitter.com/libmir
>
> I'm curious, have you come up with a solution to what is probably the biggest problem with  std.random, i.e., it uses value types and copying? I remember a lot of discussion about this and it seemed at the time that the only really solid solution was to make all random generators classes, though I think DIP1000 *may* help here.

This is an API problem, and will not be fixed. Making D scripting like language is bad for Science. For example, druntime (Fibers and Mutexes) is useless because it is too high level and poor featured in the same time.

The main problem with std.random is that std.random.uniform is broken in context of non-uniform sampling. The same situation is for 99% uniform algorithms. They just ignore the fact that for example, for [0, 1) exponent and mantissa should be generated separately with appropriate probabilities for for exponent
August 23, 2016
On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
> I am proud to publish a report of my GSoC work as two extensive blog posts, which explain non-uniform random sampling and the mir.random.flex package (part of Mir > 0.16-beta2):

Fantastic work!
August 23, 2016
On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html

Found at typo:

Search for "performance boost performance boost"
August 23, 2016
On Tuesday, 23 August 2016 at 05:40:24 UTC, Ilya Yaroshenko wrote:
> This is an API problem, and will not be fixed. Making D scripting like language is bad for Science. For example, druntime (Fibers and Mutexes) is useless because it is too high level and poor featured in the same time.

Yes, this is not an issue that is immediately fixable without introducing other issues (e.g. defining everything as a class brings immediate issues related to heap allocation).

In the long run it would obviously be nice to address that issue, but it would have been a major blocker to throw that onto Seb's shoulders (as we all recognized quite quickly when we started discussing it).  It was (rightly) not the focus of this project.

For this reason, the random distributions introduced in mir are implemented as functors (as is the case with random distributions in C++11 <random>) rather than as ranges.

> The main problem with std.random is that std.random.uniform is broken in context of non-uniform sampling. The same situation is for 99% uniform algorithms. They just ignore the fact that for example, for [0, 1) exponent and mantissa should be generated separately with appropriate probabilities for for exponent

Just as a point of terminology: we should make clear here that this is about sampling from a non-uniform distribution.  It shouldn't be confused with "sampling" in the sense of what is done by (say) `RandomSample`.
August 23, 2016
On Monday, 22 August 2016 at 17:13:10 UTC, jmh530 wrote:
> On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
>> Hey all,
>>
>> I am proud to publish a report of my GSoC work as two extensive blog posts, which explain non-uniform random sampling and the mir.random.flex package (part of Mir > 0.16-beta2):
>>
>> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
>> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html
>>
>>
>
> Thanks for the well-done blog posts, especially the first one.

I am glad to hear this!

> Does your implementation make any use of CTFE?

If you refer to whether the intervals can be calculated at CT, unfortunately it can't be used due to four main reasons:

- FP-math at CT (it's already hard to deal with at RT, see e.g. my recent complaint [1]) - the problem is that the Flex algorithm is very sensitive to numerical errors and thus an erroneous change at the lowest end (e.g 10^-15) can lead to totally different numbers with a seeded random engine
- std.container due to pointers (I doubt this can/will be fixed in the near future)
- std.math due to inline assembly and other tricks (this can be fixed and I will submit a couple of PRs soon)
- speed of the CTFE engine (see e.g. [2] for std.regex)

That being said CTFE is of course used to compute mixins, constants and specialize functions. Moreover thanks to all speed-ups described in the second blog, constructing the intervals takes about 0.1ms, so for the majority of the users it shouldn't even be noticeable and for the tiny minority it does, they can still manually inline the intervals.

[1] http://forum.dlang.org/post/hjaiavlfkoamenidomsa@forum.dlang.org
[2] http://forum.dlang.org/post/iqcrnokalollrejcabad@forum.dlang.org
August 23, 2016
On Tuesday, 23 August 2016 at 08:10:50 UTC, Nordlöw wrote:
> On Monday, 22 August 2016 at 15:34:47 UTC, Seb wrote:
>> http://blog.mir.dlang.io/random/2016/08/19/intro-to-random-sampling.html
>> http://blog.mir.dlang.io/random/2016/08/22/transformed-density-rejection-sampling.html
>
> Found at typo:
>
> Search for "performance boost performance boost"

Thanks! Fixed.

Btw in case someone is interested, the blog posts are written in Github-flavored Markdown with a couple of custom Jekyll extensions (e.g. the Math formulas are rendered on the server with KaTeX [1]):

https://github.com/libmir/blog/blob/master/_posts/2016-08-19-transformed-density-rejection-sampling.md


[1] https://khan.github.io/KaTeX/
« First   ‹ Prev
1 2