Thread overview
Vitter's algorithm for random sampling
1 day ago
Alex
1 day ago
https://getkerf.wordpress.com/2016/03/30/the-best-algorithm-no-one-knows-about/

Interesting. I think it has a place in std.random, and also it might help improve some of the existing stuff in there.
1 day ago
On Friday, 13 September 2019 at 21:49:45 UTC, Andrei Alexandrescu wrote:
> https://getkerf.wordpress.com/2016/03/30/the-best-algorithm-no-one-knows-about/
>
> Interesting. I think it has a place in std.random, and also it might help improve some of the existing stuff in there.

tsv-sample in eBay's tsv utilities contains a number of sampling implementations, including several classic algorithms. They use std.random facilities wherever possible. It does not contain an implementation of Vitter's algorithm D, because algorithm D requires knowing the record set size ahead of time. However one of the algorithms, bernoulli skip sampling, does use the "skip" mechanism listed in section 2, para 2 of Vitter's paper.

The code documentation contains references for the different algorithms used. Like all the tsv-utils stuff, they are fast. No published benchmarks though.

Links:
* User documentation: https://github.com/eBay/tsv-utils/blob/master/docs/ToolReference.md#tsv-sample-reference
* Source code: https://github.com/eBay/tsv-utils/blob/master/tsv-sample/src/tsv_utils/tsv-sample.d
* Code documentation (via Adam's doc generator): https://tsv-utils.dpldocs.info/tsv_utils.tsv_sample.html

--Jon
1 day ago
On Friday, 13 September 2019 at 21:49:45 UTC, Andrei Alexandrescu wrote:
> https://getkerf.wordpress.com/2016/03/30/the-best-algorithm-no-one-knows-about/
>
> Interesting. I think it has a place in std.random, and also it might help improve some of the existing stuff in there.

As far as I know, there is one in mir.
http://mir.dlang.io/mir_random_algorithm.html#.VitterStrides
23 hours ago
On Friday, 13 September 2019 at 21:49:45 UTC, Andrei Alexandrescu wrote:
> https://getkerf.wordpress.com/2016/03/30/the-best-algorithm-no-one-knows-about/
>
> Interesting. I think it has a place in std.random, and also it might help improve some of the existing stuff in there.

Er .... isn't that exactly what IS in std.random? I ask because I was the one who implemented Vitter's algorithm there, way back in 2012 :-)