January 23, 2017 Command line tool for weighted reservoir sampling | ||||
---|---|---|---|---|
| ||||
I released a new tool for weighted random sampling of tabular data files: tsv-sample. It's one of several tools recently added to tsv file toolkit I released last year. These tools are especially useful when data files are larger than is desirable to read entirely into memory in R and similar apps. I'll publish an announcement of broader set of tools updates in the next few weeks. I have some performance benchmarks to finish first. However, weighted reservoir sampling algorithms are interesting, I thought there might be enough interest to warrant a separate announcement. Repo: https://github.com/eBay/tsv-utils-dlang tsv-sample code: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d --Jon |
Copyright © 1999-2021 by the D Language Foundation