Thread overview
dstats alpha 4
Dec 21, 2008
dsimcha
Dec 22, 2008
Don
Dec 22, 2008
dsimcha
December 21, 2008
The latest alpha of my statistics library is out on Scrapple.  I've decided that it needed a catchier name, so I've renamed it to dstats.  Anyhow, it's undergone some pretty drastic upgrades since the last time I announced a release.  We now have the following relatively major stuff that wasn't there before, plus some minor stuff that I forgot to mention.

Much improved docs.

Much better API organization.

A decent information theory module, with entropy, mutual information, joint entropy and conditional mutual information.

Mean, variance, standard deviation, skewness, kurtosis, Pearson correlation
and covariance can now be calculated online as data is received, or from arrays.

Improved memory allocation for scratch space used within functions by using the TempAlloc region allocator.

A combination generator that can generate all (N choose R) combinations from a
sequence.

Equal frequency and equal width binning functions.

Laplace and Cauchy CDFs and inverse CDFs

Student's one- and two-sample and Welch's T-tests.

The Waltz-Wolfowitz test for randomness of a sequence.

Random number generation from the normal, Cauchy, Laplace, Bernoulli, binomial, and hypergeometric distributions.

Lots of bug fixes and performance improvements.

Unfortunately, D1 doesn't have enough features for me to create the API I want in dstats, so I've been targeting the bleeding edge compilers.  dstats will only work on DMD 2.022.  Even with 2.021, the TLS GC bug that was fixed in 2.022 will break it.
December 22, 2008
dsimcha wrote:
> The latest alpha of my statistics library is out on Scrapple.  I've decided
> that it needed a catchier name, so I've renamed it to dstats.  Anyhow, it's
> undergone some pretty drastic upgrades since the last time I announced a
> release.  We now have the following relatively major stuff that wasn't there
> before, plus some minor stuff that I forgot to mention.
> 
> Much improved docs.
> 
> Much better API organization.
> 
> A decent information theory module, with entropy, mutual information, joint
> entropy and conditional mutual information.
> 
> Mean, variance, standard deviation, skewness, kurtosis, Pearson correlation
> and covariance can now be calculated online as data is received, or from arrays.
> 
> Improved memory allocation for scratch space used within functions by using
> the TempAlloc region allocator.
> 
> A combination generator that can generate all (N choose R) combinations from a
> sequence.
> 
> Equal frequency and equal width binning functions.
> 
> Laplace and Cauchy CDFs and inverse CDFs
> 
> Student's one- and two-sample and Welch's T-tests.
> 
> The Waltz-Wolfowitz test for randomness of a sequence.
> 
> Random number generation from the normal, Cauchy, Laplace, Bernoulli,
> binomial, and hypergeometric distributions.
> 
> Lots of bug fixes and performance improvements.
> 
> Unfortunately, D1 doesn't have enough features for me to create the API I want
> in dstats, so I've been targeting the bleeding edge compilers.  dstats will
> only work on DMD 2.022.  Even with 2.021, the TLS GC bug that was fixed in
> 2.022 will break it.

Looks great!
Could you please review the new 'math.random' package in Tango? I hate competition, would be great if we could take the best of both.
December 22, 2008
== Quote from Don (nospam@nospam.com)'s article
> Looks great!
> Could you please review the new 'math.random' package in Tango? I hate
> competition, would be great if we could take the best of both.

Agreed, but as I mentioned, I've been targeting D2, and that's what I generally use.  I realize dstats duplicates some functionality that is already in Tango, but because Tango doesn't support D2 yet, and because relying on Tango for the functionality that does overlap would create a massive dependency for only a few modules, I think that duplicating a little functionality may be, for now, the lesser of two evils.  If you have any suggestions about how to deal with this, I'd be open to them.