Thread overview | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 23, 2008 Statistics library | ||||
---|---|---|---|---|
| ||||
Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary. Almost by accident, it seems like I've built up the beginnings of a decent statistics library. I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area. The following functionality is currently available: Correlation (Pearson, Spearman rho, Kendall tau). Note that the Kendall tau correlation is a very efficient O(N log N) version. Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values. Shannon entropy, mutual information. Kolmogorov-Smirnov tests Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs. Inverse normal distribution, and normally distributed random number generation. A struct to generate all possible permutations of a sequence. On the other hand, I'm a scientist, not a full-time programmer, and although I can write working code, I have no clue what it takes to get code up to the gold standard of "production." Also, this library is very D2-dependent, and I have no interest in back-porting it. Of course if by some chance someone else wanted to back-port it, they'd be more than welcome. Most of the code is covered somehow or another by unit tests, although I cheated a lot by having some unit tests depend on multiple functions. Is there any interest in this from others in the D community? Do other people think that D would benefit from having a decent statistics library? Other comments? |
October 23, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | dsimcha, I think the struct to generate permutations is out of place there, and more fit in a module like the comb (combinatorics) of mine. Beside that detail, I like the idea of having a standard module with basic statistics, so I am interested :-) Bye, bearophile |
October 23, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | Reply to dsimcha,
> Since there's really no good comprehensive statistics library for D
> (Tango has a little bit, the beginnings of a few are on dsource, but
> nothing much), Ive been rolling my own statistics functions as
> necessary. Almost by accident, it seems like I've built up the
> beginnings of a decent statistics library. I'm debating whether it
> might be interesting enough to people to be worth releasing, and
> whether enough community help would be available to really make it
> production quality, or to merge it with other people's efforts in this
> area.
Well for starters, just ask and I'll get you access to put it on scrapple. That's if you don't want to go to the trouble of having your own project (it's not much trouble BTW)
|
October 23, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | dsimcha wrote:
> Since there's really no good comprehensive statistics library for D (Tango has
> a little bit, the beginnings of a few are on dsource, but nothing much), Ive
> been rolling my own statistics functions as necessary. Almost by accident, it
> seems like I've built up the beginnings of a decent statistics library. I'm
> debating whether it might be interesting enough to people to be worth
> releasing, and whether enough community help would be available to really make
> it production quality, or to merge it with other people's efforts in this
> area. The following functionality is currently available:
>
> Correlation (Pearson, Spearman rho, Kendall tau). Note that the Kendall
> tau correlation is a very efficient O(N log N) version.
>
> Mean, standard deviation, variance, kurtosis, percent variance for arrays of
> numeric values.
>
> Shannon entropy, mutual information.
>
> Kolmogorov-Smirnov tests
>
> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric,
> Poisson, binomial PDFs.
>
> Inverse normal distribution, and normally distributed random number generation.
>
> A struct to generate all possible permutations of a sequence.
>
>
> On the other hand, I'm a scientist, not a full-time programmer, and although I
> can write working code, I have no clue what it takes to get code up to the
> gold standard of "production." Also, this library is very D2-dependent, and I
> have no interest in back-porting it. Of course if by some chance someone else
> wanted to back-port it, they'd be more than welcome.
>
> Most of the code is covered somehow or another by unit tests, although I
> cheated a lot by having some unit tests depend on multiple functions.
>
> Is there any interest in this from others in the D community? Do other people
> think that D would benefit from having a decent statistics library? Other
> comments?
If the community is interested, I'd be glad to take over your code and put it in Phobos.
Andrei
|
October 24, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | On Fri, Oct 24, 2008 at 7:43 AM, dsimcha <dsimcha@yahoo.com> wrote:
> Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary. Almost by accident, it seems like I've built up the beginnings of a decent statistics library. I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area. The following functionality is currently available:
>
> Correlation (Pearson, Spearman rho, Kendall tau). Note that the Kendall
> tau correlation is a very efficient O(N log N) version.
>
> Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values.
>
> Shannon entropy, mutual information.
>
> Kolmogorov-Smirnov tests
>
> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.
>
> Inverse normal distribution, and normally distributed random number generation.
>
> A struct to generate all possible permutations of a sequence.
I don't know what a lot of those things are, but statistics to me means you will probably have (or eventually want) things like covariance which are best represented as matrices. Does your package also have a matrix library?
--bb
|
October 24, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Reply to Andrei,
> dsimcha wrote:
>
>> Since there's really no good comprehensive statistics library for D
>> (Tango has a little bit, the beginnings of a few are on dsource, but
>> nothing much), Ive been rolling my own statistics functions as
>> necessary. Almost by accident, it seems like I've built up the
>> beginnings of a decent statistics library. I'm debating whether it
>> might be interesting enough to people to be worth releasing, and
>> whether enough community help would be available to really make it
>> production quality, or to merge it with other people's efforts in
>> this area. The following functionality is currently available:
>>
>> Correlation (Pearson, Spearman rho, Kendall tau). Note that the
>> Kendall tau correlation is a very efficient O(N log N) version.
>>
>> Mean, standard deviation, variance, kurtosis, percent variance for
>> arrays of numeric values.
>>
>> Shannon entropy, mutual information.
>>
>> Kolmogorov-Smirnov tests
>>
>> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs,
>> hypergeometric, Poisson, binomial PDFs.
>>
>> Inverse normal distribution, and normally distributed random number
>> generation.
>>
>> A struct to generate all possible permutations of a sequence.
>>
>> On the other hand, I'm a scientist, not a full-time programmer, and
>> although I can write working code, I have no clue what it takes to
>> get code up to the gold standard of "production." Also, this library
>> is very D2-dependent, and I have no interest in back-porting it. Of
>> course if by some chance someone else wanted to back-port it, they'd
>> be more than welcome.
>>
>> Most of the code is covered somehow or another by unit tests,
>> although I cheated a lot by having some unit tests depend on multiple
>> functions.
>>
>> Is there any interest in this from others in the D community? Do
>> other people think that D would benefit from having a decent
>> statistics library? Other comments?
>>
> If the community is interested, I'd be glad to take over your code and
> put it in Phobos.
>
> Andrei
>
Even better would be getting it in both Phobos and Tango. Shouldn't be hard as I can't think it should depend on much.
|
October 24, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Baxter | == Quote from Bill Baxter (wbaxter@gmail.com)'s article
> On Fri, Oct 24, 2008 at 7:43 AM, dsimcha <dsimcha@yahoo.com> wrote:
> > Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary. Almost by accident, it seems like I've built up the beginnings of a decent statistics library. I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area. The following functionality is currently available:
> >
> > Correlation (Pearson, Spearman rho, Kendall tau). Note that the Kendall
> > tau correlation is a very efficient O(N log N) version.
> >
> > Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values.
> >
> > Shannon entropy, mutual information.
> >
> > Kolmogorov-Smirnov tests
> >
> > Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.
> >
> > Inverse normal distribution, and normally distributed random number generation.
> >
> > A struct to generate all possible permutations of a sequence.
> I don't know what a lot of those things are, but statistics to me
> means you will probably have (or eventually want) things like
> covariance which are best represented as matrices. Does your package
> also have a matrix library?
> --bb
No, it doesn't have a matrix library right now. I make no claim that it is in any way complete right now, but I do think it has some pretty useful stuff that's not likely to be anywhere else for D.
|
October 24, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | On Fri, Oct 24, 2008 at 9:39 AM, dsimcha <dsimcha@yahoo.com> wrote:
> == Quote from Bill Baxter (wbaxter@gmail.com)'s article
>> On Fri, Oct 24, 2008 at 7:43 AM, dsimcha <dsimcha@yahoo.com> wrote:
>> > Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary. Almost by accident, it seems like I've built up the beginnings of a decent statistics library. I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area. The following functionality is currently available:
>> >
>> > Correlation (Pearson, Spearman rho, Kendall tau). Note that the Kendall
>> > tau correlation is a very efficient O(N log N) version.
>> >
>> > Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values.
>> >
>> > Shannon entropy, mutual information.
>> >
>> > Kolmogorov-Smirnov tests
>> >
>> > Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.
>> >
>> > Inverse normal distribution, and normally distributed random number generation.
>> >
>> > A struct to generate all possible permutations of a sequence.
>> I don't know what a lot of those things are, but statistics to me
>> means you will probably have (or eventually want) things like
>> covariance which are best represented as matrices. Does your package
>> also have a matrix library?
>> --bb
>
> No, it doesn't have a matrix library right now. I make no claim that it is in any way complete right now, but I do think it has some pretty useful stuff that's not likely to be anywhere else for D.
Ok, so it's mainly for 1d statistics then?
--bb
|
October 24, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to BCS | == Quote from BCS (ao@pathlink.com)'s article
> Even better would be getting it in both Phobos and Tango. Shouldn't be hard as I can't think it should depend on much.
First, Tango needs to be ported to D2 (I realize that this is happening) or my code needs to be ported to D1. Anyhow, here are the dependencies:
Non-trivial, i.e. in several places:
std.math, std.traits, std.functional, some custom sorting functions I wrote, which
could just be included
Trivial, i.e. in only one or two small places, pretty sure Tango has a drop-in
replacement
std.bigint (for factorial, although all functions that actually use a factorial
are calculated in log space, and therefore don't depend on this), std.algorithm
(for swap, isSorted), std.random
|
October 24, 2008 Re: Statistics library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Baxter | == Quote from Bill Baxter (wbaxter@gmail.com)'s article
> Ok, so it's mainly for 1d statistics then?
> --bb
Right.
|
Copyright © 1999-2021 by the D Language Foundation