Jump to page: 1 2 3
Thread overview
Statistics library
Oct 23, 2008
dsimcha
Oct 23, 2008
bearophile
Oct 23, 2008
BCS
Oct 24, 2008
dsimcha
Oct 24, 2008
BCS
Oct 26, 2008
dsimcha
Oct 27, 2008
BCS
Oct 24, 2008
BCS
Oct 24, 2008
dsimcha
Oct 30, 2008
Don
Oct 24, 2008
Walter Bright
Oct 27, 2008
Dejan Lekic
Oct 27, 2008
dsimcha
Oct 24, 2008
Bill Baxter
Oct 24, 2008
dsimcha
Oct 24, 2008
Bill Baxter
Oct 24, 2008
dsimcha
Oct 24, 2008
Don
Oct 24, 2008
dsimcha
Oct 25, 2008
Don
October 23, 2008
Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary.  Almost by accident, it seems like I've built up the beginnings of a decent statistics library.  I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area.  The following functionality is currently available:

Correlation (Pearson, Spearman rho, Kendall tau).   Note that the     Kendall
tau correlation is a very efficient O(N log N) version.

Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values.

Shannon entropy, mutual information.

Kolmogorov-Smirnov tests

Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.

Inverse normal distribution, and normally distributed random number generation.

A struct to generate all possible permutations of a sequence.


On the other hand, I'm a scientist, not a full-time programmer, and although I can write working code, I have no clue what it takes to get code up to the gold standard of "production."  Also, this library is very D2-dependent, and I have no interest in back-porting it.  Of course if by some chance someone else wanted to back-port it, they'd be more than welcome.

Most of the code is covered somehow or another by unit tests, although I cheated a lot by having some unit tests depend on multiple functions.

Is there any interest in this from others in the D community?  Do other people think that D would benefit from having a decent statistics library?  Other comments?
October 23, 2008
dsimcha, I think the struct to generate permutations is out of place there, and more fit in a module like the comb (combinatorics) of mine.

Beside that detail, I like the idea of having a standard module with basic statistics, so I am interested :-)

Bye,
bearophile
October 23, 2008
Reply to dsimcha,

> Since there's really no good comprehensive statistics library for D
> (Tango has a little bit, the beginnings of a few are on dsource, but
> nothing much), Ive been rolling my own statistics functions as
> necessary.  Almost by accident, it seems like I've built up the
> beginnings of a decent statistics library.  I'm debating whether it
> might be interesting enough to people to be worth releasing, and
> whether enough community help would be available to really make it
> production quality, or to merge it with other people's efforts in this
> area.

Well for starters, just ask and I'll get you access to put it on scrapple. That's if you don't want to go to the trouble of having your own project (it's not much trouble BTW)


October 23, 2008
dsimcha wrote:
> Since there's really no good comprehensive statistics library for D (Tango has
> a little bit, the beginnings of a few are on dsource, but nothing much), Ive
> been rolling my own statistics functions as necessary.  Almost by accident, it
> seems like I've built up the beginnings of a decent statistics library.  I'm
> debating whether it might be interesting enough to people to be worth
> releasing, and whether enough community help would be available to really make
> it production quality, or to merge it with other people's efforts in this
> area.  The following functionality is currently available:
> 
> Correlation (Pearson, Spearman rho, Kendall tau).   Note that the     Kendall
> tau correlation is a very efficient O(N log N) version.
> 
> Mean, standard deviation, variance, kurtosis, percent variance for arrays of
> numeric values.
> 
> Shannon entropy, mutual information.
> 
> Kolmogorov-Smirnov tests
> 
> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric,
> Poisson, binomial PDFs.
> 
> Inverse normal distribution, and normally distributed random number generation.
> 
> A struct to generate all possible permutations of a sequence.
> 
> 
> On the other hand, I'm a scientist, not a full-time programmer, and although I
> can write working code, I have no clue what it takes to get code up to the
> gold standard of "production."  Also, this library is very D2-dependent, and I
> have no interest in back-porting it.  Of course if by some chance someone else
> wanted to back-port it, they'd be more than welcome.
> 
> Most of the code is covered somehow or another by unit tests, although I
> cheated a lot by having some unit tests depend on multiple functions.
> 
> Is there any interest in this from others in the D community?  Do other people
> think that D would benefit from having a decent statistics library?  Other
> comments?

If the community is interested, I'd be glad to take over your code and put it in Phobos.

Andrei
October 24, 2008
On Fri, Oct 24, 2008 at 7:43 AM, dsimcha <dsimcha@yahoo.com> wrote:
> Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary.  Almost by accident, it seems like I've built up the beginnings of a decent statistics library.  I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area.  The following functionality is currently available:
>
> Correlation (Pearson, Spearman rho, Kendall tau).   Note that the     Kendall
> tau correlation is a very efficient O(N log N) version.
>
> Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values.
>
> Shannon entropy, mutual information.
>
> Kolmogorov-Smirnov tests
>
> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.
>
> Inverse normal distribution, and normally distributed random number generation.
>
> A struct to generate all possible permutations of a sequence.


I don't know what a lot of those things are, but statistics to me means you will probably have (or eventually want) things like covariance which are best represented as matrices.  Does your package also have a matrix library?

--bb
October 24, 2008
Reply to Andrei,

> dsimcha wrote:
> 
>> Since there's really no good comprehensive statistics library for D
>> (Tango has a little bit, the beginnings of a few are on dsource, but
>> nothing much), Ive been rolling my own statistics functions as
>> necessary.  Almost by accident, it seems like I've built up the
>> beginnings of a decent statistics library.  I'm debating whether it
>> might be interesting enough to people to be worth releasing, and
>> whether enough community help would be available to really make it
>> production quality, or to merge it with other people's efforts in
>> this area.  The following functionality is currently available:
>> 
>> Correlation (Pearson, Spearman rho, Kendall tau).   Note that the
>> Kendall tau correlation is a very efficient O(N log N) version.
>> 
>> Mean, standard deviation, variance, kurtosis, percent variance for
>> arrays of numeric values.
>> 
>> Shannon entropy, mutual information.
>> 
>> Kolmogorov-Smirnov tests
>> 
>> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs,
>> hypergeometric, Poisson, binomial PDFs.
>> 
>> Inverse normal distribution, and normally distributed random number
>> generation.
>> 
>> A struct to generate all possible permutations of a sequence.
>> 
>> On the other hand, I'm a scientist, not a full-time programmer, and
>> although I can write working code, I have no clue what it takes to
>> get code up to the gold standard of "production."  Also, this library
>> is very D2-dependent, and I have no interest in back-porting it.  Of
>> course if by some chance someone else wanted to back-port it, they'd
>> be more than welcome.
>> 
>> Most of the code is covered somehow or another by unit tests,
>> although I cheated a lot by having some unit tests depend on multiple
>> functions.
>> 
>> Is there any interest in this from others in the D community?  Do
>> other people think that D would benefit from having a decent
>> statistics library?  Other comments?
>> 
> If the community is interested, I'd be glad to take over your code and
> put it in Phobos.
> 
> Andrei
> 

Even better would be getting it in both Phobos and Tango. Shouldn't be hard as I can't think it should depend on much.


October 24, 2008
== Quote from Bill Baxter (wbaxter@gmail.com)'s article
> On Fri, Oct 24, 2008 at 7:43 AM, dsimcha <dsimcha@yahoo.com> wrote:
> > Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary.  Almost by accident, it seems like I've built up the beginnings of a decent statistics library.  I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area.  The following functionality is currently available:
> >
> > Correlation (Pearson, Spearman rho, Kendall tau).   Note that the     Kendall
> > tau correlation is a very efficient O(N log N) version.
> >
> > Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values.
> >
> > Shannon entropy, mutual information.
> >
> > Kolmogorov-Smirnov tests
> >
> > Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.
> >
> > Inverse normal distribution, and normally distributed random number generation.
> >
> > A struct to generate all possible permutations of a sequence.
> I don't know what a lot of those things are, but statistics to me
> means you will probably have (or eventually want) things like
> covariance which are best represented as matrices.  Does your package
> also have a matrix library?
> --bb

No, it doesn't have a matrix library right now.  I make no claim that it is in any way complete right now, but I do think it has some pretty useful stuff that's not likely to be anywhere else for D.
October 24, 2008
On Fri, Oct 24, 2008 at 9:39 AM, dsimcha <dsimcha@yahoo.com> wrote:
> == Quote from Bill Baxter (wbaxter@gmail.com)'s article
>> On Fri, Oct 24, 2008 at 7:43 AM, dsimcha <dsimcha@yahoo.com> wrote:
>> > Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary.  Almost by accident, it seems like I've built up the beginnings of a decent statistics library.  I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area.  The following functionality is currently available:
>> >
>> > Correlation (Pearson, Spearman rho, Kendall tau).   Note that the     Kendall
>> > tau correlation is a very efficient O(N log N) version.
>> >
>> > Mean, standard deviation, variance, kurtosis, percent variance for arrays of numeric values.
>> >
>> > Shannon entropy, mutual information.
>> >
>> > Kolmogorov-Smirnov tests
>> >
>> > Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.
>> >
>> > Inverse normal distribution, and normally distributed random number generation.
>> >
>> > A struct to generate all possible permutations of a sequence.
>> I don't know what a lot of those things are, but statistics to me
>> means you will probably have (or eventually want) things like
>> covariance which are best represented as matrices.  Does your package
>> also have a matrix library?
>> --bb
>
> No, it doesn't have a matrix library right now.  I make no claim that it is in any way complete right now, but I do think it has some pretty useful stuff that's not likely to be anywhere else for D.

Ok, so it's mainly for 1d statistics then?

--bb
October 24, 2008
== Quote from BCS (ao@pathlink.com)'s article
> Even better would be getting it in both Phobos and Tango. Shouldn't be hard as I can't think it should depend on much.

First, Tango needs to be ported to D2 (I realize that this is happening) or my code needs to be ported to D1.  Anyhow, here are the dependencies:

Non-trivial, i.e. in several places:
std.math, std.traits, std.functional, some custom sorting functions I wrote, which
could just be included

Trivial, i.e. in only one or two small places, pretty sure Tango has a drop-in
replacement
std.bigint (for factorial, although all functions that actually use a factorial
are calculated in log space, and therefore don't depend on this), std.algorithm
(for swap, isSorted), std.random
October 24, 2008
== Quote from Bill Baxter (wbaxter@gmail.com)'s article
> Ok, so it's mainly for 1d statistics then?
> --bb

Right.
« First   ‹ Prev
1 2 3