October 24, 2008
Andrei Alexandrescu wrote:
> If the community is interested, I'd be glad to take over your code and put it in Phobos.

I'm interested.
October 24, 2008
dsimcha wrote:
> Since there's really no good comprehensive statistics library for D (Tango has
> a little bit, the beginnings of a few are on dsource, but nothing much), Ive
> been rolling my own statistics functions as necessary.  Almost by accident, it
> seems like I've built up the beginnings of a decent statistics library.  I'm
> debating whether it might be interesting enough to people to be worth
> releasing, and whether enough community help would be available to really make
> it production quality, or to merge it with other people's efforts in this
> area.  The following functionality is currently available:


> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric,
> Poisson, binomial PDFs.  Inverse normal distribution,

Most of these are in Tango (not Kolmogorov). Are yours different in some way?

> A struct to generate all possible permutations of a sequence.

>
> Correlation (Pearson, Spearman rho, Kendall tau).   Note that the  Kendall
> tau correlation is a very efficient O(N log N) version.
>
> Mean, standard deviation, variance, kurtosis, percent variance for arrays of
> numeric values.
>
> Shannon entropy, mutual information.
>
> Kolmogorov-Smirnov tests

Sounds good.

> 
> 
> On the other hand, I'm a scientist, not a full-time programmer, 

Me too!

> Is there any interest in this from others in the D community?  Do other people
> think that D would benefit from having a decent statistics library?  

Yes. Which is why I put the existing stuff into Tango.
October 24, 2008
== Quote from BCS (ao@pathlink.com)'s article
> Reply to dsimcha,
> > Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary.  Almost by accident, it seems like I've built up the beginnings of a decent statistics library.  I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area.
> Well for starters, just ask and I'll get you access to put it on scrapple. That's if you don't want to go to the trouble of having your own project (it's not much trouble BTW)

Sounds good at least for now.  It's only about 1500 lines of code including unittests, comments, etc.  I'll put it up on scrapple with a permissive license, and people can make suggestions, and integrate it into Phobos and Tango as they see fit.
October 24, 2008
== Quote from Don (nospam@nospam.com.au)'s article
> > Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric, Poisson, binomial PDFs.  Inverse normal distribution,
> Most of these are in Tango (not Kolmogorov). Are yours different in some
> way?

They calculate the exact log factorial using a caching scheme.  Not sure how much accuracy this actually buys, though it costs some memory.  I should probably change the logFactorial function to a gamma approximation at least for large N.

Also, Tango doesn't have hypergeometric.

> > A struct to generate all possible permutations of a sequence.
>  >
>  > Correlation (Pearson, Spearman rho, Kendall tau).   Note that the
>   Kendall
>  > tau correlation is a very efficient O(N log N) version.
>  >
>  > Mean, standard deviation, variance, kurtosis, percent variance for
> arrays of
>  > numeric values.
>  >
>  > Shannon entropy, mutual information.
>  >
>  > Kolmogorov-Smirnov tests
> Sounds good.

This is more the part that I thought might be useful.


> > On the other hand, I'm a scientist, not a full-time programmer,
> Me too!
> > Is there any interest in this from others in the D community?  Do other people think that D would benefit from having a decent statistics library?
> Yes. Which is why I put the existing stuff into Tango.

BCS has offered me Scrapple access, I'll post the code there under a permissive license.  From there, Tango and Phobos devs can look at it and do as they see fit.



October 24, 2008
Reply to dsimcha,

> == Quote from BCS (ao@pathlink.com)'s article
> 
>> Reply to dsimcha,
>> 
>>> Since there's really no good comprehensive statistics library for D
>>> (Tango has a little bit, the beginnings of a few are on dsource, but
>>> nothing much), Ive been rolling my own statistics functions as
>>> necessary.  Almost by accident, it seems like I've built up the
>>> beginnings of a decent statistics library.  I'm debating whether it
>>> might be interesting enough to people to be worth releasing, and
>>> whether enough community help would be available to really make it
>>> production quality, or to merge it with other people's efforts in
>>> this area.
>>> 
>> Well for starters, just ask and I'll get you access to put it on
>> scrapple. That's if you don't want to go to the trouble of having
>> your own project (it's not much trouble BTW)
>> 
> Sounds good at least for now.  It's only about 1500 lines of code
> including unittests, comments, etc.  I'll put it up on scrapple with a
> permissive license, and people can make suggestions, and integrate it
> into Phobos and Tango as they see fit.
> 

If you don't already have access send me your username and I'll add you.


October 25, 2008
dsimcha wrote:
> == Quote from Don (nospam@nospam.com.au)'s article
>>> Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric,
>>> Poisson, binomial PDFs.  Inverse normal distribution,
>> Most of these are in Tango (not Kolmogorov). Are yours different in some
>> way?
> 
> They calculate the exact log factorial using a caching scheme.  Not sure how much
> accuracy this actually buys, though it costs some memory.  I should probably
> change the logFactorial function to a gamma approximation at least for large N.

That shouldn't be necessary. If logGamma() isn't giving an accurate factorial (within a couple of bits of precision), that's a problem with logGamma. Please generate a bug report.

> Also, Tango doesn't have hypergeometric.

You're right. It's still on my hard disk, I wasn't quite happy it.

> 
>>> A struct to generate all possible permutations of a sequence.
>>  >
>>  > Correlation (Pearson, Spearman rho, Kendall tau).   Note that the
>>   Kendall
>>  > tau correlation is a very efficient O(N log N) version.
>>  >
>>  > Mean, standard deviation, variance, kurtosis, percent variance for
>> arrays of
>>  > numeric values.
>>  >
>>  > Shannon entropy, mutual information.
>>  >
>>  > Kolmogorov-Smirnov tests
>> Sounds good.
> 
> This is more the part that I thought might be useful.
> 
> 
>>> On the other hand, I'm a scientist, not a full-time programmer,
>> Me too!
>>> Is there any interest in this from others in the D community?  Do other people
>>> think that D would benefit from having a decent statistics library?
>> Yes. Which is why I put the existing stuff into Tango.
> 
> BCS has offered me Scrapple access, I'll post the code there under a permissive
> license.  From there, Tango and Phobos devs can look at it and do as they see fit.
Cool!
October 26, 2008
== Quote from BCS (ao@pathlink.com)'s article
> Reply to dsimcha,
> > == Quote from BCS (ao@pathlink.com)'s article
> >
> >> Reply to dsimcha,
> >>
> >>> Since there's really no good comprehensive statistics library for D (Tango has a little bit, the beginnings of a few are on dsource, but nothing much), Ive been rolling my own statistics functions as necessary.  Almost by accident, it seems like I've built up the beginnings of a decent statistics library.  I'm debating whether it might be interesting enough to people to be worth releasing, and whether enough community help would be available to really make it production quality, or to merge it with other people's efforts in this area.
> >>>
> >> Well for starters, just ask and I'll get you access to put it on scrapple. That's if you don't want to go to the trouble of having your own project (it's not much trouble BTW)
> >>
> > Sounds good at least for now.  It's only about 1500 lines of code including unittests, comments, etc.  I'll put it up on scrapple with a permissive license, and people can make suggestions, and integrate it into Phobos and Tango as they see fit.
> >
> If you don't already have access send me your username and I'll add you.

Username:  dsimcha

Yes, I realize that it's best to do things like this off the newsgroup, but your email address doesn't seem to work.
October 27, 2008
Reply to dsimcha,

> Yes, I realize that it's best to do things like this off the
> newsgroup, but your email address doesn't seem to work.
> 

Sorry. I figure I get enough SPAM as it is. Besides, there are about 2 dozen other ways to get it if you are persistent enough

Oh. Your in, have fun

(I really ought to make up some boiler plate like this: http://en.wikipedia.org/wiki/Sudo#Design ;)


October 27, 2008
If my vote counts - I am all for it. :)
October 27, 2008
Now that I've figured out how the heck to use SVN, it's up on scrapple. Everything basically has unittests and has been dogfooded by me, but if there are any bugs I missed, please file.  I'm actually kind of surprised at the level of interest in this project.

http://dsource.org/projects/scrapple/browser/trunk/statistics

Just a reminder:  The current version makes pretty heavy use of D2 features, so it is completely incompatible with D1.  D1 compatibility was not even a consideration in the design of anything.  I have no intention of backporting to D1, since D2 is getting pretty close to ready anyhow.

Also, I'm not a lawyer, but the intent of the license I put in the header of the files is to be very permissive to allow integration into Phobos and Tango of whatever functionality Andrei and Don see fit.  If you see a problem with the way I have worded the license, let me know and I'll change it.