Making mir.random.ndvariable.multivariateNormalVar create bigger data sets than 2

Feb 27, 2018

kerdemdemir

Feb 27, 2018

jmh530

Feb 27, 2018

Feb 27, 2018

Feb 27, 2018

Feb 27, 2018

Feb 27, 2018

Sep 10, 2018

I need a classifier in my project. Since it is I believe most easy to implement I am trying to implement logistic regression. I am trying to do the same as the python example: https://beckernick.github.io/logistic-regression-from-scratch/ I need to data sets with which I will test. This works(https://run.dlang.io/is/yGa4a0) : double[2] x1; Random* gen = threadLocalPtr!Random; auto mu = [0.0, 0.0].sliced; auto sigma = [1.0, 0.75, 0.75, 1].sliced(2,2); auto rv = multivariateNormalVar(mu, sigma); rv(gen, x1[]); writeln(x1); But when I increase my data set size from double[2] to double[100] I am getting an assert : mir-random-0.4.3/mir-random/source/mir/random/ndvariable.d(378): Assertion failure which is: assert(result.length == n); How can I have a result vector which has size like 5000 something? Erdemdem

On Tuesday, 27 February 2018 at 09:23:49 UTC, kerdemdemir wrote: > I need a classifier in my project. > Since it is I believe most easy to implement I am trying to implement logistic regression. > > I am trying to do the same as the python example: https://beckernick.github.io/logistic-regression-from-scratch/ > > I need to data sets with which I will test. > > This works(https://run.dlang.io/is/yGa4a0) : > > double[2] x1; > Random* gen = threadLocalPtr!Random; > > auto mu = [0.0, 0.0].sliced; > auto sigma = [1.0, 0.75, 0.75, 1].sliced(2,2); > auto rv = multivariateNormalVar(mu, sigma); > rv(gen, x1[]); > writeln(x1); > > But when I increase my data set size from double[2] to double[100] I am getting an assert : > > mir-random-0.4.3/mir-random/source/mir/random/ndvariable.d(378): Assertion failure > > which is: > assert(result.length == n); > > How can I have a result vector which has size like 5000 something? > > Erdemdem I haven't made much use of mir.random yet... The dimension 2 in this case is the size of the dimension of the random variable. What you want to do is simulate multiple times from this 2-dimensional random variable. It looks like the examples on the main Readme page uses mir.random.algorithm.range. I tried below, but I got errors. I did notice that the MultivariateNormalVariable documentation says that it is in beta still. void main() { import mir.random : Random, unpredictableSeed; import mir.random.ndvariable : MultivariateNormalVariable; import mir.random.algorithm : range; import mir.ndslice.slice : sliced; import std.range : take; auto mu = [10.0, 0.0].sliced; auto sigma = [2.0, -1.5, -1.5, 2.0].sliced(2,2); auto rng = Random(unpredictableSeed); auto sample = range!rng (MultivariateNormalVariable!double(mu, sigma)) .take(10); } However, doing it manually with a for loop works. void main() { import mir.random : rne; import mir.random.ndvariable : multivariateNormalVar; import mir.random.algorithm : range; import mir.ndslice.slice : sliced; import std.stdio : writeln; auto mu = [10.0, 0.0].sliced; auto sigma = [2.0, -1.5, -1.5, 2.0].sliced(2,2); auto rv = multivariateNormalVar(mu, sigma); double[2][100] x; for (size_t i = 0; i < 100; i++) { rv(rne, x[i][]); } writeln(x); } Nevertheless, it probably can't hurt to file an issue if you can't get something like the first one to work. I would think it should just work.

On Tuesday, 27 February 2018 at 15:08:42 UTC, jmh530 wrote: > Nevertheless, it probably can't hurt to file an issue if you can't get something like the first one to work. I would think it should just work. The problem is that `mir.random.ndvariable` doesn't satisfy `mir.random.variable.isRandomVariable!T`. ndvariables have a slightly different interface from variables: instead of of `rv(gen)` returning a result, `rv(gen, dst)` writes to dst. I agree that the various methods for working with variables should be enhanced to work with ndvariables.

On Tuesday, 27 February 2018 at 16:42:00 UTC, Nathan S. wrote: > On Tuesday, 27 February 2018 at 15:08:42 UTC, jmh530 wrote: >> Nevertheless, it probably can't hurt to file an issue if you can't get something like the first one to work. I would think it should just work. > > The problem is that `mir.random.ndvariable` doesn't satisfy `mir.random.variable.isRandomVariable!T`. ndvariables have a slightly different interface from variables: instead of of `rv(gen)` returning a result, `rv(gen, dst)` writes to dst. I agree that the various methods for working with variables should be enhanced to work with ndvariables. So, I see that the interface will have to be slightly different for ndvariable than for variable. With the exception of MultivariateNormalVariable, the same ndvariable instance can be called to fill output of any length "n", so one can't meaningfully create a range based on just the ndvariable without further specification. What would "front" return? For MultivariateNormalVariable "n" is constrained but it is a runtime parameter rather than a compile-time parameter. You'll want to ping @9il / Ilya Yaroshenko to discuss what the API should be like for this.

February 27, 2018

Re: Making mir.random.ndvariable.multivariateNormalVar create bigger data sets than 2

Posted by jmh530
in reply to Nathan S.

Permalink

jmh530

Posted in reply to Nathan S.

Permalink

On Tuesday, 27 February 2018 at 17:24:22 UTC, Nathan S. wrote:
> On Tuesday, 27 February 2018 at 16:42:00 UTC, Nathan S. wrote:
>> On Tuesday, 27 February 2018 at 15:08:42 UTC, jmh530 wrote:
>>> Nevertheless, it probably can't hurt to file an issue if you can't get something like the first one to work. I would think it should just work.
>>
>> The problem is that `mir.random.ndvariable` doesn't satisfy `mir.random.variable.isRandomVariable!T`. ndvariables have a slightly different interface from variables: instead of of  `rv(gen)` returning a result, `rv(gen, dst)` writes to dst. I agree that the various methods for working with variables should be enhanced to work with ndvariables.
>
> So, I see that the interface will have to be slightly different for ndvariable than for variable. With the exception of MultivariateNormalVariable, the same ndvariable instance can be called to fill output of any length "n", so one can't meaningfully create a range based on just the ndvariable without further specification. What would "front" return? For MultivariateNormalVariable "n" is constrained but it is a runtime parameter rather than a compile-time parameter.
>
> You'll want to ping @9il / Ilya Yaroshenko to discuss what the API should be like for this.

Honestly, I think the post above was my first use of mir.random, so I'm nowhere near familiar enough at this point to add much useful feedback. I'm definitely glad that it is getting worked on and plan on using it in the future.

The only thing I would note is that there are not just N-dimensional random variables, there are also NXN dimensional random variables (not sure what else there could be, but it would be significantly less popular). A Wishart distribution (used for the distribution of covariance matrices) can be simulated by multiplying the transpose of a multivariate random normal by itself. This produces an NXN matrix. Ideally, the API could handle this type of distribution as well.

Another type of distribution I sometimes see is from Bayesian statistics (less common than typical distributions and could probably be built on top of what is already in mir.random, but I figured it couldn't hurt to bring it to your attention). A normal-inverse-gamma distribution is one example of these types of distributions. Simulating from this distribution would produce a pair of the mean and variance, not just one value. This would contrast with multivariate normal in that you would know it has two dimensions at compile-time.

Cross-posting from the github issue (https://github.com/libmir/mir-random/issues/77) with a workaround (execute it at https://run.dlang.io/is/Swr1xU): ---- I am not sure what the correct interface should be for this in the long run, but for now you can use a wrapper function to convert an ndvariable to a variable: ```d /++ Converts an N-dimensional variable to a fixed-dimensional variable. +/ auto specifyDimension(ReturnType, NDVariable)(NDVariable vr) if (__traits(isStaticArray, ReturnType) && __traits(compiles, {static assert(NDVariable.isRandomVariable);})) { import mir.random : isSaturatedRandomEngine; import mir.random.variable : isRandomVariable; static struct V { enum bool isRandomVariable = true; NDVariable vr; ReturnType opCall(G)(scope ref G gen) if (isSaturatedRandomEngine!G) { ReturnType ret; vr(gen, ret[]); return ret; } ReturnType opCall(G)(scope G* gen) if (isSaturatedRandomEngine!G) { return opCall!(G)(*gen); } } static assert(isRandomVariable!V); V v = { vr }; return v; } ``` So `main` from your above example becomes: ```d void main() { import std.stdio; import mir.random : Random, threadLocalPtr; import mir.random.ndvariable : multivariateNormalVar; import mir.random.algorithm : range; import mir.ndslice.slice : sliced; import std.range : take; auto mu = [10.0, 0.0].sliced; auto sigma = [2.0, -1.5, -1.5, 2.0].sliced(2,2); Random* rng = threadLocalPtr!Random; auto sample = rng .range(multivariateNormalVar(mu, sigma).specifyDimension!(double[2])) .take(10); writeln(sample); } ```

On Tuesday, 27 February 2018 at 21:54:34 UTC, Nathan S. wrote: > Cross-posting from the github issue (https://github.com/libmir/mir-random/issues/77) with a workaround (execute it at https://run.dlang.io/is/Swr1xU): > ---- > [snip] Step in the right direction at least.

On Tuesday, 27 February 2018 at 09:23:49 UTC, kerdemdemir wrote: > I need a classifier in my project. > Since it is I believe most easy to implement I am trying to implement logistic regression. > > [...] Mir Random v1.0.0 has new `range` overloads that can work NdRandomVariable. Example: https://run.dlang.io/is/jte3gx

Forums