Jump to page: 1 2
Thread overview
Generalized Linear Models and Stochastic Gradient Descent in D
Jun 10, 2017
data pulverizer
Jun 11, 2017
Nicholas Wilson
Jun 11, 2017
data pulverizer
Jun 11, 2017
Nicholas Wilson
Jun 11, 2017
data pulverizer
Jun 11, 2017
9il
Jun 11, 2017
9il
Jun 11, 2017
data pulverizer
Jun 11, 2017
data pulverizer
Jun 11, 2017
9il
Jun 11, 2017
data pulverizer
Jun 11, 2017
9il
Jun 11, 2017
jmh530
Jun 11, 2017
Ilya
Jun 12, 2017
jmh530
Jun 12, 2017
9il
June 10, 2017
Hi all,

I have written a draft article on Generalized Linear Models and Stochastic Gradient Descent in D (https://github.com/dataPulverizer/glm-stochastic-gradient-descent-d/blob/master/article.ipynb) and would greatly appreciate your suggestions.

Apologies for the spartan nature of the article formatting. That will change for the final draft.

Thank you in advance

DP
June 11, 2017
On Saturday, 10 June 2017 at 20:03:16 UTC, data pulverizer wrote:
> Hi all,
>
> I have written a draft article on Generalized Linear Models and Stochastic Gradient Descent in D (https://github.com/dataPulverizer/glm-stochastic-gradient-descent-d/blob/master/article.ipynb) and would greatly appreciate your suggestions.
>
> Apologies for the spartan nature of the article formatting. That will change for the final draft.
>
> Thank you in advance
>
> DP

Maybe its the default rendering but the open math font is hard to read as the sub scripts get vertically compressed.

My suggestions:

Distinguish between the likelihood functions for gamma and normal rather than calling them both L(x). Maybe L subscript uppercase gamma and L subscript N?

Links to wikipedia for the technical terms (e.g. dispersion, chi squared, curvature), again the vertical compression of the math font does not help here (subscripts of fractions) . It will expand your audience if they don't get lost in the introduction!

Speaking of not losing your audience: give a link to the NRA and/or a brief explanation of how it generalises to higher dimensions (graph or animation for the 2D case would be good, perhaps take something from wikipedia)

I dont think it is necessary to show the signature of the BLAS and Lapacke function, just a short description and link should suffice. also any reason you don't use GLAS?

I would just have gParamCalcs as its own function (unless you are trying to show off that particular feature of D).

omit the parentheses of .array() and .reduce()

You use .array a lot: how much of that is necessary? I dont think it is in zip(k.repeat().take(n).array(), x, y, mu)

`return(curv);` should be `return curve;`

Any reason you don't square the tolerance rather than sqrt the parsDiff?

for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?

zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b); =>dot(pars,x)?

Theres a lot of code and text, some images and graphs would be nice, particularly in combination with a more real world example use case.

Factor out code like a[2].repeat().take(a[1].length) to a function, perhaps use some more BLAS routines for things like

.map!( a =>
                        zip(a[0].repeat().take(a[1].length),
                            a[1],
                            a[2].repeat().take(a[1].length),
                            a[3].repeat().take(a[1].length))
                        .map!(a => -a[2]*(a[0]/a[3])*a[1])
                        .array())
                    .array();

to make it more obvious what the calculation is doing.

It might not be the point of the article but it would be good to show some performance figures, I'm sure optimisation tips will be forthcoming.
June 11, 2017
On Saturday, 10 June 2017 at 20:03:16 UTC, data pulverizer wrote:
> Hi all,
>
> I have written a draft article on Generalized Linear Models and Stochastic Gradient Descent in D (https://github.com/dataPulverizer/glm-stochastic-gradient-descent-d/blob/master/article.ipynb) and would greatly appreciate your suggestions.
>
> Apologies for the spartan nature of the article formatting. That will change for the final draft.
>
> Thank you in advance
>
> DP

The code has huge number of allocations. For example, T[][] matrixes are and then concatenated to be used in BLAS.

Why not to use ndslice and Lubeck [1] libraries instead?

[1] https://github.com/kaleidicassociates/lubeck

Ilya
June 11, 2017
> Why not to use ndslice and Lubeck [1] libraries instead?
>
> [1] https://github.com/kaleidicassociates/lubeck
>
> Ilya

It is already has hight level ndslice interface for inv (inverse) and mtimes (matmul).
June 11, 2017
It is obvious that you took time and care to review the article. Thank you very much!

On Sunday, 11 June 2017 at 00:40:23 UTC, Nicholas Wilson wrote:
>
> Maybe its the default rendering but the open math font is hard to read as the sub scripts get vertically compressed.
>
> My suggestions:
>
> Distinguish between the likelihood functions for gamma and normal rather than calling them both L(x). Maybe L subscript uppercase gamma and L subscript N?
>

Good idea!

> Links to wikipedia for the technical terms (e.g. dispersion, chi squared, curvature), again the vertical compression of the math font does not help here (subscripts of fractions) . It will expand your audience if they don't get lost in the introduction!
Yes I should definitely add clarification references. I should probably also add that curvature is also the Hessian, however I recently developed a dislike for calling mathematical constructs after people or odd names I was serious thinking about calling Newton-Raphson something else but that might be taking it too far.

I'll end up writing the final in html so I can add a decent html latex package to modify the size of the equations.

> Speaking of not losing your audience: give a link to the NRA and/or a brief explanation of how it generalises to higher dimensions (graph or animation for the 2D case would be good, perhaps take something from wikipedia)

NRA? Don't understand that acronym with reference to the article.

I shall mention the generalisation of the equations over multiple observations.

I agree that there is a danger of loosing the audience and perhaps some graphics would be nice.


> I dont think it is necessary to show the signature of the BLAS and Lapacke function, just a short description and link should suffice. also any reason you don't use GLAS?

True

> I would just have gParamCalcs as its own function (unless you are trying to show off that particular feature of D).

I think its easier to use mixin in since the same code is required since mu and k are used in gLogLik, gGradient, gCurvature and xB is also used in gLogLik. Showing off the use of mixins is also a plus.

> omit the parentheses of .array() and .reduce()

Yes

> You use .array a lot: how much of that is necessary? I dont think it is in zip(k.repeat().take(n).array(), x, y, mu)

Yes, I should remove the points where .array() is not necessary

> `return(curv);` should be `return curve;`

Thanks that's my R bleeding into my D! So should be:

return curv;

> Any reason you don't square the tolerance rather than sqrt the parsDiff?

The calculation is the L2 norm which ends in sqrt later used for the stopping criterion as in the equation

> for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?

hmm potato

> zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b); =>dot(pars,x)?

Fair point. When I started writing the article I considered attempting to write the whole thing in D functional style - with no external libraries. In the end I didn't want to write a matrix inverse in functional style so I rolled it back somewhat and started adding C calls which is more sensible.

> Theres a lot of code and text, some images and graphs would be nice, particularly in combination with a more real world example use case.

I would agree that the article does need be less austere - however the article is about the GLM algorithm rather than its uses. I think the analyst should know whether they need a GLM or not - there are many sources that explain applications of GLM - I could perhaps reference some.

> Factor out code like a[2].repeat().take(a[1].length) to a function, perhaps use some more BLAS routines for things like
>
> .map!( a =>
>                         zip(a[0].repeat().take(a[1].length),
>                             a[1],
>                             a[2].repeat().take(a[1].length),
>                             a[3].repeat().take(a[1].length))
>                         .map!(a => -a[2]*(a[0]/a[3])*a[1])
>                         .array())
>                     .array();
>
> to make it more obvious what the calculation is doing.

Yes

> It might not be the point of the article but it would be good to show some performance figures, I'm sure optimisation tips will be forthcoming.

Since I am using whatever cblas algorithm is installed I'm not sure that benchmarks would really mean much. Ilya raised a good point about the amount of copying I am doing - as I was writing it I thought so too. I address this below.

Thanks again for taking time to review the article!

My main take way from writing this article is that it would be quite straightforward to write a small GLM package in D - I'd use quite a different approach with structs/classes GLM objects to remove the copying issues and to give a consistent interface to the user.

An additional takeaway for me was that I also found the use of array operations like

a[] = b[]*c[]

or

d[] -= e[] -f

created odd effects in my calculations the outputs were wrong and for ages I didn't know why but later ended up removing those expressions from the code all together - which remedied the problem.




June 11, 2017
On Sunday, 11 June 2017 at 01:57:52 UTC, 9il wrote:
> The code has huge number of allocations. For example, T[][] matrixes are and then concatenated to be used in BLAS.

You are right - I realised this as I was writing the script but I address this point later ...

> Why not to use ndslice and Lubeck [1] libraries instead?
>
> [1] https://github.com/kaleidicassociates/lubeck
>
> Ilya

Thank you for mentioning the Lubeck package, I did not know about it and it looks very interesting.

The article is exploratory, I also assume that the person reading it is busy. I tend to gravitate towards Phobos because its there - its the standard library and comes with D, its easy to write code with it and easy for a reader to access. If I write an article with code I want it to be likely that:

1. Anyone can download and run the code immediately and it will work.
2. If someone sees the article in 6 months or 3 years and downloads the code it will work.
3. The reader will be able to look up all the functions I have used in the D website - makes it very easy for learners.

At this stage the numerical computing ecosystem in D is not mature and could change drastically. I added a link to the Mir library at the top because I wanted people to be aware of the Mir project.

The article is more about GLM in D than performance but I can point to the Lubeck package in the article and mention your observation on the allocations - making it clearer upfront.

As I said in the previous reply, I did learn a lot from writing the article and I think the performance observation is highly relevant for building a GLM package in D.
June 11, 2017
On Sunday, 11 June 2017 at 01:59:37 UTC, 9il wrote:
>> Why not to use ndslice and Lubeck [1] libraries instead?
>>
>> [1] https://github.com/kaleidicassociates/lubeck
>>
>> Ilya
>
> It is already has hight level ndslice interface for inv (inverse) and mtimes (matmul).

p.s. I think the work you guys are doing on lubeck is very important! Numerical computing in D desperately needs a high level linear algebra package.
June 11, 2017
On Sunday, 11 June 2017 at 10:21:03 UTC, data pulverizer wrote:
>> Speaking of not losing your audience: give a link to the NRA and/or a brief explanation of how it generalises to higher dimensions (graph or animation for the 2D case would be good, perhaps take something from wikipedia)
>
> NRA? Don't understand that acronym with reference to the article.
>

Sorry Newton-Raphson Algorithm.

> I shall mention the generalisation of the equations over multiple observations.
>
> I agree that there is a danger of loosing the audience and perhaps some graphics would be nice.
>

I suspect that most of the people that would know what Newton-Raphson is took it in undergrad math and probably didn't cover multidimensional cases (I didn't).
>
>> I dont think it is necessary to show the signature of the BLAS and Lapacke function, just a short description and link should suffice. also any reason you don't use GLAS?
>
> True
>
>> I would just have gParamCalcs as its own function (unless you are trying to show off that particular feature of D).
>
> I think its easier to use mixin in since the same code is required since mu and k are used in gLogLik, gGradient, gCurvature and xB is also used in gLogLik. Showing off the use of mixins is also a plus.

Then I would state why and give a brief explanation of what that mixin does.
>
>> omit the parentheses of .array() and .reduce()
>
> Yes
>
>> You use .array a lot: how much of that is necessary? I dont think it is in zip(k.repeat().take(n).array(), x, y, mu)
>
> Yes, I should remove the points where .array() is not necessary
>
>> `return(curv);` should be `return curve;`
>
> Thanks that's my R bleeding into my D! So should be:
>
> return curv;
>
>> Any reason you don't square the tolerance rather than sqrt the parsDiff?
>
> The calculation is the L2 norm which ends in sqrt later used for the stopping criterion as in the equation
>
>> for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?
>
> hmm potato
>
>> zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b); =>dot(pars,x)?
>
> Fair point. When I started writing the article I considered attempting to write the whole thing in D functional style - with no external libraries. In the end I didn't want to write a matrix inverse in functional style so I rolled it back somewhat and started adding C calls which is more sensible.
>
>> Theres a lot of code and text, some images and graphs would be nice, particularly in combination with a more real world example use case.
>
> I would agree that the article does need be less austere - however the article is about the GLM algorithm rather than its uses. I think the analyst should know whether they need a GLM or not - there are many sources that explain applications of GLM - I could perhaps reference some.

Fair enough, I'm just thinking about breadth of audience.

>
>> Factor out code like a[2].repeat().take(a[1].length) to a function, perhaps use some more BLAS routines for things like
>>
>> .map!( a =>
>>                         zip(a[0].repeat().take(a[1].length),
>>                             a[1],
>>                             a[2].repeat().take(a[1].length),
>>                             a[3].repeat().take(a[1].length))
>>                         .map!(a => -a[2]*(a[0]/a[3])*a[1])
>>                         .array())
>>                     .array();
>>
>> to make it more obvious what the calculation is doing.
>
> Yes
>
>> It might not be the point of the article but it would be good to show some performance figures, I'm sure optimisation tips will be forthcoming.
>
> Since I am using whatever cblas algorithm is installed I'm not sure that benchmarks would really mean much. Ilya raised a good point about the amount of copying I am doing - as I was writing it I thought so too. I address this below.
>
> Thanks again for taking time to review the article!
>

No problem, always happy to help.

> My main take way from writing this article is that it would be quite straightforward to write a small GLM package in D - I'd use quite a different approach with structs/classes GLM objects to remove the copying issues and to give a consistent interface to the user.
>
> An additional takeaway for me was that I also found the use of array operations like
>
> a[] = b[]*c[]
>
> or
>
> d[] -= e[] -f
>
> created odd effects in my calculations the outputs were wrong and for ages I didn't know why but later ended up removing those expressions from the code all together - which remedied the problem.

Hmm, did you report those? They _should_  just work.
June 11, 2017
On Sunday, 11 June 2017 at 01:59:37 UTC, 9il wrote:
>> Why not to use ndslice and Lubeck [1] libraries instead?
>>
>> [1] https://github.com/kaleidicassociates/lubeck
>>
>> Ilya
>
> It is already has hight level ndslice interface for inv (inverse) and mtimes (matmul).

p.p.s Okay, I can see how writing an article using D's latest numerical library will help increase knowledge of packages in the ecosystem. Fair enough. I shall look into re-writing to include lubeck
June 11, 2017
On Sunday, 11 June 2017 at 11:30:03 UTC, Nicholas Wilson wrote:
>> An additional takeaway for me was that I also found the use of array operations like
>>
>> a[] = b[]*c[]
>>
>> or
>>
>> d[] -= e[] -f
>>
>> created odd effects in my calculations the outputs were wrong and for ages I didn't know why but later ended up removing those expressions from the code all together - which remedied the problem.
>
> Hmm, did you report those? They _should_  just work.

Unfortunately I over-wrote the code before the first commit to Github and did not report it. The whole experience is too traumatic for me to repeat :-) but I think its something to do with the way the arrays are being referenced.
« First   ‹ Prev
1 2