Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
May 31, 2013 A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Just want to share a new way I just discovered to do loop unrolling. template Unroll(alias CODE, alias N) { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); } after that you can write stuff like mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); and it gets expanded to v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling. |
May 31, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to finalpatch | Minor improvement: template Unroll(alias CODE, alias N, alias SEP="") { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1, SEP)~SEP~format(CODE, N-1); } So vector dot product can be unrolled like this: mixin(Unroll!("v1[%1$d]*v2[%1$d]", 3, "+")); which becomes: v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2] On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote: > Just want to share a new way I just discovered to do loop unrolling. > > template Unroll(alias CODE, alias N) > { > static if (N == 1) > enum Unroll = format(CODE, 0); > else > enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); > } > > after that you can write stuff like > > mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); > > and it gets expanded to > > v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; > > I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling. |
May 31, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to finalpatch | W dniu 31.05.2013 16:06, finalpatch pisze:
> Just want to share a new way I just discovered to do loop unrolling.
>
> template Unroll(alias CODE, alias N)
> {
> static if (N == 1)
> enum Unroll = format(CODE, 0);
> else
> enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
> }
>
> after that you can write stuff like
>
> mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));
>
> and it gets expanded to
>
> v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];
>
> I find this method simpler than with foreach() and a tuple range, and
> also faster because it's identical to hand unrolling.
The advantage of foreach unrolling is that compiler can optimally choose unrolling depth as different depths may be faster or slower on different CPU targets. It is also an opportunity to do loop vectorization. But I doubt that either is available in DMD, not sure about GDC and LDC.
|
May 31, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to finalpatch | On 5/31/13 10:06 AM, finalpatch wrote:
> Just want to share a new way I just discovered to do loop unrolling.
>
> template Unroll(alias CODE, alias N)
> {
> static if (N == 1)
> enum Unroll = format(CODE, 0);
> else
> enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
> }
>
> after that you can write stuff like
>
> mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));
>
> and it gets expanded to
>
> v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];
>
> I find this method simpler than with foreach() and a tuple range, and
> also faster because it's identical to hand unrolling.
Hehe, first shot is always a trip isn't it. Welcome aboard.
We should have something like that in phobos.
Andrei
|
May 31, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu: > We should have something like that in phobos. Better (some part of static foreach): http://d.puremagic.com/issues/show_bug.cgi?id=4085 Bye, bearophile |
May 31, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to Piotr Szturmaj | Am Fri, 31 May 2013 16:33:19 +0200 schrieb Piotr Szturmaj <bncrbme@jadamspam.pl>: > It is also an opportunity to do loop vectorization. But I doubt that either is available in DMD, not sure about GDC and LDC. GDC once vectorized something for me, where I used a struct of 4 ubyte fields. I don't remember if it was a loop at all. I think all I did was operate on 3 of the fields in sequence applying the same operations and the compiler loaded the whole struct into an SSE register and it really payed off speed wise! But when you think about it, working with RGB or XYZW vectors is a common task in programming, so I can see why they put so much work into vectorization. The caveat is just that you have to remember to add a fourth dummy field to XYZ or RGB. -- Marco |
May 31, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to finalpatch | On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote: > Just want to share a new way I just discovered to do loop unrolling. > > template Unroll(alias CODE, alias N) > { > static if (N == 1) > enum Unroll = format(CODE, 0); > else > enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); > } > > after that you can write stuff like > > mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); > > and it gets expanded to > > v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; > > I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling. Remember that in D, most side-effect free functions can be run at compile time. No need for recursive template trickery: mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", i)).join()); |
May 31, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On Fri, 31 May 2013 19:30:10 +0200
"Peter Alexander" <peter.alexander.au@gmail.com> wrote:
>
>
> mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", i)).join());
Dayamn! I knew CTFE had improved considerably over the last year or so, but even I didn't expect something like that to be working already. That's crazy! :)
|
June 01, 2013 Re: A simple way to do compile time loop unrolling | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | Wow! That's so very cool! We can make it even nicer with
template Unroll(alias CODE, alias N, alias SEP="")
{
enum t = replace(CODE, "%", "%1$d");
enum Unroll = iota(N).map!(i => format(t, i)).join(SEP);
}
And use % as the placeholder instead of the ugly %1$d:
mixin(Unroll!("v1[%]*v2[%]", 3, "+"));
It actually gets quite readable now.
On Friday, 31 May 2013 at 17:30:13 UTC, Peter Alexander wrote:
> Remember that in D, most side-effect free functions can be run at compile time. No need for recursive template trickery:
>
> mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", i)).join());
|
Copyright © 1999-2021 by the D Language Foundation