Thread overview | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
August 17, 2001 Optimising | ||||
---|---|---|---|---|
| ||||
I'm from the fcpu project (f-cpu.org), the goal is to create a "free cpu". One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 ! One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array. So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more. What do you think ? nicO |
August 18, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to nicO | Since D supports rectangular arrays, much more aggressive array optimization becomes possible. nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>... >I'm from the fcpu project (f-cpu.org), the goal is to create a "free >cpu". > >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 ! > >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array. > >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more. > >What do you think ? > >nicO |
August 18, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | BTW, Microsoft's compiler has a "/Oa assume no aliasing" switch. kind of playing with fire I assume ;) "Walter" <walter@digitalmars.com> wrote in message news:9lkk0g$2qdq$2@digitaldaemon.com... > Since D supports rectangular arrays, much more aggressive array optimization > becomes possible. > > nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>... > >I'm from the fcpu project (f-cpu.org), the goal is to create a "free > >cpu". > > > >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 ! > > > >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array. > > > >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more. > > > >What do you think ? > > > >nicO > > |
August 18, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | Walter a écrit : > > Since D supports rectangular arrays, much more aggressive array optimization becomes possible. > Is that enough to introduice strip mining ? Or to change the order of the loop ? > nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>... > >I'm from the fcpu project (f-cpu.org), the goal is to create a "free > >cpu". > > > >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 ! > > > >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array. > > > >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more. > > > >What do you think ? > > > >nicO |
August 18, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to LuigiG | LuigiG wrote in message <9ll862$5ch$1@digitaldaemon.com>... >BTW, Microsoft's compiler has a >"/Oa assume no aliasing" >switch. >kind of playing with fire I assume ;) I decided long ago not to support such a switch. It would really only be useful to someone who understood exactly how the internal compiler optimizations really worked, which is likely nobody <g>. |
August 18, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to nicO | nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>... >Walter a écrit : >> >> Since D supports rectangular arrays, much more aggressive array optimization >> becomes possible. >> > >Is that enough to introduice strip mining ? Or to change the order of the loop ? I think so. |
August 18, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:9lmlj0$111a$3@digitaldaemon.com... > > LuigiG wrote in message <9ll862$5ch$1@digitaldaemon.com>... > >BTW, Microsoft's compiler has a > >"/Oa assume no aliasing" > >switch. > >kind of playing with fire I assume ;) > > > I decided long ago not to support such a switch. It would really only be useful to someone who understood exactly how the internal compiler optimizations really worked, which is likely nobody <g>. > Yep, basically, if you know so much about the compiler that you can use the noalias switch; you don't the noalias switch anymore. |
August 19, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | Walter a écrit :
>
> nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>...
> >Walter a écrit :
> >>
> >> Since D supports rectangular arrays, much more aggressive array
> optimization
> >> becomes possible.
> >>
> >
> >Is that enough to introduice strip mining ? Or to change the order of the loop ?
>
> I think so.
Good news !
|
August 19, 2001 Re: Optimising | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | Walter a écrit :
>
> nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>...
> >Walter a écrit :
> >>
> >> Since D supports rectangular arrays, much more aggressive array
> optimization
> >> becomes possible.
> >>
> >
> >Is that enough to introduice strip mining ? Or to change the order of the loop ?
>
> I think so.
And what about vector computing ? (for the use of MMX and SSE)
|
Copyright © 1999-2021 by the D Language Foundation