Optimising

I'm from the fcpu project (f-cpu.org), the goal is to create a "free cpu". One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 ! One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array. So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more. What do you think ? nicO

Since D supports rectangular arrays, much more aggressive array optimization becomes possible. nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>... >I'm from the fcpu project (f-cpu.org), the goal is to create a "free >cpu". > >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 ! > >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array. > >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more. > >What do you think ? > >nicO

August 18, 2001

Re: Optimising

Posted by LuigiG
in reply to Walter

Permalink

LuigiG

Posted in reply to Walter

Permalink

BTW, Microsoft's compiler has a
"/Oa assume no aliasing"
switch.
kind of playing with fire I assume ;)

"Walter" <walter@digitalmars.com> wrote in message news:9lkk0g$2qdq$2@digitaldaemon.com...
> Since D supports rectangular arrays, much more aggressive array
optimization
> becomes possible.
>
> nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>...
> >I'm from the fcpu project (f-cpu.org), the goal is to create a "free
> >cpu".
> >
> >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 !
> >
> >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array.
> >
> >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more.
> >
> >What do you think ?
> >
> >nicO
>
>

Walter a écrit : > > Since D supports rectangular arrays, much more aggressive array optimization becomes possible. > Is that enough to introduice strip mining ? Or to change the order of the loop ? > nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>... > >I'm from the fcpu project (f-cpu.org), the goal is to create a "free > >cpu". > > > >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 ! > > > >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array. > > > >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more. > > > >What do you think ? > > > >nicO

LuigiG wrote in message <9ll862$5ch$1@digitaldaemon.com>... >BTW, Microsoft's compiler has a >"/Oa assume no aliasing" >switch. >kind of playing with fire I assume ;) I decided long ago not to support such a switch. It would really only be useful to someone who understood exactly how the internal compiler optimizations really worked, which is likely nobody <g>.

nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>... >Walter a écrit : >> >> Since D supports rectangular arrays, much more aggressive array optimization >> becomes possible. >> > >Is that enough to introduice strip mining ? Or to change the order of the loop ? I think so.

"Walter" <walter@digitalmars.com> wrote in message news:9lmlj0$111a$3@digitaldaemon.com... > > LuigiG wrote in message <9ll862$5ch$1@digitaldaemon.com>... > >BTW, Microsoft's compiler has a > >"/Oa assume no aliasing" > >switch. > >kind of playing with fire I assume ;) > > > I decided long ago not to support such a switch. It would really only be useful to someone who understood exactly how the internal compiler optimizations really worked, which is likely nobody <g>. > Yep, basically, if you know so much about the compiler that you can use the noalias switch; you don't the noalias switch anymore.

Walter a écrit : > > nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>... > >Walter a écrit : > >> > >> Since D supports rectangular arrays, much more aggressive array > optimization > >> becomes possible. > >> > > > >Is that enough to introduice strip mining ? Or to change the order of the loop ? > > I think so. Good news !

Forums