Jump to page: 1 2 3
Thread overview
Optimising
Aug 17, 2001
nicO
Aug 18, 2001
Walter
Aug 18, 2001
LuigiG
Aug 18, 2001
Walter
Aug 18, 2001
LuigiG
Aug 18, 2001
nicO
Aug 18, 2001
Walter
Aug 19, 2001
nicO
Aug 19, 2001
nicO
Aug 19, 2001
Walter
Aug 19, 2001
nicO
Oct 29, 2001
Sean L. Palmer
Jan 10, 2002
Walter
Re: SIMD (was: Optimising)
Jan 10, 2002
Sean L. Palmer
Jan 10, 2002
Walter
Jan 10, 2002
Pavel Minayev
Jan 10, 2002
Walter
Jan 10, 2002
Pavel Minayev
Jan 12, 2002
Walter
Jan 12, 2002
Pavel Minayev
Jan 12, 2002
Pavel Minayev
August 17, 2001
I'm from the fcpu project (f-cpu.org), the goal is to create a "free
cpu".

One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 !

One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array.

So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more.

What do you think ?

nicO
August 18, 2001
Since D supports rectangular arrays, much more aggressive array optimization becomes possible.

nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>...
>I'm from the fcpu project (f-cpu.org), the goal is to create a "free
>cpu".
>
>One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 !
>
>One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array.
>
>So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more.
>
>What do you think ?
>
>nicO


August 18, 2001
BTW, Microsoft's compiler has a
"/Oa assume no aliasing"
switch.
kind of playing with fire I assume ;)

"Walter" <walter@digitalmars.com> wrote in message news:9lkk0g$2qdq$2@digitaldaemon.com...
> Since D supports rectangular arrays, much more aggressive array
optimization
> becomes possible.
>
> nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>...
> >I'm from the fcpu project (f-cpu.org), the goal is to create a "free
> >cpu".
> >
> >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 !
> >
> >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array.
> >
> >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more.
> >
> >What do you think ?
> >
> >nicO
>
>


August 18, 2001
Walter a écrit :
> 
> Since D supports rectangular arrays, much more aggressive array optimization becomes possible.
> 

Is that enough to introduice strip mining ? Or to change the order of the loop ?

> nicO wrote in message <3B7DFFFC.7FB87E4F@ifrance.com>...
> >I'm from the fcpu project (f-cpu.org), the goal is to create a "free
> >cpu".
> >
> >One of our main problem is the compiler. I have wrote some algorithme in C to improve matrix multiplication (using gcc on a P3). I have used loop unrolling, software pipelining, a have brake a lot of false dependancies, remove calculation from inner loop. I reorder the loop to align data access and do strip mining. Most of this are processor dependant and should be done by the compiler. I have won an average of 4 times compare to usual algorithme, my best gain was made on an 1.2Ghz athlon processors with 512*512 matrix : x25 !
> >
> >One of the main problem is the alias problem induice by pointer. In my test, i lose 25% of the performance by using pointer instead of array.
> >
> >So what in D will be done to improve the use of new instruction ? All new cpu use conditional move (avoid to empty the pipeline), and vector instructions (like MMX and SSE). We can also add the fact to compile for 2 processors and more.
> >
> >What do you think ?
> >
> >nicO
August 18, 2001
LuigiG wrote in message <9ll862$5ch$1@digitaldaemon.com>...
>BTW, Microsoft's compiler has a
>"/Oa assume no aliasing"
>switch.
>kind of playing with fire I assume ;)


I decided long ago not to support such a switch. It would really only be useful to someone who understood exactly how the internal compiler optimizations really worked, which is likely nobody <g>.


August 18, 2001
nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>...
>Walter a écrit :
>>
>> Since D supports rectangular arrays, much more aggressive array
optimization
>> becomes possible.
>>
>
>Is that enough to introduice strip mining ? Or to change the order of the loop ?


I think so.


August 18, 2001
"Walter" <walter@digitalmars.com> wrote in message news:9lmlj0$111a$3@digitaldaemon.com...
>
> LuigiG wrote in message <9ll862$5ch$1@digitaldaemon.com>...
> >BTW, Microsoft's compiler has a
> >"/Oa assume no aliasing"
> >switch.
> >kind of playing with fire I assume ;)
>
>
> I decided long ago not to support such a switch. It would really only be useful to someone who understood exactly how the internal compiler optimizations really worked, which is likely nobody <g>.
>


Yep,
basically, if you know so much about the compiler that you can use the
noalias switch; you don't the noalias switch anymore.



August 19, 2001
Walter a écrit :
> 
> nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>...
> >Walter a écrit :
> >>
> >> Since D supports rectangular arrays, much more aggressive array
> optimization
> >> becomes possible.
> >>
> >
> >Is that enough to introduice strip mining ? Or to change the order of the loop ?
> 
> I think so.

Good news !
August 19, 2001
Walter a écrit :
> 
> nicO wrote in message <3B7EB156.6D6761E6@ifrance.com>...
> >Walter a écrit :
> >>
> >> Since D supports rectangular arrays, much more aggressive array
> optimization
> >> becomes possible.
> >>
> >
> >Is that enough to introduice strip mining ? Or to change the order of the loop ?
> 
> I think so.

And what about vector computing ? (for the use of MMX and SSE)
August 19, 2001
nicO wrote in message <3B801890.5A7216C7@ifrance.com>...
>And what about vector computing ? (for the use of MMX and SSE)

I don't know.


« First   ‹ Prev
1 2 3