Optimising (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » Optimising (page 2)

August 19, 2001

Posted by nicO
in reply to Walter

nicO

Posted in reply to Walter

Walter a écrit :
> 
> nicO wrote in message <3B801890.5A7216C7@ifrance.com>...
> >And what about vector computing ? (for the use of MMX and SSE)
> 
> I don't know.

To work better with current CPU, any compiler should use SIMD instruction. The difference in performance is great.

nicO

October 29, 2001

Posted by Sean L. Palmer
in reply to nicO

Sean L. Palmer

Posted in reply to nicO

I think the key concept to take advantage of SSE and MMX and 3DNow etc SIMD instruction sets is that the language needs to recognize the data types manipulated by these instruction sets (usually short, power-of-two constant sized vectors of int or float of various sizes) and have the language provide operators to work with these "extra" basic types.  Vector add, subtract, vector componentwise multiply, vector-by-scalar multiply, square root of one or all components, reciprocal square root, dot product, and matrix multiply (SSE/MMX don't provide much help for these last two). Conditional testing (for every component in A, if less than corresponding component in B, set result component to 1, else set result component to 0). Bitwise ops for integer SIMD.  Component swizzling/shuffling/rearrangement.

If you're keen on understanding how SIMD programming works, download the Microsoft Processor Pack for MSVC 6 and install it... it provides new __m64 and __m128 data types and a bunch of new intrinsic C functions that work on these types, as well as __declspec(aligned(16)) so you can have a prayer of getting the compiler to help you meet these types' heavy memory alignment restrictions.

Perhaps if a few operators were added, constant-sized arrays of basic int and float types could automatically take advantage of these types of instruction sets.  Operators that work on two arrays and take an array as result.  This really ties into the operator overloading discussion.

Sean

"nicO" <nicolas.boulay@ifrance.com> wrote in message news:3B807A0C.33576068@ifrance.com...
> Walter a écrit :
> >
> > nicO wrote in message <3B801890.5A7216C7@ifrance.com>...
> > >And what about vector computing ? (for the use of MMX and SSE)
> >
> > I don't know.
>
> To work better with current CPU, any compiler should use SIMD instruction. The difference in performance is great.
>
> nicO

January 10, 2002

Posted by Walter
in reply to Sean L. Palmer

Walter

Posted in reply to Sean L. Palmer

You're right. -Walter

"Sean L. Palmer" <spalmer@iname.com> wrote in message news:9rj7js$2sq8$1@digitaldaemon.com...
> I think the key concept to take advantage of SSE and MMX and 3DNow etc
SIMD
> instruction sets is that the language needs to recognize the data types manipulated by these instruction sets (usually short, power-of-two
constant
> sized vectors of int or float of various sizes) and have the language provide operators to work with these "extra" basic types.  Vector add, subtract, vector componentwise multiply, vector-by-scalar multiply, square root of one or all components, reciprocal square root, dot product, and matrix multiply (SSE/MMX don't provide much help for these last two). Conditional testing (for every component in A, if less than corresponding component in B, set result component to 1, else set result component to
0).
> Bitwise ops for integer SIMD.  Component
swizzling/shuffling/rearrangement.
>
> If you're keen on understanding how SIMD programming works, download the Microsoft Processor Pack for MSVC 6 and install it... it provides new
__m64
> and __m128 data types and a bunch of new intrinsic C functions that work
on
> these types, as well as __declspec(aligned(16)) so you can have a prayer
of
> getting the compiler to help you meet these types' heavy memory alignment restrictions.
>
> Perhaps if a few operators were added, constant-sized arrays of basic int and float types could automatically take advantage of these types of instruction sets.  Operators that work on two arrays and take an array as result.  This really ties into the operator overloading discussion.
>
> Sean
>
> "nicO" <nicolas.boulay@ifrance.com> wrote in message news:3B807A0C.33576068@ifrance.com...
> > Walter a écrit :
> > >
> > > nicO wrote in message <3B801890.5A7216C7@ifrance.com>...
> > > >And what about vector computing ? (for the use of MMX and SSE)
> > >
> > > I don't know.
> >
> > To work better with current CPU, any compiler should use SIMD instruction. The difference in performance is great.
> >
> > nicO
>
>

January 10, 2002

Re: SIMD (was: Optimising)

Posted by Sean L. Palmer
in reply to Walter

Sean L. Palmer

Posted in reply to Walter

If D had good native support for SIMD programming, that would seem to be a great big selling point for D.  "D:  Language of the future of parallel programming".  ;)   That's what SIMD programming is, parallel programming. You do several operations in parallel usually and mix the results together occasionally.  Modern processors can do several operations in parallel usually but we usually have to resort to assembly language to program that way.  D could abstract the whole concept of that kind of data structure and emulate it if the actual hardware isn't available  (you could choose to have the compiler emit SSE or 3DNow instructions, or emulate using the FPU.)

Not just 3D graphics, but a *lot* of higher order math and physics calculations are done with vectors and matrices.  This stuff needs to run as fast as possible, and a compiler is much better suited to the job of reducing stalls and such than a human is.  (because we humans are lazy, mainly, and it's tedious, repetitive work... work computers are well suited to do)  If you don't have language support you have to do it thru the assembly language pathway, which is littered with minefields since usually even inline assembly causes all registers to be saved and restored or at least causes some kind of inefficiency with the optimizer.  Non-inline assembly (not every platform supports inline assembly) involves call/return statements and more register save/load bullshit.  Not the kind of stuff you want happening all over in your tight, fast inner loops where 99% of the time in the application is spent... the parts that benefit most from vector computing.

You could even provide convenience features that may not exist in hardware, such as inner product (which isn't available on Intel SSE instruction set, yet definitely belongs in any hardware vector processor IMO) or cross product (no instruction sets provide this to my knowledge, it takes like 3 or 4 instructions to emulate).

I think all the 3D engine programmers would switch to D overnight.  ;)

But not only that, the code generator could support using these instructions to optimize other parts of the code too.  It could unroll this, for instance:

float a[4],b[4],c[4];

for (int i = 0; i<4; ++i)
  a[i] = b[i]*b[i] + c[i];

Into this:

a[] = b[] * b[] + c[];

Or perhaps with some Unicode operator (Dot and Cross product symbols both
exist in Unicode)
Let's change that pesky "evaluates right-to-left" assignment into something
that reads more left-to-right (or in this case,

b[] * b[] + c[] ==> a[];

Which breaks down to this:

b[0] * b[0] + c[0] => a[0];
b[1] * b[1] + c[1] => a[1];
b[2] * b[2] + c[2] => a[2];
b[3] * b[3] + c[3] => a[3];

Let's just turn that line sideways, rotate it 90 degrees to the right, and drop the square brackets:

b0 b1 b2 b3
*  *  *  *
b0 b1 b2 b3
+  +  +  +
c0 c1 c2 c3
=> => => =>
a0 a1 a2 a3

Which would essentially boil down to this kind of assembly language:

fmov4 xr0, b
fmov4 xr1, c
fmul4 xr2,xr0,xr0
fadd4 xr2,xr2,xr1
fmov4 a, xr2

or even this:

fmov4 xr0, b
fmad4 xr2,xr0,xr0,c ; multiply each component of xr0 by itself and add to
corresponding component of xr1, store all 4 into xr2
fmov4 a, xr2

Check out the instruction set for D3D's vertex or pixel shaders sometime. ;)  All new video cards have that kind of vector processor.  Your kid's PS2 has two of those vector units.  Every PC since the Pentium MMX has at least one that can do very fast operations on bytes.   Yet I can't name a programming language above assembly that inherently supports such things.

Sean

"Walter" <walter@digitalmars.com> wrote in message news:a1j6gs$2b02$1@digitaldaemon.com...
> You're right. -Walter
>
> "Sean L. Palmer" <spalmer@iname.com> wrote in message news:9rj7js$2sq8$1@digitaldaemon.com...
> > I think the key concept to take advantage of SSE and MMX and 3DNow etc
> SIMD
> > instruction sets is that the language needs to recognize the data types manipulated by these instruction sets (usually short, power-of-two
> constant
> > sized vectors of int or float of various sizes) and have the language provide operators to work with these "extra" basic types.  Vector add, subtract, vector componentwise multiply, vector-by-scalar multiply,
square
> > root of one or all components, reciprocal square root, dot product, and matrix multiply (SSE/MMX don't provide much help for these last two). Conditional testing (for every component in A, if less than
corresponding
> > component in B, set result component to 1, else set result component to
> 0).
> > Bitwise ops for integer SIMD.  Component
> swizzling/shuffling/rearrangement.
> >
> > If you're keen on understanding how SIMD programming works, download the Microsoft Processor Pack for MSVC 6 and install it... it provides new
> __m64
> > and __m128 data types and a bunch of new intrinsic C functions that work
> on
> > these types, as well as __declspec(aligned(16)) so you can have a prayer
> of
> > getting the compiler to help you meet these types' heavy memory
alignment
> > restrictions.
> >
> > Perhaps if a few operators were added, constant-sized arrays of basic
int
> > and float types could automatically take advantage of these types of instruction sets.  Operators that work on two arrays and take an array
as
> > result.  This really ties into the operator overloading discussion.
> >
> > Sean
> >
> > "nicO" <nicolas.boulay@ifrance.com> wrote in message news:3B807A0C.33576068@ifrance.com...
> > > Walter a écrit :
> > > >
> > > > nicO wrote in message <3B801890.5A7216C7@ifrance.com>...
> > > > >And what about vector computing ? (for the use of MMX and SSE)
> > > >
> > > > I don't know.
> > >
> > > To work better with current CPU, any compiler should use SIMD instruction. The difference in performance is great.
> > >
> > > nicO
> >
> >
>
>

January 10, 2002

Re: SIMD (was: Optimising)

Posted by Walter
in reply to Sean L. Palmer

Walter

Posted in reply to Sean L. Palmer

Actually, I think D does with its array operation semantics. They aren't implemented :-(, but they are in the spec. -Walter

"Sean L. Palmer" <spalmer@iname.com> wrote in message news:a1jj23$2jl8$1@digitaldaemon.com...
> If D had good native support for SIMD programming, that would seem to be a great big selling point for D.  "D:  Language of the future of parallel programming".  ;)   That's what SIMD programming is, parallel programming. You do several operations in parallel usually and mix the results together occasionally.  Modern processors can do several operations in parallel usually but we usually have to resort to assembly language to program that way.  D could abstract the whole concept of that kind of data structure
and
> emulate it if the actual hardware isn't available  (you could choose to
have
> the compiler emit SSE or 3DNow instructions, or emulate using the FPU.)
>
> Not just 3D graphics, but a *lot* of higher order math and physics calculations are done with vectors and matrices.  This stuff needs to run
as
> fast as possible, and a compiler is much better suited to the job of reducing stalls and such than a human is.  (because we humans are lazy, mainly, and it's tedious, repetitive work... work computers are well
suited
> to do)  If you don't have language support you have to do it thru the assembly language pathway, which is littered with minefields since usually even inline assembly causes all registers to be saved and restored or at least causes some kind of inefficiency with the optimizer.  Non-inline assembly (not every platform supports inline assembly) involves
call/return
> statements and more register save/load bullshit.  Not the kind of stuff
you
> want happening all over in your tight, fast inner loops where 99% of the time in the application is spent... the parts that benefit most from
vector
> computing.
>
> You could even provide convenience features that may not exist in
hardware,
> such as inner product (which isn't available on Intel SSE instruction set, yet definitely belongs in any hardware vector processor IMO) or cross product (no instruction sets provide this to my knowledge, it takes like 3 or 4 instructions to emulate).
>
> I think all the 3D engine programmers would switch to D overnight.  ;)
>
> But not only that, the code generator could support using these
instructions
> to optimize other parts of the code too.  It could unroll this, for instance:
>
> float a[4],b[4],c[4];
>
> for (int i = 0; i<4; ++i)
>   a[i] = b[i]*b[i] + c[i];
>
> Into this:
>
> a[] = b[] * b[] + c[];
>
> Or perhaps with some Unicode operator (Dot and Cross product symbols both
> exist in Unicode)
> Let's change that pesky "evaluates right-to-left" assignment into
something
> that reads more left-to-right (or in this case,
>
> b[] * b[] + c[] ==> a[];
>
> Which breaks down to this:
>
> b[0] * b[0] + c[0] => a[0];
> b[1] * b[1] + c[1] => a[1];
> b[2] * b[2] + c[2] => a[2];
> b[3] * b[3] + c[3] => a[3];
>
> Let's just turn that line sideways, rotate it 90 degrees to the right, and drop the square brackets:
>
> b0 b1 b2 b3
> *  *  *  *
> b0 b1 b2 b3
> +  +  +  +
> c0 c1 c2 c3
> => => => =>
> a0 a1 a2 a3
>
> Which would essentially boil down to this kind of assembly language:
>
> fmov4 xr0, b
> fmov4 xr1, c
> fmul4 xr2,xr0,xr0
> fadd4 xr2,xr2,xr1
> fmov4 a, xr2
>
> or even this:
>
> fmov4 xr0, b
> fmad4 xr2,xr0,xr0,c ; multiply each component of xr0 by itself and add to
> corresponding component of xr1, store all 4 into xr2
> fmov4 a, xr2
>
> Check out the instruction set for D3D's vertex or pixel shaders sometime. ;)  All new video cards have that kind of vector processor.  Your kid's
PS2
> has two of those vector units.  Every PC since the Pentium MMX has at
least
> one that can do very fast operations on bytes.   Yet I can't name a programming language above assembly that inherently supports such things.
>
> Sean

January 10, 2002

Re: SIMD (was: Optimising)

Posted by Pavel Minayev
in reply to Walter

Pavel Minayev

Posted in reply to Walter

"Walter" <walter@digitalmars.com> wrote in message news:a1jlrk$2lgs$1@digitaldaemon.com...

> Actually, I think D does with its array operation semantics. They aren't implemented :-(, but they are in the spec. -Walter

But there is no dot and cross product for vectors.

January 10, 2002

Re: SIMD (was: Optimising)

Posted by Walter
in reply to Pavel Minayev

Walter

Posted in reply to Pavel Minayev

True.

"Pavel Minayev" <evilone@omen.ru> wrote in message news:a1k1lr$2shq$1@digitaldaemon.com...
> "Walter" <walter@digitalmars.com> wrote in message news:a1jlrk$2lgs$1@digitaldaemon.com...
>
> > Actually, I think D does with its array operation semantics. They aren't implemented :-(, but they are in the spec. -Walter
>
> But there is no dot and cross product for vectors.
>
>

January 10, 2002

Re: SIMD (was: Optimising)

Posted by Pavel Minayev
in reply to Walter

Pavel Minayev

Posted in reply to Walter

"Walter" <walter@digitalmars.com> wrote in message news:a1kmio$2hu$2@digitaldaemon.com...

> True.

Then why not add them? As well as matrix multiply? And optimize using MMX/3DNow!/SSE?

January 12, 2002

Re: SIMD (was: Optimising)

Posted by Walter
in reply to Pavel Minayev

Walter

Posted in reply to Pavel Minayev

"Pavel Minayev" <evilone@omen.ru> wrote in message news:a1knkr$d57$1@digitaldaemon.com...
> "Walter" <walter@digitalmars.com> wrote in message news:a1kmio$2hu$2@digitaldaemon.com...
>
> > True.
>
> Then why not add them? As well as matrix multiply? And optimize using MMX/3DNow!/SSE?

Because I need to finish what I've got first!

January 12, 2002

Re: SIMD (was: Optimising)

Posted by Robert W. Cunningham
in reply to Walter

Robert W. Cunningham

Posted in reply to Walter

Walter wrote:

> "Pavel Minayev" <evilone@omen.ru> wrote in message news:a1knkr$d57$1@digitaldaemon.com...
> > "Walter" <walter@digitalmars.com> wrote in message news:a1kmio$2hu$2@digitaldaemon.com...
> >
> > > True.
> >
> > Then why not add them? As well as matrix multiply? And optimize using MMX/3DNow!/SSE?
>
> Because I need to finish what I've got first!

I nominate Walter to be the first human to be cloned...

With enough Walters, and if they all work the same hours and all think alike, then we may have an SIMD-based language author and compiler writer!

-BobC

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation