Jump to page: 1 2
Thread overview
F Language
Jul 02, 2003
Mark Evans
Jul 02, 2003
Sean L. Palmer
Jul 02, 2003
Walter
Jul 03, 2003
Sean L. Palmer
Jul 03, 2003
Ilya Minkov
Jul 04, 2003
Sean L. Palmer
Jul 03, 2003
Mark Evans
Jul 04, 2003
Sean L. Palmer
Jul 06, 2003
Mark Evans
Jul 06, 2003
Sean L. Palmer
Jul 07, 2003
Mark Evans
July 02, 2003
The only thing I wish to highlight is the 'slicing' and array manipulation, nothing beyond that necessarily.  (FORTRAN is a column-major language, C is row-major; be careful.)

http://www.fortran.com/F/linux.htm http://www.fortran.com/F/



"Array Language

A sophisticated array language facilitates operations on whole arrays, contiguous and noncontiguous sections and slices of arrays. For example,

arr(5:1:-2, 3, 6:)

is a reference to the two-dimensional array created by taking the elements 5, 3,
and 1 in the first dimension of arr and elements from 6 to the upper bound of
the third dimension of arr, all in the 3rd plan of the array. If arr is a 5 by 6
by 7 array, the referenced elements would be (5,3,6), (3,3,6), (1,3,6), (5,3,7),
(3,3,7), and (1,3,7).

A simpler example shows calculating the sum inner product of a row and a column.


A(i,j) = sum(B(i,:)*C(:,j))

Sum is one of the more than one hundred intrinsic procedures found in F."


July 02, 2003
I could sure use slicing functionality like this.  Slices would have to be capable of having a (possibly negative) stride for this to work

Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an optimal implementation.  We need SIMD in a bad way.

Sean

"Mark Evans" <Mark_member@pathlink.com> wrote in message news:bdtrq6$1ohh$1@digitaldaemon.com...
>
> The only thing I wish to highlight is the 'slicing' and array
manipulation,
> nothing beyond that necessarily.  (FORTRAN is a column-major language, C
is
> row-major; be careful.)
>
> http://www.fortran.com/F/linux.htm http://www.fortran.com/F/
>
>
>
> "Array Language
>
> A sophisticated array language facilitates operations on whole arrays, contiguous and noncontiguous sections and slices of arrays. For example,
>
> arr(5:1:-2, 3, 6:)
>
> is a reference to the two-dimensional array created by taking the elements
5, 3,
> and 1 in the first dimension of arr and elements from 6 to the upper bound
of
> the third dimension of arr, all in the 3rd plan of the array. If arr is a
5 by 6
> by 7 array, the referenced elements would be (5,3,6), (3,3,6), (1,3,6),
(5,3,7),
> (3,3,7), and (1,3,7).
>
> A simpler example shows calculating the sum inner product of a row and a
column.
>
>
> A(i,j) = sum(B(i,:)*C(:,j))
>
> Sum is one of the more than one hundred intrinsic procedures found in F."


July 02, 2003
"Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:bdv2h6$3vn$1@digitaldaemon.com...
> Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an optimal implementation.  We need SIMD in a bad way.

Would you like to take a stab at writing sum using SIMD?


July 03, 2003
I would be happy to.  But it'll have to wait til at least tomorrow because right now I had a few beers and am not coding so well.  ;)

sum isn't easy with SSE because there's no direct instruction support for it.  But it can be done using roughly 1/4 the number of instructions required for FPU, if amortized over a large number of values.

For small arrays, say, 4 elements, it still takes 2 add's and 2 shuffle instructions.  Those would be required to sum up the result register at the end in any case.

A library implementation would be sub-optimal for constant sized arrays. It'd be better off done as an intrinsic somewhat similar to how memcpy is usually implemented in professional grade compilers.

Consider SSE code to Sum() a 16-byte aligned float[4] pointed to by eax, and store into a float pointed to by ebx:

movaps xmm0, xmmword ptr [eax]   ; xmm0 = x y z w
movhlps xmm1, xmm0                      ; xmm1 = z w ? ?
addps xmm0, xmm1                         ; xmm0 = (x+z) (y+w) ? ?
shufps xmm1, xmm0, 1                     ; xmm1 = (y+w) ? ? ?
addss xmm0, xmm1                         ; xmm0 = (x+y+z+w) ? ? ?
movss dword ptr [ebx], xmm0

Obviously a canned library routine knowing nothing about the size or alignment will not be able to come even close to this kind of performance for such a small array.  If the values start off and end up in registers, even better.  Plus the optimizer can schedule the instructions wisely.  The above has many dependency stall slots where other instructions can be inserted.

Now to extend that to handle any sized array, you first sum up as much as possible in groups of 4 like so:

mov eax, dword ptr [src]
mov ecx, dword ptr [count]
xorps xmm0,xmm0                    ; xmm0 = 0 0 0 0
shr ecx,2
jz done
sumloop:
movaps xmm1, xmmword ptr [eax]   ; obviously this can be unrolled and
pipelined better
add eax, 16
addps xmm0, xmm1
dec ecx
jnz sumloop
done:

Ok, so maybe I actually can code a little even while drunk.  ;)   You know what they say:  if you drink and code, statistically your program is more likely to crash.

Then sum up the 4 partial results you're left with using the first code snippet, and handle any leftovers using movss/addss.  You can use movss/addss to do the first part of a badly-aligned array as well, then once you find the alignment boundary, go to the fast aligned loop.

I could definitely write up a somewhat less than optimal SSE or MMX implementation to get you started, but the above pretty much covers it.  If you can write memcpy, you can write this.  I guess lack of time is a major factor.

Sean

P.S. Whatever happened to all those MMX/SSE/3dNow functions Burton and I put together for you?

"Walter" <walter@digitalmars.com> wrote in message news:bdv7pb$9es$4@digitaldaemon.com...
>
> "Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:bdv2h6$3vn$1@digitaldaemon.com...
> > Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an optimal implementation.  We need SIMD in a bad way.
>
> Would you like to take a stab at writing sum using SIMD?


July 03, 2003
hm... how about embedding something like this:

http://www.dcs.gla.ac.uk/~wpc/reports/compilers/compilerindex/x25.html


In article <be0g0d$1itj$1@digitaldaemon.com>, Sean L. Palmer says...
>
>P.S. Whatever happened to all those MMX/SSE/3dNow functions Burton and I put together for you?
>
>"Walter" <walter@digitalmars.com> wrote in message news:bdv7pb$9es$4@digitaldaemon.com...
>>
>> "Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:bdv2h6$3vn$1@digitaldaemon.com...
>> > Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an optimal implementation.  We need SIMD in a bad way.
>>
>> Would you like to take a stab at writing sum using SIMD?
>
>


July 03, 2003
SIMD is not what the post was about.  The semantics of array slicing and 'views' were the motivation.  That is what D needs fixed 'in a bad way.'

It is premature to worry about SIMD and it's a back-end issue anyway, not important to the language as such.

Mark


July 04, 2003
Wow, is all I can say.

Why don't the compilers I have to use have this kind of stuff?  I'm living in the friggin stone age.

That might be enough to make me switch back to Pascal!  ;)

Nah, I would miss generics too much.

Sean

"Ilya Minkov" <Ilya_member@pathlink.com> wrote in message news:be179r$2bk6$1@digitaldaemon.com...
> hm... how about embedding something like this:
>
> http://www.dcs.gla.ac.uk/~wpc/reports/compilers/compilerindex/x25.html


July 04, 2003
Right.  It is however important to set up the semantics of the language in such a way that efficient SIMD code generation is possible.  Whether or not the early compilers emit SIMD, powerful, expressive array handling would be a huge plus in the long run.

Sean

"Mark Evans" <Mark_member@pathlink.com> wrote in message news:be1n9b$2tn9$1@digitaldaemon.com...
>
> SIMD is not what the post was about.  The semantics of array slicing and
'views'
> were the motivation.  That is what D needs fixed 'in a bad way.'
>
> It is premature to worry about SIMD and it's a back-end issue anyway, not important to the language as such.
>
> Mark


July 06, 2003
Sean L. Palmer says...
>
>Right.  It is however important to set up the semantics of the language in such a way that efficient SIMD code generation is possible.  Whether or not the early compilers emit SIMD, powerful, expressive array handling would be a huge plus in the long run.
>
>Sean

OK, so how do back-end SIMD issues affect array semantics?  All I gather from your comments is that list operations should decompose into 4-number groups.  Is that a killer?  The F people have paid much attention to low-level performance issues such as address alignment.  I do not see big problems ahead.

Mark


July 06, 2003
Mostly I think what is important is that array operations are not expressed in terms of iterative loops, but by more implicit, inherently parallel methods.  Otherwise the compiler has to reconstruct the parallel form from the iterative form, which is difficult and painful, something not all compilers would be able to do well.  So we need the ability to do operations on entire array slices at a time, to do operations on array slices vs. array slices, to compute functions on all array elements and store the results as an array, that kind of thing.  a[] += b[] * c;  is in the D spec but is currently unimplemented.

F probably has done it right.  The best thing I saw before that for array manipulation was Yorick ( http://wuarchive.wustl.edu/languages/yorick/doc/index.html ) but Yorick is intended for interpretation;  it seems to be headed in the wrong direction.

4-number groups doesn't really have anything to do with it.  If the arrays in question happen to be multiples of 4, leftovers operations will not be necessary, but you can SIMD-ify any size array operation.  SIMD will only help (after the break-even point which is probably about size 3 or 4) and the larger the array the more it will help.

From my experiences, once the array gets big enough that it won't all fit in the processor cache, performance drops off dramatically, (memory bandwidth costs start to dominate) but still is a win over FPU.

An intelligent SIMD backend could in fact SIMD-ify any loop, not just array operations, by unrolling the loop 4 times and having each component of a SIMD vector contain a variable from one iteration of the loop.  So it executes 4 iterations at a time.  It could do this for any loop that doesn't have much branching,  doesn't depend on the results of prior iterations, or for which the compiler can predict and control such dependency.  But that has nothing whatsoever to do with array semantics.

Sean

"Mark Evans" <Mark_member@pathlink.com> wrote in message news:be83ou$38p$1@digitaldaemon.com...
> Sean L. Palmer says...
> >
> >Right.  It is however important to set up the semantics of the language
in
> >such a way that efficient SIMD code generation is possible.  Whether or
not
> >the early compilers emit SIMD, powerful, expressive array handling would
be
> >a huge plus in the long run.
> >
> >Sean
>
> OK, so how do back-end SIMD issues affect array semantics?  All I gather
from
> your comments is that list operations should decompose into 4-number
groups.  Is
> that a killer?  The F people have paid much attention to low-level
performance
> issues such as address alignment.  I do not see big problems ahead.
>
> Mark


« First   ‹ Prev
1 2