fixed-size array of values == SIMD (page 4)

May 14, 2002

Re: fixed-size array of values == SIMD

Posted by Sean L. Palmer
in reply to Walter

Permalink

Sean L. Palmer

Posted in reply to Walter

Permalink

"Walter" <walter@digitalmars.com> wrote in message news:abqfqk$574$1@digitaldaemon.com...
>
> "Sean L. Palmer" <spalmer@iname.com> wrote in message news:abmetg$2e7i$1@digitaldaemon.com...
> > I know this feels like one of the basics, Walter, and I know you're
> itching
> > to move on to bigger and better things, but the fact that moving blocks
of
> > data is one of the basics of computing indicates that it needs robust language support, not special cases.
>
> You're right, it is a basic thing, and I have not just blown it off. I use memcpy() very regularly, and I depend on it. With a few observations, I think you'll see my reasoning for disallowing overlapping copies, even if you don't agree with it:
>
> 1) 99% of memory copies are not overlapping.
> 2) overall program performance tends to be sensitive to memcpy speed.
> 3) putting the test in for each copy will sacrifice performance and add
> bloat to satisfy only 1% of the cases.

You are right.  I suppose.  <g>

> 4) I've seen memcpy code that *relied* on the wrong way overlapping copy behavior.

I still don't like special cases, but I have actually seen code that used a wrong-way memcpy before too, (though why they didn't just write a pattern fill routine I don't know..)  When writing overlapping memcpy, 99% of the time you'll want to do it the "right" way.  Leaving it explicit leaves open a possibility for the programmer to accidentally write bugs in what would otherwise be a perfectly formed compiler-generated memory shift.  Sample bug:  If I scroll up it works, but if I scroll down it destroys all entries with copies of the first entry.

> 5) Because of (4), I think if one needs to do an overlapping copy, one
does
> need to decide just what it means to do an overlapping copy, and
explicitly
> code appropriately. Leaving it implicit can be misleading.

They can always code an explicit loop if they want explicit control.  i.e. want to copy the "wrong" way.

> 6) In debug builds, checks for overlapping are inserted, so one will not inadvertantly fall prey to copy direction bugs.

Won't inadvertently fall prey to some bugs at least.

I hope your user's program doesn't die when his client tries to copy a record to clipboard, then later tries to paste it back onto the original data... if the cut is lazy you may very well end up with a "copying range exactly onto itself" NO-OP which is caught by a runtime check and causes the program to die.  Maybe if start and end of ranges of memory are identical it should fall thru as no-op instead of runtime assert?

But that's more runtime checks needed in debug build.

> 7) And most importantly, it gets rid of the "aliasing" problem C has that prevents array operations from being compiled as efficiently as FORTRAN.

I've never tried to code a highly optimized backend for arrays.  Can you describe this problem in more detail?  I'm curious to know what the compiler is probably doing under there...

I know aliasing more in terms of bugs that happen when you pass the same argument in as both an input and an output (via two different parameters). If the implementation relies on the two objects not being the same, you get a bug.  It prevents certain optimizations from being 100% safe.  I prefer to code without aliasing when possible so I can turn up the optimization more. There are many ways to have aliasing, I'm just curious how potential aliasing affects the array optimization problem.

Sean

"Sean L. Palmer" <spalmer@iname.com> wrote in message news:abqkno$9cv$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:abqfqk$574$1@digitaldaemon.com... > > 7) And most importantly, it gets rid of the "aliasing" problem C has that > > prevents array operations from being compiled as efficiently as FORTRAN. > I've never tried to code a highly optimized backend for arrays. Can you describe this problem in more detail? I'm curious to know what the compiler > is probably doing under there... Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: for (i = 0; i < n; i++) a[i] = b[i] + c[i]; If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result of one iteration of the loop can be dependent on the result of the previous iteration.

"Walter" <walter@digitalmars.com> wrote in news:abrcj6$tsq$1@digitaldaemon.com: > Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: > > for (i = 0; i < n; i++) > a[i] = b[i] + c[i]; > > If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result of one iteration of the loop can be dependent on the result of the previous iteration. So is a[] = a[] + b[]; also going to be classified as and overlapped copy?

And if you can't guarantee that they don't overlap, then the optimization can't be done safely. Gotcha. Usually there'll be one of two situations: Either the compiler can tell that they don't overlap at compile time, or it can't. If it can't, it's worth spending a few instructions checking so it can do the optimized inner loop in the likely situation that they don't in fact overlap. It's probably gotta check for alignment anyway to use SIMD so it may as well check overlap as well. If they overlap, or if the alignment sucks, do the unoptimized loop. If the array sizes are large, the overhead of the checks will be negligible in comparison. If the array sizes are small, well, too bad there's nothing we can do to help. Or just disallow overlap. Problem solved, albeit with a sledgehammer. Sean "Walter" <walter@digitalmars.com> wrote in message news:abrcj6$tsq$1@digitaldaemon.com... > > "Sean L. Palmer" <spalmer@iname.com> wrote in message news:abqkno$9cv$1@digitaldaemon.com... > > "Walter" <walter@digitalmars.com> wrote in message news:abqfqk$574$1@digitaldaemon.com... > > > 7) And most importantly, it gets rid of the "aliasing" problem C has > that > > > prevents array operations from being compiled as efficiently as FORTRAN. > > I've never tried to code a highly optimized backend for arrays. Can you describe this problem in more detail? I'm curious to know what the > compiler > > is probably doing under there... > > Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: > > for (i = 0; i < n; i++) > a[i] = b[i] + c[i]; > > If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations > of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result > of one iteration of the loop can be dependent on the result of the previous > iteration.

"Patrick Down" <pat@codemoon.com> wrote in message news:Xns920E81D566F62patcodemooncom@63.105.9.61... > "Walter" <walter@digitalmars.com> wrote in news:abrcj6$tsq$1@digitaldaemon.com: > > Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: > > > > for (i = 0; i < n; i++) > > a[i] = b[i] + c[i]; > > > > If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result of one iteration of the loop can be dependent on the result of the previous iteration. > > So is > > a[] = a[] + b[]; > > also going to be classified as and overlapped copy? Good question. It's not an overlapped copy because each array element is not dependent on any other array element in the same array.

"Patrick Down" <pat@codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom@63.105.9.61... > "OddesE" <OddesE_XYZ@hotmail.com> wrote in news:abjvsl$fem$1 @digitaldaemon.com: > > > > So we do need a bool type, byte sized! > > > > typedef ubyte bool; > *snif* And we are back in the bad old C days. This totally sucks and we all know why! We need a byte sized *bool*! -- Stijn OddesE_XYZ@hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail

"OddesE" <OddesE_XYZ@hotmail.com> wrote in message news:abrr5k$1acl$1@digitaldaemon.com... > "Patrick Down" <pat@codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom@63.105.9.61... > > "OddesE" <OddesE_XYZ@hotmail.com> wrote in news:abjvsl$fem$1 @digitaldaemon.com: > > > > > > So we do need a bool type, byte sized! > > > > > > > typedef ubyte bool; > > > > *snif* > > And we are back in the bad old C days. > This totally sucks and we all know why! > We need a byte sized *bool*! > > > -- > Stijn > OddesE_XYZ@hotmail.com > http://OddesE.cjb.net > _________________________________________________ > Remove _XYZ from my address when replying by mail > > I am forking this of into a new thread, because I think it is important... -- Stijn OddesE_XYZ@hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail

Forums