Thread overview
array traversal
Mar 20, 2007
arun
Mar 20, 2007
Sean Kelly
Mar 20, 2007
Tyler Knott
Mar 20, 2007
BCS
Mar 28, 2007
Stewart Gordon
Mar 28, 2007
Dan
Mar 28, 2007
Stewart Gordon
Mar 28, 2007
Ary Manzana
Mar 30, 2007
Don Clugston
March 20, 2007
i want to know during array traversal ,whether pointers do a better job than indices by considering the code snippet

int [10]array;
foreach( value;array) {
   //something
}

i heard that implementations decides which one to use.i want to know how those two traversal differs?


March 20, 2007
arun wrote:
> i want to know during array traversal ,whether pointers do a
> better job than indices

It depends on the platform and on the compiler.  Nowadays though, there generally isn't enough of a performance difference to matter either way.  Just write the code that looks the cleanest.


Sean
March 20, 2007
arun wrote:
> i want to know during array traversal ,whether pointers do a better job than indices by considering the code snippet
> 
> int [10]array;
> foreach( value;array) {
>    //something
> }
> 
> i heard that implementations decides which one to use.i want to know how those two traversal differs? 
> 
> 
At least on x86/x64 they shouldn't differ.  They should both compile down to the pseudo-code x86 instruction

mov value,[array.ptr+sizeof(int)*index]

which is read as "move the value at the address (array.ptr+size(int)*index) into value" using flat pointer arithmetic (i.e. not C-style; use 1-byte increments always, even for types where sizeof(type) > 1 byte).  The above examples uses a bit of pseudo-code.  The real version uses a few more instructions to make sure that index is within array's bounds (in debug mode) and temporarily store array.ptr and index in a CPU registers.
March 20, 2007
Reply to Tyler,

> arun wrote:
> 
>> i want to know during array traversal ,whether pointers do a better
>> job than indices by considering the code snippet
>> 
>> int [10]array;
>> foreach( value;array) {
>> //something
>> }
>> i heard that implementations decides which one to use.i want to know
>> how those two traversal differs?
>> 
> At least on x86/x64 they shouldn't differ.  They should both compile
> down to the pseudo-code x86 instruction
> 
> mov value,[array.ptr+sizeof(int)*index]
> 
> which is read as "move the value at the address
> (array.ptr+size(int)*index) into value" using flat pointer arithmetic
> (i.e. not C-style; use 1-byte increments always, even for types where
> sizeof(type) > 1 byte).  The above examples uses a bit of pseudo-code.
> The real version uses a few more instructions to make sure that index
> is within array's bounds (in debug mode) and temporarily store
> array.ptr and index in a CPU registers.
> 


I think that options he is taking about are these
for(T* ptr = &start; ptr !is &stop; ptr++)
{
 T value = *ptr
}

vs.

for(int i = 0; i< length; i++)
{
  T value = ptr[i];
}

only the second ever uses a multiplication


March 28, 2007
"BCS" <ao@pathlink.com> wrote in message news:ce0a3343870c8c939107e7e1a1e@news.digitalmars.com...
<snip>
> I think that options he is taking about are these
> for(T* ptr = &start; ptr !is &stop; ptr++)
> {
>  T value = *ptr
> }
>
> vs.
>
> for(int i = 0; i< length; i++)
> {
>   T value = ptr[i];
> }
>
> only the second ever uses a multiplication

And even then not necessarily - some compilers may optimise one form to the other.

Stewart. 

March 28, 2007
Stewart Gordon Wrote:

> "BCS" <ao@pathlink.com> wrote in message news:ce0a3343870c8c939107e7e1a1e@news.digitalmars.com... <snip>
> > I think that options he is taking about are these
> > for(T* ptr = &start; ptr !is &stop; ptr++)
> > {
> >  T value = *ptr
> > }
> >
> > vs.
> >
> > for(int i = 0; i< length; i++)
> > {
> >   T value = ptr[i];
> > }
> >
> > only the second ever uses a multiplication
> 
> And even then not necessarily - some compilers may optimise one form to the other.
> 
> Stewart.
> 

Hmm... let me think of how that looks in ASM.  The first one is wrong, because you're only ptr++, it should be ptr += OFFSET.  Apart from that, you're right the first code looks mildly better in ASM.

Looping through by adding to the pointer instead of adding to the array index and performing a 'lea'.

The good news is that D *should be* optimizing this out, and it ultimately only costs a half cycle which may either align or throw off the u/v pipes - something more important than 1/2 a cycle.

I wonder if Walter has it aligning the pipes as appropriate?
March 28, 2007
"Dan" <murpsoft@hotmail.com> wrote in message news:euecda$16oq$1@digitalmars.com...
> Stewart Gordon Wrote:
>
>> "BCS" <ao@pathlink.com> wrote in message
>> news:ce0a3343870c8c939107e7e1a1e@news.digitalmars.com...
>> <snip>
>>> I think that options he is taking about are these
>>> for(T* ptr = &start; ptr !is &stop; ptr++)
>>> {
>>>  T value = *ptr
>>> }
<snip>
> Hmm...  let me think of how that looks in ASM.  The first one
> is wrong, because you're only ptr++, it should be ptr +=
> OFFSET.  Apart from that, you're right the first code looks
> mildly better in ASM.

No it isn't.  The ++ operator (and similarly --, +, -, +=, -=) applied to a pointer works in the units of the size of the data type it points to.

> Looping through by adding to the pointer instead of adding to
> the array index and performing a 'lea'.
>
> The good news is that D *should be* optimizing this out, and
> it ultimately only costs a half cycle which may either align
> or throw off the u/v pipes - something more important than 1/2
> a cycle.

What on earth is a u/v pipe?

Stewart. 

March 28, 2007
Stewart Gordon escribió:
> "Dan" <murpsoft@hotmail.com> wrote in message news:euecda$16oq$1@digitalmars.com...
>> Stewart Gordon Wrote:
>>
>>> "BCS" <ao@pathlink.com> wrote in message
>>> news:ce0a3343870c8c939107e7e1a1e@news.digitalmars.com...
>>> <snip>
>>>> I think that options he is taking about are these
>>>> for(T* ptr = &start; ptr !is &stop; ptr++)
>>>> {
>>>>  T value = *ptr
>>>> }
> <snip>
>> Hmm...  let me think of how that looks in ASM.  The first one
>> is wrong, because you're only ptr++, it should be ptr +=
>> OFFSET.  Apart from that, you're right the first code looks
>> mildly better in ASM.
> 
> No it isn't.  The ++ operator (and similarly --, +, -, +=, -=) applied to a pointer works in the units of the size of the data type it points to.
> 
>> Looping through by adding to the pointer instead of adding to
>> the array index and performing a 'lea'.
>>
>> The good news is that D *should be* optimizing this out, and
>> it ultimately only costs a half cycle which may either align
>> or throw off the u/v pipes - something more important than 1/2
>> a cycle.
> 
> What on earth is a u/v pipe?
> 
> Stewart.

http://en.wikipedia.org/wiki/Pentium

(read: Major changes from the 486)
March 30, 2007
Dan wrote:
> Stewart Gordon Wrote:
> 
>> "BCS" <ao@pathlink.com> wrote in message news:ce0a3343870c8c939107e7e1a1e@news.digitalmars.com...
>> <snip>
>>> I think that options he is taking about are these
>>> for(T* ptr = &start; ptr !is &stop; ptr++)
>>> {
>>>  T value = *ptr
>>> }
>>>
>>> vs.
>>>
>>> for(int i = 0; i< length; i++)
>>> {
>>>   T value = ptr[i];
>>> }
>>>
>>> only the second ever uses a multiplication
>> And even then not necessarily - some compilers may optimise one form to the other.
>>
>> Stewart. 
>>
> 
> Hmm... let me think of how that looks in ASM.  The first one is wrong, because you're only ptr++, it should be ptr += OFFSET.  Apart from that, you're right the first code looks mildly better in ASM.
> 
> Looping through by adding to the pointer instead of adding to the array index and performing a 'lea'.

But if T.sizeof == 2, 4, or 8, the multiplication can be done in hardware for free. EG if i is stored in esi,

mov eax, [ptr + esi*8];

I'd be surprised if there are any modern compilers that actually perform a multiply.

> 
> The good news is that D *should be* optimizing this out, and it ultimately only costs a half cycle which may either align or throw off the u/v pipes - something more important than 1/2 a cycle.
> 
> I wonder if Walter has it aligning the pipes as appropriate?

BTW, the u-v pipe thing is not relevant for recent Pentiums any more. Core2 CPUs are more likely to be limited by the decoding stage than by the execution units.