October 02, 2007
On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright <newshound1@digitalmars.com> wrote:

> Vladimir Panteleev wrote:
>> Thanks, however dynamic arrays are unfortunately slow. To access a random element, you need a dereference, a multiplication+addition, another dereference, another multiplication+addition. Janice Caron's workaround, when modified to use a struct placed in the data segment, requires only one dereferencing/multiply+adding - while a static array placed in the data segment requires only arithmetics to access.
>
> If you define an array as:
>
> int[10000][] a = new int[10000][100];
>
> only one dereference is required to access the data.

Yes, only one - when there should be none at all. *sigh*

Note that one thing which the compiler fails to prevent but still chokes the linker are having more than 16 Mb of data or even distinct types (which need initial values, such as structs/classes). Example:

uint[0x800000] a,b,c,d;

void main()
{
}

So - please, Just Fix That Linker ;)

-- 
Best regards,
 Vladimir                          mailto:thecybershadow@gmail.com
October 02, 2007
Reply to Vladimir,

> On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright
> <newshound1@digitalmars.com> wrote:
> 
>> Vladimir Panteleev wrote:
>> 
>>> Thanks, however dynamic arrays are unfortunately slow. To access a
>>> random element, you need a dereference, a multiplication+addition,
>>> another dereference, another multiplication+addition. Janice Caron's
>>> workaround, when modified to use a struct placed in the data
>>> segment, requires only one dereferencing/multiply+adding - while a
>>> static array placed in the data segment requires only arithmetics to
>>> access.
>>> 
>> If you define an array as:
>> 
>> int[10000][] a = new int[10000][100];
>> 
>> only one dereference is required to access the data.
>> 
> Yes, only one - when there should be none at all. *sigh*
> 

Sorry, no can do. Even with stack based static arrays there is a single dereference. An array access always does a dereference. (might you be thinking about indirects?)


October 02, 2007
On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao@pathlink.com> wrote:

> Sorry, no can do. Even with stack based static arrays there is a single dereference. An array access always does a dereference. (might you be thinking about indirects?)

If you're talking about the current situation in D with huge arrays, then it's so (and those don't fit in the stack anyway).
If not, well, it's certainly very possible if you just declare the array in the data segment (or with "static" anywhere else). Then, like I said earlier, "its address and the address of every of its elements is predetermined at compile time" - so "a static array placed in the data segment requires only arithmetics to access" ((base address+offset into element) + index*element size).

OPTLINK's limitation is around 16 MB of data for the data segment. This also includes initializations for structs, so you can't even have struct types with the gross size of more than 16 MB.

Other languages certainly would allow it, and I'm sure that if I would use a hex-edited dmd.exe with another linker - had I known a better DMD-compatible one - everything would work.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow@gmail.com
October 02, 2007
Reply to Vladimir,

> On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao@pathlink.com> wrote:
> 
> If you're talking about the current situation in D with huge arrays,
> then it's so (and those don't fit in the stack anyway).
> 
> If not, well, it's certainly very possible if you just declare the
> array in the data segment (or with "static" anywhere else). Then, like
> I said earlier, "its address and the address of every of its elements
> is predetermined at compile time" - so "a static array placed in the
> data segment requires only arithmetics to access" ((base
> address+offset into element) + index*element size).
> 
> OPTLINK's limitation is around 16 MB of data for the data segment.
> This also includes initializations for structs, so you can't even have
> struct types with the gross size of more than 16 MB.
> 
> Other languages certainly would allow it, and I'm sure that if I would
> use a hex-edited dmd.exe with another linker - had I known a better
> DMD-compatible one - everything would work.
> 

Unless I'm smoking something, the code to access any array will look the same regardless of where the array is.

Get base aggress
compute offset
add them
*dereference* that address

the only difference is how the base address is found. with static arrays it can be coded as a part of the opcode (I think), with general arrays, it is loaded from a variable, and with an array in the stack frame, it is pulled from the  frame pointer*. But in all cases, their is still a dereference.


* BTW, accessing a local variable also does a dereference, just where the offset is known at compile time.


October 02, 2007
On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao@pathlink.com> wrote:

> Unless I'm smoking something, the code to access any array will look the same regardless of where the array is.

Well, with how I understand this terminology, "dereferencing" means reading an offset from a memory location, other than an immediate instruction operand. Consider:

int i;
i++;

This assembles as:

i dd ?
...
inc i

Assuming i is in the data segment (it has a fixed address), I do not consider this dereferencing, nor would I think anyone should.

Then we have:

int* i;
...
(*i)++;

Assuming i is in the data segment (it has a fixed address), this is a "single" dereference. To get the address of the integer we need to read it from a fixed memory location.

The same with arrays - except, for the first case, it's a static array, and for the second - it is either a dynamic array, a new-allocated array, etc.

So, how does this boil down in performance? Let me show you:

int[0x4000000] bigArr;
...
int readValue(int index)
{
	return bigArr[index];
}

With -release -O, this assembles to:

readValue proc near
        mov     eax, ds:bigArr[eax*4]
        retn
readValue endp

Now, if bigArr would be a dynamic array or referenced through a pointer, the resulting code would look like this:

readValue proc near
        mov     ecx, ds:bigArr
        mov     eax, [ecx+eax*4]
        retn
readValue endp

It gets worse if there are further nests, e.g. when using multidimensional arrays.

That doesn't sound like a big deal, however:

1) when the array is in the data segment, we know the address of every one of its elements - so random access is faster
2) when using multi-dimensional dynamic arrays, there are more casts
3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next row
4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to that
etc.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow@gmail.com
October 02, 2007
Vladimir Panteleev wrote:
> Yes, only one - when there should be none at all. *sigh*

For accessing a big array like that, it is hard to see how a single dereference through a register is going to affect the run time at all.


> Note that one thing which the compiler fails to prevent but still
> chokes the linker are having more than 16 Mb of data or even distinct
> types (which need initial values, such as structs/classes). Example:
> 
> uint[0x800000] a,b,c,d;
> 
> void main() { }
> 
> So - please, Just Fix That Linker ;)

Easier said than done :-(

October 02, 2007
Vladimir Panteleev wrote:
> That doesn't sound like a big deal, however:
> 
> 1) when the array is in the data segment, we know the address of
> every one of its elements - so random access is faster

You might be surprised. Intel has done a great deal of work optimizing and pipelining addressing modes. Furthermore, one would probably access such an array repeatedly, meaning the pointer to it would be cached in a register. I doubt you'll see a dime's worth of difference.

In any case, as has been shown time and again, where one thinks the time is spent in a program is always wrong. Even the experts are always wrong. Use a profiler!

> 2) when using
> multi-dimensional dynamic arrays, there are more casts

int[1000][] a;
...
a[3][4] = 2;

involves no casts.


> 3) rectangular
> static arrays are much faster when you want to access it by column -
> you just add (width*element_size) to the pointer, and go to the
> element on the next row

It's the same for int[1000][]

> 4) when you want to access an address inside
> the element (you have an array of structs), you must perform an
> addition after the multiplication (I think it can be part of the
> instruction in the newer instruction sets though) - while, when the
> array base address is predetermined, you can just use base_address +
> element_offset as the base address, then add index*element_size to
> that etc.

Using the address mode offset[reg1][reg2*size] mode is not any slower than using offset[reg2*size].

October 02, 2007
Reply to Vladimir,

> On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao@pathlink.com> wrote:
> 
>> Unless I'm smoking something, the code to access any array will look
>> the same regardless of where the array is.
>> 
> Well, with how I understand this terminology, "dereferencing" means
> reading an offset from a memory location, other than an immediate
> instruction operand. Consider:
> 
[...]

I think we are referring to the same thing, but counting different parts of it. What I'm counting is the times that a load is done from an address that is not an inline value, but comes from a register.

> readValue proc near
> mov     eax, ds:bigArr[eax*4]

If I'm reading this correctly what is happening here is "load address (bigArr + eax*4)

unless IA32 has a special case op for that, your going to have to compute the address in another op and even if there is a special case, I rather suspect that it will be exactly as fast either way. Furthermore, if more than one detention is used you will have to do the computation in other ops no mater what.

> retn
> readValue endp

BTW where did you get that ASM dump? I don't recognize the format.

> 
> readValue proc near
> mov     ecx, ds:bigArr
> mov     eax, [ecx+eax*4]
> retn
> readValue endp
> It gets worse if there are further nests, e.g. when using
> multidimensional arrays.

unless you are nesting dynamic arrays it's just math, not dereferences (i1 + d1*i2 + d2*d1*i3 ...)


But it looks like we agree on that point.

> 3) rectangular static arrays are much faster when you want to access
> it by column - you just add (width*element_size) to the pointer, and
> go to the element on the next row

this can be done with a dynamic array of static sized arrays.

> 
> 4) when you want to access an address inside the element (you have an
> array of structs), you must perform an addition after the
> multiplication (I think it can be part of the instruction in the newer
> instruction sets though) - while, when the array base address is
> predetermined, you can just use base_address + element_offset as the
> base address, then add index*element_size to that

It looks like you are using a (constA + Constb * reg) address mode. I just look in the "Intel Architecture Software Developer's Manual" and didn't find any reference to it. Am I missing something? (I didn't actual look real hard)

> 
> etc.
> 


October 02, 2007
On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1@digitalmars.com> wrote:

> Vladimir Panteleev wrote:
>> That doesn't sound like a big deal, however:
>>
>> 1) when the array is in the data segment, we know the address of every one of its elements - so random access is faster
>
> You might be surprised. Intel has done a great deal of work optimizing and pipelining addressing modes. Furthermore, one would probably access such an array repeatedly, meaning the pointer to it would be cached in a register. I doubt you'll see a dime's worth of difference.
>
> In any case, as has been shown time and again, where one thinks the time is spent in a program is always wrong. Even the experts are always wrong. Use a profiler!

I will, if I won't be satisfied by the performance :)

>> 2) when using
>> multi-dimensional dynamic arrays, there are more casts
>
> int[1000][] a;
> ...
> a[3][4] = 2;
>
> involves no casts.

Erm... I'm sure I wanted to write something other than "casts" there, but now I even forgot what I wanted to say originally. I think I meant that when you have imbricated dynamic arrays, there are even more dereferences.

>> 3) rectangular
>> static arrays are much faster when you want to access it by column -
>> you just add (width*element_size) to the pointer, and go to the
>> element on the next row
>
> It's the same for int[1000][]

Right. I always overlook that case!

>> 4) when you want to access an address inside
>> the element (you have an array of structs), you must perform an
>> addition after the multiplication (I think it can be part of the
>> instruction in the newer instruction sets though) - while, when the
>> array base address is predetermined, you can just use base_address +
>> element_offset as the base address, then add index*element_size to
>> that etc.
>
> Using the address mode offset[reg1][reg2*size] mode is not any slower than using offset[reg2*size].

Yes... that's the one I mentioned (though I'm used to see it as [reg+reg2*size+offset]).

Thanks for replying. I wonder how hard would it be to write - or, at least, start a community project of a Windows PE linker written from scratch - in D, of course :)

-- 
Best regards,
 Vladimir                          mailto:thecybershadow@gmail.com
October 02, 2007
Vladimir Panteleev wrote:
> On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1@digitalmars.com> wrote:
> 
> Yes... that's the one I mentioned (though I'm used to see it as [reg+reg2*size+offset]).
> 

> Thanks for replying. I wonder how hard would it be to write - or, at
> least, start a community project of a Windows PE linker written from
> scratch - in D, of course :)

For anyone whose first inclination is to hex-edit binaries when confronted with a minor annoyance, I'm guessing probably not that hard.  ;-)

--bb