October 03, 2007
On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao@pathlink.com> wrote:

> I think we are referring to the same thing, but counting different parts of it. What I'm counting is the times that a load is done from an address that is not an inline value, but comes from a register.

OK, so in the first array example that makes one load (only the value we need), and in the second - two (one to get the address of the array data, and another to get the value).

>> readValue proc near
>> mov     eax, ds:bigArr[eax*4]
>
> If I'm reading this correctly what is happening here is "load address (bigArr
> + eax*4)

No, that gets the value directly and puts it in eax. To get the address I would use the "lea" instruction instead of "mov" (which is useful if you want to read and then write to the address).

> unless IA32 has a special case op for that, your going to have to compute
> the address in another op and even if there is a special case, I rather suspect
> that it will be exactly as fast either way. Furthermore, if more than one
> detention is used you will have to do the computation in other ops no mater
> what.

I posted the disassembly listing of the D code I supplied, so this code works (there are no missing instructions).

>> retn
>> readValue endp
>
> BTW where did you get that ASM dump? I don't recognize the format.

The demo version of Interactive Disassembler[1]. It's quite useful for a Windows D programmer's arsenal, since it understands D symbolic information. (They're loaded mangled, so I "demangled" them manually.)

>> readValue proc near
>> mov     ecx, ds:bigArr
>> mov     eax, [ecx+eax*4]
>> retn
>> readValue endp
>> It gets worse if there are further nests, e.g. when using
>> multidimensional arrays.
>
> unless you are nesting dynamic arrays it's just math, not dereferences (i1
> + d1*i2 + d2*d1*i3 ...)

Yes, I meant dynamic arrays (or pointers, for the first "link")...

>> 3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next row
>
> this can be done with a dynamic array of static sized arrays.

Yeah. Forgot about those (again).

>> 4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to that
>
> It looks like you are using a (constA + Constb * reg) address mode. I just look in the "Intel Architecture Software Developer's Manual" and didn't find any reference to it. Am I missing something? (I didn't actual look real hard)

ConstB must be a power of 2 - and I don't think I can find a link more easily than you (since I don't use any online - or any other, actually - reference). As WB mentioned in another reply, there's also a [reg1 + reg2*powerOfTwo + offset] addressing mode.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow@gmail.com
October 03, 2007
Reply to Vladimir,

> On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao@pathlink.com> wrote:
> 
>> I think we are referring to the same thing, but counting different
>> parts of it. What I'm counting is the times that a load is done from
>> an address that is not an inline value, but comes from a register.
>> 
> OK, so in the first array example that makes one load (only the value
> we need), and in the second - two (one to get the address of the array
> data, and another to get the value).
> 

yes, assuming that you have to do some pointer work to get the array (if it's a static variable than you can do a "move from inline defined address")

>>> readValue proc near
>>> mov     eax, ds:bigArr[eax*4]
>> If I'm reading this correctly what is happening here is "load address
>> (bigArr + eax*4)
>> 
> No, that gets the value directly and puts it in eax. To get the
> address I would use the "lea" instruction instead of "mov" (which is
> useful if you want to read and then write to the address).
> 

Baahhh. I need to be more careful with what I actually say

that should have been "load from address"

>> unless IA32 has a special case op for that, your going to have to
>> compute
>> the address in another op and even if there is a special case, I
>> rather suspect
>> that it will be exactly as fast either way. Furthermore, if more than
>> one
>> detention is used you will have to do the computation in other ops no
>> mater
>> what.
> I posted the disassembly listing of the D code I supplied, so this
> code works (there are no missing instructions).
> 
[...]
>> It looks like you are using a (constA + Constb * reg) address mode. I
>> just look in the "Intel Architecture Software Developer's Manual" and
>> didn't find any reference to it. Am I missing something? (I didn't
>> actual look real hard)
>> 
> ConstB must be a power of 2 - and I don't think I can find a link more
> easily than you (since I don't use any online - or any other, actually
> - reference). As WB mentioned in another reply, there's also a [reg1 +
> reg2*powerOfTwo + offset] addressing mode.
> 

sweet

BTW I find these kida handy

http://www.intel.com/design/processor/manuals/253665.pdf    Volume 1: Basic Architecture
http://www.intel.com/design/processor/manuals/253666.pdf    Volume 2A: Instruction Set Reference, A-M
http://www.intel.com/design/processor/manuals/253667.pdf    Volume 2B: Instruction Set Reference, N-Z


foud here
http://www.intel.com/products/processor/manuals/index.htm

In the end I rather suspect that what really matters is how many times you read from memory, not how you got the address to read from.


October 03, 2007
Reply to Vladimir,

> The demo version of Interactive Disassembler[1]. It's quite useful for
> a Windows D programmer's arsenal, since it understands D symbolic
> information. (They're loaded mangled, so I "demangled" them manually.)

link?


October 03, 2007
"Walter Bright" <newshound1@digitalmars.com> wrote in message news:fduing$trk$1@digitalmars.com...
>>
>> So - please, Just Fix That Linker ;)
>
> Easier said than done :-(

Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die.

Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0.  Yes, I am actually more interested in that.


October 03, 2007
Jarrett Billingsley wrote:
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:fduing$trk$1@digitalmars.com...
>>> So - please, Just Fix That Linker ;)
>> Easier said than done :-(
> 
> Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die.
> 
> Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0.  Yes, I am actually more interested in that. 

There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.
October 03, 2007
Jarrett Billingsley wrote:
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:fduing$trk$1@digitalmars.com...
>>> So - please, Just Fix That Linker ;)
>> Easier said than done :-(
> 
> Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die.

It's not that bad, especially not this particular issue, which has a painless workaround.


> Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0.  Yes, I am actually more interested in that. 
October 03, 2007
On Wed, 03 Oct 2007 03:33:28 +0300, BCS <ao@pathlink.com> wrote:

> Reply to Vladimir,
>
>> The demo version of Interactive Disassembler[1]. It's quite useful for a Windows D programmer's arsenal, since it understands D symbolic information. (They're loaded mangled, so I "demangled" them manually.)
> 
> link?

Oops.

>> [1] http://datarescue.com/idabase/idadowndemo.htm

-- 
Best regards,
 Vladimir                          mailto:thecybershadow@gmail.com
February 20, 2008
Christopher Wright wrote:

>> Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0.  Yes, I am actually more interested in that. 
> 
> There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.

It usually clocks in somewhere inbetween gcc and g++, but I'm sure
that says more about the language complexity than about the compilers...

Not sure how LLVM does for speed, as I've only used it through GCC.
But I think the speed-above-everything DMD and OPTLINK still "win".

--anders
February 20, 2008
On Wed, 20 Feb 2008 11:30:40 +0300, Anders F Björklund <afb@algonet.se> wrote:

> Christopher Wright wrote:
>
>>> Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0.  Yes, I am actually more interested in that.
>>  There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.
>
> It usually clocks in somewhere inbetween gcc and g++, but I'm sure
> that says more about the language complexity than about the compilers...
>
> Not sure how LLVM does for speed, as I've only used it through GCC.
> But I think the speed-above-everything DMD and OPTLINK still "win".
>
> --anders

It is somewhat outdated but still: http://www.digitalmars.com/d/1.0/cppstrings.html

There is a comparison at the end.
1 2 3 4
Next ›   Last »