April 03, 2007
Derek Parnell Wrote:

> On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:
> 
> > Walter Bright Wrote:
> >> I think what you need is a runtime check, which is provided in std.cpuid.
> > 
> > So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
> 
> 
> No, maybe you missed the point. It certainly is possible for one to create editions of an application for specific hardware configurations; and using the version() statement is a reasonable way to do that.
> 
>   version(SSE2) { . . . }
>   version(SSE)  { . . . }
> etc...
> 
>   dmd -version=SSE2 myapp.d
>   dmd -version=SSE myapp.d
> 
> However, such editions should be able to be generated regardless of which hardware architecure the compiler just happens to be running on at the time. In other words, setting the version values within the compiler based on the hardware at compilation time is not very useful. It would be better to set these version values at the compiler command line level, if one does really want hardware-specific editions of the app.

Yeah, I guess defining the versions yourself is totally possible, and reasonable.  : p

Less persuasively, and more as a closing note, I tend to think that the more D is used for other platforms (GDC retargetable?) the more those predefined platform ones will be wanted.

It's generally bad to have each programmer use different names for exactly the same thing.
April 04, 2007
Don Clugston wrote:
> Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.

If it is, then one should put the switch at an enclosing level.
April 04, 2007

Walter Bright wrote:
> Don Clugston wrote:
>> Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.
> 
> If it is, then one should put the switch at an enclosing level.

Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up?

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
April 04, 2007
Daniel Keep wrote:
> 
> Walter Bright wrote:
>> Don Clugston wrote:
>>> Yes, it's possible to detect the CPU type at runtime, but the
>>> performance penalty is appalling for very short functions.
>> If it is, then one should put the switch at an enclosing level.
> 
> Out of interest, which is faster: a branch at the start of a function
> (say, just a comparison with a bool), or using function pointers that
> are set up to point to the correct implementation at start-up?
> 
> 	-- Daniel

I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient.
The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.
April 04, 2007
Don Clugston wrote:
> Daniel Keep wrote:
>>
>> Walter Bright wrote:
>>> Don Clugston wrote:
>>>> Yes, it's possible to detect the CPU type at runtime, but the
>>>> performance penalty is appalling for very short functions.
>>> If it is, then one should put the switch at an enclosing level.
>>
>> Out of interest, which is faster: a branch at the start of a function
>> (say, just a comparison with a bool), or using function pointers that
>> are set up to point to the correct implementation at start-up?
>>
>>     -- Daniel
> 
> I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient.
> The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.

I'm starting to see what you mean.  If DDL kept generalized fixup data around during runtime, it would be trivial to swap one function address for another.  The only catch is that this technique would only be available to dynamic modules - the pre-linked .exe code can't be modified this way.

I'll keep this in mind.

-- 
- EricAnderton at yahoo
April 04, 2007
Pragma wrote:
> for another.  The only catch is that this technique would only be available to dynamic modules - the pre-linked .exe code can't be modified this way.

Wouldn't ld's "--emit-relocs" (aka "-q") remove that limitation? From the ld man page:
=====
-q
--emit-relocs
    Leave relocation sections and contents in fully  linked  exececuta‐
    bles.   Post  link  analysis  and  optimization tools may need this
    information in order to perform correct modifications  of  executa‐
    bles.  This results in larger executables.

    This option is currently only supported on ELF platforms.
=====

I have no idea if optlink has a similar option.
Is it even possible to store such information in PE files? (i.e. do they support arbitrary sections or a special section for this stuff?) I seem to remember you can append arbitrary data to PE files without breaking them, but obviously that's not ideal...
April 04, 2007
Don Clugston wrote:
> Daniel Keep wrote:
>>
>> Walter Bright wrote:
>>> Don Clugston wrote:
>>>> Yes, it's possible to detect the CPU type at runtime, but the
>>>> performance penalty is appalling for very short functions.
>>> If it is, then one should put the switch at an enclosing level.
>>
>> Out of interest, which is faster: a branch at the start of a function
>> (say, just a comparison with a bool), or using function pointers that
>> are set up to point to the correct implementation at start-up?
>>
>>     -- Daniel
> 
> I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient.
> The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.

I've seen this type of thing used in Intel compiler generated code to good effect, and w/o really any adverse performance that I'm aware of.

Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.

Actually, I'm kind-of surprised one isn't there already (or is it?).

- Dave
April 04, 2007
Dave Wrote:
> Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.
> 
> Actually, I'm kind-of surprised one isn't there already (or is it?).

There is already a cpu variable/object/module or something within the phobos library which provides cpu related information.  The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*.

If instead, it was pre-determined during compile time, these branches could be optimized out.  Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.


April 05, 2007
Dan wrote:
> Dave Wrote:
>> Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.
>>
>> Actually, I'm kind-of surprised one isn't there already (or is it?).
> 
> There is already a cpu variable/object/module or something within the phobos library which provides cpu related information.  The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*.
> 
> If instead, it was pre-determined during compile time, these branches could be optimized out.  Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.

Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required.

How about these:

X86_MMX   // necessary? (MMX is dead technology).
X86_SSE
X86_SSE2
X86_SSE3
X86_SSSE3 // is this really necessary?
X86_SSE4

Only change would be that the GDC compiler for X86_64 should set all of the above.
Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently.

However, if we all agreed to use the same set of version identifiers, we can get going immediately.
April 06, 2007
Don Clugston wrote:
> Dan wrote:
>> Dave Wrote:
>>> Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.
>>>
>>> Actually, I'm kind-of surprised one isn't there already (or is it?).
>>
>> There is already a cpu variable/object/module or something within the phobos library which provides cpu related information.  The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*.
>>
>> If instead, it was pre-determined during compile time, these branches could be optimized out.  Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
> 
> Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required.
> 
> How about these:
> 
> X86_MMX   // necessary? (MMX is dead technology).
> X86_SSE
> X86_SSE2
> X86_SSE3
> X86_SSSE3 // is this really necessary?
> X86_SSE4
> 
> Only change would be that the GDC compiler for X86_64 should set all of the above.
> Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently.
> 
> However, if we all agreed to use the same set of version identifiers, we can get going immediately.

I've done this manually with a set of DLL's before.  Basically the exe would be a tiny file that would call the correct DLL at startup.  You would only compile all the program DLLs when distributing the program. This way you only branch when you pick the correct CPU at application start.

Now I should mention Michael Abrash (of which I'm a big Fan) used a technique to simulate DirectX7 effeciently in software where he created CPU code on the fly for different CPU's.  It didn't suffer from cache misses (which is typical in self modification programs) because the generated data was consumed a frame later.

If the compiler was able to make simple changes to the code at startup like a virtual machine or Interpreter that would be cool.  You could in essence consider it as a compressed version of the DLL stradigie I employed, except you'd be able to make use of combinations more effectively and optimize for AMD, Intel chips ect...

-Joel
1 2
Next ›   Last »