Predefined Version expansion (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Predefined Version expansion (page 2)

April 03, 2007

Re: Predefined Version expansion

Posted by Dan
in reply to Derek Parnell

Dan

Posted in reply to Derek Parnell

Derek Parnell Wrote:

> On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:
> 
> > Walter Bright Wrote:
> >> I think what you need is a runtime check, which is provided in std.cpuid.
> > 
> > So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
> 
> 
> No, maybe you missed the point. It certainly is possible for one to create editions of an application for specific hardware configurations; and using the version() statement is a reasonable way to do that.
> 
>   version(SSE2) { . . . }
>   version(SSE)  { . . . }
> etc...
> 
>   dmd -version=SSE2 myapp.d
>   dmd -version=SSE myapp.d
> 
> However, such editions should be able to be generated regardless of which hardware architecure the compiler just happens to be running on at the time. In other words, setting the version values within the compiler based on the hardware at compilation time is not very useful. It would be better to set these version values at the compiler command line level, if one does really want hardware-specific editions of the app.

Yeah, I guess defining the versions yourself is totally possible, and reasonable.  : p

Less persuasively, and more as a closing note, I tend to think that the more D is used for other platforms (GDC retargetable?) the more those predefined platform ones will be wanted.

It's generally bad to have each programmer use different names for exactly the same thing.

April 04, 2007

Re: Predefined Version expansion

Posted by Walter Bright
in reply to Don Clugston

Walter Bright

Posted in reply to Don Clugston

Don Clugston wrote:
> Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.

If it is, then one should put the switch at an enclosing level.

April 04, 2007

Re: Predefined Version expansion

Posted by Daniel Keep
in reply to Walter Bright

Daniel Keep

Posted in reply to Walter Bright


Walter Bright wrote:
> Don Clugston wrote:
>> Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.
> 
> If it is, then one should put the switch at an enclosing level.

Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up?

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

April 04, 2007

Re: Predefined Version expansion

Posted by Don Clugston
in reply to Daniel Keep

Don Clugston

Posted in reply to Daniel Keep

Daniel Keep wrote:
> 
> Walter Bright wrote:
>> Don Clugston wrote:
>>> Yes, it's possible to detect the CPU type at runtime, but the
>>> performance penalty is appalling for very short functions.
>> If it is, then one should put the switch at an enclosing level.
> 
> Out of interest, which is faster: a branch at the start of a function
> (say, just a comparison with a bool), or using function pointers that
> are set up to point to the correct implementation at start-up?
> 
> 	-- Daniel

I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient.
The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.

April 04, 2007

Re: Predefined Version expansion

Posted by Pragma
in reply to Don Clugston

Pragma

Posted in reply to Don Clugston

Don Clugston wrote:
> Daniel Keep wrote:
>>
>> Walter Bright wrote:
>>> Don Clugston wrote:
>>>> Yes, it's possible to detect the CPU type at runtime, but the
>>>> performance penalty is appalling for very short functions.
>>> If it is, then one should put the switch at an enclosing level.
>>
>> Out of interest, which is faster: a branch at the start of a function
>> (say, just a comparison with a bool), or using function pointers that
>> are set up to point to the correct implementation at start-up?
>>
>>     -- Daniel
> 
> I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient.
> The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.

I'm starting to see what you mean.  If DDL kept generalized fixup data around during runtime, it would be trivial to swap one function address for another.  The only catch is that this technique would only be available to dynamic modules - the pre-linked .exe code can't be modified this way.

I'll keep this in mind.

-- 
- EricAnderton at yahoo

April 04, 2007

Re: Predefined Version expansion

Posted by Frits van Bommel
in reply to Pragma

Frits van Bommel

Posted in reply to Pragma

Pragma wrote:
> for another.  The only catch is that this technique would only be available to dynamic modules - the pre-linked .exe code can't be modified this way.

Wouldn't ld's "--emit-relocs" (aka "-q") remove that limitation? From the ld man page:
=====
-q
--emit-relocs
    Leave relocation sections and contents in fully  linked  exececuta‐
    bles.   Post  link  analysis  and  optimization tools may need this
    information in order to perform correct modifications  of  executa‐
    bles.  This results in larger executables.

    This option is currently only supported on ELF platforms.
=====

I have no idea if optlink has a similar option.
Is it even possible to store such information in PE files? (i.e. do they support arbitrary sections or a special section for this stuff?) I seem to remember you can append arbitrary data to PE files without breaking them, but obviously that's not ideal...

April 04, 2007

Re: Predefined Version expansion

Posted by Dave
in reply to Don Clugston

Dave

Posted in reply to Don Clugston

Don Clugston wrote:
> Daniel Keep wrote:
>>
>> Walter Bright wrote:
>>> Don Clugston wrote:
>>>> Yes, it's possible to detect the CPU type at runtime, but the
>>>> performance penalty is appalling for very short functions.
>>> If it is, then one should put the switch at an enclosing level.
>>
>> Out of interest, which is faster: a branch at the start of a function
>> (say, just a comparison with a bool), or using function pointers that
>> are set up to point to the correct implementation at start-up?
>>
>>     -- Daniel
> 
> I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient.
> The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.

I've seen this type of thing used in Intel compiler generated code to good effect, and w/o really any adverse performance that I'm aware of.

Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.

Actually, I'm kind-of surprised one isn't there already (or is it?).

- Dave

April 04, 2007

Re: Predefined Version expansion

Posted by Dan
in reply to Dave

Dan

Posted in reply to Dave

Dave Wrote:
> Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.
> 
> Actually, I'm kind-of surprised one isn't there already (or is it?).

There is already a cpu variable/object/module or something within the phobos library which provides cpu related information.  The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*.

If instead, it was pre-determined during compile time, these branches could be optimized out.  Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.

April 05, 2007

Re: Predefined Version expansion

Posted by Don Clugston
in reply to Dan

Don Clugston

Posted in reply to Dan

Dan wrote:
> Dave Wrote:
>> Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.
>>
>> Actually, I'm kind-of surprised one isn't there already (or is it?).
> 
> There is already a cpu variable/object/module or something within the phobos library which provides cpu related information.  The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*.
> 
> If instead, it was pre-determined during compile time, these branches could be optimized out.  Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.

Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required.

How about these:

X86_MMX   // necessary? (MMX is dead technology).
X86_SSE
X86_SSE2
X86_SSE3
X86_SSSE3 // is this really necessary?
X86_SSE4

Only change would be that the GDC compiler for X86_64 should set all of the above.
Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently.

However, if we all agreed to use the same set of version identifiers, we can get going immediately.

April 06, 2007

Re: Predefined Version expansion

Posted by janderson
in reply to Don Clugston

janderson

Posted in reply to Don Clugston

Don Clugston wrote:
> Dan wrote:
>> Dave Wrote:
>>> Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches.
>>>
>>> Actually, I'm kind-of surprised one isn't there already (or is it?).
>>
>> There is already a cpu variable/object/module or something within the phobos library which provides cpu related information.  The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*.
>>
>> If instead, it was pre-determined during compile time, these branches could be optimized out.  Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
> 
> Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required.
> 
> How about these:
> 
> X86_MMX   // necessary? (MMX is dead technology).
> X86_SSE
> X86_SSE2
> X86_SSE3
> X86_SSSE3 // is this really necessary?
> X86_SSE4
> 
> Only change would be that the GDC compiler for X86_64 should set all of the above.
> Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently.
> 
> However, if we all agreed to use the same set of version identifiers, we can get going immediately.

I've done this manually with a set of DLL's before.  Basically the exe would be a tiny file that would call the correct DLL at startup.  You would only compile all the program DLLs when distributing the program. This way you only branch when you pick the correct CPU at application start.

Now I should mention Michael Abrash (of which I'm a big Fan) used a technique to simulate DirectX7 effeciently in software where he created CPU code on the fly for different CPU's.  It didn't suffer from cache misses (which is typical in self modification programs) because the generated data was consumed a frame later.

If the compiler was able to make simple changes to the code at startup like a virtual machine or Interpreter that would be cool.  You could in essence consider it as a compressed version of the DLL stradigie I employed, except you'd be able to make use of combinations more effectively and optimize for AMD, Intel chips ect...

-Joel

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation