On 4/2/2013 10:58 PM, Kai Nacke wrote:
While I understand your argumentation I still feel a bit
uncomfortable with it. It creates a situation in which you can't
tell me if a D program will compile by reading the source if I
dont' tell you the target architecture. I think this is
something really new.
The current situation is that conditional compiling is used if
interfaces etc. are not globally available. This principle is
now broken by the "invisible" rules which determines the
availability of vector operations.
Performant SIMD code is simply not portable between architectures.
The programmer writing SIMD code ought to be guaranteed he's getting
SIMD code, not workaround code that is 100x slower. I view a
compiler error as being far more visible than silently generating
unacceptably slow code.
My approach would be to define the following: if D_SIMD is
defined then only the optimal vector operations are available.
This ensures your goal of generating code with optimal
performance. If D_SIMD is not defined but a vendor specific SIMD
implementation is available then the rules of this
implementation hold (which may include generation of
"workaround" code). This has the advantage of being explicit:
version(D_SIMD)
{
uint4 w = ...;
uint4 v = w << 1;
}
else version(XYZ_SIMD)
{
uint4 w = ..., x = ...;
// Not allowed by DMD . only fast on altivec
uint4 v = x << w;
}
Or do I miss something?
All the programmer really needs to do is use a version statement on
the architecture for the SIMD code for that architecture, and then
have a default with the workaround code. The point here will be that
he *knowingly* selects the slow workaround code. This is critical
for a systems programming language where programmers writing SIMD
code are not always experts at dumping the compiler output to see
what was generated.
2nd try: core.bitop.popcnt
is a "workaround" for a missing popcnt instruction. LDC provides
an intrinsic for popcnt but this is lowered to the "workaround"
code if the popcnt instruction is not available. If we apply the
same rules then this is verboten.
The workaround code for popcnt isn't 100x slower.