On 4/2/2013 10:58 PM, Kai Nacke wrote:

While I understand your argumentation I still feel a bit uncomfortable with it. It creates a situation in which you can't tell me if a D program will compile by reading the source if I dont' tell you the target architecture. I think this is something really new.
The current situation is that conditional compiling is used if interfaces etc. are not globally available.  This principle is now broken by the "invisible" rules which determines the availability of vector operations.

Performant SIMD code is simply not portable between architectures. The programmer writing SIMD code ought to be guaranteed he's getting SIMD code, not workaround code that is 100x slower. I view a compiler error as being far more visible than silently generating unacceptably slow code.




My approach would be to define the following: if D_SIMD is defined then only the optimal vector operations are available. This ensures your goal of generating code with optimal performance. If D_SIMD is not defined but a vendor specific SIMD implementation is available then the rules of this implementation hold (which may include generation of "workaround" code). This has the advantage of being explicit:

    version(D_SIMD)
    {
        uint4 w = ...;
        uint4 v = w << 1;
    }
    else version(XYZ_SIMD)
    {
        uint4 w = ..., x = ...;
        // Not allowed by DMD . only fast on altivec
        uint4 v = x << w;
    }

Or do I miss something?

All the programmer really needs to do is use a version statement on the architecture for the SIMD code for that architecture, and then have a default with the workaround code. The point here will be that he *knowingly* selects the slow workaround code. This is critical for a systems programming language where programmers writing SIMD code are not always experts at dumping the compiler output to see what was generated.
2nd try: core.bitop.popcnt is a "workaround" for a missing popcnt instruction. LDC provides an intrinsic for popcnt but this is lowered to the "workaround" code if the popcnt instruction is not available. If we apply the same rules then this is verboten.

The workaround code for popcnt isn't 100x slower.