On 25 October 2012 01:00, bearophile <bearophileHUGS@lycos.com> wrote:
Manu:


The compiler would have to do some serious magic to optimise that;
flattening both sides of the if into parallel expressions, and then applying the mask to combine...

I think it's a small amount of magic.

The simple features shown in that paper are fully focused on SIMD programming, so they aren't introducing things clearly not efficient.


I'm personally not in favour of SIMD constructs that are anything less than
optimal (but I appreciate I'm probably in the minority here).


(The simple benchmarks of the paper show a 5-15% performance loss compared
to handwritten SIMD code.)


Right, as I suspected.

15% is a very small performance loss, if for the programmer the alternative is writing scalar code, that is 2 or 3 times slower :-)

The SIMD programmers that can't stand a 1% loss of performance use the intrinsics manually (or write in asm) and they ignore all other things.

A much larger population of system programmers wish to use modern CPUs efficiently, but they don't have time (or skill, this means their programs are too much often buggy) for assembly-level programming. Currently they use smart numerical C++ libraries, use modern Fortran versions, and/or write C/C++ scalar code (or Fortran), add "restrict" annotations, and take a look at the produced asm hoping the modern compiler back-ends will vectorize it. This is not good enough, and it's far from a 15% loss.

This paper shows a third way, making such kind of programming simpler and approachable for a wider audience, with a small performance loss compared to handwritten code. This is what language designers do since 60+ years :-)

I don't disagree with you, it is fairly cool!
I can't can't imagine D adopting those sort of language features any time soon, but it's probably possible.
I guess the keys are defining the bool vector concept, and some tech to flatten both sides of a vector if statement, but that's far from simple... Particularly so if someone puts some unrelated code in those if blocks.
Chances are it offers too much freedom that wouldn't be well used or understood by the average programmer, and that still leaves you in a similar land of only being particularly worthwhile in the hands of a fairly advanced/competent user.
The main error that most people make is thinking SIMD code is faster by nature. Truth is, in the hands of someone who doesn't know precisely what they're doing, SIMD code is almost always slower.
There are some cool new expressions offered here, fairly convenient (although easy[er?] to write in other ways too), but I don't think it would likely change that fundamental premise for the average programmer beyond some very simple parallel constructs that the compiler can easily get right.
I'd certainly love to see it, but is it realistic that someone would take the time to do all of that any time soon when benefits are controversial? It may even open the possibility for un-skilled people to write far worse code.

Let's consider your example above for instance, I would rewrite (given existing syntax):

// vector length of context = 1; current_mask = T
int4 v = [0,3,4,1];
int4 w = 3; // [3,3,3,3] via broadcast
uint4 m = maskLess(v, w); // [T,F,F,T] (T == ones, F == zeroes)
v += int4(1); // [1,4,5,2]

// the if block is trivially rewritten:
int4 trueSide = v + int4(2);
int4 falseSize = v + int4(3);
v = select(m, trueSide, falseSide); // [3,7,8,4]


Or the whole thing further simplified:
int4 v = [0,3,4,1];
int4 w = 3; // [3,3,3,3] via broadcast

// one convenient function does the comparison and select accordingly
v = selectLess(v, w, v + int4(1 + 2), v + int4(1 + 3)); // combine the prior few lines

I actually find this more convenient. I also find the if syntax you demonstrate to be rather deceptive and possibly misleading. 'if' suggests a branch, whereas the construct you demonstrate will evaluate both sides every time. Inexperienced programmers may not really grasp that. Evaluating the true side and the false side inline, and then perform the select serially is more honest; it's actually what the computer will do, and I don't really see it being particularly less convenient either.