CPU with 4-8 and more cores will be around soon, so pure functions are useful, but "simpler" forms of parallel processing are useful too. OpenMP syntax is not easy, while the syntax of Intel Ct is very short and to me it looks nice enough (it's a complex set of libs for C++):
More info: http://techresearch.intel.com/articles/Tera-Scale/1514.htm
It contains few functions that allow things like:
sumReduce([1, 2, 3, 4]) = 
sumReduce([[1, 2], [3, 4, 5]]) = [3, 12]
sumReduce([(1 -> 1), (2 -> 1), (1-> 2)]) = [(1->3), (2->1)]
Pack([a, b, c, d, e, f], [0, 1, 1, 0, 1, 0]) = [b, c, e]
ShiftRight([a, b, c, d, e, f], , i) = [i, a, b, c, d, e]
RotateLeft([a, b, c, d, e, f], ) = [c, d, e, f, a, b]
Note that they allow much more than the + - / * among arrays, as in the D specs, they work on many kinds of collections, associative arrays too.
Such things may be written at user-level code, but they require a better inlining and the use of SIMD instructions by the compiler; and to be used well, they may enjoy some syntax sugar too (that Intel Ct has already almost enough sugar).
Such things are already built-in in the syntax of languages like Sisal, ParallelPascal, and the future Fortress. They don't solve all parallel processing problems, but they allow to solve some numerically intensive ones.
Soon many different forms of parallel processing will become essential for any high-performance language.