Not auto-vectorization

May 22, 2012

bearophile

May 22, 2012

Denis Shelomovskij

May 22, 2012

Andrew Wiley

May 22, 2012

Martin Nowak

On Reddit they have linked an article that shows auto vectorization in GCC 4.7: http://locklessinc.com/articles/vectorize/ http://www.reddit.com/r/programming/comments/tz6ml/autovectorization_with_gcc_47/ GCC is good, it knows many tricks, it contains a lot of pattern matching code and other code to allow such vectorizations, and that C code is almost transparent & standard (restrict is standard, and I think __builtin_assume_aligned isn't too much hard to #define away when not available. And something like --fast-math is available on most compilers (despite Walter doesn't like it)). So it's good to optimize legacy C code too. But this article also shows why such strategy is not usable for serious purposes. If small changes risk turning off such major optimizations, you can't rely much on them. More generally, writing low-level code and hoping the compiler recovers that high level semantics of the code is a bit ridiculous. It's way better to express that semantics in a more direct way, in a standard way that's understood by all compilers of a language (this also because the code shown in that article has very simple semantics). How is the development of the D SIMD ops going? Are those efforts (maybe with the help of another higher level Phobos lib) going to avoid the silly problems shown in that article? Bye, bearophile

22.05.2012 22:52, bearophile написал: > How is the development of the D SIMD ops going? Are those efforts (maybe > with the help of another higher level Phobos lib) going to avoid the > silly problems shown in that article? So the question is: do we need `aligned(T)(T)` function? It can work like current `scoped` implementation e.g.: https://github.com/D-Programming-Language/phobos/pull/570/files#L0R3096 Thoughts? -- Денис В. Шеломовский Denis V. Shelomovskij

May 22, 2012

Re: Not auto-vectorization

Posted by Andrew Wiley
in reply to bearophile

Permalink

Andrew Wiley

Posted in reply to bearophile

Attachments:

text/html part

Permalink

On Tue, May 22, 2012 at 1:52 PM, bearophile <bearophileHUGS@lycos.com>wrote:

> On Reddit they have linked an article that shows auto vectorization in GCC 4.7:
>
> http://locklessinc.com/**articles/vectorize/<http://locklessinc.com/articles/vectorize/>
>
> http://www.reddit.com/r/**programming/comments/tz6ml/** autovectorization_with_gcc_47/<http://www.reddit.com/r/programming/comments/tz6ml/autovectorization_with_gcc_47/>
>
> GCC is good, it knows many tricks, it contains a lot of pattern matching code and other code to allow such vectorizations, and that C code is almost transparent & standard (restrict is standard, and I think __builtin_assume_aligned isn't too much hard to #define away when not available. And something like --fast-math is available on most compilers (despite Walter doesn't like it)). So it's good to optimize legacy C code too.
>
> But this article also shows why such strategy is not usable for serious purposes. If small changes risk turning off such major optimizations, you can't rely much on them. More generally, writing low-level code and hoping the compiler recovers that high level semantics of the code is a bit ridiculous. It's way better to express that semantics in a more direct way, in a standard way that's understood by all compilers of a language (this also because the code shown in that article has very simple semantics).
>

This is also why building a compiler that outputs C is a bad idea. Performance inevitably suffers because the C output must have same or tighter semantic requirements than the input code, and high level optimizations are more difficult.

> GCC is good, it knows many tricks, it contains a lot of pattern matching code and other code to allow such vectorizations, and that C code is almost transparent & standard (restrict is standard, and I think __builtin_assume_aligned isn't too much hard to #define away when not available. And something like --fast-math is available on most compilers (despite Walter doesn't like it)). So it's good to optimize legacy C code too. I was really surprised that all vectorization approaches seem to be restricted to loops. I'd think that loop unrolling + arithmetic vectorization should achieve most of a specialized loop vectorization. http://forum.dlang.org/post/jf1s30$14mj$1@digitalmars.com https://github.com/D-Programming-Language/phobos/blob/master/std/numeric.d#L2329

Forums