Jump to page: 1 25  
Page
Thread overview
BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Apr 03, 2007
Don Clugston
Apr 03, 2007
Walter Bright
Apr 04, 2007
Don Clugston
Apr 04, 2007
Georg Wrede
Apr 04, 2007
Walter Bright
Apr 05, 2007
David B. Held
Apr 05, 2007
Walter Bright
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates,
Apr 05, 2007
Dan
Apr 08, 2007
janderson
Apr 05, 2007
Don Clugston
Apr 05, 2007
Georg Wrede
From http://www.ddj.com/authors.htm
Apr 05, 2007
Georg Wrede
Apr 05, 2007
Walter Bright
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates,
Apr 05, 2007
Dan
Apr 05, 2007
Georg Wrede
Apr 06, 2007
Bill Baxter
Apr 06, 2007
Don Clugston
Apr 03, 2007
Walter Bright
Apr 03, 2007
Pragma
Apr 07, 2007
Bruno Medeiros
Apr 09, 2007
Pragma
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates,
Apr 09, 2007
Dan
Apr 09, 2007
Dave
Apr 09, 2007
Pragma
Apr 10, 2007
Paul Findlay
Apr 10, 2007
Dan
Apr 09, 2007
Don Clugston
Apr 09, 2007
Pragma
Apr 10, 2007
Don Clugston
Apr 10, 2007
Walter Bright
Apr 10, 2007
Don Clugston
Apr 10, 2007
Walter Bright
Apr 10, 2007
Pragma
Apr 10, 2007
Pragma
Apr 10, 2007
KlausO
Apr 10, 2007
Pragma
Apr 10, 2007
KlausO
Apr 10, 2007
Pragma
Apr 10, 2007
BCS
Apr 10, 2007
Walter Bright
Apr 03, 2007
Witold Baryluk
Apr 04, 2007
Don Clugston
Apr 04, 2007
Davidl
Apr 05, 2007
Jascha Wetzel
Apr 06, 2007
Mikola Lysenko
Apr 06, 2007
torhu
Apr 06, 2007
Bill Baxter
Apr 06, 2007
Mikola Lysenko
April 03, 2007
I have been trying to come up with a convincing use case for the new mixins (and for metaprogramming in general). My best effort to date can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d

It generates near-optimal x87 asm code for BLAS1-style basic vector operations. 32, 64 and 80 bit vectors are all supported.

Compile with -version=BladeDebug to see the asm code which is generated.

Typical usage:

void main()
{
    auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
    auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
    auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
    auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
    real d = dot(r, p+r+r);
    ireal e = dot(r, z);
    q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
    d = dot(r, p+r+r);
}

Notice that mixed-length operations (real[] + float[] - double[]) are supported.

Like the C++ Blitz++ library, expression templates are used to convert vector expressions into efficient element-wise operations. Unlike that library, however, there is no reliance on the compiler's optimiser. Instead, the expression template is manipulated as text, converted into postfix, and then passed to a simple CTFE compile-time assembler, which creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of x87 assembler. In fact, you probably need to have read Agner Fog's superb Pentium optimisation manual (www.agner.org).

Some observations:
* I was amazed at how simple the expression template code is (it is somewhat cluttered by the code to check for real/imaginary type mismatch errors).
* I've often read that the x87 floating-point stack is notoriously difficult for compilers to write code for, but it works quite well in this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of 'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new mixins are. In this case, the mixin forms the _entire_ body of the function. This is an interesting situation which I think a language purist will find more palatable.

Enjoy.
April 03, 2007
Don Clugston wrote:
> I have been trying to come up with a convincing use case for the new mixins (and for metaprogramming in general). My best effort to date can be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d

This is pretty incredible.

Can I beg you to write an article about it?
April 03, 2007
Don Clugston wrote:
> * I've often read that the x87 floating-point stack is notoriously difficult for compilers to write code for, but it works quite well in this case.

The hard part is doing register allocation. Generating code for the stack machine is trivial. There are some complications for dealing with 87 stack overflows.
April 03, 2007
Don Clugston wrote:
> I have been trying to come up with a convincing use case for the new mixins (and for metaprogramming in general). My best effort to date can be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
> 
> It generates near-optimal x87 asm code for BLAS1-style basic vector operations. 32, 64 and 80 bit vectors are all supported.
> 
> Compile with -version=BladeDebug to see the asm code which is generated.
> 
> Typical usage:
> 
> void main()
> {
>     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>     real d = dot(r, p+r+r);
>     ireal e = dot(r, z);
>     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>     d = dot(r, p+r+r);
> }
> 
> Notice that mixed-length operations (real[] + float[] - double[]) are supported.
> 
> Like the C++ Blitz++ library, expression templates are used to convert vector expressions into efficient element-wise operations. Unlike that library, however, there is no reliance on the compiler's optimiser. Instead, the expression template is manipulated as text, converted into postfix, and then passed to a simple CTFE compile-time assembler, which creates highly efficient asm code which is used as a mixin.
> To understand the later parts of the code, you need some knowledge of x87 assembler. In fact, you probably need to have read Agner Fog's superb Pentium optimisation manual (www.agner.org).
> 
> Some observations:
> * I was amazed at how simple the expression template code is (it is somewhat cluttered by the code to check for real/imaginary type mismatch errors).
> * I've often read that the x87 floating-point stack is notoriously difficult for compilers to write code for, but it works quite well in this case.
> * The major workarounds are:
> - inability to use a tuple element directly from asm code (bug #1028);
> - inability to define operators for built-in arrays (hence the use of 'Vec' wrappers).
> - inability to index through a tuple in a CTFE function (solved by converting types into a string).
> * There have been mutterings about how unhygenic/dangerous the new mixins are. In this case, the mixin forms the _entire_ body of the function. This is an interesting situation which I think a language purist will find more palatable.
> 
> Enjoy.

This is a work of art Don - it's practically a compiler extension.  Nice job. :)

For others that are interested in how this actually gets the job done, I found this in your documentation:

* THEORY:
* Expression templates are used to create an expression string of the form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is converted to
* a string containing x87 asm, which is then mixed into a function which accepts the tuple.

-- 
- EricAnderton at yahoo
April 03, 2007
Hello,

Very good work. I'm impressed. Actually i was developing very
similar program, but yours is practicly all I was needing.
I'm developing linear algebra package in D, and now I'm optimising
it for vector machines, so your BLAS1-like pacakge will be
helpful. :)

Thx.

-- 
Witold Baryluk
April 04, 2007
i think compilers have feeled a great challenge from compile-time back-end like power

> I have been trying to come up with a convincing use case for the new mixins (and for metaprogramming in general). My best effort to date can be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
>
> It generates near-optimal x87 asm code for BLAS1-style basic vector operations. 32, 64 and 80 bit vectors are all supported.
>
> Compile with -version=BladeDebug to see the asm code which is generated.
>
> Typical usage:
>
> void main()
> {
>      auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>      auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>      auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>      auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>      real d = dot(r, p+r+r);
>      ireal e = dot(r, z);
>      q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>      d = dot(r, p+r+r);
> }
>
> Notice that mixed-length operations (real[] + float[] - double[]) are supported.
>
> Like the C++ Blitz++ library, expression templates are used to convert vector expressions into efficient element-wise operations. Unlike that library, however, there is no reliance on the compiler's optimiser. Instead, the expression template is manipulated as text, converted into postfix, and then passed to a simple CTFE compile-time assembler, which creates highly efficient asm code which is used as a mixin.
> To understand the later parts of the code, you need some knowledge of x87 assembler. In fact, you probably need to have read Agner Fog's superb Pentium optimisation manual (www.agner.org).
>
> Some observations:
> * I was amazed at how simple the expression template code is (it is somewhat cluttered by the code to check for real/imaginary type mismatch errors).
> * I've often read that the x87 floating-point stack is notoriously difficult for compilers to write code for, but it works quite well in this case.
> * The major workarounds are:
> - inability to use a tuple element directly from asm code (bug #1028);
> - inability to define operators for built-in arrays (hence the use of 'Vec' wrappers).
> - inability to index through a tuple in a CTFE function (solved by converting types into a string).
> * There have been mutterings about how unhygenic/dangerous the new mixins are. In this case, the mixin forms the _entire_ body of the function. This is an interesting situation which I think a language purist will find more palatable.
>
> Enjoy.

April 04, 2007
Witold Baryluk wrote:
> Hello,
> 
> Very good work. I'm impressed. Actually i was developing very
> similar program, but yours is practicly all I was needing.
> I'm developing linear algebra package in D, and now I'm optimising
> it for vector machines, so your BLAS1-like pacakge will be
> helpful. :)

Do you intend to implement BLAS2 and BLAS3-like functionality? I feel that I don't know enough about cache-efficient matrix blocking techniques to be confident in presenting code. I don't know how sophisticated the techniques are in libraries like ATLAS, but the ones used in Blitz++ should be easy to compete with.

Blade is still mostly proof-of-concept rather than being industrial-strength -- there are so many possibilities for improvement, it's not well tested, and I haven't even put any effort into making the code look nice. But as Davidl said, it shows that hard-core back-end optimisation can now be done in a library at compile time.

The code is quite modularised, so it would be straightforward to add a check for 'is it SSE-able' (ie, are all the vectors floats) and 'is it SSE2-able' (are all the vectors doubles), and if so, send them off into specialised assemblers, otherwise send them to the x87 one. The postfix-ing step will involve searching for *+ combinations where fma can be used. The ability to know the vector length in many cases can be a huge advantage for SSE code, where loop unrolling by a factor of 2 or 4 is always necessary. I haven't got that far yet.

I'm pretty sure that this type of technique can blow almost everything else out of the water. <g>
April 04, 2007
Walter Bright wrote:
> Don Clugston wrote:
>> I have been trying to come up with a convincing use case for the new mixins (and for metaprogramming in general). My best effort to date can be found at:
>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
> 
> This is pretty incredible.

I thought you'd like it. The sheer power of D these days...
"Flies like a rocket, steers like a bicycle."

> Can I beg you to write an article about it?

OK. Are you thinking somewhere like DDJ?

It's nicely focused (central idea being something like "D's metaprogramming is so powerful, that compiler back-end optimisation can easily be performed in compile-time libraries") -- but it would require loads of explanation of D's coolest features. Could be a challenge to make it easily readable.

But with the proposed changes to 'const', macros, and previous discussion about whether tuples should be automatically flattened, I have doubts over how stable that part of the language is. I wouldn't want to write something that didn't compile any more, or looked hopelessly obsolete by the time it was published.
I've been bitten before -- CTFE blasted almost all of my previous metaprogramming code into the stone age <g>. What's the shelf life of this stuff?.
April 04, 2007
Don Clugston wrote:
> Could be a challenge to make it easily readable.

I've some experience proofreading and commenting similar articles. Only too happy to oblige!

Also, DDJ readers may not be the CUJ crowd, but still, they can handle advanced stuff. Besides, some similar magazines help you by rewriting the article anyway, because then it has the same overall "feel" that their other articles. Articles in SciAm for example feel the same, no matter who or from what continent is presented as the author.
April 04, 2007
Don Clugston wrote:
> Walter Bright wrote:
>> Can I beg you to write an article about it?
> 
> OK. Are you thinking somewhere like DDJ?

Anywhere!

> It's nicely focused (central idea being something like "D's metaprogramming is so powerful, that compiler back-end optimisation can easily be performed in compile-time libraries") -- but it would require loads of explanation of D's coolest features. Could be a challenge to make it easily readable.
> 
> But with the proposed changes to 'const', macros, and previous discussion about whether tuples should be automatically flattened, I have doubts over how stable that part of the language is. I wouldn't want to write something that didn't compile any more, or looked hopelessly obsolete by the time it was published.
> I've been bitten before -- CTFE blasted almost all of my previous metaprogramming code into the stone age <g>. What's the shelf life of this stuff?.

Good question. I've already fixed #1028. But even if it looks stone age 6 months from now, it's still a decade ahead of any other language.
« First   ‹ Prev
1 2 3 4 5