View mode: basic / threaded / horizontal-split · Log in · Help
April 03, 2007
BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
I have been trying to come up with a convincing use case for the new 
mixins (and for metaprogramming in general). My best effort to date can 
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d

It generates near-optimal x87 asm code for BLAS1-style basic vector 
operations. 32, 64 and 80 bit vectors are all supported.

Compile with -version=BladeDebug to see the asm code which is generated.

Typical usage:

void main()
{
    auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
    auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
    auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
    auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
    real d = dot(r, p+r+r);
    ireal e = dot(r, z);
    q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
    d = dot(r, p+r+r);
}

Notice that mixed-length operations (real[] + float[] - double[]) are 
supported.

Like the C++ Blitz++ library, expression templates are used to convert 
vector expressions into efficient element-wise operations. Unlike that 
library, however, there is no reliance on the compiler's optimiser. 
Instead, the expression template is manipulated as text, converted into 
postfix, and then passed to a simple CTFE compile-time assembler, which 
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of 
x87 assembler. In fact, you probably need to have read Agner Fog's 
superb Pentium optimisation manual (www.agner.org).

Some observations:
* I was amazed at how simple the expression template code is (it is 
somewhat cluttered by the code to check for real/imaginary type mismatch 
errors).
* I've often read that the x87 floating-point stack is notoriously 
difficult for compilers to write code for, but it works quite well in 
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of 
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by 
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new 
mixins are. In this case, the mixin forms the _entire_ body of the 
function. This is an interesting situation which I think a language 
purist will find more palatable.

Enjoy.
April 03, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston wrote:
> I have been trying to come up with a convincing use case for the new 
> mixins (and for metaprogramming in general). My best effort to date can 
> be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d

This is pretty incredible.

Can I beg you to write an article about it?
April 03, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston wrote:
> * I've often read that the x87 floating-point stack is notoriously 
> difficult for compilers to write code for, but it works quite well in 
> this case.

The hard part is doing register allocation. Generating code for the 
stack machine is trivial. There are some complications for dealing with 
87 stack overflows.
April 03, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston wrote:
> I have been trying to come up with a convincing use case for the new 
> mixins (and for metaprogramming in general). My best effort to date can 
> be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
> 
> It generates near-optimal x87 asm code for BLAS1-style basic vector 
> operations. 32, 64 and 80 bit vectors are all supported.
> 
> Compile with -version=BladeDebug to see the asm code which is generated.
> 
> Typical usage:
> 
> void main()
> {
>     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>     real d = dot(r, p+r+r);
>     ireal e = dot(r, z);
>     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>     d = dot(r, p+r+r);
> }
> 
> Notice that mixed-length operations (real[] + float[] - double[]) are 
> supported.
> 
> Like the C++ Blitz++ library, expression templates are used to convert 
> vector expressions into efficient element-wise operations. Unlike that 
> library, however, there is no reliance on the compiler's optimiser. 
> Instead, the expression template is manipulated as text, converted into 
> postfix, and then passed to a simple CTFE compile-time assembler, which 
> creates highly efficient asm code which is used as a mixin.
> To understand the later parts of the code, you need some knowledge of 
> x87 assembler. In fact, you probably need to have read Agner Fog's 
> superb Pentium optimisation manual (www.agner.org).
> 
> Some observations:
> * I was amazed at how simple the expression template code is (it is 
> somewhat cluttered by the code to check for real/imaginary type mismatch 
> errors).
> * I've often read that the x87 floating-point stack is notoriously 
> difficult for compilers to write code for, but it works quite well in 
> this case.
> * The major workarounds are:
> - inability to use a tuple element directly from asm code (bug #1028);
> - inability to define operators for built-in arrays (hence the use of 
> 'Vec' wrappers).
> - inability to index through a tuple in a CTFE function (solved by 
> converting types into a string).
> * There have been mutterings about how unhygenic/dangerous the new 
> mixins are. In this case, the mixin forms the _entire_ body of the 
> function. This is an interesting situation which I think a language 
> purist will find more palatable.
> 
> Enjoy.

This is a work of art Don - it's practically a compiler extension.  Nice job. :)

For others that are interested in how this actually gets the job done, I found this in your documentation:

* THEORY:
* Expression templates are used to create an expression string of the form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is converted to
* a string containing x87 asm, which is then mixed into a function which accepts the tuple.

-- 
- EricAnderton at yahoo
April 03, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Hello,

Very good work. I'm impressed. Actually i was developing very
similar program, but yours is practicly all I was needing.
I'm developing linear algebra package in D, and now I'm optimising
it for vector machines, so your BLAS1-like pacakge will be
helpful. :)

Thx.

-- 
Witold Baryluk
April 04, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
i think compilers have feeled a great challenge from compile-time back-end  
like power

> I have been trying to come up with a convincing use case for the new  
> mixins (and for metaprogramming in general). My best effort to date can  
> be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
>
> It generates near-optimal x87 asm code for BLAS1-style basic vector  
> operations. 32, 64 and 80 bit vectors are all supported.
>
> Compile with -version=BladeDebug to see the asm code which is generated.
>
> Typical usage:
>
> void main()
> {
>      auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>      auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>      auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>      auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>      real d = dot(r, p+r+r);
>      ireal e = dot(r, z);
>      q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>      d = dot(r, p+r+r);
> }
>
> Notice that mixed-length operations (real[] + float[] - double[]) are  
> supported.
>
> Like the C++ Blitz++ library, expression templates are used to convert  
> vector expressions into efficient element-wise operations. Unlike that  
> library, however, there is no reliance on the compiler's optimiser.  
> Instead, the expression template is manipulated as text, converted into  
> postfix, and then passed to a simple CTFE compile-time assembler, which  
> creates highly efficient asm code which is used as a mixin.
> To understand the later parts of the code, you need some knowledge of  
> x87 assembler. In fact, you probably need to have read Agner Fog's  
> superb Pentium optimisation manual (www.agner.org).
>
> Some observations:
> * I was amazed at how simple the expression template code is (it is  
> somewhat cluttered by the code to check for real/imaginary type mismatch  
> errors).
> * I've often read that the x87 floating-point stack is notoriously  
> difficult for compilers to write code for, but it works quite well in  
> this case.
> * The major workarounds are:
> - inability to use a tuple element directly from asm code (bug #1028);
> - inability to define operators for built-in arrays (hence the use of  
> 'Vec' wrappers).
> - inability to index through a tuple in a CTFE function (solved by  
> converting types into a string).
> * There have been mutterings about how unhygenic/dangerous the new  
> mixins are. In this case, the mixin forms the _entire_ body of the  
> function. This is an interesting situation which I think a language  
> purist will find more palatable.
>
> Enjoy.
April 04, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Witold Baryluk wrote:
> Hello,
> 
> Very good work. I'm impressed. Actually i was developing very
> similar program, but yours is practicly all I was needing.
> I'm developing linear algebra package in D, and now I'm optimising
> it for vector machines, so your BLAS1-like pacakge will be
> helpful. :)

Do you intend to implement BLAS2 and BLAS3-like functionality? I feel 
that I don't know enough about cache-efficient matrix blocking 
techniques to be confident in presenting code. I don't know how 
sophisticated the techniques are in libraries like ATLAS, but the ones 
used in Blitz++ should be easy to compete with.

Blade is still mostly proof-of-concept rather than being 
industrial-strength -- there are so many possibilities for improvement, 
it's not well tested, and I haven't even put any effort into making the 
code look nice. But as Davidl said, it shows that hard-core back-end 
optimisation can now be done in a library at compile time.

The code is quite modularised, so it would be straightforward to add a 
check for 'is it SSE-able' (ie, are all the vectors floats) and 'is it 
SSE2-able' (are all the vectors doubles), and if so, send them off into 
specialised assemblers, otherwise send them to the x87 one. The 
postfix-ing step will involve searching for *+ combinations where fma 
can be used. The ability to know the vector length in many cases can be 
a huge advantage for SSE code, where loop unrolling by a factor of 2 or 
4 is always necessary. I haven't got that far yet.

I'm pretty sure that this type of technique can blow almost everything 
else out of the water. <g>
April 04, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Walter Bright wrote:
> Don Clugston wrote:
>> I have been trying to come up with a convincing use case for the new 
>> mixins (and for metaprogramming in general). My best effort to date 
>> can be found at:
>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
> 
> This is pretty incredible.

I thought you'd like it. The sheer power of D these days...
"Flies like a rocket, steers like a bicycle."

> Can I beg you to write an article about it?

OK. Are you thinking somewhere like DDJ?

It's nicely focused (central idea being something like "D's 
metaprogramming is so powerful, that compiler back-end optimisation can 
easily be performed in compile-time libraries") -- but it would require 
loads of explanation of D's coolest features. Could be a challenge to 
make it easily readable.

But with the proposed changes to 'const', macros, and previous 
discussion about whether tuples should be automatically flattened, I 
have doubts over how stable that part of the language is. I wouldn't 
want to write something that didn't compile any more, or looked 
hopelessly obsolete by the time it was published.
I've been bitten before -- CTFE blasted almost all of my previous 
metaprogramming code into the stone age <g>. What's the shelf life of 
this stuff?.
April 04, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston wrote:
> Could be a challenge to make it easily readable.

I've some experience proofreading and commenting similar articles. Only 
too happy to oblige!

Also, DDJ readers may not be the CUJ crowd, but still, they can handle 
advanced stuff. Besides, some similar magazines help you by rewriting 
the article anyway, because then it has the same overall "feel" that 
their other articles. Articles in SciAm for example feel the same, no 
matter who or from what continent is presented as the author.
April 04, 2007
Re: BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston wrote:
> Walter Bright wrote:
>> Can I beg you to write an article about it?
> 
> OK. Are you thinking somewhere like DDJ?

Anywhere!

> It's nicely focused (central idea being something like "D's 
> metaprogramming is so powerful, that compiler back-end optimisation can 
> easily be performed in compile-time libraries") -- but it would require 
> loads of explanation of D's coolest features. Could be a challenge to 
> make it easily readable.
> 
> But with the proposed changes to 'const', macros, and previous 
> discussion about whether tuples should be automatically flattened, I 
> have doubts over how stable that part of the language is. I wouldn't 
> want to write something that didn't compile any more, or looked 
> hopelessly obsolete by the time it was published.
> I've been bitten before -- CTFE blasted almost all of my previous 
> metaprogramming code into the stone age <g>. What's the shelf life of 
> this stuff?.

Good question. I've already fixed #1028. But even if it looks stone age 
6 months from now, it's still a decade ahead of any other language.
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home