April 06, 2007
Excellent work!  I have to say that I've had my misgivings about mixin statements, but I'm glad to see that BLADE and Don's hard work have proven me wrong.  Compile time vector expressions are desperately needed, and this library goes a long way to sorting things out.  As far as meta-type tricks go, this is by far the coolest I've ever seen in any  programming language.

One very interesting application I would like to see with this technique would be a compile time geometric algebra library.  If no one writes one by this summer, I will probably do it myself.  (Actually, when I heard the name BLADE, my heart skipped a beat thinking it was a GA library, but alas it is not...)  Maybe it is my unrestrained exuberance, I would guess that a strong D implementation should easily crush the leading C++ GA lib, GAIGEN in both performance and ease of use.

For reference, here is the GAIGEN homepage:
http://www.science.uva.nl/ga/gaigen/content_gaigen.html

-Mikola Lysenko
April 06, 2007
Mikola Lysenko wrote:
<snip>
> One very interesting application I would like to see with this technique would be a compile time geometric algebra library.  If no one writes one by this summer, I will probably do it myself.  (Actually, when I heard the name BLADE, my heart skipped a beat thinking it was a GA library, but alas it is not...)  Maybe it is my unrestrained exuberance, I would guess that a strong D implementation should easily crush the leading C++ GA lib, GAIGEN in both performance and ease of use.
> 


Oh my GAD?  Get it? ;)
April 06, 2007
Georg Wrede wrote:
> Dan wrote:
>> Walter Bright Wrote:
>>
>>> Just wait 'till you get AST macros!
>>>
>>> But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months. This needs to get out now, - I suggest www.artima.com, followed up with dzone, slashdot, and digg.
>>>
>>> It would also be a real treat to have you present this at Amazon's D conference in August.
>>
>>
>> Wow.  That's the most excited I've seen Walter.  : )
> 
> LOL, have to admit I agree!!!

It really is very cool and all, but did I just miss the benchmarks?  I need numbers, man!   You can generate all the assembly you want, but at the end of the day its the numbers that people care about.

--bb
April 06, 2007
Mikola Lysenko wrote:
> Excellent work!  I have to say that I've had my misgivings about mixin statements, but I'm glad to see that BLADE and Don's hard work have proven me wrong.  Compile time vector expressions are desperately needed, and this library goes a long way to sorting things out.  As far as meta-type tricks go, this is by far the coolest I've ever seen in any  programming language.
> 
> One very interesting application I would like to see with this technique would be a compile time geometric algebra library.  If no one writes one by this summer, I will probably do it myself.  (Actually, when I heard the name BLADE, my heart skipped a beat thinking it was a GA library, but alas it is not...)  Maybe it is my unrestrained exuberance, I would guess that a strong D implementation should easily crush the leading C++ GA lib, GAIGEN in both performance and ease of use.
> 
> For reference, here is the GAIGEN homepage:
> http://www.science.uva.nl/ga/gaigen/content_gaigen.html
> 
> -Mikola Lysenko

Wow, that looks really sweet.  Last I heard on GA was that it was good for working out the math, but too slow for real use.  Now it's fast too apparently!  Now I've really got to learn GA.  And a D compile-time fu version would be neat.

One question: from the GAIGEN page it sounds almost like GAIGEN generates off-line a set of primitive operations that are efficient for the chosen specific GA domain (3D euclidean or whatever).  If so, then that sounds pretty good to me.  What's wrong with using an off-line approach?  Are there some expressions things that are just too difficult to optimize ahead of time without seeing the whole expression as it will actually used?  Or is it more like GAIGEN creates the primitives operations for a particular GA like dot product, cross product, and matrix multiplies?  If so then it still maybe leaves a lot of optimization opportunities on the floor.  Like evaluating A*B*C*vec as ((A*B)*C)*vec vs. (A*(B*(C*vec))) can make a big difference if ABC are big matrices and vec is a small vector.

--bb
April 06, 2007
Bill Baxter wrote:
> Mikola Lysenko wrote:
>> Excellent work!  I have to say that I've had my misgivings about mixin statements, but I'm glad to see that BLADE and Don's hard work have proven me wrong.  Compile time vector expressions are desperately needed, and this library goes a long way to sorting things out.  As far as meta-type tricks go, this is by far the coolest I've ever seen in any  programming language.
>>
>> One very interesting application I would like to see with this technique would be a compile time geometric algebra library.  If no one writes one by this summer, I will probably do it myself.  (Actually, when I heard the name BLADE, my heart skipped a beat thinking it was a GA library, but alas it is not...)  Maybe it is my unrestrained exuberance, I would guess that a strong D implementation should easily crush the leading C++ GA lib, GAIGEN in both performance and ease of use.
>>
>> For reference, here is the GAIGEN homepage:
>> http://www.science.uva.nl/ga/gaigen/content_gaigen.html
>>
>> -Mikola Lysenko
> 
> Wow, that looks really sweet.  Last I heard on GA was that it was good for working out the math, but too slow for real use.  Now it's fast too apparently!  Now I've really got to learn GA.  And a D compile-time fu version would be neat.
> 
> One question: from the GAIGEN page it sounds almost like GAIGEN generates off-line a set of primitive operations that are efficient for the chosen specific GA domain (3D euclidean or whatever).  If so, then that sounds pretty good to me.  What's wrong with using an off-line approach?  Are there some expressions things that are just too difficult to optimize ahead of time without seeing the whole expression as it will actually used?  Or is it more like GAIGEN creates the primitives operations for a particular GA like dot product, cross product, and matrix multiplies?  If so then it still maybe leaves a lot of optimization opportunities on the floor.  Like evaluating A*B*C*vec as ((A*B)*C)*vec vs. (A*(B*(C*vec))) can make a big difference if ABC are big matrices and vec is a small vector.
> 
> --bb

As I understand it, GAIGEN uses an external program to generate the implementation, and the implementation uses template expressions to actually perform the optimizations.  The process seems a bit contrived to me, but I'm sure they have their reasons.  Also if you look at the performance statistics on that site, you will see that the geometric algebra implementation is actually SLOWER than the standard linear algebra (but not by much.)  The main advantage of GA when programming is that the code is shorter and simpler.

Recently, GA has been getting some more widespread acceptance and commercial success.  Take Geomerics for example; they're a small company that specializes in graphics and physics software based on GA inspired algorithms.  Here is there website:

http://www.geomerics.com

They've got a very impressive realtime radiosity engine that seems to have gotten some press in the last few weeks.  There's also some interesting stuff in the technology section on their website.

Writing a GA library for D is definitely on my todo list (right after I get done with this semester of classes.)  I will probably work on it this summer unless someone beats me to it.

-Mik
April 06, 2007
Bill Baxter wrote:
> Georg Wrede wrote:
>> Dan wrote:
>>> Walter Bright Wrote:
>>>
>>>> Just wait 'till you get AST macros!
>>>>
>>>> But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months. This needs to get out now, - I suggest www.artima.com, followed up with dzone, slashdot, and digg.
>>>>
>>>> It would also be a real treat to have you present this at Amazon's D conference in August.
>>>
>>>
>>> Wow.  That's the most excited I've seen Walter.  : )
>>
>> LOL, have to admit I agree!!!
> 
> It really is very cool and all, but did I just miss the benchmarks?  I need numbers, man!   You can generate all the assembly you want, but at the end of the day its the numbers that people care about.
> 
> --bb

You're right, but it wasn't posted as a finished product!
I post this sort of thing every now and again, largely to give Walter some feedback as to what the language is doing at the bleeding edge. In this case, I thought it would be relevant to AST macro development.
Also I posted this to get an idea if there was much interest in this sort of thing -- I don't have a great deal of free time, so I want to make good use of it. I only got the code working a few days ago.
The speed of the code will likely be affected by how well DMD optimises the tuple copying. I even haven't looked at what code it's actually generating.

April 07, 2007
Pragma wrote:
> Don Clugston wrote:
>> I have been trying to come up with a convincing use case for the new mixins (and for metaprogramming in general). My best effort to date can be found at:
>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
>>
>> It generates near-optimal x87 asm code for BLAS1-style basic vector operations. 32, 64 and 80 bit vectors are all supported.
>>
>> Compile with -version=BladeDebug to see the asm code which is generated.
>>
>> Typical usage:
>>
>> void main()
>> {
>>     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>>     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>>     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>>     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>>     real d = dot(r, p+r+r);
>>     ireal e = dot(r, z);
>>     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>>     d = dot(r, p+r+r);
>> }
>>
>> Notice that mixed-length operations (real[] + float[] - double[]) are supported.
>>
>> Like the C++ Blitz++ library, expression templates are used to convert vector expressions into efficient element-wise operations. Unlike that library, however, there is no reliance on the compiler's optimiser. Instead, the expression template is manipulated as text, converted into postfix, and then passed to a simple CTFE compile-time assembler, which creates highly efficient asm code which is used as a mixin.
>> To understand the later parts of the code, you need some knowledge of x87 assembler. In fact, you probably need to have read Agner Fog's superb Pentium optimisation manual (www.agner.org).
>>
>> Some observations:
>> * I was amazed at how simple the expression template code is (it is somewhat cluttered by the code to check for real/imaginary type mismatch errors).
>> * I've often read that the x87 floating-point stack is notoriously difficult for compilers to write code for, but it works quite well in this case.
>> * The major workarounds are:
>> - inability to use a tuple element directly from asm code (bug #1028);
>> - inability to define operators for built-in arrays (hence the use of 'Vec' wrappers).
>> - inability to index through a tuple in a CTFE function (solved by converting types into a string).
>> * There have been mutterings about how unhygenic/dangerous the new mixins are. In this case, the mixin forms the _entire_ body of the function. This is an interesting situation which I think a language purist will find more palatable.
>>
>> Enjoy.
> 
> This is a work of art Don - it's practically a compiler extension.  Nice job. :)
> 
> For others that are interested in how this actually gets the job done, I found this in your documentation:
> 
> * THEORY:
> * Expression templates are used to create an expression string of the form "(a+b*c)+d"
> * and a tuple, the entries of which correspond to a, b, c, d, ...
> * This string is converted to postfix. The postfix string is converted to
> * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
> 

Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
April 08, 2007
Dan wrote:
> Current compilers tend to dump IR's to a buffer and then adjust them according to parser rules.  The wrong approach.
> 
> I totally want to see this kind of programming get implemented into the compiler itself.  I know it's too much to ask, but if DMD could generate optimal floating point math and cache pipelines, it would rend the gaming industry from C++'s hands REAL quick like.
> 
> A C++ programmer can say:
> "oh yeah, it's a minor upgrade made for my convenience, but I'm already conveniently comfortable with C++"...
> 
> Until a decision maker sees that D performs 20% better, dramatically reduces debugging time, cuts source code volume, and has better version control.

That would be nice and would help however I think it'll take more then that to convince a Game company to switch to D, particularly ones using XNA.

It would have to be a startup company that is not working 360, perhaps a Nintendo, sony, Linux or perhaps windows.  I say perhaps, because if a company is going to switch to anything its probably XNA.  Now if we had D for Net, maybe the story would be slightly different.

I do like the idea of using D for games, I just see it as being a hard sell.  Scripting and tools are probably the main gateway for D into the game programming world.

Once a few game companies start using it in their main project and can demonstrate productivity/speed improvements then the wall may indeed start to crumble.

-Joel
April 09, 2007
Bruno Medeiros wrote:
> Pragma wrote:
>> Don Clugston wrote:
>>> I have been trying to come up with a convincing use case for the new mixins (and for metaprogramming in general). My best effort to date can be found at:
>>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d 
>>>
>>>
>>> It generates near-optimal x87 asm code for BLAS1-style basic vector operations. 32, 64 and 80 bit vectors are all supported.
>>>
>>> Compile with -version=BladeDebug to see the asm code which is generated.
>>>
>>> Typical usage:
>>>
>>> void main()
>>> {
>>>     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>>>     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>>>     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>>>     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>>>     real d = dot(r, p+r+r);
>>>     ireal e = dot(r, z);
>>>     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>>>     d = dot(r, p+r+r);
>>> }
>>>
>>> Notice that mixed-length operations (real[] + float[] - double[]) are supported.
>>>
>>> Like the C++ Blitz++ library, expression templates are used to convert vector expressions into efficient element-wise operations. Unlike that library, however, there is no reliance on the compiler's optimiser. Instead, the expression template is manipulated as text, converted into postfix, and then passed to a simple CTFE compile-time assembler, which creates highly efficient asm code which is used as a mixin.
>>> To understand the later parts of the code, you need some knowledge of x87 assembler. In fact, you probably need to have read Agner Fog's superb Pentium optimisation manual (www.agner.org).
>>>
>>> Some observations:
>>> * I was amazed at how simple the expression template code is (it is somewhat cluttered by the code to check for real/imaginary type mismatch errors).
>>> * I've often read that the x87 floating-point stack is notoriously difficult for compilers to write code for, but it works quite well in this case.
>>> * The major workarounds are:
>>> - inability to use a tuple element directly from asm code (bug #1028);
>>> - inability to define operators for built-in arrays (hence the use of 'Vec' wrappers).
>>> - inability to index through a tuple in a CTFE function (solved by converting types into a string).
>>> * There have been mutterings about how unhygenic/dangerous the new mixins are. In this case, the mixin forms the _entire_ body of the function. This is an interesting situation which I think a language purist will find more palatable.
>>>
>>> Enjoy.
>>
>> This is a work of art Don - it's practically a compiler extension.  Nice job. :)
>>
>> For others that are interested in how this actually gets the job done, I found this in your documentation:
>>
>> * THEORY:
>> * Expression templates are used to create an expression string of the form "(a+b*c)+d"
>> * and a tuple, the entries of which correspond to a, b, c, d, ...
>> * This string is converted to postfix. The postfix string is converted to
>> * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
>>
> 
> Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation?
> 

I know Don wrote this lib, but I think I can answer this.

In short: no.  But it's really an unintentionally loaded question: there are some rather severe limitations as to what kind of data structures you can create at compile time.  You're basically limited to tuples and strings, each of which have drawbacks of their own.

You can create a tree using tuples, but then you run into problems with over-running the limitations on identifier length built into the compiler.  Strings are no where near as flexible out of the box, but pose no storage limitations, which gives a slight edge to string-based representations of data.  With some clever coding, strings can be made to store just about any structure imaginable (even if it does make for some ugly code).

-- 
- EricAnderton at yahoo
April 09, 2007
Pragma Wrote:
> In short: no.  But it's really an unintentionally loaded question: there are some rather severe limitations as to what kind of data structures you can create at compile time.  You're basically limited to tuples and strings, each of which have drawbacks of their own.
> 
> You can create a tree using tuples, but then you run into problems with over-running the limitations on identifier length built into the compiler.  Strings are no where near as flexible out of the box, but pose no storage limitations, which gives a slight edge to string-based representations of data.  With some clever coding, strings can be made to store just about any structure imaginable (even if it does make for some ugly code).

Well said.  Once the AST is reflected at compile time, I'm sure the code can be made vastly more refined; and perhaps additional code generators can be developed and *hopefully* integrated into D.

Particular generators that spark my interest tend to have to do with x87, SSE extensions, and x86-64.  Most compilers to date don't properly use these functionalities.  Having them integrated into D Agner Fog-optimally would hugely replace alot of C++ gaming, video, rendering and graphics engine code.

*bigsmile