Search
Linear Algebra
Aug 03, 2005
mclysenk
Aug 03, 2005
Sean Kelly
Aug 03, 2005
Mike Capp
Aug 03, 2005
mclysenk
Aug 03, 2005
Sean Kelly
Aug 03, 2005
mclysenk
Aug 03, 2005
Sean Kelly
Aug 03, 2005
Knud Sørensen
Aug 28, 2005
Freejack
Sep 07, 2005
Don Clugston
```One year ago, I started learning D.  The language is positively lovely, with handy builtin dynamic arrays and powerful templates.  And it's fast too.

But, there's one irksome flaw D has in common with every other programming language I've encountered: no support for linear algebra. This can be implemented in either a module in the standard library or a language extension. Often my programs make use of vectors and matrices, since they typically involve some sort of computer graphics.  While it is easy enough to write my own classes to take care of this, I find it irksome that I need to reinvent the wheel so many times.

Here are some reasons for a standard vector:
1.  Vectors are frequently used, especially in games, computer graphics, physics
simulations, numerical applications. In some cases, they may even turn up more
frequently than strings.

2.  Implementing an efficient vector is very sensitive to the specifics of the
system and the situation in which it is used.  On most modern machines, there is
some sort of SIMD hardware built specifically to handle these sorts of problems,
but taking advantage of it requires either handcrafted assembly or special
compiler intrinsics to assure the memory is properly aligned and so on.  The
other option is lazy evaluation, which can be very difficult to implement.
In practice, the best approach lies somewhere between the fully vectorized SIMD
approach and lazy evaluation. On an x86, it is best to use lazy evaluation to
break the vectors into chunks of 4 floats, then use SIMD operations to solve
each part.  Implementing this on a case by case basis is rather tedious and
usually reserved only for extremely critical performance bottlenecks.

3.  There are many redundant implementations of vectors floating around, all of which contain essentially the same components.  A standardized implementation would make it easier to write common objects like physics engines and graphics wrappers that would all work together.

I've noticed that on the array page, there is a section on array operations.  If properly implemented, these may be a good start.  However, it would still be nice to have a library for higher mathematics as part of a standard library.

```
```In article <dcp6a2\$45\$1@digitaldaemon.com>, mclysenk@mtu.edu says...
>
>One year ago, I started learning D.  The language is positively lovely, with handy builtin dynamic arrays and powerful templates.  And it's fast too.
>
>But, there's one irksome flaw D has in common with every other programming language I've encountered: no support for linear algebra. This can be implemented in either a module in the standard library or a language extension. Often my programs make use of vectors and matrices, since they typically involve some sort of computer graphics.  While it is easy enough to write my own classes to take care of this, I find it irksome that I need to reinvent the wheel so many times.

For what it's worth, this is a feature Walter seems to be very much interested in.  It will probably be a post-1.0 feature, but I think Walter may also be waiting for a syntax/implementation scheme that is both powerful and elegant (expression templates in C++ are quite powerful, but are a beast to implement).

>Here are some reasons for a standard vector:
>1.  Vectors are frequently used, especially in games, computer graphics, physics
>simulations, numerical applications. In some cases, they may even turn up more
>frequently than strings.
>
>2.  Implementing an efficient vector is very sensitive to the specifics of the
>system and the situation in which it is used.  On most modern machines, there is
>some sort of SIMD hardware built specifically to handle these sorts of problems,
>but taking advantage of it requires either handcrafted assembly or special
>compiler intrinsics to assure the memory is properly aligned and so on.  The
>other option is lazy evaluation, which can be very difficult to implement.
>In practice, the best approach lies somewhere between the fully vectorized SIMD
>approach and lazy evaluation. On an x86, it is best to use lazy evaluation to
>break the vectors into chunks of 4 floats, then use SIMD operations to solve
>each part.  Implementing this on a case by case basis is rather tedious and
>usually reserved only for extremely critical performance bottlenecks.

However this is implemented, you can bet that the bulk of it will be implemented via callbacks in the guts of Phobos.  So if you want to use specialized operations you'll likely be able to, even if Walter doesn't provide this by default.  And as far as Ares is concerned, this is a sufficiently important feature that my first inclination would be to make it easily extensible without such low-level modifications, so perhaps this will happen in Phobos as well (assuming the two remain separate entities, that is).

>I've noticed that on the array page, there is a section on array operations.  If properly implemented, these may be a good start.  However, it would still be nice to have a library for higher mathematics as part of a standard library.

If you have some good ideas, by all means propose them.  Walter tends to be quite receptive to this sort of thing.

Sean

```
```In article <dcp6a2\$45\$1@digitaldaemon.com>, mclysenk@mtu.edu says...
>
>Here are some reasons for a standard vector:
>1.  Vectors are frequently used, especially in games, computer graphics, physics
>simulations, numerical applications. In some cases, they may even turn up more
>frequently than strings.

True, but I'm wondering if this might be two distinct requests. Game and graphics code, and a lot of physics code, deals with 2, 3 or 4-element vectors, and usually vectors of single-precision floats at that. Writing tuned implementations of those is a vastly simpler problem than generating optimally-efficient vectors for arbitrary sizes and types.

cheers
Mike

```
```On Wed, 03 Aug 2005 01:23:14 +0000, mclysenk wrote:

Hi

Try to go to the D wish list and look
if not the "vectorization" suggestion does it for you !
http://all-technology.com/eigenpolls/dwishlist/

Vectorization is currently the 3ed most voted for suggestion just after "array initialization/literals" and "Reflection API"
```
```In article <dcp9sp\$2ug\$1@digitaldaemon.com>, Mike Capp says...
>
>In article <dcp6a2\$45\$1@digitaldaemon.com>, mclysenk@mtu.edu says...
>>
>>Here are some reasons for a standard vector:
>>1.  Vectors are frequently used, especially in games, computer graphics, physics
>>simulations, numerical applications. In some cases, they may even turn up more
>>frequently than strings.
>
>True, but I'm wondering if this might be two distinct requests. Game and graphics code, and a lot of physics code, deals with 2, 3 or 4-element vectors, and usually vectors of single-precision floats at that. Writing tuned implementations of those is a vastly simpler problem than generating optimally-efficient vectors for arbitrary sizes and types.
>

That is true, but even the slightest improvement would be better than most math implementations I've seen.  When I start prototyping some algorithm, I'll quickly code up a vector class with just the basic operators to get started. Just having a standard vector would save me a great deal of time, any optimizations after the fact are just gravy.

When it comes to optimization, there is a lot the compiler can do. Library level solutions suffer due to several factors; lazy evaluation requires writing lots of very complex and hacky templates + tricky compiler flags, SIMD instructions require inline assembler and all its associated woes. The thing is the compiler can basically handle both, more elegantly than a programmer could with a library and inline assembly.

I like the idea of vectorization, but the problem is that the syntax they present is rather cumbersome.  What it seems that they are asking for is a parallel foreach, which is a good idea, but not exactly what I had in mind.  A better syntax would be something that allows a single operation to work on a whole list.  Something like the following would be nice,

vector float a[4] = [0, 0, 0, 1];
vector float b[4] = [1, 0, 0, 1];
vector float c[4] = [2, 3, 4, 5];
a *= 3.0;   //a == [0, 0, 0, 3]
a = a + b;  //a == [1, 0, 0, 4]
a = c * b;  //a == [2, 0, 0, 5]
a = b + 1.0;//a == [2, 1, 1, 2]
a = sin(b); //ILLEGAL, only arithemtic operations can be vectorized

//The same could work any arithmetic type
vector int x[3] = [3, 1, 2];
x <<= 1;   //x == [6, 2, 4]

The vector type qualifier would allow parallel arithmetic to be done on the vector.

```
```In article <dcpj88\$9k7\$1@digitaldaemon.com>, mclysenk@mtu.edu says...
>
>I like the idea of vectorization, but the problem is that the syntax they present is rather cumbersome.  What it seems that they are asking for is a parallel foreach, which is a good idea, but not exactly what I had in mind.  A better syntax would be something that allows a single operation to work on a whole list.

Why not keep the parallel foreach bit and leave optimizations as a QOI issue? Any array operation could be processed in blocks of 4, for example, provided the underlying implementation were done right.

Sean

```
```In article <dcqk1j\$15h4\$1@digitaldaemon.com>, Sean Kelly says...
>
>In article <dcpj88\$9k7\$1@digitaldaemon.com>, mclysenk@mtu.edu says...
>>
>>I like the idea of vectorization, but the problem is that the syntax they present is rather cumbersome.  What it seems that they are asking for is a parallel foreach, which is a good idea, but not exactly what I had in mind.  A better syntax would be something that allows a single operation to work on a whole list.
>
>Why not keep the parallel foreach bit and leave optimizations as a QOI issue? Any array operation could be processed in blocks of 4, for example, provided the underlying implementation were done right.

I agree optimization needs to be QOI, but parallel foreach loops are no substitute for array operations. What we need is a standard syntax for handling vector like objects. Examine the following vector class,

class Vector(T, int N)
{
public:

..

{
Vector r = new Vector();

for(size_t i=0; i<N; i++)
r[i] = x[i] + v[i];

return r;
}

Vector opMul(T s)
{
Vector r = new Vector();

for(size_t i=0; i<N; i++)
r[i] = x[i] * s;

return r;
}

..

private:
T[N] x;   //Data
}

The first thing to notice in this implementation all the repeated code in each of the operators. Now consider this expression,

Vector!(float, 4) a, b, c;
..
a = b + c * 5.0;

Translates to:

This creates unnecessary temporary objects. Lazy evaluation can fix this by
translating the expression into an equivalent component by component function,
but it requires alot of template trickery. To see an example of this, check out
http://www.flipcode.com/articles/article_fastervectormath.shtml

Of course, this doesn't work in D since templates aren't implictly instantiated like C++.  Instead, to get an optimized expression you would have to write the following,

Vector!(float, 4) a, b, c;
..
for(size_t i=0; i<4; i++)
a[i] = b[i] + c[i] * 5.0;

Writing out loops like this defeats the point of a vector class. I propose a special attribute for arrays to solve the optimization dilemma and give us a standard vector.

vector float[4] a, b, c;
..
a = b + c * 5.0;

In addition, having a language level construct lets a high quality implementation take advantage of specialized SIMD hardware features without compiler intrinsics.  For example, floats can be aligned on 16-byte boundaries in order to use the movaps instruction.

```
```In article <dcqq3b\$1ajd\$1@digitaldaemon.com>, mclysenk@mtu.edu says...
>
>I agree optimization needs to be QOI, but parallel foreach loops are no substitute for array operations. What we need is a standard syntax for handling vector like objects. Examine the following vector class,
>
>class Vector(T, int N)
>{
>public:
>
>..
>
>{
>Vector r = new Vector();
>
>for(size_t i=0; i<N; i++)
>r[i] = x[i] + v[i];
>
>return r;
>}
>
>Vector opMul(T s)
>{
>Vector r = new Vector();
>
>for(size_t i=0; i<N; i++)
>r[i] = x[i] * s;
>
>return r;
>}
>
>..
>
>private:
>T[N] x;   //Data
>}
>
>
>The first thing to notice in this implementation all the repeated code in each of the operators. Now consider this expression,
>
>Vector!(float, 4) a, b, c;
>..
>a = b + c * 5.0;
>
>
>Translates to:
>
>
>
>This creates unnecessary temporary objects. Lazy evaluation can fix this by
>translating the expression into an equivalent component by component function,
>but it requires alot of template trickery. To see an example of this, check out
>http://www.flipcode.com/articles/article_fastervectormath.shtml

Yup.  This is covered in detail in "C++ Templates" by Vandevoorde and Josuttis as well.  I was thinking about this particular issue this morning and realized that you might be right that using SIMD-type operations is a similar but not identical goal to lazy evaluation.  The point of lazy evaluation is to avoid the creation of temporaries (as you've said), as many science applications use absolutely massive data structures for calculations while SIMD operations are intended to speed up processing by handling operations in blocks.  I think the two goals should work together just fine, but it will take some care to avoid moving copies of double[4] all over the place.

>Of course, this doesn't work in D since templates aren't implictly instantiated like C++.  Instead, to get an optimized expression you would have to write the following,
>
>
>Vector!(float, 4) a, b, c;
>..
>for(size_t i=0; i<4; i++)
>a[i] = b[i] + c[i] * 5.0;
>
>
>Writing out loops like this defeats the point of a vector class. I propose a special attribute for arrays to solve the optimization dilemma and give us a standard vector.
>
>vector float[4] a, b, c;
>..
>a = b + c * 5.0;

I *think* basic operations could be done now if you're willing to live with using function calls:

But this obviously doesn't extend well to complex expressions, where the true power of lazy evaluation comes into play:

a = (b+c)+(d+e);

>In addition, having a language level construct lets a high quality implementation take advantage of specialized SIMD hardware features without compiler intrinsics.  For example, floats can be aligned on 16-byte boundaries in order to use the movaps instruction.

By all means propose something.  It's also worth noting that Walter has hinted at possibly supporting implicit template instantiation at some point in the future.  Likely not a 1.0 feature, but something to keep in the back if your head nevertheless.

Sean

```
```On Wed, 03 Aug 2005 01:23:14 +0000, mclysenk wrote:

> One year ago, I started learning D.  The language is positively lovely, with handy builtin dynamic arrays and powerful templates.  And it's fast too.
>
> But, there's one irksome flaw D has in common with every other programming language I've encountered: no support for linear algebra. This can be implemented in either a module in the standard library or a language extension. Often my programs make use of vectors and matrices, since they typically involve some sort of computer graphics.  While it is easy enough to write my own classes to take care of this, I find it irksome that I need to reinvent the wheel so many times.
>
> Here are some reasons for a standard vector:
> 1.  Vectors are frequently used, especially in games, computer graphics, physics
> simulations, numerical applications. In some cases, they may even turn up more
> frequently than strings.
>
> 2.  Implementing an efficient vector is very sensitive to the specifics of the
> system and the situation in which it is used.  On most modern machines, there is
> some sort of SIMD hardware built specifically to handle these sorts of problems,
> but taking advantage of it requires either handcrafted assembly or special
> compiler intrinsics to assure the memory is properly aligned and so on.  The
> other option is lazy evaluation, which can be very difficult to implement.
> In practice, the best approach lies somewhere between the fully vectorized SIMD
> approach and lazy evaluation. On an x86, it is best to use lazy evaluation to
> break the vectors into chunks of 4 floats, then use SIMD operations to solve
> each part.  Implementing this on a case by case basis is rather tedious and
> usually reserved only for extremely critical performance bottlenecks.
>
> 3.  There are many redundant implementations of vectors floating around, all of which contain essentially the same components.  A standardized implementation would make it easier to write common objects like physics engines and graphics wrappers that would all work together.
>
>
> I've noticed that on the array page, there is a section on array operations.  If properly implemented, these may be a good start.  However, it would still be nice to have a library for higher mathematics as part of a standard library.

When it comes to higher math, I've found that usually easier to use
another tool side by side with my C or D code specifically designed for
that purpose. Most of the time I just spawn J process(www.jsoftware.com)
along with my D application and pass the data back and forth using IPC or
J's C API.
The matrix math and such all run about %60 to %70 faster than when I had
hand coded the routines in C or D.
Trivial array operations don't need to take such an approach, but when
it's needed, it really is needed.

Other than that D kicks ass.

Freejack

```
```Freejack wrote:
> On Wed, 03 Aug 2005 01:23:14 +0000, mclysenk wrote:
>
>
>>One year ago, I started learning D.  The language is positively lovely, with
>>handy builtin dynamic arrays and powerful templates.  And it's fast too.
>>
>>But, there's one irksome flaw D has in common with every other programming
>>language I've encountered: no support for linear algebra. This can be
>>implemented in either a module in the standard library or a language extension.
>>Often my programs make use of vectors and matrices, since they typically involve
>>some sort of computer graphics.  While it is easy enough to write my own classes
>>to take care of this, I find it irksome that I need to reinvent the wheel so
>>many times.
>>
>>Here are some reasons for a standard vector:
>>1.  Vectors are frequently used, especially in games, computer graphics, physics
>>simulations, numerical applications. In some cases, they may even turn up more
>>frequently than strings.
>>
>>2.  Implementing an efficient vector is very sensitive to the specifics of the
>>system and the situation in which it is used.  On most modern machines, there is
>>some sort of SIMD hardware built specifically to handle these sorts of problems,
>>but taking advantage of it requires either handcrafted assembly or special
>>compiler intrinsics to assure the memory is properly aligned and so on.  The
>>other option is lazy evaluation, which can be very difficult to implement.
>>In practice, the best approach lies somewhere between the fully vectorized SIMD
>>approach and lazy evaluation. On an x86, it is best to use lazy evaluation to
>>break the vectors into chunks of 4 floats, then use SIMD operations to solve
>>each part.  Implementing this on a case by case basis is rather tedious and
>>usually reserved only for extremely critical performance bottlenecks.
>>
>>3.  There are many redundant implementations of vectors floating around, all of
>>which contain essentially the same components.  A standardized implementation
>>would make it easier to write common objects like physics engines and graphics
>>wrappers that would all work together.
>>
>>
>>I've noticed that on the array page, there is a section on array operations.  If
>>properly implemented, these may be a good start.  However, it would still be
>>nice to have a library for higher mathematics as part of a standard library.

I agree, but I think that if the proposed array operations were implemented, a mathematics library would not be too difficult to write (at least, much simpler than for any other language), and could be really fantastic.

As a previous poster said, there are two very different groups of users:

(a) high-speed, low accuracy. Especially for graphics and games, usually using 3-D vectors, 4x4 matrices. These will use SIMD if available.

(b) scientific, high accuracy requirements. May use vectors of reals.
Matrices and vectors will be large, and the size is usually unknown at compile time.

Group (a) is bigger by far. But I spend most of my time in group (b).
I really think that it is a mistake to think that these groups are tightly linked. My scientific physics work normally uses 3D and 4D vectors of 80-bit reals. You'd never do that in game physics, and the
presence of SIMD means that the resulting asm code is completely different (and therefore, the poor compiler writer has to write different optimisers for each group).

I think it would be helpful to state what you'd like to see in such a library, because your introductory comments suggest you're in group (a),
which I have a lot less experience with. I suspect they may be easier to
implement. Group (b) has the BLAS as a basic standard, I'm not aware of any similar standard for group (a).

> When it comes to higher math, I've found that usually easier to use
> another tool side by side with my C or D code specifically designed for
> that purpose. Most of the time I just spawn J process(www.jsoftware.com)
> along with my D application and pass the data back and forth using IPC or
> J's C API.

I had a look. Wow! J would have to be least readable language I've seen since Forth! :-)

> The matrix math and such all run about %60 to %70 faster than when I had
> hand coded the routines in C or D.

Something must be wrong. Were you using an optimised BLAS library?
What sort of operations are you performing?

> Trivial array operations don't need to take such an approach, but when
> it's needed, it really is needed.
>
> Other than that D kicks ass.

Yes, but I also think it has many of the ingredients needed to knock the socks off everything else in linear algebra, too. Someday.

-Don.
```