View mode: basic / threaded / horizontal-split · Log in · Help
November 20, 2007
The Matrix to end all Matrix classes (Let's dream!)
No, I don't have the perfect Matrix. I just want to explore the question of 
whether it exists.
There are a plethora of C++ matrix libraries, all of which are far from perfect; 
they all involve trade-offs between various language limitations.
In particular, the usage of expression templates creates all kinds of evil.

Requirements:
1. Must be compatible with future multi-dimensional array syntax such as
double [7,5] m;
2. Must be theoretically capable of optimal performance.
3. Needs to support slices. This means we also need to support strided
vectors. Therefore we also need matrix slices, and matrix references.
Probably distinguishing between the T[] and T[new] concepts.
4. For storage we need packed, triangular, symmetric, banded, and sparse
matrix support. (Possibly even 'calculated' matrix support, for
user-defined matrix types which generate entries on-demand).
5. Can be generalized to tensors.
6. Nice error messages.

I now have enough experience with my BLADE library to be confident that all of 
these features can be achieved in existing D. In particular, the crucial 
requirement 2 can be completely decoupled from the others.

But, there are many requirements (dozens?) which are not on my brief list, and 
perhaps they cannot all be met simultaneously, even in theory. Yet I'm not 
convinced that it's impossible. What are the other requirements?
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!)
Don Clugston wrote:
> No, I don't have the perfect Matrix. I just want to explore the question 
> of whether it exists.

A very interesting topic indeed. I will just give some quick thoughts to 
start with. First of all, a matrix is (generally) 2-dimensional, but 
there are many uses for higher-dimensional structures as well. Also, the 
1 dimensional vector is a useful structure that could be handled in the 
same generic framework. Is the target just 2-dimensional matrices or 
also higher dimensional arrays?

The matrix as a mathematical structure is quite different in semantics 
and use from generic multi-dimensional arrays and should perhaps be 
treated separately.

To avoid confusion, I will use "size" to refer to the number of elements 
of a matrix/array and "dimensionality" to refer to the number of 
dimensions of an array (a vector is 1-dimensional, a matrix is 
2-dimensional).

First, there is the separation of compile time and run time. One can 
identify (at least) 3 different needs, with different requirements for 
run-time/compile-time dynamics:

* Small static matrices/arrays (where the size and dimensionality are 
known at compile time)
* Dynamic matrices/arrays (where the sizes are determined at run-time, 
but dimensionality is fixed at compile time)
* Fully dynamic multidimensional arrays (where the dimensionality is 
determined at run time)

The third case could probably in most cases be left out, but the first 
two make an interesting separation. Knowing the size at compile time is 
mostly an advantage for smaller matrices (eg 4x4, as commonly used for 
geometric translations). Larger matrices (eg 1000x1000) probably gain 
much less from compile time known sizes. There is also the matter of 
template bloat that means that compile time quantities are best left to 
the smaller numbers.

> Requirements:
> 1. Must be compatible with future multi-dimensional array syntax such as
> double [7,5] m;
> 2. Must be theoretically capable of optimal performance.

Seems reasonable.

> 3. Needs to support slices. This means we also need to support strided
> vectors. Therefore we also need matrix slices, and matrix references.

Yes, both strided views and non-strided references are definitely needed.

> Probably distinguishing between the T[] and T[new] concepts.

Such a distinction is definitely of an advantage. For large matrices, 
ownership of memory becomes important, as such structures may be too 
large to be able to rely on the GC.

> 4. For storage we need packed, triangular, symmetric, banded, and sparse
> matrix support. (Possibly even 'calculated' matrix support, for
> user-defined matrix types which generate entries on-demand).

For generality, at least sparse (and possibly calculated) are useful for 
vectors and higher dimensional arrays too.

> 5. Can be generalized to tensors.
> 6. Nice error messages.
> 
> I now have enough experience with my BLADE library to be confident that 
> all of these features can be achieved in existing D. In particular, the 
> crucial requirement 2 can be completely decoupled from the others.

Sounds excellent.

> But, there are many requirements (dozens?) which are not on my brief 
> list, and perhaps they cannot all be met simultaneously, even in theory. 
> Yet I'm not convinced that it's impossible. What are the other 
> requirements?

Naturally, different element types (as well UDT). Compatibilty with 
external libraries (Fortran/C) with the possibility of switching between 
Fortran and C layout.

-- 
Oskar
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!)
> Requirements:
> 1. Must be compatible with future multi-dimensional array syntax such as
> double [7,5] m;
> 2. Must be theoretically capable of optimal performance.
> 3. Needs to support slices. This means we also need to support strided
> vectors. Therefore we also need matrix slices, and matrix references.
> Probably distinguishing between the T[] and T[new] concepts.
> 4. For storage we need packed, triangular, symmetric, banded, and sparse
> matrix support. (Possibly even 'calculated' matrix support, for
> user-defined matrix types which generate entries on-demand).
> 5. Can be generalized to tensors.
> 6. Nice error messages.

Another requirement would be elegant syntax that "fits" with D syntax.  Is 
that what you were trying to communicate with requirement 1?  If I remember 
correctly doesn't BLAS syntax look a little funny because it relies on 
mixins?

If you want requirement 2 then you must be able to leverage any available 
GPU's or GPGPU's.  This is going to be especially useful when Nvidia and AMD 
release their GPU-like processors that will be able to do double precision 
floating point operations.  Nvidia's (yet unnamed) processor is due out in 
December.  AMD's Firestorm is due in the first quarter of '08.  My company 
has plans to develop a matrix library for physics simulations that will 
support these new processors.  However, if one was already available for us 
then we wouldn't need to develop it.

-Craig
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!)
Don Clugston wrote:
> [about TMtEAMC]

Very interesting indeed! I do a lot of scientific
computation with Matlab, and I'd love to have a
powerful matrix lib for D. It's just I'm currently
swamped with other work, so I couldn't be much
help... :-(
The first thing that springs in my mind is which
operations to choose for operator overloading?
Given that a distinction is made between column
and row vectors, we can use the operand order
to choose between inner and outer product.
But what about element-wise product?
Same goes for division.
Maybe "fake infix operators" could be used?

regards, frank
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!)
> Another requirement would be elegant syntax that "fits" with D syntax.  Is
> that what you were trying to communicate with requirement 1?  If I 
> remember correctly doesn't BLAS syntax look a little funny because it 
> relies on mixins?

Er, I meant BLADE not BLAS.
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!) - BLADE 0.4Alpha
For reference, here's the state of BLADE:

Syntax is:

mixin(vectorize("expression"));

eg,

mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * some_constant"));

It would be so nice to get rid of the mixin() and the quotes. But they don't 
impede development in any way. Ideally it would be:

vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);

Most significant changes since I my last post about it.
(a) it finally works in a usable form :-). Thanks Walter for some great bugfixes 
in the past few releases!  I'm using dmd 1.023.
(b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of 
the expression and the types involved.
(c) an expression rewriting step has been added, which performs scalar/const 
folding, and indexing/slicing folding
   (eg, (A+7*B)[0..5] --->   A[0..5] + 7*(B[0..5]) ).
(d) it gives really, really nice error messages and debug output. No template 
garbage from the library internals -- just a straightforward one-line error 
message, giving the line in YOUR code which contains the error.

eg,   with an array a[],
    mixin(vectorize("any+= old*garbage"));
    mixin(vectorize("a+= 2"));

you get this output (and NOTHING ELSE):
demo.d(38): static assert  "BLADE: Undefined symbols: any old garbage"
demo.d(39): static assert  "BLADE: Rank mismatch (addition or subtraction)"

It's still not terribly useful, since it only generates code for packed vectors. 
But the infrastructure is very solid.

Here's a particularly nasty example which the constant folding can cope with, to 
generate SSE2 code:

    double [] a = new double[4];
    double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
    a[0..$] = [3.4, 565, 31.3, 41.8];
    double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 2.43]];

    mixin(vectorize(
` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));

-------
Generates this front-end code (compile with -debug=BladeFrontEnd to see it). 
Note that there are many asserts to give nice debug info at runtime, but the 
only runtime code is a single function call, which passes 3 pointers and a 
double into an asm function (there's no inlining work for the compiler to do):

------
// bladedemo.d(34)  a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
assert(a.length==another[][1][(3u-3)..$].length, `Vector length mismatch`);
assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, `Vector 
length mismatch`);
assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
misalignment: d[2..($-1)][(3u-3)..$]`);
assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
misalignment: another[][1][(3u-3)..$]`);

SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);

-----
The function consists of this ASM code (compile with -debug=BladeBackEnd to see 
it; BTW all comments are auto-generated). Note there are only 8 asm instructions 
in the inner loop:

-------
// Operation : ACB*D-+A=

asm {
 push EBX;
  mov EAX, veclength;
  lea ECX, [8*EAX];     add ECX, values[0];  //A
  movsd XMM0, values[1];   shufpd XMM0, XMM0,0; //B
  lea EDX, [8*EAX];     add EDX, values[2];  //C
  lea EBX, [8*EAX];     add EBX, values[3];  //D
  xor EAX, EAX;
  sub EAX, veclength; // counter=-length
  jz short L2; // test for length==0

  align 16;
L1:
  movapd XMM1, [ECX + 8*EAX];  // A
  movapd XMM2, [EDX + 8*EAX];  // C
  mulpd XMM2, XMM0; // B*
  subpd  XMM2, [EBX + 8*EAX];  // D-
  addpd XMM1, XMM2;  //+
  movapd [ECX + 8*EAX], XMM1;  // A=
  add EAX,2;
  js L1;
L2:
  sub EAX, 2;
  jns L4;
  movsd XMM1, [ECX + 8*EAX+16];  // A
  movsd XMM2, [EDX + 8*EAX+16];  // C
  mulsd XMM2, XMM0; // B*
  subsd  XMM2, [EBX + 8*EAX+16];  // D-
  addsd XMM1, XMM2;  //+
  movsd [ECX + 8*EAX+16], XMM1;  // A=
L4:
;  pop EBX;
}
-------
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!) - BLADE 0.4Alpha
"Don Clugston" <dac@nospam.com.au> wrote in message 
news:fhv00l$13eq$1@digitalmars.com...
> For reference, here's the state of BLADE:
>
> Syntax is:
>
> mixin(vectorize("expression"));
>
> eg,
>
> mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * 
> some_constant"));
>
> It would be so nice to get rid of the mixin() and the quotes. But they 
> don't impede development in any way. Ideally it would be:
>
> vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);
>
> Most significant changes since I my last post about it.
> (a) it finally works in a usable form :-). Thanks Walter for some great 
> bugfixes in the past few releases!  I'm using dmd 1.023.
> (b) generates X87, SSE, SSE2, or inline D code, depending on the 
> complexity of the expression and the types involved.
> (c) an expression rewriting step has been added, which performs 
> scalar/const folding, and indexing/slicing folding
>    (eg, (A+7*B)[0..5] --->   A[0..5] + 7*(B[0..5]) ).
> (d) it gives really, really nice error messages and debug output. No 
> template garbage from the library internals -- just a straightforward 
> one-line error message, giving the line in YOUR code which contains the 
> error.
>
> eg,   with an array a[],
>     mixin(vectorize("any+= old*garbage"));
>     mixin(vectorize("a+= 2"));
>
> you get this output (and NOTHING ELSE):
> demo.d(38): static assert  "BLADE: Undefined symbols: any old garbage"
> demo.d(39): static assert  "BLADE: Rank mismatch (addition or 
> subtraction)"
>
> It's still not terribly useful, since it only generates code for packed 
> vectors. But the infrastructure is very solid.
>
> Here's a particularly nasty example which the constant folding can cope 
> with, to generate SSE2 code:
>
>     double [] a = new double[4];
>     double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
>     a[0..$] = [3.4, 565, 31.3, 41.8];
>     double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 
> 2.43]];
>
>     mixin(vectorize(
> ` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));
>
> -------
> Generates this front-end code (compile with -debug=BladeFrontEnd to see 
> it). Note that there are many asserts to give nice debug info at runtime, 
> but the only runtime code is a single function call, which passes 3 
> pointers and a double into an asm function (there's no inlining work for 
> the compiler to do):
>
> ------
> // bladedemo.d(34)  a += 
> (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
> assert(a.length==another[][1][(3u-3)..$].length, `Vector length 
> mismatch`);
> assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, 
> `Vector length mismatch`);
> assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
> assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
> misalignment: d[2..($-1)][(3u-3)..$]`);
> assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE 
> Vector misalignment: another[][1][(3u-3)..$]`);
>
> SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);
>
> -----
> The function consists of this ASM code (compile with -debug=BladeBackEnd 
> to see it; BTW all comments are auto-generated). Note there are only 8 asm 
> instructions in the inner loop:
>
> -------
> // Operation : ACB*D-+A=
>
> asm {
>  push EBX;
>   mov EAX, veclength;
>   lea ECX, [8*EAX];     add ECX, values[0];  //A
>   movsd XMM0, values[1];   shufpd XMM0, XMM0,0; //B
>   lea EDX, [8*EAX];     add EDX, values[2];  //C
>   lea EBX, [8*EAX];     add EBX, values[3];  //D
>   xor EAX, EAX;
>   sub EAX, veclength; // counter=-length
>   jz short L2; // test for length==0
>
>   align 16;
> L1:
>   movapd XMM1, [ECX + 8*EAX];  // A
>   movapd XMM2, [EDX + 8*EAX];  // C
>   mulpd XMM2, XMM0; // B*
>   subpd  XMM2, [EBX + 8*EAX];  // D-
>   addpd XMM1, XMM2;  //+
>   movapd [ECX + 8*EAX], XMM1;  // A=
>   add EAX,2;
>   js L1;
> L2:
>   sub EAX, 2;
>   jns L4;
>   movsd XMM1, [ECX + 8*EAX+16];  // A
>   movsd XMM2, [EDX + 8*EAX+16];  // C
>   mulsd XMM2, XMM0; // B*
>   subsd  XMM2, [EBX + 8*EAX+16];  // D-
>   addsd XMM1, XMM2;  //+
>   movsd [ECX + 8*EAX+16], XMM1;  // A=
> L4:
> ;  pop EBX;
> }
> -------

Very very cool Don!  This obviously has huge potential.  Do you have any 
plans to support GPU's or multicores?  Have you looked into SSE3 or SSE4 to 
see if there are any new instructions that would be useful?

Double precision support is the most important to me, but I am also 
interested in single precision and extended precision.  To what degree are 
they supported?  I assume that SSE instructions can't be used for extended 
precision operations.

So there is only support for one dimensional arrays?  When will two 
dimensional arrays be supported?  Once two dimensional arrays are 
implemented, a simple test for the code generator would be to generate 
optimized code for single precision multiplication of 3x3 or 4x4 matrices. 
Optimal code for single precision 3x3 and 4x4 matrix multiplication is 
widely available on the internet.

-Craig
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!) - BLADE 0.4Alpha
"Don Clugston" <dac@nospam.com.au> wrote in message 
news:fhv00l$13eq$1@digitalmars.com...
> For reference, here's the state of BLADE:
>
> Syntax is:
>
> mixin(vectorize("expression"));
>
> eg,
>
> mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * 
> some_constant"));
>
> It would be so nice to get rid of the mixin() and the quotes. But they 
> don't impede development in any way. Ideally it would be:
>
> vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);
>
> Most significant changes since I my last post about it.
> (a) it finally works in a usable form :-). Thanks Walter for some great 
> bugfixes in the past few releases!  I'm using dmd 1.023.
> (b) generates X87, SSE, SSE2, or inline D code, depending on the 
> complexity of the expression and the types involved.
> (c) an expression rewriting step has been added, which performs 
> scalar/const folding, and indexing/slicing folding
>    (eg, (A+7*B)[0..5] --->   A[0..5] + 7*(B[0..5]) ).
> (d) it gives really, really nice error messages and debug output. No 
> template garbage from the library internals -- just a straightforward 
> one-line error message, giving the line in YOUR code which contains the 
> error.
>
> eg,   with an array a[],
>     mixin(vectorize("any+= old*garbage"));
>     mixin(vectorize("a+= 2"));
>
> you get this output (and NOTHING ELSE):
> demo.d(38): static assert  "BLADE: Undefined symbols: any old garbage"
> demo.d(39): static assert  "BLADE: Rank mismatch (addition or 
> subtraction)"
>
> It's still not terribly useful, since it only generates code for packed 
> vectors. But the infrastructure is very solid.
>
> Here's a particularly nasty example which the constant folding can cope 
> with, to generate SSE2 code:
>
>     double [] a = new double[4];
>     double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
>     a[0..$] = [3.4, 565, 31.3, 41.8];
>     double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 
> 2.43]];
>
>     mixin(vectorize(
> ` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));
>
> -------
> Generates this front-end code (compile with -debug=BladeFrontEnd to see 
> it). Note that there are many asserts to give nice debug info at runtime, 
> but the only runtime code is a single function call, which passes 3 
> pointers and a double into an asm function (there's no inlining work for 
> the compiler to do):
>
> ------
> // bladedemo.d(34)  a += 
> (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
> assert(a.length==another[][1][(3u-3)..$].length, `Vector length 
> mismatch`);
> assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, 
> `Vector length mismatch`);
> assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
> assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
> misalignment: d[2..($-1)][(3u-3)..$]`);
> assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE 
> Vector misalignment: another[][1][(3u-3)..$]`);
>
> SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);
>
> -----
> The function consists of this ASM code (compile with -debug=BladeBackEnd 
> to see it; BTW all comments are auto-generated). Note there are only 8 asm 
> instructions in the inner loop:
>
> -------
> // Operation : ACB*D-+A=
>
> asm {
>  push EBX;
>   mov EAX, veclength;
>   lea ECX, [8*EAX];     add ECX, values[0];  //A
>   movsd XMM0, values[1];   shufpd XMM0, XMM0,0; //B
>   lea EDX, [8*EAX];     add EDX, values[2];  //C
>   lea EBX, [8*EAX];     add EBX, values[3];  //D
>   xor EAX, EAX;
>   sub EAX, veclength; // counter=-length
>   jz short L2; // test for length==0
>
>   align 16;
> L1:
>   movapd XMM1, [ECX + 8*EAX];  // A
>   movapd XMM2, [EDX + 8*EAX];  // C
>   mulpd XMM2, XMM0; // B*
>   subpd  XMM2, [EBX + 8*EAX];  // D-
>   addpd XMM1, XMM2;  //+
>   movapd [ECX + 8*EAX], XMM1;  // A=
>   add EAX,2;
>   js L1;
> L2:
>   sub EAX, 2;
>   jns L4;
>   movsd XMM1, [ECX + 8*EAX+16];  // A
>   movsd XMM2, [EDX + 8*EAX+16];  // C
>   mulsd XMM2, XMM0; // B*
>   subsd  XMM2, [EBX + 8*EAX+16];  // D-
>   addsd XMM1, XMM2;  //+
>   movsd [ECX + 8*EAX+16], XMM1;  // A=
> L4:
> ;  pop EBX;
> }
> -------

Another question.  How practical would this mixin technique be to generate 
optimal floating point code for regular floating point operations (not 
vectors or matrices)?  The DMD backend currently doesn't optimize floating 
point very well.  Perhaps this approach could be used for floating point 
operations in general.
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!) - BLADE 0.4Alpha
Craig Black wrote:
> "Don Clugston" <dac@nospam.com.au> wrote in message 
> news:fhv00l$13eq$1@digitalmars.com...
>> For reference, here's the state of BLADE:
>> -------
> 
> Very very cool Don!  This obviously has huge potential.  Do you have any 
> plans to support GPU's or multicores?  Have you looked into SSE3 or SSE4 to 
> see if there are any new instructions that would be useful?

Yes. Although it seems to me that the GPU world is about to change radically -- 
using OpenGL looks like a horrible hack without long-term future. I'll wait 
until we get a decent API, I think.
I have been looking at SSE3, 4, and 5, but I don't even have a Core2 machine...

> Double precision support is the most important to me, but I am also 
> interested in single precision and extended precision.  To what degree are 
> they supported?  

Yes.

(b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of 
the expression and the types involved.

I assume that SSE instructions can't be used for extended
> precision operations.

Yes. I don't think they're even worth doing when strided arrays are involved, 
except in special cases.

> 
> So there is only support for one dimensional arrays?  When will two 
> dimensional arrays be supported? 

Strided arrays are next, then 2D.

 Once two dimensional arrays are
> implemented, a simple test for the code generator would be to generate 
> optimized code for single precision multiplication of 3x3 or 4x4 matrices. 
> Optimal code for single precision 3x3 and 4x4 matrix multiplication is 
> widely available on the internet.
> 
> -Craig 
> 
>
November 20, 2007
Re: The Matrix to end all Matrix classes (Let's dream!) - BLADE 0.4Alpha
"Don Clugston" <dac@nospam.com.au> wrote in message 
news:fhv35u$18nc$1@digitalmars.com...
> Craig Black wrote:
>> "Don Clugston" <dac@nospam.com.au> wrote in message 
>> news:fhv00l$13eq$1@digitalmars.com...
>>> For reference, here's the state of BLADE:
>>> -------
>>
>> Very very cool Don!  This obviously has huge potential.  Do you have any 
>> plans to support GPU's or multicores?  Have you looked into SSE3 or SSE4 
>> to see if there are any new instructions that would be useful?
>
> Yes. Although it seems to me that the GPU world is about to change 
> radically -- 
> using OpenGL looks like a horrible hack without long-term future. I'll 
> wait until we get a decent API, I think.
> I have been looking at SSE3, 4, and 5, but I don't even have a Core2 
> machine...
>
>> Double precision support is the most important to me, but I am also 
>> interested in single precision and extended precision.  To what degree 
>> are they supported?
>
> Yes.
>
> (b) generates X87, SSE, SSE2, or inline D code, depending on the 
> complexity of the expression and the types involved.
>
> I assume that SSE instructions can't be used for extended
>> precision operations.
>
> Yes. I don't think they're even worth doing when strided arrays are 
> involved, except in special cases.
>
>>
>> So there is only support for one dimensional arrays?  When will two 
>> dimensional arrays be supported?
>
> Strided arrays are next, then 2D.

What are strided arrays?  Are they similar to sparse arrays?
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home