Thread overview
Multidimensional Arrays
Sep 08, 2006
Oskar Linde
Sep 10, 2006
Don Clugston
Feb 17, 2007
Bill Baxter
Feb 17, 2007
Neal Becker
Feb 18, 2007
Bill Baxter
Mar 16, 2007
Oskar Linde
Mar 16, 2007
Bill Baxter
September 08, 2006
I have been implementing a strided multidimensional array type and am interested in comments.

It is basically designed after Norbert Nemec's suggestion:
http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
+/- some things.

The basic array type looks like this:

struct Array(T,int dims = 1) {
	T* ptr;
	size_t[dims] range;
	ptrdiff_t[dims] stride;

	const dimensions = dims;
	alias T ElementType;

	...
}

Usage:

Initialization

// Create a 3x3 array
auto A = Array!(int,2)(3,3);

//This could use a nicer syntax
A.assign(5,6,7,
         3,1,9,
         8,8,1);

writefln("A = %s",A.toString);
/*
A = (3x3)
      5        6        7
      3        1        9
      8        8        1
*/

auto B = Array!(int,2)(3,3);
B.assign(0,0,1,
         0,2,1,
         7,8,9);

Element wise operators:
writefln("A*B = %s",(A*B).toString);

Scalar operators:
writefln("5*B = %s",(5*B).toString);

As in Norberts proposal:
C.transpose returns a transposed array view (all dimensions reversed)
C.transpose(dimA,dimB) swaps dimension dimA and dimB

C.diag returns a N-1 dimensional view of the diagonal of the matrix. E.g. a 5x5 identity matrix:

// Zeros is just a convenience wrapper to generate a zero initialized array
Auto A = zeros!(int)(5,5);
A.diag[] = 1;
writefln("A = %s",A.toString);

A = (5x5)
      1        0        0        0        0
      0        1        0        0        0
      0        0        1        0        0
      0        0        0        1        0
      0        0        0        0        1

C.reverse(dim) reverses the given dimension (inverts the stride)

Indexing and slicing:

auto D = Array!(int,3)(7,7,7); // 7x7x7 dimensional array

D[1,2,3] => int
D[1,2,all] => 7 dimensional slice
D[1,all,all] => 7x7 dimensional slice
D[range(1,end-1), all, range(1,end-1)] => 5x7x5 dimensional subarray
D[end-1, range(end-3,end-1), all] => 2x7 dimensional slice

End works in the same way as $ does for regular arrays.
range(a,b) is the replacement for a..b (D doesn't support mixed indexing/slicing otherwise)

Slices/subarrays are of course views that share the same data:

auto M = zeros!(int)(4,4);
M[range(0,2), range(0,2)][] = 1;
M[range(2,end), range(2,end)][] = 1;
writefln("M = %s", M.toString);

M = (4x4)
      1        1        0        0
      1        1        0        0
      0        0        1        1
      0        0        1        1


.dup => Returns a compact copy of the array

In place map over a function:

M.doMap((int y){return 1<<y;});

Functional map:

writefln("cos(M) = %s",M.map(&cos).toString);

cos(M) = (4x4)
 0.5403   0.5403   1.0000   1.0000
 0.5403   0.5403   1.0000   1.0000
 1.0000   1.0000   0.5403   0.5403
 1.0000   1.0000   0.5403   0.5403

Foreach iteration:
auto N = zeros!(int)(5,5);
int i = 0;
foreach(inout x; N)
  x = ++i;



Of course, custom types are supported. Some examples just for fun: :)

auto Q = ones!(Rational!(long))(5,5) * 3;
writefln("Q/N = %s",(Q/N).toString);

Q/N = (5x5)
      3      3/2        1      3/4      3/5
    1/2      3/7      3/8      1/3     3/10
   3/11      1/4     3/13     3/14      1/5
   3/16     3/17      1/6     3/19     3/20
    1/7     3/22     3/23      1/8     3/25



auto U = zeros!(Uncertain!(double))(4,4);

double val = -8;
foreach(inout T;U) {
    // +- 10% uncertainty + 0.1
    T += Uncertain!(double)(val,abs(0.1*val)+0.1);
    val += 1;
}
writefln("U = %s",U.toString);

U = (4x4)
  (-8.0 ± 0.9)    (-7.0 ± 0.8)    (-6.0 ± 0.7)    (-5.0 ± 0.6)
  (-4.0 ± 0.5)    (-3.0 ± 0.4)    (-2.0 ± 0.3)    (-1.0 ± 0.2)
    (0. ± 0.1)     (1.0 ± 0.2)     (2.0 ± 0.3)     (3.0 ± 0.4)
   (4.0 ± 0.5)     (5.0 ± 0.6)     (6.0 ± 0.7)     (7.0 ± 0.8)

writefln("U/U = %s",(U/U).toString);

U/U = (4x4)
   (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
   (1.0 ± 0.3)     (1.0 ± 0.3)     (1.0 ± 0.3)     (1.1 ± 0.4)
   (nan ± inf)     (1.1 ± 0.4)     (1.0 ± 0.3)     (1.0 ± 0.3)
   (1.0 ± 0.3)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)




auto V = zeros!(Voltage)(4,4);
  val = 0;
  foreach(inout v; V) {
    v = val * volt;
    val += 0.1;
  }

auto R = zeros!(Resistance)(4,4);
  val = 100;
  foreach(inout r; R) {
    r = val * ohm;
    val *= 2;
  }

writefln("V = %s",V.toString);
writefln("R = %s",R.toString);
writefln("V/R = %s",(V/R).toString);


V = (4x4)
            0 V           100 mV           200 mV           300 mV
         400 mV           500 mV           600 mV           700 mV
         800 mV           900 mV         1e+03 mV            1.1 V
          1.2 V            1.3 V            1.4 V            1.5 V

R = (4x4)
          100 Ω            200 Ω            400 Ω            800 Ω
         1.6 kΩ           3.2 kΩ           6.4 kΩ          12.8 kΩ
        25.6 kΩ          51.2 kΩ           102 kΩ           205 kΩ
         410 kΩ           819 kΩ          1.64 MΩ          3.28 MΩ

V/R = (4x4)
            0 A           500 µA           500 µA           375 µA
         250 µA           156 µA          93.7 µA          54.7 µA
        31.2 µA          17.6 µA          9.77 µA          5.37 µA
        2.93 µA          1.59 µA           854 nA           458 nA





Comments:

The only tricky part about all this was getting opIndex working with compile time variadic arguments. I have not yet gotten DMD to inline the generated indexing functions, but it should be doable as they are all trivial. I have not been able to pursue this due to a DMD inlining bug that in some cases generates "xx is not an lvalue" when compiling with -inline.

The main issue that remains is op_r operators that have to work a bit differently from non _r operators(*).

I see this all more as a proof of concept than anything else. If anyone is interested in the sources, I will find a way to put them up somewhere.

Future extensions would be writing Matrix types, and possibly hook them up with a BLAS library. Anyone know if there exists BLAS implementations that supports 80-bit reals?

Other things I want to look at are adding stride support to the range() type and investigate the possibilities of generating expression templates.

*) Currently, if you have an:

opAdd(T)(T x) {...}

you can't also have an:

opAdd_r(T)(T x) {...}

since this will create ambiguous overloads. I find myself wanting to define opAdd_r for all cases where there are no regular non _r operator, but there is (AFAIK) no way to do that.

I would go as far as suggesting a change to D: If both a matching opX and a matching opX_r overload exists, the opX overload is chosen. This will demote the opX_r overloads a bit and is consistent with the way commutative operators work, where if for example no a.opAdd(b) exists, b.opAdd(a) is called.

/Oskar
September 10, 2006
Oskar Linde wrote:
> I have been implementing a strided multidimensional array type and am interested in comments.

[snip]

This looks really promising -- that improved IFTI support in the last release has clearly opened a lot of possibilities. For now, I'll just comment on one thing.

> Future extensions would be writing Matrix types, and possibly hook them up with a BLAS library. Anyone know if there exists BLAS implementations that supports 80-bit reals?

This is a really interesting issue. Personally I don't of any implementations, but I haven't really looked. My first reaction was, "you'd always be using doubles or floats in matrices, so you'd use SIMD". But even with 64-bit matrices, 80-bit temporary results can still be useful, especially for operations on the diagonal elements of matrices. Basically, all vectors parameters would still be 64-bit doubles, but scalar parameters and returned results would be 80-bit.

And then I got thinking -- it's not clear to me that SIMD would be very beneficial when dealing with strided matrices, unless you're extremely clever(and this is probably why the AMD BLAS libraries have hand-crafted ASM even for the BLAS3 functions, you'd don't get much benefit otherwise). Consequently, I strongly suspect that you'd always want to treat strided arrays differently from built-in arrays (ie, stride == 1).
I don't know to what extent expression templates could keep track of whether an array has a stride or not, but there'd be a massive performance boost if you could.

-Don
February 17, 2007
Oskar Linde wrote:
> I have been implementing a strided multidimensional array type and am interested in comments.
> 
> It is basically designed after Norbert Nemec's suggestion:
> http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
> +/- some things.
> 
> The basic array type looks like this:
> 
> struct Array(T,int dims = 1) {
>     T* ptr;
>     size_t[dims] range;
>     ptrdiff_t[dims] stride;
> 
>     const dimensions = dims;
>     alias T ElementType;
> 
>     ...
> }

Hey Oskar,
I've been working on implementing something similar.  I'd be interested to see some parts of your implementation.
Mine's at
http://www.dsource.org/projects/multiarray

The main difference between our approaches seems to be that I've made 'dims' non-const (and therefore not a template parameter).

I thought that would be better since sometimes it's handy to reshape a 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the potential to do broadcasting more easily that way too (E.g. multiply an N dim 1-d array times an Nx5 array just by duplicating the one axis implicitly).  The down side is that range and stride then become dynamic arrays.  I guess with your approach you can create a new view with the right # of dims when needed.  On the other hand that probably means you need more templated member functions, parameterized by Array dimension.

Another tradeoff is that I can also have indexing and slicing return arrays of different dimension, but I can't have indexing return a single element (because the return type of opIndex is Array not float and you can't have two different return types).  Instead I overloaded opCall for that.  So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element view.

I also made it a class in the end, though I started out with it being a struct.  Partly that decision was motivated by the idea that generally I don't really want to pass 6 or more 32-bit numbers around by value, plus the presence of pointer member data, so using a class seemed reasonable.  Having to remember to say 'new' everywhere has been a pain, though.

I have mine hooked up with some basic BLAS routines -- the routines for VV, MV, and MM multiplication, and the simplest least squares and linear system solvers.  I don't know of any 80-bit capable BLAS, but really single precision is good enough for most of what I do.  Double is mostly overkill, but I use it to be sure.

--bb

> 
> Usage:
> 
> Initialization
> 
> // Create a 3x3 array
> auto A = Array!(int,2)(3,3);
> 
> //This could use a nicer syntax
> A.assign(5,6,7,
>          3,1,9,
>          8,8,1);
> 
> writefln("A = %s",A.toString);
> /*
> A = (3x3)
>       5        6        7
>       3        1        9
>       8        8        1
> */
> 
> auto B = Array!(int,2)(3,3);
> B.assign(0,0,1,
>          0,2,1,
>          7,8,9);
> 
> Element wise operators:
> writefln("A*B = %s",(A*B).toString);
> 
> Scalar operators:
> writefln("5*B = %s",(5*B).toString);
> 
> As in Norberts proposal:
> C.transpose returns a transposed array view (all dimensions reversed)
> C.transpose(dimA,dimB) swaps dimension dimA and dimB
> 
> C.diag returns a N-1 dimensional view of the diagonal of the matrix. E.g. a 5x5 identity matrix:
> 
> // Zeros is just a convenience wrapper to generate a zero initialized array
> Auto A = zeros!(int)(5,5);
> A.diag[] = 1;
> writefln("A = %s",A.toString);
> 
> A = (5x5)
>       1        0        0        0        0
>       0        1        0        0        0
>       0        0        1        0        0
>       0        0        0        1        0
>       0        0        0        0        1
> 
> C.reverse(dim) reverses the given dimension (inverts the stride)
> 
> Indexing and slicing:
> 
> auto D = Array!(int,3)(7,7,7); // 7x7x7 dimensional array
> 
> D[1,2,3] => int
> D[1,2,all] => 7 dimensional slice
> D[1,all,all] => 7x7 dimensional slice
> D[range(1,end-1), all, range(1,end-1)] => 5x7x5 dimensional subarray
> D[end-1, range(end-3,end-1), all] => 2x7 dimensional slice
> 
> End works in the same way as $ does for regular arrays.
> range(a,b) is the replacement for a..b (D doesn't support mixed indexing/slicing otherwise)
> 
> Slices/subarrays are of course views that share the same data:
> 
> auto M = zeros!(int)(4,4);
> M[range(0,2), range(0,2)][] = 1;
> M[range(2,end), range(2,end)][] = 1;
> writefln("M = %s", M.toString);
> 
> M = (4x4)
>       1        1        0        0
>       1        1        0        0
>       0        0        1        1
>       0        0        1        1
> 
> 
> .dup => Returns a compact copy of the array
> 
> In place map over a function:
> 
> M.doMap((int y){return 1<<y;});
> 
> Functional map:
> 
> writefln("cos(M) = %s",M.map(&cos).toString);
> 
> cos(M) = (4x4)
>  0.5403   0.5403   1.0000   1.0000
>  0.5403   0.5403   1.0000   1.0000
>  1.0000   1.0000   0.5403   0.5403
>  1.0000   1.0000   0.5403   0.5403
> 
> Foreach iteration:
> auto N = zeros!(int)(5,5);
> int i = 0;
> foreach(inout x; N)
>   x = ++i;
> 
> 
> 
> Of course, custom types are supported. Some examples just for fun: :)
> 
> auto Q = ones!(Rational!(long))(5,5) * 3;
> writefln("Q/N = %s",(Q/N).toString);
> 
> Q/N = (5x5)
>       3      3/2        1      3/4      3/5
>     1/2      3/7      3/8      1/3     3/10
>    3/11      1/4     3/13     3/14      1/5
>    3/16     3/17      1/6     3/19     3/20
>     1/7     3/22     3/23      1/8     3/25
> 
> 
> 
> auto U = zeros!(Uncertain!(double))(4,4);
> 
> double val = -8;
> foreach(inout T;U) {
>     // +- 10% uncertainty + 0.1
>     T += Uncertain!(double)(val,abs(0.1*val)+0.1);
>     val += 1;
> }
> writefln("U = %s",U.toString);
> 
> U = (4x4)
>   (-8.0 ± 0.9)    (-7.0 ± 0.8)    (-6.0 ± 0.7)    (-5.0 ± 0.6)
>   (-4.0 ± 0.5)    (-3.0 ± 0.4)    (-2.0 ± 0.3)    (-1.0 ± 0.2)
>     (0. ± 0.1)     (1.0 ± 0.2)     (2.0 ± 0.3)     (3.0 ± 0.4)
>    (4.0 ± 0.5)     (5.0 ± 0.6)     (6.0 ± 0.7)     (7.0 ± 0.8)
> 
> writefln("U/U = %s",(U/U).toString);
> 
> U/U = (4x4)
>    (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
>    (1.0 ± 0.3)     (1.0 ± 0.3)     (1.0 ± 0.3)     (1.1 ± 0.4)
>    (nan ± inf)     (1.1 ± 0.4)     (1.0 ± 0.3)     (1.0 ± 0.3)
>    (1.0 ± 0.3)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
> 
> 
> 
> 
> auto V = zeros!(Voltage)(4,4);
>   val = 0;
>   foreach(inout v; V) {
>     v = val * volt;
>     val += 0.1;
>   }
> 
> auto R = zeros!(Resistance)(4,4);
>   val = 100;
>   foreach(inout r; R) {
>     r = val * ohm;
>     val *= 2;
>   }
> 
> writefln("V = %s",V.toString);
> writefln("R = %s",R.toString);
> writefln("V/R = %s",(V/R).toString);
> 
> 
> V = (4x4)
>             0 V           100 mV           200 mV           300 mV
>          400 mV           500 mV           600 mV           700 mV
>          800 mV           900 mV         1e+03 mV            1.1 V
>           1.2 V            1.3 V            1.4 V            1.5 V
> 
> R = (4x4)
>           100 Ω            200 Ω            400 Ω            800 Ω
>          1.6 kΩ           3.2 kΩ           6.4 kΩ          12.8 kΩ
>         25.6 kΩ          51.2 kΩ           102 kΩ           205 kΩ
>          410 kΩ           819 kΩ          1.64 MΩ          3.28 MΩ
> 
> V/R = (4x4)
>             0 A           500 µA           500 µA           375 µA
>          250 µA           156 µA          93.7 µA          54.7 µA
>         31.2 µA          17.6 µA          9.77 µA          5.37 µA
>         2.93 µA          1.59 µA           854 nA           458 nA
> 
> 
> 
> 
> 
> Comments:
> 
> The only tricky part about all this was getting opIndex working with compile time variadic arguments. I have not yet gotten DMD to inline the generated indexing functions, but it should be doable as they are all trivial. I have not been able to pursue this due to a DMD inlining bug that in some cases generates "xx is not an lvalue" when compiling with -inline.
> 
> The main issue that remains is op_r operators that have to work a bit differently from non _r operators(*).
> 
> I see this all more as a proof of concept than anything else. If anyone is interested in the sources, I will find a way to put them up somewhere.
> 
> Future extensions would be writing Matrix types, and possibly hook them up with a BLAS library. Anyone know if there exists BLAS implementations that supports 80-bit reals?
> 
> Other things I want to look at are adding stride support to the range() type and investigate the possibilities of generating expression templates.
> 
> *) Currently, if you have an:
> 
> opAdd(T)(T x) {...}
> 
> you can't also have an:
> 
> opAdd_r(T)(T x) {...}
> 
> since this will create ambiguous overloads. I find myself wanting to define opAdd_r for all cases where there are no regular non _r operator, but there is (AFAIK) no way to do that.
> 
> I would go as far as suggesting a change to D: If both a matching opX and a matching opX_r overload exists, the opX overload is chosen. This will demote the opX_r overloads a bit and is consistent with the way commutative operators work, where if for example no a.opAdd(b) exists, b.opAdd(a) is called.
> 
> /Oskar
February 17, 2007
I think it makes a lot of sense to build arrays from 2 components.  One component is the storage.  The second component is the array, which is an interpretation of the storage.  I think this simplifies slicing and projections.
February 18, 2007
Neal Becker wrote:
> I think it makes a lot of sense to build arrays from 2 components.  One
> component is the storage.  The second component is the array, which is an
> interpretation of the storage.  I think this simplifies slicing and
> projections.

You mean exactly how Oskar and I are doing it?  Or you mean a more flexible notion of storage?  I agree it would be nice to be able to use different storage back ends besides strided memory.  Diagonal, Triangular, Sparse, Block, Banded, etc. would all be nice.  The Matrix Template Library did something like that, though I'd say with mixed results:
   http://www.osl.iu.edu/research/mtl/

The high-level idea is nice, but the end result was overly complicated IMHO, both internally and externally.  And it depended critically on good optimization from the compiler to get decent performance since everything was in C++ (no tie-in with optimized Fortran back-ends like ATLAS).

--bb
March 16, 2007
Bill Baxter skrev:
> Oskar Linde wrote:
>> I have been implementing a strided multidimensional array type and am interested in comments.
>>
>> It is basically designed after Norbert Nemec's suggestion:
>> http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html 
>>
>> +/- some things.
>>
>> The basic array type looks like this:
>>
>> struct Array(T,int dims = 1) {
>>     T* ptr;
>>     size_t[dims] range;
>>     ptrdiff_t[dims] stride;
>>
>>     const dimensions = dims;
>>     alias T ElementType;
>>
>>     ...
>> }
> 
> Hey Oskar,
> I've been working on implementing something similar.  I'd be interested to see some parts of your implementation.
> Mine's at
> http://www.dsource.org/projects/multiarray

Hi Bill,

I'm sorry for the late reply. I've had a quite hectic last few months and not had time to dig into this. I have now at least extracted my old multiarray code and turned it into something that compiles (and converted it to use the new D variadic templates instead of the old variadic.d hack):

http://www.csc.kth.se/~ol/multiarray.tar.gz

the multiarray module is sci.matrix.multiarray. Everything is in a very crude shape.

> The main difference between our approaches seems to be that I've made 'dims' non-const (and therefore not a template parameter).
> 
> I thought that would be better since sometimes it's handy to reshape a 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the potential to do broadcasting more easily that way too (E.g. multiply an N dim 1-d array times an Nx5 array just by duplicating the one axis implicitly).  The down side is that range and stride then become dynamic arrays.  

> I guess with your approach you can create a new view with the right # of dims when needed.  

Yes. This is very powerful.

Given a 10x10x10 array arr, arr[0,all,all] will be a 10x10 array, not a 1x10x10 array or anything else. And that is all deduced at compile time.

> On the other hand that probably means you need more templated member functions, parameterized by Array dimension.

Possibly.

> Another tradeoff is that I can also have indexing and slicing return arrays of different dimension, but I can't have indexing return a single element (because the return type of opIndex is Array not float and you can't have two different return types).  Instead I overloaded opCall for that.  So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element view.

In addition to being able to have full indexing return a single element vs partial indexing returning slices I think the greatest advantage of having the dimensionality as a compile time constant is the improved compile time consistency checks.

I believe you will extremely rarely dynamically adjust the dimensionality of a multidimensional array. You are almost always required to know the dimensionality at compile time anyway. You can always make slices of different dimensionality. Having a dimension with a stride of 0, you could even duplicate the data over a dimension such as [1,1,1,1,... (10000 times) ...,1] only occupying a single unit of memory. Reshaping has to be done explicitly (which I see as a good thing).

In general, I think this falls back on the same conclusion as the discussion regarding intrinsic vector operations and small static arrays being value types a while ago. Low dimensionalities will most of the time turn out to be static, while larger dimensions are dynamic. The dimensionality of a multidimensional array will typically be static and of a low order (For example, I've personally never encountered a case needing more than 6 dimensions).

> I also made it a class in the end, though I started out with it being a struct.  Partly that decision was motivated by the idea that generally I don't really want to pass 6 or more 32-bit numbers around by value, plus the presence of pointer member data, so using a class seemed reasonable.  Having to remember to say 'new' everywhere has been a pain, though.

The biggest disadvantage I see is that data shuffling (such as in a loop assigning slices of one array to slices of another one) will result in the creation of lots of temporary heap allocated classes instead of, when using a struct, a small constant stack memory usage.

One option could possibly be to make slicing return proxy structs that (in the future implicitly) converted to class instances.

Note that the by-value passing doesn't necessarily mean any reduction in performance. Being careful about making functions not changing the shape of the array take the array as an inout (future const/readonly ref) while those that need must make the same data-copy when dup-ing the class anyway.

Another option to avoid passing lots of data by value could possibly be to make a container struct containing a pointer to another struct containing all the data.

> I have mine hooked up with some basic BLAS routines -- the routines for VV, MV, and MM multiplication, and the simplest least squares and linear system solvers.  

That sounds really interesting. I will definitely give it a look.

/Oskar
March 16, 2007
Oskar Linde wrote:
> Bill Baxter skrev:

> Hi Bill,
> 
> I'm sorry for the late reply. I've had a quite hectic last few months and not had time to dig into this. I have now at least extracted my old multiarray code and turned it into something that compiles (and converted it to use the new D variadic templates instead of the old variadic.d hack):
> 
> http://www.csc.kth.se/~ol/multiarray.tar.gz

Great!  Thanks for putting this online.

> the multiarray module is sci.matrix.multiarray. Everything is in a very crude shape.
> 
>> The main difference between our approaches seems to be that I've made 'dims' non-const (and therefore not a template parameter).
>>
>> I thought that would be better since sometimes it's handy to reshape a 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the potential to do broadcasting more easily that way too (E.g. multiply an N dim 1-d array times an Nx5 array just by duplicating the one axis implicitly).  The down side is that range and stride then become dynamic arrays.  
> 
>> I guess with your approach you can create a new view with the right # of dims when needed.  
> 
> Yes. This is very powerful.
> 
> Given a 10x10x10 array arr, arr[0,all,all] will be a 10x10 array, not a 1x10x10 array or anything else. And that is all deduced at compile time.

Hmm, how do you accomplish that?  I'll take a look at the code but it seems like you need a lot of partial specializations for the 0 case. Any opIndex call that has a zero as one of its arguments needs to have a special implementation because it's returning a different type.  But I thought specialization and IFTI were mutually exclusive.

>> On the other hand that probably means you need more templated member functions, parameterized by Array dimension.
> 
> Possibly.
> 
>> Another tradeoff is that I can also have indexing and slicing return arrays of different dimension, but I can't have indexing return a single element (because the return type of opIndex is Array not float and you can't have two different return types).  Instead I overloaded opCall for that.  So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element view.
> 
> In addition to being able to have full indexing return a single element vs partial indexing returning slices I think the greatest advantage of having the dimensionality as a compile time constant is the improved compile time consistency checks.
> 
> I believe you will extremely rarely dynamically adjust the dimensionality of a multidimensional array. You are almost always required to know the dimensionality at compile time anyway. You can always make slices of different dimensionality. Having a dimension with a stride of 0, you could even duplicate the data over a dimension such as [1,1,1,1,... (10000 times) ...,1] only occupying a single unit of memory. Reshaping has to be done explicitly (which I see as a good thing).

What about allowing flexible expanding of dimensions?  I'm basically trying to make a pseudo-clone of numpy (www.numpy.org), and one thing it does a lot is "broadcasting".
  http://www.scipy.org/EricsBroadcastingDoc
  http://www.onlamp.com/pub/a/python/2000/09/27/numerically.html

> In general, I think this falls back on the same conclusion as the discussion regarding intrinsic vector operations and small static arrays being value types a while ago. Low dimensionalities will most of the time turn out to be static, while larger dimensions are dynamic. The dimensionality of a multidimensional array will typically be static and of a low order (For example, I've personally never encountered a case needing more than 6 dimensions).

Yeh, there is a limit.  Numpy is supposed to be dimension agnostic, but there is a hard-coded MAX_DIMS in the C code that implements the backend.  I think it's set to something like 32.

>> I also made it a class in the end, though I started out with it being a struct.  Partly that decision was motivated by the idea that generally I don't really want to pass 6 or more 32-bit numbers around by value, plus the presence of pointer member data, so using a class seemed reasonable.  Having to remember to say 'new' everywhere has been a pain, though.
> 
> The biggest disadvantage I see is that data shuffling (such as in a loop assigning slices of one array to slices of another one) will result in the creation of lots of temporary heap allocated classes instead of, when using a struct, a small constant stack memory usage.

Yep.  But basically since I was porting from Python code my thinking was "Python already does all those allocations, and millions more -- a D implementation can only be faster"  :-)

I think the thing that really got me to switch to class was that there were some IFTI bugs that only surfaced when using structs and which went away when using a class.  They're fixed now, so maybe it's time to go back to a struct.    *Especially* if we get some new constref goodness.

> One option could possibly be to make slicing return proxy structs that (in the future implicitly) converted to class instances.
> 
> Note that the by-value passing doesn't necessarily mean any reduction in performance. Being careful about making functions not changing the shape of the array take the array as an inout (future const/readonly ref) while those that need must make the same data-copy when dup-ing the class anyway.

Yeh, I hate to make things 'inout' when I really want const reference behavior.  I'm going to go an rewrite a bunch of stuff as soon as we get const ref.

Another nice thing about going the class route with current D, is that you can have a function which potentially returns the input unmodified.  For instance a function that makes sure the input is at least 2D, and if so just returns the input as-is, otherwise it creates a new view. With structs that'll turn into create a new view no matter what.

On the other hand, you need those kinds of functions in Numpy a lot because you tend to write functions that take an "A" where "A" could be a python array, or a python tuple, or a numpy array of any dimension, or any other sequence type.  Using numpy.asarray_2d(A) at the top of your functions is in many ways a substitute for not having static typing.

It's hard to say without actually going through the exercise.  I don't have much experience using big flexible array packages in statically typed languages like C++.  My experience is with the array handling in Matlab and Numpy, both of which are dynamic.  Maybe static dimension is the way to go with a statically typed language.

> Another option to avoid passing lots of data by value could possibly be to make a container struct containing a pointer to another struct containing all the data.

Sounds like significantly more complexity for not so much benefit.

>> I have mine hooked up with some basic BLAS routines -- the routines for VV, MV, and MM multiplication, and the simplest least squares and linear system solvers.  
> 
> That sounds really interesting. I will definitely give it a look.



--bb