View mode: basic / threaded / horizontal-split · Log in · Help
May 09, 2004
Proposal for multidimensional arrays
Hi there,

I just finished the first version of my proposal for multidimensional arrays
in D:

http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html

I believe, the core concept is so absolutely beautiful that you just have to
agree. (Sorry, but I think I've just fallen in love with it while working
on the details...)

Of course, many of the details may still need some refinement. Please:
before you bring any ideas for changes, make sure you read the comments in
the text. I tried to write down the rationals for most of the possibly
controversial decisions I had to make.

Ciao,
Nobbi
May 09, 2004
Re: Proposal for multidimensional arrays
Norbert Nemec wrote:

> Hi there,
> 
> I just finished the first version of my proposal for multidimensional
> arrays in D:
> 
>
http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
> 
> I believe, the core concept is so absolutely beautiful that you just have
> to agree. (Sorry, but I think I've just fallen in love with it while
> working on the details...)
> 
> Of course, many of the details may still need some refinement. Please:
> before you bring any ideas for changes, make sure you read the comments in
> the text. I tried to write down the rationals for most of the possibly
> controversial decisions I had to make.
> 
> Ciao,
> Nobbi

Very nicely written! Native support of 2d and n-d arrays would be nice, but
just as important is a good interface to BLAS, LAPACK and other Fortran
code. IIRC, BLAS can accept row-oriented data but LAPACK is column-oriented
only. The fact that Fortran indexing is 1-based is probably not going to be
a problem but transposing all those matrices to use LAPACK is a problem -
though I haven't timed anything to know for sure.

Some more mundane questions/comments:
- How about having n-d arrays be column oriented? Since your proposal uses a
new syntax it wouldn't be confusing to have column-oriented storage.

- Why remove dynamic arrays? Assigning to length (the note about assigning
to "size" confused me - did you mean "length"?) is pretty useful.

- N-D arrays don't have to have any way of getting the ranges. Perhaps you
could add a read-only property .length or .range or something that returns
a dynamic array of uints.

- Is is possible to reshape an n-d array or to perform the n-d equivalent to
"slicing a pointer":
char* ptr;
char[] arr = ptr[0..10];

-Ben
May 09, 2004
Re: Proposal for multidimensional arrays
Ben Hinkle wrote:
> Very nicely written! Native support of 2d and n-d arrays would be nice,
> but just as important is a good interface to BLAS, LAPACK and other
> Fortran code. IIRC, BLAS can accept row-oriented data but LAPACK is
> column-oriented only. The fact that Fortran indexing is 1-based is
> probably not going to be a problem but transposing all those matrices to
> use LAPACK is a problem - though I haven't timed anything to know for
> sure.
>
> Some more mundane questions/comments:
> - How about having n-d arrays be column oriented? Since your proposal uses
> a new syntax it wouldn't be confusing to have column-oriented storage.

Actually, I had this in mind from the very beginning. The proposal already
contains everything necessary to handle Fortran-arrays with the same ease
as C-arrays, I just forgot to mention it explicitely. A few additional tiny
features might make it more convenient, but I haven't made up my mind what
they would have to look like.

> - Why remove dynamic arrays? Assigning to length (the note about assigning
> to "size" confused me - did you mean "length"?) is pretty useful.

They are not removed, just renamed. The length of an array (of course,
length, sorry about talking about "size" there) never was really "dynamic".
Assigning to .length just obscured this fact. It is much clearer to either
use the extended .dup(int newlength) property if you want a new array with
the data copied, or use slicing if the new array reference should point to
the same location in memory.

I just promote calling the former dynamic arrays what they really are:
"array references", i.e. pointers with additional shape information.

> - N-D arrays don't have to have any way of getting the ranges. Perhaps you
> could add a read-only property .length or .range or something that returns
> a dynamic array of uints.

Of course, that has to be added. Anyhow, it should, of course, return a
fixed size array, since the number of dimensions is known at compile time.

> - Is is possible to reshape an n-d array or to perform the n-d equivalent
> to "slicing a pointer":
>  char* ptr;
>  char[] arr = ptr[0..10];

Might be an idea. Free reshaping beyond the capabilities of slicing and
transposing should then be done by first converting to a pointer and then
back to an array with that "slicing of pointers" idea of yours.

Actually, I have not yet thought of any low-level interface at all. Of
course one should be able to access all the internas of arrays in some way.
It would be unsafe, and it should be made clear that it should be avoided,
but it should still be possible in a comfortable, portable way.

I'll try to work in the details you mention ASAP.

Ciao,
Nobbi
May 10, 2004
Re: Proposal for multidimensional arrays
Reading your proposal I started wondering about your earlier attempts to
write Matrix/ND-array templates. What were the main issues? Was it syntax?
Was it performance concerns?

I've attached two files of my experiments for a day. One has some templates
and the other has an example. I think this basic approach was suggested in
a post on the now-defunct newsgroup. Assuming the templates can be inlined
it seems like a nice set of templates would work out. I haven't tried any
experiments, though. Did you try something like this?

-Ben
May 10, 2004
Re: Proposal for multidimensional arrays
Ben Hinkle wrote:

> Reading your proposal I started wondering about your earlier attempts to
> write Matrix/ND-array templates. What were the main issues? Was it syntax?
> Was it performance concerns?

Both: I could either have gone for pretty syntax, sacrificing performance
and clearness of the concepts, or going for performance and clearness, the
resulting syntax would have been ugly like something.

In neither case, the array templates would have been a good demonstration
for what I have in mind for the language, and in neither case, they would
have been really useful in practice.

> I've attached two files of my experiments for a day. One has some
> templates and the other has an example. I think this basic approach was
> suggested in a post on the now-defunct newsgroup. Assuming the templates
> can be inlined it seems like a nice set of templates would work out. I
> haven't tried any experiments, though. Did you try something like this?

I did something similar and even got a few steps beyond, working with
strides, but once I entered the subject of slicing and partial indexing,
everything just became a complete mess, and I realized that I spent so much
time trying to find workarounds for current shortcomings of the language
that the basic concepts got more and more washed out.

My resolution: The language is not yet flexible enough to implement my
vision of comfortable and efficient arrays in the library, and I judged
that instead of trying to get Walter to include five tiny changes, it might
be easier and more worthwhile to go for the big thing...

B.t.w: I saw that "reshape" function in your code. I thought about this
myself, but came to the conclusion that, in my concept of arrays with
strides, this would not work out too well. In general, reshaping would only
work for continuous arrays, and even then, it would be an unsafe operation
that should rather be done by casting to a pointer and back again.
May 10, 2004
Re: Proposal for multidimensional arrays
Norbert Nemec wrote:

> Ben Hinkle wrote:
> 
>> Reading your proposal I started wondering about your earlier attempts to
>> write Matrix/ND-array templates. What were the main issues? Was it
>> syntax? Was it performance concerns?
> 
> Both: I could either have gone for pretty syntax, sacrificing performance
> and clearness of the concepts, or going for performance and clearness, the
> resulting syntax would have been ugly like something.
> 
> In neither case, the array templates would have been a good demonstration
> for what I have in mind for the language, and in neither case, they would
> have been really useful in practice.

Most of the time the strides are 1 so building in the more general case can
hurt performance of the common case. Also n-d arrays can have complex
storage implementations (think sparse arrays, triangular arrays, etc) so
building something into the language will only be practical up to a certain
point. At some point the library will have to take over.

>> I've attached two files of my experiments for a day. One has some
>> templates and the other has an example. I think this basic approach was
>> suggested in a post on the now-defunct newsgroup. Assuming the templates
>> can be inlined it seems like a nice set of templates would work out. I
>> haven't tried any experiments, though. Did you try something like this?
> 
> I did something similar and even got a few steps beyond, working with
> strides, but once I entered the subject of slicing and partial indexing,
> everything just became a complete mess, and I realized that I spent so
> much time trying to find workarounds for current shortcomings of the
> language that the basic concepts got more and more washed out.
>
> My resolution: The language is not yet flexible enough to implement my
> vision of comfortable and efficient arrays in the library, and I judged
> that instead of trying to get Walter to include five tiny changes, it
> might be easier and more worthwhile to go for the big thing...

Slicing with a stride or slicing by arbitrary arrays (meaning the indices
don't have to be equally spaced or even distinct) came up for dynamic
arrays back on the old newsgroup but I can't remember Walter's reaction - I
don't think he posted any. A number of people expressed interest in more
powerful slicing for 1D arrays.

> B.t.w: I saw that "reshape" function in your code. I thought about this
> myself, but came to the conclusion that, in my concept of arrays with
> strides, this would not work out too well. In general, reshaping would
> only work for continuous arrays, and even then, it would be an unsafe
> operation that should rather be done by casting to a pointer and back
> again.

I agree that reshaping wouldn't be a member function of subarrays or if it
was it would extract the subarray before reshaping. I don't think it is
"unsafe", though, since you can't change the number of elements (not my
little reshape didn't check for this but it should). It is very handy in
MATLAB. It is a pity that my reshape can only change the dimension lengths
and not the number of dimensions - it would be great to have functions to
squeeze out empty dimensions (the equivalent to MATLAB's "squeeze") or to
flatten an n-d array into a vector (the equivalent to MATLAB's (:)
notation).

-Ben
May 10, 2004
Re: Proposal for multidimensional arrays
Ben Hinkle wrote:

> Most of the time the strides are 1 so building in the more general case
> can hurt performance of the common case.

True. My thought on this was: For multidimensional arrays, all dimensions
but the last one (in C-storage alignment) need strides != 1 in any case. In
your implementations, these strides are calculated from the ranges of the
lower dimensions. It does not change the access time, if, instead, the
stride is saved in a separate field. The array reference gets a bit bigger,
but usually, this does not mean much overhead.

For 1-d arrays, I believe, an unstrided, "lightweight" array type makes
sense. For more dimensions, the gains gets smaller and smaller, especially,
since the timecritical inner loop can easily be optimized by the compiler
for the special case of stride 1.

> Also n-d arrays can have complex
> storage implementations (think sparse arrays, triangular arrays, etc) so
> building something into the language will only be practical up to a
> certain point. At some point the library will have to take over.

True. I really believe that as much as possible should be put into the
language. Anyhow: rectangular arrays should clearly be a feature of the
language itself. The only reason that I went for striding as well was, that
the step from multidimensional arrays to fully strided arrays was so
surprisingly small.

> Slicing with a stride or slicing by arbitrary arrays (meaning the indices
> don't have to be equally spaced or even distinct) came up for dynamic
> arrays back on the old newsgroup but I can't remember Walter's reaction -
> I don't think he posted any. A number of people expressed interest in more
> powerful slicing for 1D arrays.

My opinion would be: striding with varying steps is clearly far too complex
to be put into the language. Striding of 1D-arrays only would be too
complex as well. Striding for ND-arrays comes naturally at minimal
additional cost, so if we want ND-arrays, we would be crazy to ignore the
power we can get by introducing full striding.

>> B.t.w: I saw that "reshape" function in your code. I thought about this
>> myself, but came to the conclusion that, in my concept of arrays with
>> strides, this would not work out too well. In general, reshaping would
>> only work for continuous arrays, and even then, it would be an unsafe
>> operation that should rather be done by casting to a pointer and back
>> again.
> 
> I agree that reshaping wouldn't be a member function of subarrays or if it
> was it would extract the subarray before reshaping. I don't think it is
> "unsafe", though, since you can't change the number of elements (not my
> little reshape didn't check for this but it should). It is very handy in
> MATLAB. It is a pity that my reshape can only change the dimension lengths
> and not the number of dimensions - it would be great to have functions to
> squeeze out empty dimensions (the equivalent to MATLAB's "squeeze") or to
> flatten an n-d array into a vector (the equivalent to MATLAB's (:)
> notation).

That is true for arrays that are stored continuously. As soon as you slice a
rectangular array (even without explicit strides) you will get an array
stored in a non-continuous area. Reshaping this without .dup-ing it first
would only be possible in very special cases.

In MATLAB, of course, there is no problem, since array contents are just
copied around intelligently.
May 11, 2004
Re: Proposal for multidimensional arrays
Norbert Nemec wrote:

> Ben Hinkle wrote:
> 
>> Reading your proposal I started wondering about your earlier attempts to
>> write Matrix/ND-array templates. What were the main issues? Was it
>> syntax? Was it performance concerns?
> 
> Both: I could either have gone for pretty syntax, sacrificing performance
> and clearness of the concepts, or going for performance and clearness, the
> resulting syntax would have been ugly like something.
> 
> In neither case, the array templates would have been a good demonstration
> for what I have in mind for the language, and in neither case, they would
> have been really useful in practice.

The only thing D lacks now in terms of syntax from your proposal is (aside
from the mtype[[N]]) the step in slicing. If A is an nd array then it
should be possible to write templates for
A[i][j][k]
A[i_0..i_1][j][k]
A[i][j_0..j_1][k]
A[i_0..i_1][j_0..j_1][k]
etc
and to add slicing step something like
A[i_0..i_1].by(i_step)[j][k]
A[i][j_0..j_1].by(j_step)[k]
etc
doesn't seem too bad. Or maybe just make a function
A.slice(i_0,i_1,i_step)[j][k]
A[i].slice(j_0,j_1,j_step)[k]

Are the templates really that nasty? The suggestion about opIndex and
opIndexAssign makes alot of sense to me (and, Walter, why isn't there any
opSliceAssign to overload slice assignment?) and it would be pretty easy to
convince Walter to act on it. 

If D had opCallAssign then something like
A(i)(j)(k) = 10
A(i_0,i_1)(j)(k) = 10
A(i_0,i_1,i_step)(j)(k) = 10
would look pretty usable and be extendable to more non-trivial indexing
expressions like
int[N] perm;
... fill perm with a permutation of 0..N-1 
A(perm,j,k) // permute the 1st dimension of A

Maybe with enough opFooAssign overloads we'd be all set :-)

-Ben
May 11, 2004
Re: Proposal for multidimensional arrays
Ben Hinkle wrote:

> Norbert Nemec wrote:
> 
>> Ben Hinkle wrote:
>> 
>>> Reading your proposal I started wondering about your earlier attempts to
>>> write Matrix/ND-array templates. What were the main issues? Was it
>>> syntax? Was it performance concerns?
>> 
>> Both: I could either have gone for pretty syntax, sacrificing performance
>> and clearness of the concepts, or going for performance and clearness,
>> the resulting syntax would have been ugly like something.
>> 
>> In neither case, the array templates would have been a good demonstration
>> for what I have in mind for the language, and in neither case, they would
>> have been really useful in practice.
> 
> The only thing D lacks now in terms of syntax from your proposal is (aside
> from the mtype[[N]]) the step in slicing. If A is an nd array then it
> should be possible to write templates for
>  A[i][j][k]
>  A[i_0..i_1][j][k]
>  A[i][j_0..j_1][k]
>  A[i_0..i_1][j_0..j_1][k]

The first one would work, because [i] could reduce the dimesionality by one.
For slicing, this will not work any more: [i_0..i_1] does not reduce the
dimensionality of the array, so [j] would again index the first dimension.

I've spent some thought on this, but I came to the conclusion that it is not
a good idea mixing [i][j] and [i,j]. It is a way to do a crude hack at the
moment, but you run into problems very quickly if you want to go beyond the
very basic indexing.

> Are the templates really that nasty?

No, but as I said: you can either go for a clean internal structure and
clumsy syntax for the user or try to get a comfortable syntax and end up
with messy and unmaintainable internals. It is the same in C++, even
though, with references, implicit instantiations and other features, I find
it a little bit more flexible if you want to implement a syntax that really
is comforable to the user. (But then, don't even try to understand or debug
the library internals...)

> opSliceAssign

For that, you would first have to say, what a range expression actually is.
The .. is not an operator, but it only exists within brackets. opSlice
should therefore be something like
       opSlice(int min0,int max0,int stride0,int min1,int max1,int stride1,...)

Might be an idea in the future, but I think we should not decide on this
now, as long as it is not clear what happens with respect to
multidimensional arrays in the language. If my proposal is stalled and it
is decided that something like this should never get into the language,
then it might make sense to think about little tweaks that allow to do the
same thing in the library. Otherwise, it seems much more reasonable to
design the details of opIndexAssign and opSlice in a way that fits into the
big picture.
Top | Discussion index | About this forum | D home