March 13, 2018
On Tuesday, 13 March 2018 at 12:23:23 UTC, jmh530 wrote:
> On Tuesday, 13 March 2018 at 10:35:15 UTC, 9il wrote:
>> On Tuesday, 13 March 2018 at 04:35:53 UTC, jmh530 wrote:
>>> [snip]
>>>
>>> What's TMMat?
>>
>> TMat is a transposed matrix. Not sure for now if it would be required.
>>
>
> There are some people who like being able to specify a whether a matrix has column or row layout. Would an option to control this be the same thing?

Good point. Would matrix(j, i) syntax solve this issue? One of reasons to introduce Mat is API simplicity. ndslice has 3 compile time params. I hope we would have only type for Mat, like Mat!double.
March 13, 2018
On Tuesday, 13 March 2018 at 13:02:45 UTC, Ilya Yaroshenko wrote:
> On Tuesday, 13 March 2018 at 12:23:23 UTC, jmh530 wrote:
>> On Tuesday, 13 March 2018 at 10:35:15 UTC, 9il wrote:
>>> On Tuesday, 13 March 2018 at 04:35:53 UTC, jmh530 wrote:
>>>> [snip]
>>>>
>>>> What's TMMat?
>>>
>>> TMat is a transposed matrix. Not sure for now if it would be required.
>>>
>>
>> There are some people who like being able to specify a whether a matrix has column or row layout. Would an option to control this be the same thing?
>
> Good point. Would matrix(j, i) syntax solve this issue? One of reasons to introduce Mat is API simplicity. ndslice has 3 compile time params. I hope we would have only type for Mat, like Mat!double.

I'm not sure I understand what your syntax solution does...

But I agree that there is a benefit from API simplicity. It would probably be easier to just say Mat is row-major and have another that is column-major (or have the options in ndslice). Nevertheless, it can't help to look at what other matrix libraries do.

Eigen's (C++ library) Matrix class uses template arguments to set storage order (_Options). It looks like Eigen has six template arguments.
https://eigen.tuxfamily.org/dox/classEigen_1_1Matrix.html

Numpy does the same thing at run-time
https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

Also, many of the languages that emphasize linear algebra strongly (Fortran, Matlab, etc) use column-major order. Row-order is most popular coming from C-based languages.
https://en.wikipedia.org/wiki/Row-_and_column-major_order
March 13, 2018
On Tuesday, 13 March 2018 at 14:13:02 UTC, jmh530 wrote:
[...]
> https://en.wikipedia.org/wiki/Row-_and_column-major_order

I think for mathematics it is more important for easy handling,
to be able to get the element of a matrix a_ij by a(i,j) and not only by a[i-1,j-1].

The underlying storage concept is less important and depends just on the used libs, which should be the best (fastest) available for the purpose.
March 13, 2018
On Tuesday, 13 March 2018 at 14:13:02 UTC, jmh530 wrote:
> On Tuesday, 13 March 2018 at 13:02:45 UTC, Ilya Yaroshenko wrote:
>> On Tuesday, 13 March 2018 at 12:23:23 UTC, jmh530 wrote:
>>> [...]
>>
>> Good point. Would matrix(j, i) syntax solve this issue? One of reasons to introduce Mat is API simplicity. ndslice has 3 compile time params. I hope we would have only type for Mat, like Mat!double.
>
> I'm not sure I understand what your syntax solution does...

matrix(j, i) == matrix[i, j] (reversed order)

> But I agree that there is a benefit from API simplicity. It would probably be easier to just say Mat is row-major and have another that is column-major (or have the options in ndslice). Nevertheless, it can't help to look at what other matrix libraries do.
>
> Eigen's (C++ library) Matrix class uses template arguments to set storage order (_Options). It looks like Eigen has six template arguments.
> https://eigen.tuxfamily.org/dox/classEigen_1_1Matrix.html

Interesting.  Maybe user can do:

alias Mat = Matrix!(double, ColMajor);

Fixed lengths can be considered too. Eigen has good ideas.
March 13, 2018
On Tuesday, 13 March 2018 at 15:47:36 UTC, Martin Tschierschke wrote:
> On Tuesday, 13 March 2018 at 14:13:02 UTC, jmh530 wrote:
> [...]
>> https://en.wikipedia.org/wiki/Row-_and_column-major_order
>
> I think for mathematics it is more important for easy handling,
> to be able to get the element of a matrix a_ij by a(i,j) and not only by a[i-1,j-1].
>
> The underlying storage concept is less important and depends just on the used libs, which should be the best (fastest) available for the purpose.

The underlying storage format is important for performance, especially cache lines. For instance, calculate the sum of the columns of a matrix stored in row major format vs. column major format. If it is stored column-wise, you can just loop right down the column.

In LAPACKE (note the E), the first parameter on just about every function is a variable controlling whether it is in row-major or column major. By contrast, I believe the original LAPACK (without E) was written for FORTRAN and is in column-major storage [1]. The documentation for LAPACKE notes:

"Note that using row-major ordering may require more memory and time than column-major ordering, because the routine must transpose the row-major order to the column-major order required by the underlying LAPACK routine."

It also is relevant if people who use mir want to interact with libraries that use different memory layouts. Alternately, people who use languages like Fortran that have row major formats might want to call mir code.

[1] mir-lapack uses Canonical slices in many of these functions. I assume this is correct, but I have a nagging feeling that I should compare the results of some of these functions with another language to really convince myself...When you increment an iterator on canonical it's still going in row order.
March 13, 2018
On Tuesday, 13 March 2018 at 15:47:36 UTC, Martin Tschierschke wrote:
>
> I think for mathematics it is more important for easy handling,
> to be able to get the element of a matrix a_ij by a(i,j) and not only by a[i-1,j-1].
> [snip]

I didn't really address this in my other post.

What you're talking about is 0-based or 1-based indexing. Most languages force you to choose, though there are a few languages that let you specify the type of index you want (Pascal, Chapel, and Ada come to mind).

While I've thought that the way Chapel does domains is cool, I guess I never gave much thought into implementing the optional 0 or 1 based indexing in D. Now that I'm thinking about it, I don't see why it couldn't be implemented. For instance, there's nothing stopping you from writing a function like below that has the same behavior.

auto access(T)(T x, size_t i, size_t j)
{
     return(x.opIndex(i - 1, j - 1));
}

What you really care about is the nice syntax. In that case, you could write an opIndex function that has different behavior based on a template parameter in Slice. Even something simple like below might work.

auto ref opIndex(Indexes...)(Indexes indexes) @safe
    if (isIndexSlice!Indexes)
{
    static if (defaultIndexingBehavior)
    {
        return this.opIndex!(indexes.length)([indexes]);
    }
    else
    {
        Indexes newIndexes;
        foreach(i, e; indexes)
        {
            newIndexes[i] = e - 1;
        }
        return this.opIndex!(indexes.length)([newIndexes]);
    }
}

The time-consuming part is that you'd have to go through all of mir where it relies on opIndex and ensure that both sets of behavior work.



March 13, 2018
On Tuesday, 13 March 2018 at 16:40:13 UTC, 9il wrote:
> On Tuesday, 13 March 2018 at 14:13:02 UTC, jmh530 wrote:
>> [snip]
>>
>> I'm not sure I understand what your syntax solution does...
>
> matrix(j, i) == matrix[i, j] (reversed order)
>

Hopefully, I made the issue more clean in my response to Martin. The issue with row/column major order is one-part performance and one-part interoperability with other Matrix libraries.

Perhaps most importantly for that syntax is to imagine how confusing a new user would find that syntax... A free-standing function, such as the simple one below, might be less confusing. Also, this is a solution for the Matrix type, but not so much the Slice type.

auto reverseIndex(T)(T x, size_t i, size_t j)
{
    return(x[j, i]);
}

The reverseIndex function is convenient, but you are looping through each element of the second column. Is there any performance advantage to using the reverseIndex function to do so? I suspect not. This is because you still have to jump around in memory because the underlying storage is row-major. You may not notice this effect when the CPU is able to pre-fetch the whole matrix and put it in cache, but as the matrix gets larger, then you can't fit it all in cache and it starts to matter more. Also, you might break vector operations.

Personally, while I think it's important to think about, I also don't think it's a hugely pressing issue so long as the API is flexible enough that you can add the functionality in the future.
March 13, 2018
On Tuesday, 13 March 2018 at 12:09:04 UTC, jmh530 wrote:
> On Tuesday, 13 March 2018 at 10:39:29 UTC, 9il wrote:
>> On Tuesday, 13 March 2018 at 05:36:06 UTC, J-S Caux wrote:
>>>
>>> Your suggestion [4] that matrix[i] returns a Vec is perhaps too inflexible. What one needs sometimes is to return a row, or a column of a matrix, so a notation like matrix[i, ..] or matrix[.., j] returning respectively a row or column would be useful.
>>
>> auto row = matrix[i]; // or matrix[i, 0 .. $];
>> auto col = matrix[0 .. $, j];
>
> Some kind of improvement that replaces 0 .. $ with some shorter syntax has been brought up in the past.
> https://github.com/libmir/mir-algorithm/issues/53

What I have settled on is Row(x,2), which returns a range that works with foreach. I tried x[_,2] to return Row(x,2) but didn't like reading it, so I went with x[_all,2] instead. Similarly for Col(x,2) and x[2,_all]. The exact form is bikeshedding and shouldn't make much difference. I use ByRow(x) and ByColumn(x) to iterate over the full matrix.

IME, if you try to mix row-order and column-order, or 0-based indexing and 1-based indexing, it's too complicated to write correct code that interacts with other libraries. I think you need to choose one and go with it.
March 13, 2018
On Tuesday, 13 March 2018 at 21:04:28 UTC, bachmeier wrote:
>
> What I have settled on is Row(x,2), which returns a range that works with foreach. I tried x[_,2] to return Row(x,2) but didn't like reading it, so I went with x[_all,2] instead. Similarly for Col(x,2) and x[2,_all]. The exact form is bikeshedding and shouldn't make much difference. I use ByRow(x) and ByColumn(x) to iterate over the full matrix.

This Row(x, 2) is essentially the same approach as Armadillo (it also has rows, cols, span). mir's select isn't quite the same thing.

_all is interesting.

mir's byDim that can iterate by both rows and columns.

>
> IME, if you try to mix row-order and column-order, or 0-based indexing and 1-based indexing, it's too complicated to write correct code that interacts with other libraries. I think you need to choose one and go with it.

Fair enough.

mir uses a row-order 0-based indexing approach by default. That's fine, I'm used to it at this point. What I was thinking about was that Slice's definition would change from
struct Slice(SliceKind kind, size_t[] packs, Iterator) to
struct Slice(SliceKind kind, size_t[] packs, Iterator, MemoryLayout layout = rowLayout)
so that the user has control over changing it on a object by object basis. Ideally, they would keep it the same across the entire program. Nevertheless, I would still prefer it so that all functions in mir provide the same result regardless of what layout is chosen (not sure you can say that about switching to 0-based indexing...). The idea would be that whatever is built on top of it shouldn't need to care about the layout. However, due to cache locality, some programs might run faster depending on the layout chosen.

With respect to interacting with libraries, I agree that a user should choose either row-order or column-order and stick to it. But what options are available for the user of a column-major language (or array library) to call mir if mir only makes available functions that handle row-major layouts? RCppArmadillo doesn't have an issue because both R and Armadillo are column-major. Going the other way, you'd probably know better than I would, but it looks like in embedr the only way I see to assign a D matrix to a RMatrix is by copying values. If a matrix was already in column-major form, then how easy much easier would it be to interact with R?
March 14, 2018
On Tuesday, 13 March 2018 at 22:08:10 UTC, jmh530 wrote:
> With respect to interacting with libraries, I agree that a user should choose either row-order or column-order and stick to it. But what options are available for the user of a column-major language (or array library) to call mir if mir only makes available functions that handle row-major layouts? RCppArmadillo doesn't have an issue because both R and Armadillo are column-major. Going the other way, you'd probably know better than I would, but it looks like in embedr the only way I see to assign a D matrix to a RMatrix is by copying values. If a matrix was already in column-major form, then how easy much easier would it be to interact with R?

With embedr, any data that is used by R has to be allocated by R. Therefore one would have to allocate R data (column-major) and then pass it to Mir. So I suppose you're right that it would be necessary to have some way to work with column-major data if you want to call Mir from R.