Thread overview
Mir Slice Column or Row Major
May 27, 2020
data pulverizer
May 27, 2020
data pulverizer
May 27, 2020
welkam
May 27, 2020
jmh530
May 28, 2020
9il
May 28, 2020
jmh530
May 28, 2020
welkam
May 27, 2020
Hi,

I have started running Kernel benchmarks calculations using Mir NDSlice, and I'm getting times that are much slower than expected. To check that I'm not making an obvious mistake, below are samples of the code I am using. The way the selection happens is that the `calculateKernelMatrix` function assumes that the data under the slice object is column major, if it is row major the calculation will be slow which could account for the issues I'm seeing. Thanks

Dot product functor
```
struct DotProduct(T)
{
  public:
  this(T _nothing)
  {}
  T opCall(U...)(Slice!(T*, U) x, Slice!(T*, U) y) const
  {
    T dist = 0;
    auto m = x.length;
    for(size_t i = 0; i < m; ++i)
    {
      dist += x[i] * y[i];
    }
    return dist;
  }
}
```

Kernel Matrix function:
```
auto calculateKernelMatrix(alias K, T, U...)(K!(T) kernel, Slice!(T*, U) data)
{
  size_t n = data.length!1;
  auto mat = slice!(T)(n, n);

  foreach(j; taskPool.parallel(iota(n)))
  {
    auto arrj = data[0..$, j];
    foreach(size_t i; j..n)
    {
      mat[i, j] = kernel(data[0..$, i], arrj);
      mat[j, i] = mat[i, j];
    }
  }
  return mat;
}
```

Benchmark Function
```
auto bench(alias K, T)(K!(T) kernel, long[] n, bool verbose = true)
{
  auto times = new double[n.length];
  auto sw = StopWatch(AutoStart.no);
  foreach(i; 0..n.length)
  {
    double[3] _times;
    auto data = UniformVariable!T(0, 1).randomSlice(784L, n[i]);
    foreach(ref t; _times[])
    {
      sw.start();
      auto mat = calculateKernelMatrix!(K, T)(kernel, data);
      sw.stop();
      t = sw.peek.total!"nsecs"/1000_000_000.0;
      sw.reset();
    }
    times[i] = sum(_times[])/3.0;
    if(verbose)
    {
      writeln("Average time for n = ", n[i], ", ", times[i], " seconds.");
      writeln("Detailed times: ", _times, "\n");
    }
  }
  return tuple(n, times);
}
```
May 27, 2020
On Wednesday, 27 May 2020 at 01:31:23 UTC, data pulverizer wrote:
> Hi,
>
> I have started running Kernel benchmarks calculations using Mir NDSlice, and I'm getting times that are much slower than expected.

I've swapped the calculation to row major and it's running as expected.


May 27, 2020
On Wednesday, 27 May 2020 at 01:31:23 UTC, data pulverizer wrote:
> column major

Cute puppies die when people access their arrays in column major.
May 27, 2020
On Wednesday, 27 May 2020 at 16:07:58 UTC, welkam wrote:
> On Wednesday, 27 May 2020 at 01:31:23 UTC, data pulverizer wrote:
>> column major
>
> Cute puppies die when people access their arrays in column major.

Not always true...many languages support column-major order (Fortran, most obviously). The Eigen C++ library allows the user to specify row major or column major. I had brought this up with Ilya early on in mir and he thought it would increase complexity to allow both and could also require more memory. So mir is row major.
May 28, 2020
On Wednesday, 27 May 2020 at 16:53:37 UTC, jmh530 wrote:
> On Wednesday, 27 May 2020 at 16:07:58 UTC, welkam wrote:
>> On Wednesday, 27 May 2020 at 01:31:23 UTC, data pulverizer wrote:
>>> column major
>>
>> Cute puppies die when people access their arrays in column major.
>
> Not always true...many languages support column-major order (Fortran, most obviously). The Eigen C++ library allows the user to specify row major or column major. I had brought this up with Ilya early on in mir and he thought it would increase complexity to allow both and could also require more memory. So mir is row major.

Actually it is a question of notation. For example, mir-lapack uses ndslice as column-major Fortran arrays. This may cause some headaches because the data needs to be transposed in mind. We can think about ndslice as about column-major nd-arrays with the reversed order of indexing.

The current template looks like

Slice(Iterator, size_t N = 1, SliceKind kind = 1)

If we add a special column-major notation, then it will look like

Slice(Iterator, size_t N = 1, SliceKind kind = Contiguous, PayloadOrder = RowMajor)

A PR that adds this feature will be accepted.

May 28, 2020
On Thursday, 28 May 2020 at 00:51:50 UTC, 9il wrote:
> snip
> Actually it is a question of notation. For example, mir-lapack uses ndslice as column-major Fortran arrays. This may cause some headaches because the data needs to be transposed in mind. We can think about ndslice as about column-major nd-arrays with the reversed order of indexing.
>
> The current template looks like
>
> Slice(Iterator, size_t N = 1, SliceKind kind = 1)
>
> If we add a special column-major notation, then it will look like
>
> Slice(Iterator, size_t N = 1, SliceKind kind = Contiguous, PayloadOrder = RowMajor)
>
> A PR that adds this feature will be accepted.

Oh, that is news to me. I was under the impression that such a PR would not be accepted. The prototype you have is exactly what I had been thinking (that’s what eigen does).

Unfortunately, I don’t think I have the time to ensure everything works properly with column major. I think my time right now is better spent on other mir stuff, but it’s good to know that the only obstacle is someone putting the work in.

May 28, 2020
On Wednesday, 27 May 2020 at 16:53:37 UTC, jmh530 wrote:
> Not always true...many languages support column-major order (Fortran, most obviously).

if your column major matrix is implemented as
matrix[row_index][column_index]
then ok no puppies will be hurt. But I dont see much value in such implementations.