Mir Slice Column or Row Major

May 27, 2020

data pulverizer

May 27, 2020

data pulverizer

May 27, 2020

May 27, 2020

May 28, 2020

May 28, 2020

May 28, 2020

May 27, 2020

Mir Slice Column or Row Major

Posted by data pulverizer

Permalink

data pulverizer

Permalink

Hi,

I have started running Kernel benchmarks calculations using Mir NDSlice, and I'm getting times that are much slower than expected. To check that I'm not making an obvious mistake, below are samples of the code I am using. The way the selection happens is that the `calculateKernelMatrix` function assumes that the data under the slice object is column major, if it is row major the calculation will be slow which could account for the issues I'm seeing. Thanks

Dot product functor
```
struct DotProduct(T)
{
  public:
  this(T _nothing)
  {}
  T opCall(U...)(Slice!(T*, U) x, Slice!(T*, U) y) const
  {
    T dist = 0;
    auto m = x.length;
    for(size_t i = 0; i < m; ++i)
    {
      dist += x[i] * y[i];
    }
    return dist;
  }
}
```

Kernel Matrix function:
```
auto calculateKernelMatrix(alias K, T, U...)(K!(T) kernel, Slice!(T*, U) data)
{
  size_t n = data.length!1;
  auto mat = slice!(T)(n, n);

  foreach(j; taskPool.parallel(iota(n)))
  {
    auto arrj = data[0..$, j];
    foreach(size_t i; j..n)
    {
      mat[i, j] = kernel(data[0..$, i], arrj);
      mat[j, i] = mat[i, j];
    }
  }
  return mat;
}
```

Benchmark Function
```
auto bench(alias K, T)(K!(T) kernel, long[] n, bool verbose = true)
{
  auto times = new double[n.length];
  auto sw = StopWatch(AutoStart.no);
  foreach(i; 0..n.length)
  {
    double[3] _times;
    auto data = UniformVariable!T(0, 1).randomSlice(784L, n[i]);
    foreach(ref t; _times[])
    {
      sw.start();
      auto mat = calculateKernelMatrix!(K, T)(kernel, data);
      sw.stop();
      t = sw.peek.total!"nsecs"/1000_000_000.0;
      sw.reset();
    }
    times[i] = sum(_times[])/3.0;
    if(verbose)
    {
      writeln("Average time for n = ", n[i], ", ", times[i], " seconds.");
      writeln("Detailed times: ", _times, "\n");
    }
  }
  return tuple(n, times);
}
```

On Wednesday, 27 May 2020 at 01:31:23 UTC, data pulverizer wrote: > Hi, > > I have started running Kernel benchmarks calculations using Mir NDSlice, and I'm getting times that are much slower than expected. I've swapped the calculation to row major and it's running as expected.

On Wednesday, 27 May 2020 at 16:07:58 UTC, welkam wrote: > On Wednesday, 27 May 2020 at 01:31:23 UTC, data pulverizer wrote: >> column major > > Cute puppies die when people access their arrays in column major. Not always true...many languages support column-major order (Fortran, most obviously). The Eigen C++ library allows the user to specify row major or column major. I had brought this up with Ilya early on in mir and he thought it would increase complexity to allow both and could also require more memory. So mir is row major.

On Wednesday, 27 May 2020 at 16:53:37 UTC, jmh530 wrote: > On Wednesday, 27 May 2020 at 16:07:58 UTC, welkam wrote: >> On Wednesday, 27 May 2020 at 01:31:23 UTC, data pulverizer wrote: >>> column major >> >> Cute puppies die when people access their arrays in column major. > > Not always true...many languages support column-major order (Fortran, most obviously). The Eigen C++ library allows the user to specify row major or column major. I had brought this up with Ilya early on in mir and he thought it would increase complexity to allow both and could also require more memory. So mir is row major. Actually it is a question of notation. For example, mir-lapack uses ndslice as column-major Fortran arrays. This may cause some headaches because the data needs to be transposed in mind. We can think about ndslice as about column-major nd-arrays with the reversed order of indexing. The current template looks like Slice(Iterator, size_t N = 1, SliceKind kind = 1) If we add a special column-major notation, then it will look like Slice(Iterator, size_t N = 1, SliceKind kind = Contiguous, PayloadOrder = RowMajor) A PR that adds this feature will be accepted.

On Thursday, 28 May 2020 at 00:51:50 UTC, 9il wrote: > snip > Actually it is a question of notation. For example, mir-lapack uses ndslice as column-major Fortran arrays. This may cause some headaches because the data needs to be transposed in mind. We can think about ndslice as about column-major nd-arrays with the reversed order of indexing. > > The current template looks like > > Slice(Iterator, size_t N = 1, SliceKind kind = 1) > > If we add a special column-major notation, then it will look like > > Slice(Iterator, size_t N = 1, SliceKind kind = Contiguous, PayloadOrder = RowMajor) > > A PR that adds this feature will be accepted. Oh, that is news to me. I was under the impression that such a PR would not be accepted. The prototype you have is exactly what I had been thinking (that’s what eigen does). Unfortunately, I don’t think I have the time to ensure everything works properly with column major. I think my time right now is better spent on other mir stuff, but it’s good to know that the only obstacle is someone putting the work in.

On Wednesday, 27 May 2020 at 16:53:37 UTC, jmh530 wrote: > Not always true...many languages support column-major order (Fortran, most obviously). if your column major matrix is implemented as matrix[row_index][column_index] then ok no puppies will be hurt. But I dont see much value in such implementations.

Forums