September 24, 2016
On 9/24/16 9:18 AM, John Colvin wrote:
> On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
>> Could you also add a comparison with SciPy? People often say it's just
>> fine for scientific computing.
>
> That's just BLAS (so could be mkl, could be openBLAS, could be netlib,
> etc. just depends on the system and compilation choices) under the hood,
> you'd just see a small overhead from the python wrapping. Basically,
> everyone uses a BLAS or Eigen.

I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.

> An Eigen comparison would be interesting.

That'd be awesome especially since the article text refers to it.


Andrei

September 24, 2016
On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:
>
> I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.
>

Here are some benchmarks from Eigen and Blaze for comparison
http://eigen.tuxfamily.org/index.php?title=Benchmark
https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks

They don't include Python, for the reason mentioned above (no one would use native python implementation of matrix multiplication, it just calls some other library).

I don't see a reason to include it here.
September 24, 2016
On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
> On 9/24/16 3:20 AM, Ilya Yaroshenko wrote:
>> [...]
>
> Awesome. Good to see that most of the graphs have a nice blue envelope :o). Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.
>
> [...]

Thank you !!! --Ilya
September 24, 2016
On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin wrote:
> On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
>> Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.
>
> That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.
>
> An Eigen comparison would be interesting.

Seems like libeigen_blas.dylib and libeigen_blas_static.a does not contain _cblas_sgemm symbol for example. Does they work for you?
September 24, 2016
On 09/24/2016 10:26 AM, jmh530 wrote:
> On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:
>>
>> I see, thanks. To the extent the Python-specific overheads are
>> measurable, it might make sense to include the benchmark.
>>
>
> Here are some benchmarks from Eigen and Blaze for comparison
> http://eigen.tuxfamily.org/index.php?title=Benchmark
> https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks
>
> They don't include Python, for the reason mentioned above (no one would
> use native python implementation of matrix multiplication, it just calls
> some other library).
>
> I don't see a reason to include it here.

OK. Yah, native Python wouldn't make sense. It may be worth mentioning that SciPy uses BLAS so it has the same performance profile.

Also, a great idea for a followup would be a blog post comparing the source code for a typical linear algebra real-world task. The idea being, yes the D version has parity with Intel, but there _is_ a reason to switch to it because of its ease of use.


Andrei

September 24, 2016
On Saturday, 24 September 2016 at 14:59:32 UTC, Ilya Yaroshenko wrote:
> On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin wrote:
>> On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
>>> Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.
>>
>> That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.
>>
>> An Eigen comparison would be interesting.
>
> Seems like libeigen_blas.dylib and libeigen_blas_static.a does not contain _cblas_sgemm symbol for example. Does they work for you?

Fixed with Netlib CBLAS
September 24, 2016
First of all, awesome work. It's great to see that it's possible to match or even exceed the performance of hand-crafted assembly implementations with generic code.

I would suggest adding more information on how the Eigen results were obtained. Unlike OpenBLAS, Eigen performance does often vary by compiler and varies greatly depending on the kind of preprocessor macros that are defined. In particular, EIGEN_NO_DEBUG is defined by default and reduces performance, EIGEN_FAST_MATH is not defined by default but can often increase performance and EIGEN_STACK_ALLOCATION_LIMIT matters greatly for performance on very small matrices (where MKL and especially OpenBLAS are very inefficient). It's been a while since I've used Eigen, so I may have forgotten one or two.

It may also be worth noting in the blog post that these are all single threaded comparisons and multithreaded implementations are on the way. This is obvious to anyone who's followed the development of Mir, but a general audience on Reddit will likely point it out as a deficiency unless stated upfront.
September 24, 2016
On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:
> On 9/24/16 9:18 AM, John Colvin wrote:
>> On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
>>> Could you also add a comparison with SciPy? People often say it's just
>>> fine for scientific computing.
>>
>> That's just BLAS (so could be mkl, could be openBLAS, could be netlib,
>> etc. just depends on the system and compilation choices) under the hood,
>> you'd just see a small overhead from the python wrapping. Basically,
>> everyone uses a BLAS or Eigen.
>
> I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.
>
>> An Eigen comparison would be interesting.
>
> That'd be awesome especially since the article text refers to it.
>
>
> Andrei

Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --Ilya
September 24, 2016
On Saturday, 24 September 2016 at 17:46:07 UTC, Ilya Yaroshenko wrote:
> On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:
>> On 9/24/16 9:18 AM, John Colvin wrote:
>>> On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
>>>> Could you also add a comparison with SciPy? People often say it's just
>>>> fine for scientific computing.
>>>
>>> That's just BLAS (so could be mkl, could be openBLAS, could be netlib,
>>> etc. just depends on the system and compilation choices) under the hood,
>>> you'd just see a small overhead from the python wrapping. Basically,
>>> everyone uses a BLAS or Eigen.
>>
>> I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.
>>
>>> An Eigen comparison would be interesting.
>>
>> That'd be awesome especially since the article text refers to it.
>>
>>
>> Andrei
>
> Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --Ilya

It would also be interesting to compare the results to Blaze [1]. According to https://www.youtube.com/watch?v=hfn0BVOegac it is faster than Eigen and on some instances faster than even Intel MKL.

[1]: https://bitbucket.org/blaze-lib/blaze
September 24, 2016
On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:
> Eigen was added (but only data, still need to write text). Relative
> charts was added.

Looks awesome. Couple more nits after one more pass:

"numerical and scientific projects" -> "numeric and scientific projects"

"OpenBLAS Haswell computation kernels" -> "The OpenBLAS Haswell computation kernels"

"To add a new architecture or target an engineer" -> "To add a new architecture or target, an engineer"

"configurations are available for X87, SSE2, AVX, and AVX2 instruction sets" -> "configurations are available for the X87, SSE2, AVX, and AVX2 instruction sets"

In the machine, you may want to specify the amount of L2 cache (I think it's 6 MB)

Instead of "Recent" MKL, a version number would be more precise

Relative performance plots should specify "percent", i.e. "Performance relative to Mir" -> "Performance relative to Mir [%]"

"General Matrix-matrix Multiplication" -> "General Matrix-Matrix Multiplication"


Andrei