January 03, 2016
On Sunday, 3 January 2016 at 00:17:23 UTC, Ilya Yaroshenko wrote:
> On Sunday, 3 January 2016 at 00:09:33 UTC, Jack Stouffer wrote:
>> On Saturday, 2 January 2016 at 23:51:09 UTC, Ilya Yaroshenko wrote:
>>> This benchmark is _not_ lazy, so ndslice faster than Numpy only 3.5 times.
>>
>> I don't know what you mean here, I made sure to call std.array.array to force allocation.
>
> In the article:
>     auto means = 100_000.iota <---- 100_000.iota is lazy range
>         .sliced(100, 1000)
>         .transposed
>         .map!(r => sum(r) / r.length)
>         .array;               <---- allocation of the result
>
> In GitHub:
>     means = data           <---- data is allocated array, it is fair test for real world
>         .sliced(100, 1000)
>         .transposed
>         .map!(r => sum(r, 0L) / cast(double) r.length)
>         .array;             <---- allocation of the result
>  -- Ilya

I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, and sense most std.ndslice calculations are going to be on the stack, I believe my benchmark is indicative of normal use.
January 03, 2016
On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote:
> On Sunday, 3 January 2016 at 00:17:23 UTC, Ilya Yaroshenko wrote:
>>  [...]
>
> I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, and sense most std.ndslice calculations are going to be on the stack, I believe my benchmark is indicative of normal use.

No, for real world math calculations, most std.ndslice will be at global memory.
Examples: all SciD (if port it to ndslice), future BLAS, future LAPACK.
-- Ilya
January 03, 2016
On Saturday, 2 January 2016 at 19:49:05 UTC, Jack Stouffer wrote:
> http://jackstouffer.com/blog/nd_slice.html
>
> https://www.reddit.com/r/programming/comments/3z6f7a/using_d_and_stdndslice_as_a_numpy_replacement/

Nicely written, good to see you explain all the code, enjoyed reading it.
January 03, 2016
On 1/2/16 6:24 PM, Ilya Yaroshenko wrote:
> On Saturday, 2 January 2016 at 23:23:38 UTC, Ilya Yaroshenko wrote:
>> On Saturday, 2 January 2016 at 19:49:05 UTC, Jack Stouffer wrote:
>>> http://jackstouffer.com/blog/nd_slice.html
>>>
>>> https://www.reddit.com/r/programming/comments/3z6f7a/using_d_and_stdndslice_as_a_numpy_replacement/
>>>
>>
>> I just wanted to write to you that dip80-ndslice was moved to mir
>> http://code.dlang.org/packages/mir
>>
>>     "dependencies": {
>>         "dip80-ndslice": "~>0.8.7"
>>     },
>>
>> Ilya
>
> EDIT:
>
>      "dependencies": {
>          "mir": "~>0.9.0-beta"
>      }

What is the relationship between mir and std.experimental.ndslice? -- Andrei
January 03, 2016
On Sunday, 3 January 2016 at 23:18:16 UTC, Andrei Alexandrescu wrote:
> On 1/2/16 6:24 PM, Ilya Yaroshenko wrote:
>> On Saturday, 2 January 2016 at 23:23:38 UTC, Ilya Yaroshenko wrote:
>>> On Saturday, 2 January 2016 at 19:49:05 UTC, Jack Stouffer wrote:
>>>> http://jackstouffer.com/blog/nd_slice.html
>>>>
>>>> https://www.reddit.com/r/programming/comments/3z6f7a/using_d_and_stdndslice_as_a_numpy_replacement/
>>>>
>>>
>>> I just wanted to write to you that dip80-ndslice was moved to mir
>>> http://code.dlang.org/packages/mir
>>>
>>>     "dependencies": {
>>>         "dip80-ndslice": "~>0.8.7"
>>>     },
>>>
>>> Ilya
>>
>> EDIT:
>>
>>      "dependencies": {
>>          "mir": "~>0.9.0-beta"
>>      }
>
> What is the relationship between mir and std.experimental.ndslice? -- Andrei

1. mir.ndslice is a developer version of std.experimental.ndslice
2. mir can be used with DMD front end >= 2.068, so ndslice can be used with LDC 0.17.0-alpha. It is important for benchmarks.
3. mir is going to be a testing package for the future std.la (generic BLAS implementation)

-- Ilya
January 04, 2016
On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote:
> I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, […]

What leads you to this belief? (Beyond cache locality considerations, which are not so important if the data is large.)

> and sense most std.ndslice calculations are going to be on the stack, I believe my benchmark is indicative of normal use.

Your iota example does not read the data from memory at all (neither stack nor heap), instead computing it on the fly.

 — David
January 03, 2016
On 1/3/16 6:52 PM, Ilya wrote:
> 1. mir.ndslice is a developer version of std.experimental.ndslice
> 2. mir can be used with DMD front end >= 2.068, so ndslice can be used
> with LDC 0.17.0-alpha. It is important for benchmarks.
> 3. mir is going to be a testing package for the future std.la (generic
> BLAS implementation)

The care you show for your users is impressive. We should take a page from your book, hopefully with your own help. Thanks! -- Andrei

January 04, 2016
On Monday, 4 January 2016 at 00:24:51 UTC, David Nadlinger wrote:
> On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote:
>> I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, […]
>
> What leads you to this belief? (Beyond cache locality considerations, which are not so important if the data is large.)

The example in the article and the example I submitted to DlangScience/examples have very different speeds: https://github.com/DlangScience/examples/blob/master/mean_of_columns.d

Article example:                  5 µs
DlangScience example:  145 µs

Both when compiled with LDC


January 04, 2016
On Monday, 4 January 2016 at 01:03:49 UTC, Jack Stouffer wrote:
> On Monday, 4 January 2016 at 00:24:51 UTC, David Nadlinger wrote:
>> On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote:
>>> I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, […]
>>
>> What leads you to this belief? (Beyond cache locality considerations, which are not so important if the data is large.)
>
> The example in the article and the example I submitted to DlangScience/examples have very different speeds: https://github.com/DlangScience/examples/blob/master/mean_of_columns.d
>
> Article example:                  5 µs
> DlangScience example:  145 µs
>
> Both when compiled with LDC

To be clear: there is NO data in Article example. Only CPU registers are used. It is not fair. -- Ilya
January 04, 2016
On Monday, 4 January 2016 at 01:09:30 UTC, Ilya wrote:
> To be clear: there is NO data in Article example. Only CPU registers are used. It is not fair. -- Ilya

Ok, I see were I made the mistake, I apologize. I believed that since I was only testing the np.mean line of code, that the lazy generation of the D in the D version would have no effects on the comparison. But I forgot the fact that the lazy generation would mean that no memory needed to be accessed in the D code and therefore it's not apples to apples with the Numpy code.

Thank you for clarifying this, I will update the article.