Using D and std.ndslice as a Numpy Replacement (page 2)

January 03, 2016

Re: Using D and std.ndslice as a Numpy Replacement

Posted by Jack Stouffer
in reply to Ilya Yaroshenko

Permalink

Jack Stouffer

Posted in reply to Ilya Yaroshenko

Permalink

On Sunday, 3 January 2016 at 00:17:23 UTC, Ilya Yaroshenko wrote:
> On Sunday, 3 January 2016 at 00:09:33 UTC, Jack Stouffer wrote:
>> On Saturday, 2 January 2016 at 23:51:09 UTC, Ilya Yaroshenko wrote:
>>> This benchmark is _not_ lazy, so ndslice faster than Numpy only 3.5 times.
>>
>> I don't know what you mean here, I made sure to call std.array.array to force allocation.
>
> In the article:
>     auto means = 100_000.iota <---- 100_000.iota is lazy range
>         .sliced(100, 1000)
>         .transposed
>         .map!(r => sum(r) / r.length)
>         .array;               <---- allocation of the result
>
> In GitHub:
>     means = data           <---- data is allocated array, it is fair test for real world
>         .sliced(100, 1000)
>         .transposed
>         .map!(r => sum(r, 0L) / cast(double) r.length)
>         .array;             <---- allocation of the result
>  -- Ilya

I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, and sense most std.ndslice calculations are going to be on the stack, I believe my benchmark is indicative of normal use.

On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote: > On Sunday, 3 January 2016 at 00:17:23 UTC, Ilya Yaroshenko wrote: >> [...] > > I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, and sense most std.ndslice calculations are going to be on the stack, I believe my benchmark is indicative of normal use. No, for real world math calculations, most std.ndslice will be at global memory. Examples: all SciD (if port it to ndslice), future BLAS, future LAPACK. -- Ilya

On Saturday, 2 January 2016 at 19:49:05 UTC, Jack Stouffer wrote: > http://jackstouffer.com/blog/nd_slice.html > > https://www.reddit.com/r/programming/comments/3z6f7a/using_d_and_stdndslice_as_a_numpy_replacement/ Nicely written, good to see you explain all the code, enjoyed reading it.

On 1/2/16 6:24 PM, Ilya Yaroshenko wrote: > On Saturday, 2 January 2016 at 23:23:38 UTC, Ilya Yaroshenko wrote: >> On Saturday, 2 January 2016 at 19:49:05 UTC, Jack Stouffer wrote: >>> http://jackstouffer.com/blog/nd_slice.html >>> >>> https://www.reddit.com/r/programming/comments/3z6f7a/using_d_and_stdndslice_as_a_numpy_replacement/ >>> >> >> I just wanted to write to you that dip80-ndslice was moved to mir >> http://code.dlang.org/packages/mir >> >> "dependencies": { >> "dip80-ndslice": "~>0.8.7" >> }, >> >> Ilya > > EDIT: > > "dependencies": { > "mir": "~>0.9.0-beta" > } What is the relationship between mir and std.experimental.ndslice? -- Andrei

On Sunday, 3 January 2016 at 23:18:16 UTC, Andrei Alexandrescu wrote: > On 1/2/16 6:24 PM, Ilya Yaroshenko wrote: >> On Saturday, 2 January 2016 at 23:23:38 UTC, Ilya Yaroshenko wrote: >>> On Saturday, 2 January 2016 at 19:49:05 UTC, Jack Stouffer wrote: >>>> http://jackstouffer.com/blog/nd_slice.html >>>> >>>> https://www.reddit.com/r/programming/comments/3z6f7a/using_d_and_stdndslice_as_a_numpy_replacement/ >>>> >>> >>> I just wanted to write to you that dip80-ndslice was moved to mir >>> http://code.dlang.org/packages/mir >>> >>> "dependencies": { >>> "dip80-ndslice": "~>0.8.7" >>> }, >>> >>> Ilya >> >> EDIT: >> >> "dependencies": { >> "mir": "~>0.9.0-beta" >> } > > What is the relationship between mir and std.experimental.ndslice? -- Andrei 1. mir.ndslice is a developer version of std.experimental.ndslice 2. mir can be used with DMD front end >= 2.068, so ndslice can be used with LDC 0.17.0-alpha. It is important for benchmarks. 3. mir is going to be a testing package for the future std.la (generic BLAS implementation) -- Ilya

On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote: > I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, […] What leads you to this belief? (Beyond cache locality considerations, which are not so important if the data is large.) > and sense most std.ndslice calculations are going to be on the stack, I believe my benchmark is indicative of normal use. Your iota example does not read the data from memory at all (neither stack nor heap), instead computing it on the fly. — David

On 1/3/16 6:52 PM, Ilya wrote: > 1. mir.ndslice is a developer version of std.experimental.ndslice > 2. mir can be used with DMD front end >= 2.068, so ndslice can be used > with LDC 0.17.0-alpha. It is important for benchmarks. > 3. mir is going to be a testing package for the future std.la (generic > BLAS implementation) The care you show for your users is impressive. We should take a page from your book, hopefully with your own help. Thanks! -- Andrei

On Monday, 4 January 2016 at 00:24:51 UTC, David Nadlinger wrote: > On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote: >> I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, […] > > What leads you to this belief? (Beyond cache locality considerations, which are not so important if the data is large.) The example in the article and the example I submitted to DlangScience/examples have very different speeds: https://github.com/DlangScience/examples/blob/master/mean_of_columns.d Article example: 5 µs DlangScience example: 145 µs Both when compiled with LDC

On Monday, 4 January 2016 at 01:03:49 UTC, Jack Stouffer wrote: > On Monday, 4 January 2016 at 00:24:51 UTC, David Nadlinger wrote: >> On Sunday, 3 January 2016 at 18:56:07 UTC, Jack Stouffer wrote: >>> I still have to disagree with you that the example I submitted was fair. Accessing global memory in D is going to be much slower than accessing stack memory, […] >> >> What leads you to this belief? (Beyond cache locality considerations, which are not so important if the data is large.) > > The example in the article and the example I submitted to DlangScience/examples have very different speeds: https://github.com/DlangScience/examples/blob/master/mean_of_columns.d > > Article example: 5 µs > DlangScience example: 145 µs > > Both when compiled with LDC To be clear: there is NO data in Article example. Only CPU registers are used. It is not fair. -- Ilya

On Monday, 4 January 2016 at 01:09:30 UTC, Ilya wrote: > To be clear: there is NO data in Article example. Only CPU registers are used. It is not fair. -- Ilya Ok, I see were I made the mistake, I apologize. I believed that since I was only testing the np.mean line of code, that the lazy generation of the D in the D version would have no effects on the comparison. But I forgot the fact that the lazy generation would mean that no memory needed to be accessed in the D code and therefore it's not apples to apples with the Numpy code. Thank you for clarifying this, I will update the article.

Forums