Lazy caching map()?

Re: Lazy caching map | Mir version

Mar 19, 2018

Today I found myself needing a lazy, caching version of map() on an array. More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that: - result[i] == func(src[i]), for 0 ≤ i < src.length. - If result[j] is never actually used, func(src[j]) is never invoked (lazy). - If result[j] is referenced multiple times, a cached value is returned after the first time, i.e., the result of func(src[j]) is cached, and func(src[j]) is never invoked more than once. I couldn't figure out how to build this using Phobos primitives, so I wrote my own implementation of it. Pretty simple, really, it's just a wrapper struct that lazily initializes an array of Nullable!T and lazily populates it with func(j) when opIndex(j) is invoked, and just returns the cached value if opIndex(j) has been invoked before. Can this be done using current Phobos primitives? T -- Don't throw out the baby with the bathwater. Use your hands...

On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote: > Today I found myself needing a lazy, caching version of map() on an array. More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that: > > - result[i] == func(src[i]), for 0 ≤ i < src.length. > > - If result[j] is never actually used, func(src[j]) is never invoked > (lazy). > > - If result[j] is referenced multiple times, a cached value is returned > after the first time, i.e., the result of func(src[j]) is cached, and > func(src[j]) is never invoked more than once. > > I couldn't figure out how to build this using Phobos primitives, so I wrote my own implementation of it. Pretty simple, really, it's just a wrapper struct that lazily initializes an array of Nullable!T and lazily populates it with func(j) when opIndex(j) is invoked, and just returns the cached value if opIndex(j) has been invoked before. > > Can this be done using current Phobos primitives? > > > T Unless I'm misunderstanding your requirements, are you looking for std.functionl.memoize? auto fun(int a) { writeln("here"); return a + 1; } void main() { auto a = [1, 2, 3]; auto b = a.map!(memoize!fun); writeln(b[1]); writeln(b[1]); writeln(b[1]); } will print "here" just once.

March 19, 2018

Re: Lazy caching map | Mir version

Posted by 9il
in reply to H. S. Teoh

Permalink

9il

Posted in reply to H. S. Teoh

Permalink

On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
> Today I found myself needing a lazy, caching version of map() on an array.  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
>
> - result[i] == func(src[i]), for 0 ≤ i < src.length.
>
> - If result[j] is never actually used, func(src[j]) is never invoked
>   (lazy).
>
> - If result[j] is referenced multiple times, a cached value is returned
>   after the first time, i.e., the result of func(src[j]) is cached, and
>   func(src[j]) is never invoked more than once.
>
> Can this be done using current Phobos primitives?
>
>
> T

Phobos implementation based on memoize would use hash map internally, which is slow for this use case.

mir-algorithm v0.9.5 (DUB would be updated soon) got `cached` and `cachedGC` topologies.

Example:

// Mir's map is required
import mir.ndslice: cachedGC, map, sliced;

auto ar = new int[5];
// init ar ...
auto cachedLazyMap = ar.map!fun.cachedGC;

`cachedGC` [1] internally allocates a vector of caches and a packed bitarray.

`cached` [2] allows to reuse already allocated buffers.

The result support all random access range primitive, slicing and etc.

ndslice architecture allows to add new random access topologies very easily.

[1] http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.cachedGC
[2] http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.cached

Best regards,
Ilya Yaroshenko

On Monday, 19 March 2018 at 16:47:16 UTC, Graham Fawcett wrote: > On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote: >> Today I found myself needing a lazy, caching version of map() on an array. More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that: >> >> [...] > > Map the sources to std.parallelism.Task instances, and yieldForce() the instances when you need the results? On second thought, that won't work. AFIAK there's no way to take an unscheduled task and force it to execute on the current thread immediately and yield the result. But with such an operation, you could have a thread-local future/promise type that would meet your needs. Graham

Forums