Thread overview
Lazy caching map()?
Mar 09, 2018
H. S. Teoh
Mar 10, 2018
aliak
Re: Lazy caching map | Mir version
Mar 19, 2018
9il
Mar 19, 2018
Graham Fawcett
Mar 19, 2018
Graham Fawcett
March 09, 2018
Today I found myself needing a lazy, caching version of map() on an array.  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:

- result[i] == func(src[i]), for 0 ≤ i < src.length.

- If result[j] is never actually used, func(src[j]) is never invoked
  (lazy).

- If result[j] is referenced multiple times, a cached value is returned
  after the first time, i.e., the result of func(src[j]) is cached, and
  func(src[j]) is never invoked more than once.

I couldn't figure out how to build this using Phobos primitives, so I wrote my own implementation of it.  Pretty simple, really, it's just a wrapper struct that lazily initializes an array of Nullable!T and lazily populates it with func(j) when opIndex(j) is invoked, and just returns the cached value if opIndex(j) has been invoked before.

Can this be done using current Phobos primitives?


T

-- 
Don't throw out the baby with the bathwater. Use your hands...
March 10, 2018
On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
> Today I found myself needing a lazy, caching version of map() on an array.  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
>
> - result[i] == func(src[i]), for 0 ≤ i < src.length.
>
> - If result[j] is never actually used, func(src[j]) is never invoked
>   (lazy).
>
> - If result[j] is referenced multiple times, a cached value is returned
>   after the first time, i.e., the result of func(src[j]) is cached, and
>   func(src[j]) is never invoked more than once.
>
> I couldn't figure out how to build this using Phobos primitives, so I wrote my own implementation of it.  Pretty simple, really, it's just a wrapper struct that lazily initializes an array of Nullable!T and lazily populates it with func(j) when opIndex(j) is invoked, and just returns the cached value if opIndex(j) has been invoked before.
>
> Can this be done using current Phobos primitives?
>
>
> T

Unless I'm misunderstanding your requirements, are you looking for std.functionl.memoize?

auto fun(int a) {
    writeln("here");
    return a + 1;
}

void main() {
    auto a = [1, 2, 3];
    auto b = a.map!(memoize!fun);

    writeln(b[1]);
    writeln(b[1]);
    writeln(b[1]);
}

will print "here" just once.
March 19, 2018
On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
> Today I found myself needing a lazy, caching version of map() on an array.  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
>
> - result[i] == func(src[i]), for 0 ≤ i < src.length.
>
> - If result[j] is never actually used, func(src[j]) is never invoked
>   (lazy).
>
> - If result[j] is referenced multiple times, a cached value is returned
>   after the first time, i.e., the result of func(src[j]) is cached, and
>   func(src[j]) is never invoked more than once.
>
> Can this be done using current Phobos primitives?
>
>
> T

Phobos implementation based on memoize would use hash map internally, which is slow for this use case.

mir-algorithm v0.9.5 (DUB would be updated soon) got `cached` and `cachedGC` topologies.

Example:

// Mir's map is required
import mir.ndslice: cachedGC, map, sliced;

auto ar = new int[5];
// init ar ...
auto cachedLazyMap = ar.map!fun.cachedGC;

`cachedGC` [1] internally allocates a vector of caches and a packed bitarray.

`cached` [2] allows to reuse already allocated buffers.

The result support all random access range primitive, slicing and etc.

ndslice architecture allows to add new random access topologies very easily.

[1] http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.cachedGC
[2] http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.cached

Best regards,
Ilya Yaroshenko

March 19, 2018
On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
> Today I found myself needing a lazy, caching version of map() on an array.  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
>
> [...]

Map the sources to std.parallelism.Task instances, and yieldForce() the instances when you need the results?

Graham
March 19, 2018
On Monday, 19 March 2018 at 16:47:16 UTC, Graham Fawcett wrote:
> On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
>> Today I found myself needing a lazy, caching version of map() on an array.  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
>>
>> [...]
>
> Map the sources to std.parallelism.Task instances, and yieldForce() the instances when you need the results?

On second thought, that won't work. AFIAK there's no way to take an unscheduled task and force it to execute on the current thread immediately and yield the result. But with such an operation, you could have a thread-local future/promise type that would meet your needs.

Graham