Thread overview
Re: Lazy caching map()?
Mar 09, 2018
Jonathan M Davis
Mar 09, 2018
H. S. Teoh
Mar 13, 2018
H. S. Teoh
Mar 14, 2018
aliak
Mar 14, 2018
H. S. Teoh
March 09, 2018
On Friday, March 09, 2018 11:41:46 H. S. Teoh via Digitalmars-d wrote:
> Today I found myself needing a lazy, caching version of map() on an array.  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
>
> - result[i] == func(src[i]), for 0 ≤ i < src.length.
>
> - If result[j] is never actually used, func(src[j]) is never invoked
>   (lazy).
>
> - If result[j] is referenced multiple times, a cached value is returned
>   after the first time, i.e., the result of func(src[j]) is cached, and
>   func(src[j]) is never invoked more than once.
>
> I couldn't figure out how to build this using Phobos primitives, so I wrote my own implementation of it.  Pretty simple, really, it's just a wrapper struct that lazily initializes an array of Nullable!T and lazily populates it with func(j) when opIndex(j) is invoked, and just returns the cached value if opIndex(j) has been invoked before.
>
> Can this be done using current Phobos primitives?

Wasn't that what std.algorithm.iteration.cache was supposed to solve? I've never used it, so I don't know how well it fits what you need, but IIRC, using it with map was basically the use that it was invented for.

- Jonathan M Davis


March 09, 2018
On Fri, Mar 09, 2018 at 12:47:14PM -0700, Jonathan M Davis via Digitalmars-d wrote:
> On Friday, March 09, 2018 11:41:46 H. S. Teoh via Digitalmars-d wrote:
> > [...]  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
> >
> > - result[i] == func(src[i]), for 0 ≤ i < src.length.
> >
> > - If result[j] is never actually used, func(src[j]) is never invoked
> >   (lazy).
> >
> > - If result[j] is referenced multiple times, a cached value is returned
> >   after the first time, i.e., the result of func(src[j]) is cached, and
> >   func(src[j]) is never invoked more than once.
[...]
> > Can this be done using current Phobos primitives?
> 
> Wasn't that what std.algorithm.iteration.cache was supposed to solve? I've never used it, so I don't know how well it fits what you need, but IIRC, using it with map was basically the use that it was invented for.
[...]

The problem with cache() is that it does not return a random-access range, and for my use case random-access is mandatory, because I cannot predict ahead of time which indices will be used.  Worse yet, cache() eagerly evaluates .front and .back, which is a no-no in my case (it should not compute elements that are not asked for).

These two limitations defeats the entire purpose, since no RA means I have to iterate to find the elements I want, and eager evaluation means func() will be computed as I iterate, even if I never call .front. So basically the entire result array will be computed.

It's also not trivial to extend cache() to handle random access, because that means it has to allocate in order to store cached elements, whereas the current implementation statically allocates space in the returned range to store cached .front and .back elements.


T

-- 
An imaginary friend squared is a real enemy.
March 13, 2018
On Sat, Mar 10, 2018 at 08:54:16PM +0000, aliak via Digitalmars-d wrote:
> On Friday, 9 March 2018 at 19:41:46 UTC, H. S. Teoh wrote:
> > [...]  More precisely, given an array `T[] src` of source data and a function func(T) that's pretty expensive to compute, return an object `result` such that:
> > 
> > - result[i] == func(src[i]), for 0 ≤ i < src.length.
> > 
> > - If result[j] is never actually used, func(src[j]) is never invoked
> >   (lazy).
> > 
> > - If result[j] is referenced multiple times, a cached value is
> >   returned after the first time, i.e., the result of func(src[j]) is
> >   cached, and func(src[j]) is never invoked more than once.
[...]
> > Can this be done using current Phobos primitives?
[...]
> Unless I'm misunderstanding your requirements, are you looking for std.functionl.memoize?
> 
> auto fun(int a) {
>     writeln("here");
>     return a + 1;
> }
> 
> void main() {
>     auto a = [1, 2, 3];
>     auto b = a.map!(memoize!fun);
> 
>     writeln(b[1]);
>     writeln(b[1]);
>     writeln(b[1]);
> }
> 
> will print "here" just once.

Very nice. Using memoize did occur to me, but I needed an array interface to it.  Didn't think of using memoize with map, for some reason.  Thanks for the idea!

However, looking at the implementation of memoize, it seems that it's caching the result globally, with very little control over the cache except the size. I wonder if there's a way to control it better, i.e., free the cache once there are no more references to it, and have the cache size depend on the data size.  Basically, in my use case, once I map an array it's not predictable in advance how many elements will be accessed (i.e., it's hard to decide on an optimal cache size for memoize), but this access will happen pretty soon afterwards, and then the array will be discarded (i.e., no point holding on old entries in the cache).  I would prefer that the cache will be cleared once the mapped array has been GC'd, but memoize() seems to hold on to the cached results indefinitely.


T

-- 
Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen
March 14, 2018
On Tuesday, 13 March 2018 at 17:24:35 UTC, H. S. Teoh wrote:
> Very nice. Using memoize did occur to me, but I needed an array interface to it.  Didn't think of using memoize with map, for some reason.  Thanks for the idea!
>
> However, looking at the implementation of memoize, it seems that it's caching the result globally, with very little control over the cache except the size. I wonder if there's a way to control it better, i.e., free the cache once there are no more references to it, and have the cache size depend on the data size.  Basically, in my use case, once I map an array it's not predictable in advance how many elements will be accessed (i.e., it's hard to decide on an optimal cache size for memoize), but this access will happen pretty soon afterwards, and then the array will be discarded (i.e., no point holding on old entries in the cache).  I would prefer that the cache will be cleared once the mapped array has been GC'd, but memoize() seems to hold on to the cached results indefinitely.
>
>
> T

No worries! And ugh, can't think off the top of my head how to improve it other than to make it a type and give it scope so that it can die at some point. But that would make it uglier to use as well.

Btw, I just saw someone posted a link to an old forum post of yours: https://forum.dlang.org/post/mailman.2562.1403196857.2907.digitalmars-d@puremagic.com

Mind if I add that (or a version of it) to a library I'm writing? (it's an optional type on dub)

Cheers
- Ali
March 13, 2018
On Wed, Mar 14, 2018 at 12:12:44AM +0000, aliak via Digitalmars-d wrote: [...]
> Btw, I just saw someone posted a link to an old forum post of yours: https://forum.dlang.org/post/mailman.2562.1403196857.2907.digitalmars-d@puremagic.com
> 
> Mind if I add that (or a version of it) to a library I'm writing?
> (it's an optional type on dub)
[...]

Sure, go right ahead!  Glad you found it useful!


T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."