Truly lazy ranges, transient .front, and std.range.Generator

Aug 15, 2015

Joseph Rushton Wakeling

Aug 16, 2015

Alex Parrill

Aug 18, 2015

Joseph Rushton Wakeling

Aug 18, 2015

HaraldZealot

Aug 18, 2015

Joseph Rushton Wakeling

Aug 18, 2015

HaraldZealot

Aug 18, 2015

Joseph Rushton Wakeling

Hello all, One of the design considerations I've been mulling over recently, in terms of random number generation functionality, is that while we talk of ranges being lazily evaluated, in fact this isn't strictly true. Most ranges are in fact a mix of lazy and eager: lazy in their use of popFront, but eager in that the current value of .front is always pre-calculated, usually starting with the first value being calculated in the constructor. This is fine for many ranges, whose outcome is in any case deterministic, but it's not appropriate for many other cases, where the appropriate moment of evaluation of 'front' is arguably _the moment where front is called_. This kind of requirement clearly holds anywhere where the values of the range depend on some kind of external input, and that arguably should include things like sources of randomness (even if they are, under the hood, only pseudo-random). One possible support for this is given by std.range.Generator, which implements a very simple wrapper of a callable entity via the following range primitives: enum empty = false; auto ref front() @property { return fun(); } void popFront() { } Now, first, I should say that I think this is a good example of how the classification and concept of ranges is incomplete -- there is a case for an even simpler range than InputRange, one which _just_ has .empty and .front defined, and which we might call a TransientRange -- but second, it's problematic, inasmuch as it's an incomplete solution to the problem of truly lazy ranges. In some cases we're going to want true laziness of evaluation, but _not_ transience of .front. In these cases, the _first_ time .front is called, its value should be freshly evaluated; but thereafter, the value is cached, until .popFront() is called, at which point .front will be re-evaluated lazily the next time it's called. Something like std.algorithm.cache is inappropriate here, precisely because it's eager in its calculation of the cached values. As far as I can see, we don't have a valid solution for this case. We have some functionality in std.random -- e.g. randomCover and randomSample -- which implements rather shaky workarounds for that lack. I can't imagine this isn't a known case/problem, but as far as I can see any discussion of it has been more on GitHub than here. So before rushing off with Yet Another Custom Solution, I thought I'd raise the issue here, to ask what the current state of thought is on these kinds of challenge, and what may be upcoming in people's roadmaps. Thanks & best wishes, -- Joe

On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton Wakeling wrote: > ... I had this issue recently when reading from a command-line-style TCP connection; I needed to read the line up to the \n separator, but consuming the separator meant waiting for the next byte that would never arrive unless a new command was sent. So I made a wrapper range that evaluates the wrapped range's popFront only when front/empty is first called ("just in time"). Source code here: https://gist.github.com/ColonelThirtyTwo/0dfe76520efcda02d848 You can throw it in a UFCS chain anywhere except (for some reason) after something that takes a delegate template parameter like map. For example: auto reader = SocketReader(socket).joiner.jitRange.map!(byt => cast(char) byt);

On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton Wakeling wrote: > ... > > In some cases we're going to want true laziness of evaluation, but _not_ transience of .front. In these cases, the _first_ time .front is called, its value should be freshly evaluated; but thereafter, the value is cached, until .popFront() is called, at which point .front will be re-evaluated lazily the next time it's called. Something like std.algorithm.cache is inappropriate here, precisely because it's eager in its calculation of the cached values. > > ... > > -- Joe Do you mean something like that: ```d struct Range { public: enum empty = false; auto ref front() @property { if(mustBeEvaluated) { cache = fun(); mustBeEvaluated = false; } return cache; } void popFront() { mustBeEvaluated = true; } private: ReturnType!fun cache; bool mustBeEvaluated = true; } ``` ?

On 17/08/15 00:33, Alex Parrill via Digitalmars-d wrote: > On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton Wakeling wrote: >> ... > > I had this issue recently when reading from a command-line-style TCP connection; > I needed to read the line up to the \n separator, but consuming the separator > meant waiting for the next byte that would never arrive unless a new command was > sent. > > So I made a wrapper range that evaluates the wrapped range's popFront only when > front/empty is first called ("just in time"). Source code here: > https://gist.github.com/ColonelThirtyTwo/0dfe76520efcda02d848 > > You can throw it in a UFCS chain anywhere except (for some reason) after > something that takes a delegate template parameter like map. For example: > > auto reader = SocketReader(socket).joiner.jitRange.map!(byt => cast(char) > byt); Interesting, thanks for sharing that. I'm not at all surprised that an IO-based range should have similar issues to the one I describe; a comparable use-case I was thinking of was reading bytes from /dev/urandom. My own concern was whether there was any sort of generic functionality for constructing lazy-front ranges like this; I'll share more on my own approach replying to Harald Zealot below.

August 18, 2015

Re: Truly lazy ranges, transient .front, and std.range.Generator

Posted by Joseph Rushton Wakeling
in reply to HaraldZealot

Permalink

Joseph Rushton Wakeling

Posted in reply to HaraldZealot

Permalink

On 18/08/15 10:59, HaraldZealot via Digitalmars-d wrote:
> On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton Wakeling wrote:
>> ...
>>
>> In some cases we're going to want true laziness of evaluation, but _not_
>> transience of .front.  In these cases, the _first_ time .front is called, its
>> value should be freshly evaluated; but thereafter, the value is cached, until
>> .popFront() is called, at which point .front will be re-evaluated lazily the
>> next time it's called.  Something like std.algorithm.cache is inappropriate
>> here, precisely because it's eager in its calculation of the cached values.
>>
>> ...
>>
>>     -- Joe
>
> Do you mean something like that:

Yes, broadly like your example, although note that your version won't handle multiple popFront() calls in succession without any .front call in-between.

My own approach, which I knocked up earlier this month, was to create a generic mixin template that could be used to construct a lazy input range:

// ---------------------------------------------------------
mixin template LazyGen(T)
{
    T front__;

    bool evaluated__ = false;

  public:
    enum bool empty = false;

    T front() @property
    {
        if (!this.evaluated__)
        {
            this.front__ = this.evaluateFront__();
            this.evaluated__ = true;
        }

        return this.front__;
    }

    void popFront()
    {
        if (this.evaluated__)
        {
            this.evaluated__ = false;
        }
        else
        {
            this.front__ = this.evaluateFront__();
        }

        assert(!this.evaluated__);
    }
}
// ---------------------------------------------------------

... which could be mixin'd to any struct or class defining an evaluateFront__ method; although I think in retrospect I might rewrite that a bit to be more generic, not making assumptions about persistent values or the .empty condition:

// ---------------------------------------------------------
mixin template LazyPopFront()
{
    private bool evaluated__ = false;

    public T front() @property
    {
        if (!this.evaluated__)
        {
            this.popFront__();  // the 'true', private popFront
            this.evaluated__ = true;
        }

        return this.front__();  // private @property method
    }

    public void popFront()
    {
        if (this.evaluated__)
        {

            this.evaluated__ = false;
        }
        else
        {
            this.popFront__();
        }

        assert(!this.evaluated__);
    }
}
// ---------------------------------------------------------

There are probably some other subtleties that need to be taken into account here, but broadly I think the above covers the fundamental requirements for a range whose front is truly lazily evaluated.

Anyway, I'll try and follow up in the not-too-distant future with some more concrete example of how this can be used for some interesting stuff with RNGs. It would be interesting to know if the above solves any use-cases for anyone else, too.

On Tuesday, 18 August 2015 at 19:37:47 UTC, Joseph Rushton Wakeling wrote: > > Yes, broadly like your example, although note that your version won't handle multiple popFront() calls in succession without any .front call in-between. Good point! And you show acceptable solution. The last your example I prefer to parametrize with alias to functions: ```d mixin template LazyPopFront(alias frontImplementation, alias popFrontImplementation) ... ``` It gives more flexibility. One that worries me, if someone start to make odd things like create shared instance of this range.

On 18/08/15 22:47, HaraldZealot via Digitalmars-d wrote: > The last your example I prefer to parametrize with alias to functions: > > ```d > mixin template LazyPopFront(alias frontImplementation, alias > popFrontImplementation) > ... > ``` > It gives more flexibility. Yea, good call. > One that worries me, if someone start to make odd things like create shared > instance of this range. What kind of issues did you have in mind?

Forums