protocol for using InputRanges (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » protocol for using InputRanges (page 2)

March 23, 2014

Re: protocol for using InputRanges

Posted by Szymon Gatner
in reply to Szymon Gatner

Szymon Gatner

Posted in reply to Szymon Gatner

On Sunday, 23 March 2014 at 09:34:28 UTC, Szymon Gatner wrote:
> On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
>>
>> 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?
>
> IMO: yes. Logic of empty() sould be const and not have side effects.
>
>>
>> If this is true, extra logic will need to be added to r.front in many cases.
>>
>> 2. Can r.front be called n times in a row? I.e. is calling front() destructive?
>>
>> If true, this means that r.front will have to cache a copy in many cases.
>
> Yes, all true. Not sure if there is something like that in Phobos but if for example you had RandomValueRange, ever call to front() should return the same random number until popFront() is called at which point internal front variable is being recalculated and cached.
>
> Since only popFront() is non-const, it is there where all the magic/mutation should happen.
>
> In my code I also have CachingRange which calls front() on underlying range the first time and then stores result internally. I use it for ranges which generate front() on the fly and I knot it is expensive. I could cache that calculation directly in that underlying range but CachingRange is reusable.

I tend to think about pair of C++ iterators, which are generalization of pointers. So for C pointers 'first' and 'last':

*first -> front()
first != last -> empty()
++first -> popFont()

And I stick to semantics.

March 23, 2014

Re: protocol for using InputRanges

Posted by Joseph Rushton Wakeling

Joseph Rushton Wakeling

On 23/03/14 08:53, Jonathan M Davis wrote:
> But again, front and empty should normally function as if they were variables.
> They should be property functions and should be pure (or at least act like
> they're pure). I'm sure that a _lot_ of code will break if that isn't
> followed.

There are some not-very-nice exceptions to that in std.random, where in many of the range types you have stuff along the following lines:

    auto front() @property
    {
        if (notInitialized)
        {
            init();
        }
        return _value;
    }

see e.g. MersenneTwister, RandomSample, and others.  It's all a nasty consequence of that RNGs-as-value-types problem we've discussed previously.

However, the class-based std.random2 that is being discussed on the announce list fixes that:
http://forum.dlang.org/thread/cyytvhixkqlbwkmiugex@forum.dlang.org

> In general though, I think that most of us would agree that front and empty
> should be treated as properties - i.e. as if they were variables - and that
> they should have try to stick to those semantics as closely as possible.
> Ranges that stray from that seriously risk not working with a lot of range-
> based code.

Yes, absolutely.

March 23, 2014

Re: protocol for using InputRanges

Posted by Timon Gehr
in reply to Walter Bright

Timon Gehr

Posted in reply to Walter Bright

On 03/23/2014 01:50 AM, Walter Bright wrote:
>
> 2. Can r.front be called n times in a row? I.e. is calling front()
> destructive?
>
> If true, this means that r.front will have to cache a copy in many cases.

An analogous problem exists and is more severe for RandomAccessRanges.

import std.algorithm, std.range;
class Obj{ this(int){} }
void main(){
    auto x=[1,2,3].map!(a=>new Obj(a));
    auto a=x.front;
    auto b=x.front;
    auto c=x[0];
    auto d=x[0];
    assert(a is b); // fail
    assert(b is c); // fail
    assert(c is d); // fail
}

March 23, 2014

Re: protocol for using InputRanges

Posted by Tommi
in reply to Walter Bright

Tommi

Posted in reply to Walter Bright

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
> It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is:
>
>     while (!r.empty) {
>         auto e = r.front;
>         ... do something with e ...
>         r.popFront();
>     }
>
> no argument there. But there are two issues:
>
> 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?
>
> If this is true, extra logic will need to be added to r.front in many cases.
>
> 2. Can r.front be called n times in a row? I.e. is calling front() destructive?
>
> If true, this means that r.front will have to cache a copy in many cases.

Is InputRange supposed to be a one-pass range and am I supposed to able to use InputRange as a lightweight wrapper for a stream? If the answer is yes, then I think the fundamental issue is that the empty-front-popFront interface is not optimal for something like InputRange. But given that's the interface we have, I think that InputRange's front must be allowed to be destructive because the stream it could potentially be wrapping is destructive.

March 23, 2014

Re: protocol for using InputRanges

Posted by Tommi
in reply to Jonathan M Davis

Tommi

Posted in reply to Jonathan M Davis

On Sunday, 23 March 2014 at 07:54:12 UTC, Jonathan M Davis wrote:
>
> If calling front were destructive, that would break a lot of code.

I thought that "breaking existing code" meant either "causing existing code do something it wasn't supposed to do" or "causing existing code not compile", but you're using it in the meaning of "making existing code not conform to the language specification". Are you sure that's the correct use of the expression?

March 24, 2014

Re: protocol for using InputRanges

Posted by Steven Schveighoffer
in reply to Tommi

Steven Schveighoffer

Posted in reply to Tommi

On Sun, 23 Mar 2014 09:00:17 -0400, Tommi <tommitissari@hotmail.com> wrote:

> On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
>> It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is:
>>
>>     while (!r.empty) {
>>         auto e = r.front;
>>         ... do something with e ...
>>         r.popFront();
>>     }
>>
>> no argument there. But there are two issues:
>>
>> 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?
>>
>> If this is true, extra logic will need to be added to r.front in many cases.
>>
>> 2. Can r.front be called n times in a row? I.e. is calling front() destructive?
>>
>> If true, this means that r.front will have to cache a copy in many cases.
>
> Is InputRange supposed to be a one-pass range and am I supposed to able to use InputRange as a lightweight wrapper for a stream? If the answer is yes, then I think the fundamental issue is that the empty-front-popFront interface is not optimal for something like InputRange. But given that's the interface we have, I think that InputRange's front must be allowed to be destructive because the stream it could potentially be wrapping is destructive.

A range interface only works for a buffered stream, which naturally allows caching.

-Steve

March 24, 2014

Re: protocol for using InputRanges

Posted by Steven Schveighoffer
in reply to Szymon Gatner

Steven Schveighoffer

Posted in reply to Szymon Gatner

On Sun, 23 Mar 2014 05:34:26 -0400, Szymon Gatner <noemail@gmail.com> wrote:

> On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
>>
>> 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?
>
> IMO: yes. Logic of empty() sould be const and not have side effects.

Here is the crux of the issue. I think aside from Walter's question, we have situations where it is expensive to construct each value of a range. For a range that is never used, that is an expense that you don't have to pay.

There was a post a little while ago about how File.byLine reads the first line on construction. Consider the case where stdin.byLine needs to be constructed, but not *iterated* before writing some information to stdout. This gives us a predicament that the first line has to be available before you can output any instructions. As we all know, stdin will block on first read, unless you piped in a file or something. This makes an interactive program simply incorrect.

From the library point of view, caching the first line on construction is the most sensible thing -- This allows empty, front, and popFront to all be consistent across the entire iteration. Making empty or front do something different on the first access requires awkward machinery (a boolean indicating it should do so, plus a check every call thereafter). But from the user's point of view, sometimes you want lazy access to the first element, for logical reasons.

I think in light of such possibilities, empty and front should not have to be const or pure. Logically, it can be viewed that way, but not technically.

-Steve

March 24, 2014

Re: protocol for using InputRanges

Posted by monarch_dodra
in reply to Timon Gehr

monarch_dodra

Posted in reply to Timon Gehr

On Sunday, 23 March 2014 at 11:52:53 UTC, Timon Gehr wrote:
> On 03/23/2014 01:50 AM, Walter Bright wrote:
>>
>> 2. Can r.front be called n times in a row? I.e. is calling front()
>> destructive?
>>
>> If true, this means that r.front will have to cache a copy in many cases.
>
> An analogous problem exists and is more severe for RandomAccessRanges.
>
> import std.algorithm, std.range;
> class Obj{ this(int){} }
> void main(){
>     auto x=[1,2,3].map!(a=>new Obj(a));
>     auto a=x.front;
>     auto b=x.front;
>     auto c=x[0];
>     auto d=x[0];
>     assert(a is b); // fail
>     assert(b is c); // fail
>     assert(c is d); // fail
> }

I have an open proposal for a "cache" range adaptor that somewhat eliviates the problem:
https://github.com/D-Programming-Language/phobos/pull/1364

But it restricts the range to bidirectional (for the reasons above).

Actually, it's mimited to Bidir, but if you *do* use it as bidirectional, then 1 element will be evaluated twice. So I'm wondering if it might not be better to simply restrict it to forward...

March 24, 2014

Re: protocol for using InputRanges

Posted by w0rp
in reply to Szymon Gatner

w0rp

Posted in reply to Szymon Gatner

On Sunday, 23 March 2014 at 09:38:43 UTC, Szymon Gatner wrote:
> On Sunday, 23 March 2014 at 09:26:53 UTC, w0rp wrote:
>> I understand it like this.
>>
>> * empty - Are there no more values?
>> * front - Get me the current value.
>> * popFront - Advance to the next value.
>
> That is correct.
>
>>
>> In terms of how I implement an InputRange in general, I typically end up with this.
>>
>> * empty - Advance and cache "current value," return true if we ran out of values.
>> * front - enforce(!empty), which in turn caches the current value, which we then return.
>> * popFront - enforce(!empty), clear the cached value so we can later advance.
>>
>> So .front gives you the same thing again and again until you call popFront, you could indeed call .front before .empty, but you may get an exception. This obviously isn't how I implement all InputRanges, as there are better ways to write other ranges, such as ranges which operate on sets of integers or wrap arrays. This is just my last resort general case InputRange implementation.
>>
>> .front assignment obviously replaces the cached value.
>
> That is not consistent with the first part of your reply and is incorrect imho. Calling empty() twice should not destroy anything, even according to your understanding. Neither should front(). User should be able to call them as many times as he wishes getting same value every time. Only popFront() mutate the range.

You read wrong. Say you have this sequence of three numbers and 'x' represents the cached value being empty, for a range 'r.'

---
1 2 3
x

r.empty

1 2 3
^

r.empty

1 2 3
^

r.front == 1

r.popFront()

1 2 3
  x

r.empty

1 2 3
  ^
---

front and popFront each enforce(!empty) first, so this works, etc.

---
1 2 3
x

r.popFront()

# enforce(!empty);

1 2 3
^

# Clear the cached value.

1 2 3
  x

---

So it works for any sequence of values, and you can call .empty and .front as much as you want without changing the result. It just so happens that it's actually .empty which does the work of actually fetching the next value. There are better ways to implement this for certain ranges, but this works for anything.

March 24, 2014

Re: protocol for using InputRanges

Posted by Dicebot
in reply to Walter Bright

Dicebot

Posted in reply to Walter Bright

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
> It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is:
>
>     while (!r.empty) {
>         auto e = r.front;
>         ... do something with e ...
>         r.popFront();
>     }
>
> no argument there. But there are two issues:
>
> 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?
>
> If this is true, extra logic will need to be added to r.front in many cases.
>
> 2. Can r.front be called n times in a row? I.e. is calling front() destructive?
>
> If true, this means that r.front will have to cache a copy in many cases.

I think there is one design mistake with current InputRange rules that makes usage so inconsistent. We have `empty` but don't have any distinct `not yet started` state. If calling `popFront` at least once was required before accessing `front`, I can't imagine the case where having any non-trivial code in `front` would have been necessary.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation