protocol for using InputRanges (page 15) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » protocol for using InputRanges (page 15)

March 28, 2014

Re: protocol for using InputRanges

Posted by Szymon Gatner
in reply to Tobias Müller

Szymon Gatner

Posted in reply to Tobias Müller

On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
> On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
>> On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
>>> This loop is intuitive. Not being allowed to call empty or front multiple times
>>> or not at all is unintuitive. They should not be named as if they are properties
>>> then.
>>
>> I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
>
> Disclaimer: I'm a C++ programmer just lurking here, I've never
> actually used D.
>
> I find it very counter-intuitive that 'empty' is required before
> front or popFront.
> Since 'pump priming' in the constructor isn't wanted either, i'd
> suggest the following protocol:
>
> while (popFront())
> {
>     front;
> }
>

In c++ if you had a list or deque you would obviously did if(empty()) before calling front() too, and the before a call to pop_front(). Container is kind of range and in c++ it looks exactly the same:

while (!cont.empty())
{
  auto& var = cont.front();
  cont.pop_front(); // consume
}

3 operations are needed.

March 29, 2014

Re: protocol for using InputRanges

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:
> On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright
> <newshound2@digitalmars.com> wrote:
>
>> On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
>>> Adding range primitives on top of a stream does not make sense.
>>
>> Are ready to implement a parallel universe of stream based algorithms
>> to go alongside all the range based ones and be ready to constantly
>> justify that to everyone?
>
> Not necessary. You just need to implement one range on top of a buffered
> stream, and then it works with all other algorithms that accept input
> ranges.
>
>> I'm also curious what a generic read() should do when the stream is
>> empty.
>
> Returns 0 elements read.

It increasingly seems to me we need to do this. -- Andrei

March 29, 2014

Re: protocol for using InputRanges

Posted by w0rp
in reply to Walter Bright

w0rp

Posted in reply to Walter Bright

On Friday, 28 March 2014 at 22:05:33 UTC, Walter Bright wrote:
> On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
>> Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even
>> though sys calls are amortized by C runtime, we have a function call per byte.
>> No wonder it's SLOW.
>
> How about a PR to fix it?

Yes. I hold the opinion that there is not a whole lot of reason why something like byChunk can't be fast like we might desire. If it's down to getting chunks of data the right way, whatever that is, and the problem is an IO bound problem, then I don't see why we can't submit pull requests to offer optimisations on it. I don't think we need to go down this "the way it does IO isn't fast enough so we need to replace it" route. Just optimise the existing thing more.

March 29, 2014

Re: protocol for using InputRanges

Posted by w0rp
in reply to Tobias Müller

w0rp

Posted in reply to Tobias Müller

On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
> On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
>> On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
>>> This loop is intuitive. Not being allowed to call empty or front multiple times
>>> or not at all is unintuitive. They should not be named as if they are properties
>>> then.
>>
>> I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
>
> Disclaimer: I'm a C++ programmer just lurking here, I've never
> actually used D.
>
> I find it very counter-intuitive that 'empty' is required before
> front or popFront.
> Since 'pump priming' in the constructor isn't wanted either, i'd
> suggest the following protocol:
>
> while (popFront())
> {
>     front;
> }
>
> popFront is then required to return !empty.
> 'empty' as a separate property getter can stay but is not
> required for the protocol.
>
> This way, it's clear that the work to fetch the next element is
> always done in popFront.
>
> Generally I find dependecies between functions problematic that
> require a specific call sequence. If they can be removed, they
> should.
>
> With my proposed solution, there's still one minor dependency,
> namely that front is not valid before the first popFront. This
> could be solved by again combining the two, as proposed by
> someone else in this thread.
>
> Tobi

Well, it might seem kind of weird at first that an InputRange has these concepts in three distinct methods, but they really are three separate operations. After I have spent enough time working with Java Iterators, I have found it much nicer to be able to get the current value or see if I'm at the end or not without having to worry about caching the current value myself. Also as I've said before it gives you some opportunity to make trying to fetch something out of an empty range less error prone, because you don't have to check for a null value, etc.

March 29, 2014

Re: protocol for using InputRanges

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Fri, 28 Mar 2014 20:14:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:
>> On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright
>> <newshound2@digitalmars.com> wrote:
>>
>>> On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
>>>> Adding range primitives on top of a stream does not make sense.
>>>
>>> Are ready to implement a parallel universe of stream based algorithms
>>> to go alongside all the range based ones and be ready to constantly
>>> justify that to everyone?
>>
>> Not necessary. You just need to implement one range on top of a buffered
>> stream, and then it works with all other algorithms that accept input
>> ranges.
>>
>>> I'm also curious what a generic read() should do when the stream is
>>> empty.
>>
>> Returns 0 elements read.
>
> It increasingly seems to me we need to do this. -- Andrei

It is in the works. I need to finish it. I've been incorporating Dmitry's I/O buffer into my design.

-Steve

March 29, 2014

Re: protocol for using InputRanges

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On Thu, 27 Mar 2014 19:52:52 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 3/27/2014 2:46 PM, Steven Schveighoffer wrote:
>> What I am protesting is the idea that you *know* input exists, you must still call empty.
>> Front is enough in that case.
>
> You can't use that on generic code, because generic code cannot assume that front will not fail.

There is a difference here that I want to establish.

First, I completely agree that for generic code, for when an unknown range is passed in, empty is required to be called to verify that front is valid.

Second, while I agree that empty should be required to verify front is valid on unknown ranges, it should not be required, when it is known that it's not empty.

What you are proposing is that we take advantage of the requirement for empty on unknown ranges to require it on known ones. I disagree, the code looks awkward and incorrect. When I know something is empty, calling empty again doesn't give me information. instead empty now becomes "is it empty, and if it needs priming, prime it." I'd rather see another primitive, since empty does not convey that message.

A counter proposal -- What if we create a global method in std.range called prime. prime(r) will call r.prime if it exists, otherwise calls r.empty, and ignores the result. Code that wants to work with "primable" ranges, can call that function, and it's harmless for normal ranges, but will allow primable ranges to fetch the first element.

I think penalizing all range-using code to force a non-functional empty call would 1. artificially invalidate lots of currently valid code and 2. look completely awkward in cases where it's not normally necessary.

-Steve

March 29, 2014

Re: protocol for using InputRanges

Posted by Steven Schveighoffer
in reply to Marc Schütz

Steven Schveighoffer

Posted in reply to Marc Schütz

On Fri, 28 Mar 2014 07:47:22 -0400, Marc Schütz <schuetzm@gmx.net> wrote:

> On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:

>>
>> I call shenanigans. This is the code:
>>
>>             @property bool empty()
>>             {
>>                 if (nLeft)
>>                     return false;
>>                 if (r.empty)
>>                     return true;
>>
>> If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks.
>>
>
> ... but of course lose laziness.

In this case, laziness is not critical. Decoding the element is an O(1) operation, and when looping through, you will decode it anyway.

When processing a 20 element string, the cost of checking to see if you have decoded on every empty or front call may override the front-loaded cost of decoding the first element on construction. It's sure to add to the cost if you are processing all 20 elements, since you decode them all anyway.

On other ranges, it's more important when the first element costs a lot to fetch. HOWEVER, it's not critically important to delay that unless you are not going to process that element. For example, if you are foreach'ing over all the elements, the delay doesn't matter.

I'd rather the higher level code decide whether to delay or not, depending on the situation. Requiring a protocol change for such detailed knowledge seems unbalanced.

-Steve

March 29, 2014

Re: protocol for using InputRanges

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:
>> I'm also curious what a generic read() should do when the stream is empty.
>
> Returns 0 elements read.

Meaning it must also write through a pointer if it did read something.

How is that faster than a 1 element buffer?

March 29, 2014

Re: protocol for using InputRanges

Posted by Andrei Alexandrescu
in reply to Dmitry Olshansky

Andrei Alexandrescu

Posted in reply to Dmitry Olshansky

On 3/28/14, 11:40 AM, Dmitry Olshansky wrote:
> 28-Mar-2014 22:29, Walter Bright пишет:
>> On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
>>> WAT? The overhead is in issuing system calls, you'd want to do as
>>> little of them
>>> as possible. Reading byte by byte is an exemplar of idiocy in I/O code.
>>
>> That's why we have things like byLine().
>>
>
> Which uses C's BUFFERED I/O and it reads from it byte by byte via getc.
> Even though sys calls are amortized by C runtime, we have a function
> call per byte. No wonder it's SLOW.

byLine doesn't use getc. -- Andrei

March 29, 2014

Re: protocol for using InputRanges

Posted by monarch_dodra
in reply to Andrei Alexandrescu

monarch_dodra

Posted in reply to Andrei Alexandrescu

On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu wrote:
> It increasingly seems to me we need to do this. -- Andrei

What I find funny here is that walter is saying we must enforce that empty always be called in the range - "for performance".

Yet about a year ago, he made a proposal for a "SentinelInputRange", where you could do away with empty altogether because "tl,dr: PERFORMANCE!".

http://forum.dlang.org/thread/kgmatj$1v8b$1@digitalmars.com

If you need to deviate from the rules for extra performance, then fine. Just document that it is a lower level range, with useage limitations.

However, making a global design change because *1* type of range needs it, "for performance", I'm not fine with.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation