protocol for using InputRanges (page 12) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » protocol for using InputRanges (page 12)

March 28, 2014

Re: protocol for using InputRanges

Posted by Johannes Pfau
in reply to Walter Bright

Johannes Pfau

Posted in reply to Walter Bright

Am Thu, 27 Mar 2014 17:20:25 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
> > On 3/27/14, 2:24 PM, Walter Bright wrote:
> >> The range protocol is designed to work with streams.
> >
> > It's designed to work with containers.
> 
> I know we talked about streams when we designed it.
> 
> 
> >> It's a giant fail
> >> if they do not, or if you want to create a separate, non-range
> >> universe to deal with streams.
> >
> > It's not a giant fail, we just need to adjust the notion.
> 
> Are you suggesting that ranges needn't support streams?
> 
> Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required.

Ranges have equivalents in other languages:
iterators in c++,
IEnumerator in c#,
Iterator in java

all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.

March 28, 2014

Re: protocol for using InputRanges

Posted by Paolo Invernizzi
in reply to Walter Bright

Paolo Invernizzi

Posted in reply to Walter Bright

On Friday, 28 March 2014 at 00:41:42 UTC, Walter Bright wrote:
> On 3/27/2014 3:23 PM, QAston wrote:
>> The protocol is not intuitive.
>
> I find empty-front-popFront as perfectly intuitive. I don't find the counter proposals, which come with baggage like constructors that may fail, and front() that may fail in unspecified ways, or throwing entire paradigms out the window, as intuitive.
>
> But I concede that other people think differently. Not everyone thinks the same. But consider this: floating point math is not intuitive. There has never been a shortage of proposals to make fp intuitive, but they've all failed because they are impractical.
>
> Sometimes ya gotta go with what works.

I _strongly_ agree with Walter: people learning D in my groups have no problems with the empty-front-popFront sequence.

Please don't complicate or change the notion of range: you can find an adjustment that don't break code, but for sure that will break the mindset of people.

For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.

--
Paolo

March 28, 2014

Re: protocol for using InputRanges

Posted by RivenTheMage
in reply to Andrei Alexandrescu

RivenTheMage

Posted in reply to Andrei Alexandrescu

On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu
wrote:
> On 3/27/14, 1:58 PM, Walter Bright wrote:
>> On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
>>> Yah, agreed. -- Andrei
>>
>> Completely unworkable. To determine if a range is empty or not, it may
>> have to actually read from its input. TTYs are an example. Although
>> empty may then cache the result, and not read the second time it is
>> called, an observer of TTY could see that an item was read from the TTY.
>
> That's a good point too.
>
> Andrei

Maybe new kind of range can help?
Which has "bool popFront()" instead of "bool empty() + void
popFront()".

March 28, 2014

Re: protocol for using InputRanges

Posted by Regan Heath
in reply to Paolo Invernizzi

Regan Heath

Posted in reply to Paolo Invernizzi

On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi <paolo.invernizzi@no.address> wrote:
> For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.

This is actually not true.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

March 28, 2014

Re: protocol for using InputRanges

Posted by Walter Bright
in reply to Johannes Pfau

Walter Bright

Posted in reply to Johannes Pfau

On 3/28/2014 1:32 AM, Johannes Pfau wrote:
> Ranges have equivalents in other languages:
> iterators in c++,
> IEnumerator in c#,
> Iterator in java
>
> all these languages have special stream types for raw data. I don't
> think it's bad if we also have streams/ranges separate in D.

Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?

empty-front-popFront works with streams and non-streams. Why break this?

March 28, 2014

Re: protocol for using InputRanges

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
>> Adding range primitives on top of a stream does not make sense.
>
> Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone?

Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges.

> I'm also curious what a generic read() should do when the stream is empty.

Returns 0 elements read.

-Steve

March 28, 2014

Re: protocol for using InputRanges

Posted by Marc Schütz
in reply to monarch_dodra

Marc Schütz

Posted in reply to monarch_dodra

On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:
> On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu wrote:
>> On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
>>> On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>>> On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
>>>>> BTW, I think there has been recent talk about not focusing on dust when
>>>>> we are laying bricks. This would have my vote as useless dust. This does
>>>>> not solve the problem of streams-as-ranges, because streams don't make
>>>>> good ranges. It doesn't really solve any problems as far as I can tell.
>>>>
>>>> I think byXchar are important and are not streams. -- Andrei
>>>
>>> What's broken about those?
>>
>> Speed.
>
> I call shenanigans. This is the code:
>
>             @property bool empty()
>             {
>                 if (nLeft)
>                     return false;
>                 if (r.empty)
>                     return true;
>
> If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks.
>

... but of course lose laziness.

> Furthermore, popFront() has a guaranteed "1 call" per element. Weareas "empty" and "front" may be called several times in a row for a single element.
>
> If you want efficiency, stop being "member-wise" lazy, and put some eagerness in your code.
>

Does a flag "isInitialized" even have an effect on performance where it matters?

It should only be relevant in tight loops, and there I expect a moderately well-working optimizer to inline calls to `empty` and `front`. It can then see that it has just set "isInitialized" to true/false, and remove the checks completely, effectively generating code as if the value were only fetched/computed in `empty`. That only leaves the larger struct size as a potential problem. However I've seen LDC being able to "decompose" a struct into separate variables, so I guess it can remove the unnecessary member in our case.

In other situations, where you already have more complex code, an additional conditional branch will not have as much weight anyway.

Does somehow have numbers for realistic programs, with GDC/LDC/DMD? I suspect this entire discussion is moot, because we can easily have non-eagerly initialized ranges without sacrificing performance in most situations.

March 28, 2014

Re: protocol for using InputRanges

Posted by Chris
in reply to Walter Bright

Chris

Posted in reply to Walter Bright

On Friday, 28 March 2014 at 04:58:28 UTC, Walter Bright wrote:
> On 3/27/2014 3:21 AM, Chris wrote:
>> I agree. I've been using ranges for a while now and have tried
>> different implementations based on advice given on this forum and
>> depending on each case. After reading this thread I looked at
>> some of my ranges and I have to say I still have no clue as to
>> what should and should _not_ be done (regardless of whether you
>> _can_ do it). For a while I thought that it's my lack of
>> understanding, but this thread shows that everyone has a
>> different view of ranges. Guidelines with use cases would be
>> great. I remember I mentioned this in another thread already. The
>> thing is that experimenting without proper guidelines leaves your
>> code in an inconsistent state where you have two or more ranges
>> doing technically the same thing but each with a different logic.
>
> Following the protocol empty-front-popFront always works.
>
> Where the differences come in is when people skip calling empty.

Not only there. The whole issue of front and popFront has not been made clear yet (unless it's my lack of understanding). What should and should not be done there, even if you stick to the basic protocol (!empty > front > popFront).

I like the concept of ranges and use them a lot, because they help me to create independent components. But sometimes I think they are just glorified for-loops in the sense that we know more or less what the input is, and part of the reason why we have this thread is that we want everything to be specialized and generic at the same time. E.g.

@monarch_dodra wrote:

"It's not that *you* know a particular range doesn't need "empty"
called, it's that the algorithm you are using has already
previously validated there are elements in it.

For example, the splitter algorithm will first save a copy of its
range, and then walk it, searching for the "splitting" elements.
It then realizes it has walked N elements."

But then it's no longer generic. If your range depends on another algorithm (that you assume or know to have checked something), then it's no longer generic. In many cases "generic" means an overhead of checks, and is anathema to performance tuning. I think it's the issue of generic vs. specialized / optimized that started this thread. And this is a tough one to sort out.

March 28, 2014

Re: protocol for using InputRanges

Posted by w0rp
in reply to Chris

w0rp

Posted in reply to Chris

I think something has gone wrong in this discussion somewhere. It is like this.

Do not assume .empty is const, this is unreasonable.

Do not assume .front is const, this is unreasonable.

.popFront has the only high-level observable difference in removing something from the front of an InputRange.

I say it is unreasonable to expect .empty or .front to be const because a range which is strictly an InputRange (not a ForwardRange...) is likely to be doing some kind of work where stepping through its sequence of data is destructive. As soon as the thing is read in the implementation of the InputRange, you can't read that thing again. You've cached it or it is gone. Thus .empty and .front must be non-const and lazily evaluate caching the next value.

I do not see it as a major issue to check .empty or call .front many times, even with this caching behaviour. Commonly this is as simple as checking whether or not you still have a value to provide inside the InputRange. We use assert(!empty) etc. because honestly I cannot imagine another way to make this safe.

If you really, really hate this double-checking in .empty or .front, you could try to change InputRanges so they behave like Java Iterators. I believe this is a really bad idea. You have to start thinking about returning nullable types. You have to push the job of caching values on to users of the InputRanges, as in each and every algorithm. It's just a really bad choice, and with the kind of boxing you'll need to do for nullable types, you'll surely end off worse as far as performance goes anyway.

Really, the range protocol is as Walter specified at the start of this thread. The documentation should reflect how .empty or .front may result in caching the next value. I really don't think this is a major performance problem worth worrying about, though. If it's really, really worth it for certain standard algorithms, perhaps something can be done in private range methods or similar to offer some kind of speed boost. ('package' privacy for std.* is feasible.) I don't think there is a lot to be gained from this in general, though.

Streams are different, and may be more appropriate for some functions for performance reasons. I still prefer ranges for code where performance is less critical, because if you aren't very worried about performance, the range API is a lot nicer. I'm not dead set on maximum performance if it makes life a lot harder in my code, personally. I view these kinds of improvements as premature optimisations if I'm writing something new.

March 28, 2014

Re: protocol for using InputRanges

Posted by Chris
in reply to w0rp

Chris

Posted in reply to w0rp

Earlier Walter wrote:

"I don't like being in the position of when I need high performance code, I have
to implement my own ranges & algorithms, or telling customers they need to do so."

I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation