March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
> if(!r.empty)
> {
> auto r2 = map!(x => x * 2)(r);
> do
> {
> auto x = r2.front;
> ...
> } while(!r2.empty);
> }
>
> Should we be required to re-verify that r2 is not empty before using it? It
> clearly is not, and would be an artificial requirement (one that map likely
> would not enforce!).
>
> This sounds so much like a convention that simply won't be followed, and the
> result will be perfectly valid code.
The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways.
If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down.
To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.
|
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Wed, 26 Mar 2014 22:48:16 -0400, Walter Bright <newshound2@digitalmars.com> wrote:
> On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
>> if(!r.empty)
>> {
>> auto r2 = map!(x => x * 2)(r);
>> do
>> {
>> auto x = r2.front;
>> ...
>> } while(!r2.empty);
>> }
>>
>> Should we be required to re-verify that r2 is not empty before using it? It
>> clearly is not, and would be an artificial requirement (one that map likely
>> would not enforce!).
>>
>> This sounds so much like a convention that simply won't be followed, and the
>> result will be perfectly valid code.
>
> The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways.
>
> If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down.
>
> To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.
OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later.
Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls.
-Steve
|
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
> OK, but it's logical to assume you *can* avoid a call to empty if you know
> what's going on under the hood, no? Then at that point, you have lost the
> requirement -- people will avoid calling empty because they can get away with
> it, and then altering the under-the-hood requirements cause code breakage later.
>
> Case in point, the pull request I referenced, the author originally tried to
> just use empty to lazily initialize filter, but it failed due to existing code
> in phobos that did not call empty on filtered data before processing. He had to
> instrument all 3 calls.
As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows.
If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.
|
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wednesday, March 26, 2014 10:36:08 Andrei Alexandrescu wrote:
> On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
> > On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan@netmail.co.nz>
> >
> > wrote:
> >> On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
> >>
> >> <schveiguy@yahoo.com> wrote:
> >>> Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.
> >>
> >> I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.
> >
> > Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement.
> >
> > I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.
>
> I think requiring users to call empty before front on input ranges is a concession we should make.
I don't know. It's not the end of the world if we require it (at least with a range that doesn't have length), since you're almost always going to be forced to call empty anyway, because ranges are generally used in generic code. However, it seems really off to me for there to be any real work going on in empty, and I'd be leery of any range which actually required empty to be called before calling front (though I definitely think that calling front on an empty range should not be considered okay and would expect that to generally be undefined behavior).
Regardless, if the range has length, then I'd think that quite a few of the algorithms which actually used length wouldn't actually need to call empty, making calling empty unnecessary overhead. But for the most part, I think that a lot of the weird cases go away when you end up with length and random-access ranges and the like, since that usually means that the range isn't generative but likely has an array underneath (though map certainly has its quirks thanks to the fact that front could technically allocate a new object on every call, and it can be random-access).
- Jonathan M Davis
|
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 27.03.2014 03:48, Walter Bright wrote: > On 3/26/2014 7:19 PM, Steven Schveighoffer wrote: >> if(!r.empty) >> { >> auto r2 = map!(x => x * 2)(r); >> do >> { >> auto x = r2.front; >> ... >> } while(!r2.empty); >> } >> >> Should we be required to re-verify that r2 is not empty before using >> it? It >> clearly is not, and would be an artificial requirement (one that map >> likely >> would not enforce!). >> >> This sounds so much like a convention that simply won't be followed, >> and the >> result will be perfectly valid code. > > The idea is that there are three functions: empty, front, and popFront. > The functionality of the range will be distributed between these 3 > functions. Different things being ranged over will naturally split their > operations into the 3 functions in different ways. > > If the 3 functions can be called in any order, then extra logic has to > be added to them to signal to each other. This slows things down. > > To write generic code calling ranges, it's necessary to stop assuming > what is happening under the hood of empty, front, and popFront, and > stick to the protocol. > IIUC what you are proposing would be covered better by removing front and return the value by popFront. Instead of forcing strong, breaking and sometimes unintuitive semantics we could also improve dmd's optimizer. This caching range example: /////////////////////////////////// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache < 0); } T front() { empty(); return _cache; } void popFront() { _cached = false; } } int foo(irange!int rg) { int sum = 0; while(!rg.empty) { sum += rg.front; rg.popFront; } return sum; } /////////////////////////////////// generates this code with "dmd2.065 -O -inline -release -noboundscheck -m64": _D7irange23fooFS7irange213__T6irangeTiZ6irangeZi: 0000: push rbp 0001: mov rbp,rsp 0004: push rax 0005: push rsi 0006: mov qword ptr [rbp+10h],rcx 000A: xor esi,esi 000C: lea rcx,[rbp+10h] 0010: sub rsp,20h 0014: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 0019: add rsp,20h 001D: xor al,1 001F: je 0050 0021: lea rcx,[rbp+10h] 0025: sub rsp,20h 0029: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 002E: add rsp,20h 0032: mov eax,dword ptr [rbp+14h] 0035: add esi,eax 0037: mov byte ptr [rbp+10h],0 003B: lea rcx,[rbp+10h] 003F: sub rsp,20h 0043: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 0048: add rsp,20h 004C: xor al,1 004E: jne 0021 0050: mov eax,esi 0052: pop rsi 0053: lea rsp,[rbp] 0057: pop rbp 0058: ret _D7irange213__T6irangeTiZ6irange5emptyMFZb: 0000: push rbp 0001: mov rbp,rsp 0004: mov qword ptr [rbp+10h],rcx 0008: cmp byte ptr [rcx],0 000B: je 0011 000D: xor eax,eax 000F: pop rbp 0010: ret 0011: sub rsp,20h 0015: call _D7irange211__T4getcTiZ4getcFZi 001A: add rsp,20h 001E: mov rcx,qword ptr [rbp+10h] 0022: mov dword ptr [rcx+4],eax 0025: shr eax,1Fh 0028: mov byte ptr [rcx],al 002A: xor al,1 002C: pop rbp 002D: ret and this with gdc (-O2 on godbolt.org): int example.foo(example.irange!(int).irange): mov rax, rdi push rbx xor ebx, ebx shr rax, 32 test dil, dil je .L5 .L2: add ebx, eax .L5: call int example.getc!(int).getc() test eax, eax js .L2 mov eax, ebx pop rbx ret All traces of the caching but the initial state check have been removed, no extra calls. |
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rainer Schuetze |
On 27.03.2014 07:53, Rainer Schuetze wrote:
> bool empty() {
> if(_cached) return false;
> _cache = getc!T();
> return (_cache < 0);
> }
Ouch, pasted before fixing:
bool empty() {
if(_cached) return false;
_cache = getc!T();
_cached = _cache >= 0; // this line added
return !_cached;
}
The asm is with an unintended "_cached = _cache < 0;", but that's the same, just accepting negative input instead of non-negative.
|
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Wednesday, 26 March 2014 at 18:04:44 UTC, Ola Fosheim Grøstad
wrote:
> On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei Alexandrescu wrote:
>> I think requiring users to call empty before front on input ranges is a concession we should make.
>
> Then the name should change to "ready". It makes sense to require the user to check that the range is "ready", but not to check that it is "not empty". This will also make more sense for async implementations that will block if "not ready".
>
> IMO the whole interface needs rethinking if you want to gracefully support async data streams where you need to distinguish between: "ready" vs "empty", "front" vs "firstavailable". Both quick-sort, merge-sort, filter and map work well with async data streams.
Why not?
It is how iterators work in most OO languages.
--
Paulo
|
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rainer Schuetze | On 3/26/2014 11:53 PM, Rainer Schuetze wrote: > IIUC what you are proposing would be covered better by removing front and return > the value by popFront. Different things being ranged over make different decisions as to what goes into each function. > Instead of forcing strong, breaking and sometimes unintuitive semantics I don't understand what is so unintuitive about: while (!empty) { ... front ...; popFront(); } What I do find unintuitive and unattractive is all those flags in the range implementation. > we could also improve dmd's optimizer. Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such. |
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rainer Schuetze | On 3/26/2014 11:53 PM, Rainer Schuetze wrote: > This caching range example: > > /////////////////////////////////// > T getc(T)(); > > struct irange(T) > { > bool _cached; > T _cache; > > this(T[] arr) { _cached = false; } > bool empty() { > if(_cached) return false; > _cache = getc!T(); > return (_cache < 0); What happens if empty is called twice in a row? Two characters get read! That isn't right. > } > T front() { empty(); return _cache; } What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges. > void popFront() { _cached = false; } > } |
March 27, 2014 Re: protocol for using InputRanges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 3/26/2014 11:48 PM, Jonathan M Davis wrote: > However, it seems really off to me for there to be any real work going on in > empty, Many things, like some ttys, can not be tested for being empty without reading it, and reading of a tty is destructive. > and I'd be leery of any range which actually required empty to be > called before calling front (though I definitely think that calling front on > an empty range should not be considered okay and would expect that to > generally be undefined behavior). That's exactly why empty should be called before front - because front is expected to succeed. |
Copyright © 1999-2021 by the D Language Foundation