March 27, 2014
On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
> if(!r.empty)
> {
>     auto r2 = map!(x => x * 2)(r);
>     do
>     {
>        auto x = r2.front;
>        ...
>     } while(!r2.empty);
> }
>
> Should we be required to re-verify that r2 is not empty before using it? It
> clearly is not, and would be an artificial requirement (one that map likely
> would not enforce!).
>
> This sounds so much like a convention that simply won't be followed, and the
> result will be perfectly valid code.

The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways.

If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down.

To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.

March 27, 2014
On Wed, 26 Mar 2014 22:48:16 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
>> if(!r.empty)
>> {
>>     auto r2 = map!(x => x * 2)(r);
>>     do
>>     {
>>        auto x = r2.front;
>>        ...
>>     } while(!r2.empty);
>> }
>>
>> Should we be required to re-verify that r2 is not empty before using it? It
>> clearly is not, and would be an artificial requirement (one that map likely
>> would not enforce!).
>>
>> This sounds so much like a convention that simply won't be followed, and the
>> result will be perfectly valid code.
>
> The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways.
>
> If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down.
>
> To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.

OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later.

Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls.

-Steve
March 27, 2014
On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
> OK, but it's logical to assume you *can* avoid a call to empty if you know
> what's going on under the hood, no? Then at that point, you have lost the
> requirement -- people will avoid calling empty because they can get away with
> it, and then altering the under-the-hood requirements cause code breakage later.
>
> Case in point, the pull request I referenced, the author originally tried to
> just use empty to lazily initialize filter, but it failed due to existing code
> in phobos that did not call empty on filtered data before processing. He had to
> instrument all 3 calls.

As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows.

If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.

March 27, 2014
On Wednesday, March 26, 2014 10:36:08 Andrei Alexandrescu wrote:
> On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
> > On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan@netmail.co.nz>
> > 
> > wrote:
> >> On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
> >> 
> >> <schveiguy@yahoo.com> wrote:
> >>> Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.
> >> 
> >> I see.  I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.
> > 
> > Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement.
> > 
> > I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.
> 
> I think requiring users to call empty before front on input ranges is a concession we should make.

I don't know. It's not the end of the world if we require it (at least with a range that doesn't have length), since you're almost always going to be forced to call empty anyway, because ranges are generally used in generic code. However, it seems really off to me for there to be any real work going on in empty, and I'd be leery of any range which actually required empty to be called before calling front (though I definitely think that calling front on an empty range should not be considered okay and would expect that to generally be undefined behavior).

Regardless, if the range has length, then I'd think that quite a few of the algorithms which actually used length wouldn't actually need to call empty, making calling empty unnecessary overhead. But for the most part, I think that a lot of the weird cases go away when you end up with length and random-access ranges and the like, since that usually means that the range isn't generative but likely has an array underneath (though map certainly has its quirks thanks to the fact that front could technically allocate a new object on every call, and it can be random-access).

- Jonathan M Davis
March 27, 2014

On 27.03.2014 03:48, Walter Bright wrote:
> On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
>> if(!r.empty)
>> {
>>     auto r2 = map!(x => x * 2)(r);
>>     do
>>     {
>>        auto x = r2.front;
>>        ...
>>     } while(!r2.empty);
>> }
>>
>> Should we be required to re-verify that r2 is not empty before using
>> it? It
>> clearly is not, and would be an artificial requirement (one that map
>> likely
>> would not enforce!).
>>
>> This sounds so much like a convention that simply won't be followed,
>> and the
>> result will be perfectly valid code.
>
> The idea is that there are three functions: empty, front, and popFront.
> The functionality of the range will be distributed between these 3
> functions. Different things being ranged over will naturally split their
> operations into the 3 functions in different ways.
>
> If the 3 functions can be called in any order, then extra logic has to
> be added to them to signal to each other. This slows things down.
>
> To write generic code calling ranges, it's necessary to stop assuming
> what is happening under the hood of empty, front, and popFront, and
> stick to the protocol.
>

IIUC what you are proposing would be covered better by removing front and return the value by popFront.

Instead of forcing strong, breaking and sometimes unintuitive semantics we could also improve dmd's optimizer. This caching range example:

///////////////////////////////////
T getc(T)();

struct irange(T)
{
  bool _cached;
  T _cache;

  this(T[] arr) { _cached = false; }
  bool empty() {
    if(_cached) return false;
    _cache = getc!T();
    return (_cache < 0);
  }
  T front() { empty(); return _cache; }
  void popFront() { _cached = false; }
}

int foo(irange!int rg)
{
  int sum = 0;
  while(!rg.empty)
  {
    sum += rg.front;
  	rg.popFront;
  }
  return sum;
}
///////////////////////////////////

generates this code with "dmd2.065 -O -inline -release -noboundscheck -m64":

_D7irange23fooFS7irange213__T6irangeTiZ6irangeZi:
  0000: push        rbp
  0001: mov         rbp,rsp
  0004: push        rax
  0005: push        rsi
  0006: mov         qword ptr [rbp+10h],rcx
  000A: xor         esi,esi
  000C: lea         rcx,[rbp+10h]
  0010: sub         rsp,20h
  0014: call        _D7irange213__T6irangeTiZ6irange5emptyMFZb
  0019: add         rsp,20h
  001D: xor         al,1
  001F: je          0050
  0021: lea         rcx,[rbp+10h]
  0025: sub         rsp,20h
  0029: call        _D7irange213__T6irangeTiZ6irange5emptyMFZb
  002E: add         rsp,20h
  0032: mov         eax,dword ptr [rbp+14h]
  0035: add         esi,eax
  0037: mov         byte ptr [rbp+10h],0
  003B: lea         rcx,[rbp+10h]
  003F: sub         rsp,20h
  0043: call        _D7irange213__T6irangeTiZ6irange5emptyMFZb
  0048: add         rsp,20h
  004C: xor         al,1
  004E: jne         0021
  0050: mov         eax,esi
  0052: pop         rsi
  0053: lea         rsp,[rbp]
  0057: pop         rbp
  0058: ret

_D7irange213__T6irangeTiZ6irange5emptyMFZb:
  0000: push        rbp
  0001: mov         rbp,rsp
  0004: mov         qword ptr [rbp+10h],rcx
  0008: cmp         byte ptr [rcx],0
  000B: je          0011
  000D: xor         eax,eax
  000F: pop         rbp
  0010: ret
  0011: sub         rsp,20h
  0015: call        _D7irange211__T4getcTiZ4getcFZi
  001A: add         rsp,20h
  001E: mov         rcx,qword ptr [rbp+10h]
  0022: mov         dword ptr [rcx+4],eax
  0025: shr         eax,1Fh
  0028: mov         byte ptr [rcx],al
  002A: xor         al,1
  002C: pop         rbp
  002D: ret

and this with gdc (-O2 on godbolt.org):

int example.foo(example.irange!(int).irange):
	mov	rax, rdi
	push	rbx
	xor	ebx, ebx
	shr	rax, 32
	test	dil, dil
	je	.L5
.L2:
	add	ebx, eax
.L5:
	call	int example.getc!(int).getc()
	test	eax, eax
	js	.L2
	mov	eax, ebx
	pop	rbx
	ret

All traces of the caching but the initial state check have been removed, no extra calls.
March 27, 2014

On 27.03.2014 07:53, Rainer Schuetze wrote:
> bool empty() {
>      if(_cached) return false;
>      _cache = getc!T();
>      return (_cache < 0);
>    }

Ouch, pasted before fixing:

  bool empty() {
    if(_cached) return false;
    _cache = getc!T();
    _cached = _cache >= 0; // this line added
    return !_cached;
  }

The asm is with an unintended "_cached = _cache < 0;", but that's the same, just accepting negative input instead of non-negative.

March 27, 2014
On Wednesday, 26 March 2014 at 18:04:44 UTC, Ola Fosheim Grøstad
wrote:
> On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei Alexandrescu wrote:
>> I think requiring users to call empty before front on input ranges is a concession we should make.
>
> Then the name should change to "ready". It makes sense to require the user to check that the range is "ready", but not to check that it is "not empty". This will also make more sense for async implementations that will block if "not ready".
>
> IMO the whole interface needs rethinking if you want to gracefully support async data streams where you need to distinguish between: "ready" vs "empty", "front" vs "firstavailable". Both quick-sort, merge-sort, filter and map work well with async data streams.

Why not?

It is how iterators work in most OO languages.

--
Paulo
March 27, 2014
On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
> IIUC what you are proposing would be covered better by removing front and return
> the value by popFront.

Different things being ranged over make different decisions as to what goes into each function.

> Instead of forcing strong, breaking and sometimes unintuitive semantics

I don't understand what is so unintuitive about:

    while (!empty) { ... front ...; popFront(); }

What I do find unintuitive and unattractive is all those flags in the range implementation.

> we could also improve dmd's optimizer.

Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such.
March 27, 2014
On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
>  This caching range example:
>
> ///////////////////////////////////
> T getc(T)();
>
> struct irange(T)
> {
>    bool _cached;
>    T _cache;
>
>    this(T[] arr) { _cached = false; }
>    bool empty() {
>      if(_cached) return false;
>      _cache = getc!T();
>      return (_cache < 0);

What happens if empty is called twice in a row? Two characters get read! That isn't right.

>    }
>    T front() { empty(); return _cache; }

What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges.

>    void popFront() { _cached = false; }
> }

March 27, 2014
On 3/26/2014 11:48 PM, Jonathan M Davis wrote:
> However, it seems really off to me for there to be any real work going on in
> empty,

Many things, like some ttys, can not be tested for being empty without reading it, and reading of a tty is destructive.

> and I'd be leery of any range which actually required empty to be
> called before calling front (though I definitely think that calling front on
> an empty range should not be considered okay and would expect that to
> generally be undefined behavior).

That's exactly why empty should be called before front - because front is expected to succeed.