November 17, 2017
On Thursday, 16 November 2017 at 18:34:54 UTC, Steven Schveighoffer wrote:
> On 11/16/17 8:10 AM, ag0aep6g wrote:
>> On 11/16/2017 09:03 AM, Tony wrote:
>>> However, when I use the class with foreach, the opindex gets called to create a dynamic array, rather than use the empty(),front(),popFront() routines. I would prefer it use the three methods, rather than create a dynamic array.
>> 
>> https://issues.dlang.org/show_bug.cgi?id=14619
>
> I took a shot at fixing. Way more complex than I realized.
>

I was initially miffed that I had added empty(), popFront() and pop() and they weren't being used, but I don't have a problem with using [] instead of them. Maybe call it a feature and document it.

But I do have a complaint about the methods empty(), popFront() and pop(). I think they should have a special syntax or name to reflect that they are not general purpose methods. __empty() or preferably __forEachDone().  empty() is typically used to say if a container has no data,  not if you are at the end of external foreach loop processing. pop() and popFront() also would typically have different meanings with certain containers and their names don't reflect that they have a special "external foreach loop" purpose.
November 16, 2017
On Fri, Nov 17, 2017 at 01:06:31AM +0000, Tony via Digitalmars-d-learn wrote: [...]
> But I do have a complaint about the methods empty(), popFront() and
> pop(). I think they should have a special syntax or name to reflect
> that they are not general purpose methods. __empty() or preferably
> __forEachDone().  empty() is typically used to say if a container has
> no data,  not if you are at the end of external foreach loop
> processing. pop() and popFront() also would typically have different
> meanings with certain containers and their names don't reflect that
> they have a special "external foreach loop" purpose.

It should be .empty, .popFront, and .front, not .pop.

Also, these methods are *range* primitives, and over time, we have come to a consensus that generally speaking, it's a bad idea to conflate containers with ranges over containers.  The main thing is that iterating over a range is supposed to consume it, which is usually not what you want with a container.

The usual idiom is to separate the two concepts, and have the container provide a mechanism for returning a range over its contents, usually via .opIndex with no arguments, or .opSlice. Then you would just write:

	foreach (e; myContainer[]) { // [] calls .opIndex/.opSlice
		...
	}

Unfortunately, built-in arrays, which are also ranges, are one exception to this rule that, due to their ubiquity in D, also serve to mislead newcomers to D about when/where range primitives should be implemented. Generally speaking, built-in arrays should not be considered exemplary in this respect, but rather should be understood as exceptions.  The general convention is to separate your containers from ranges over its contents, and to provide .opIndex / .opSlice that constructs a range over the container when needed.

The other consideration is that if you don't really need range functionality, i.e., the only thing you want to do with your container is to put it in a foreach loop, then you can sidestep this whole mess and just implement .opApply for your container and call it a day.  Of course, then you won't be able to use generic algorithms like those in std.algorithm with your container, but if you didn't intend to anyway, it's not a big deal.


T

-- 
Heads I win, tails you lose.
November 17, 2017
On Friday, 17 November 2017 at 01:16:38 UTC, H. S. Teoh wrote:

>
> It should be .empty, .popFront, and .front, not .pop.
>
> Also, these methods are *range* primitives, and over time, we have come to a consensus that generally speaking, it's a bad idea to conflate containers with ranges over containers.  The main thing is that iterating over a range is supposed to consume it, which is usually not what you want with a container.
>
> The usual idiom is to separate the two concepts, and have the container provide a mechanism for returning a range over its contents, usually via .opIndex with no arguments, or .opSlice. Then you would just write:
>
> 	foreach (e; myContainer[]) { // [] calls .opIndex/.opSlice
> 		...
> 	}
>
> Unfortunately, built-in arrays, which are also ranges, are one exception to this rule that, due to their ubiquity in D, also serve to mislead newcomers to D about when/where range primitives should be implemented. Generally speaking, built-in arrays should not be considered exemplary in this respect, but rather should be understood as exceptions.  The general convention is to separate your containers from ranges over its contents, and to provide .opIndex / .opSlice that constructs a range over the container when needed.
>
> The other consideration is that if you don't really need range functionality, i.e., the only thing you want to do with your container is to put it in a foreach loop, then you can sidestep this whole mess and just implement .opApply for your container and call it a day.  Of course, then you won't be able to use generic algorithms like those in std.algorithm with your container, but if you didn't intend to anyway, it's not a big deal.
>
>
> T

Thanks T! Good information, especially "iterating over a range is supposed to consume it". I have been reading dlang.org->Documentation->Language Reference, but  should have also read dlang.org->Dlang-Tour->Ranges. Although that page makes a distinction about "range consumption" with regard to a "reference type" or a "value type" and it isn't clear to me why there would be a difference.
November 17, 2017
On Friday, 17 November 2017 at 03:15:12 UTC, Tony wrote:

>
> Thanks T! Good information, especially "iterating over a range is supposed to consume it". I have been reading dlang.org->Documentation->Language Reference, but  should have also read dlang.org->Dlang-Tour->Ranges. Although that page

You might also find use in this article (poorly adapted from Chapter 6 of Learning D by the publisher, but still readable):

https://www.packtpub.com/books/content/understanding-ranges

> makes a distinction about "range consumption" with regard to a "reference type" or a "value type" and it isn't clear to me why there would be a difference.

With a value type, you're consuming a copy of the original range, so you can reuse it after. With a reference type, you're consuming the original range and therefore can't reuse it.


========
struct ValRange {
    int[] items;
    bool empty() @property { return items.length == 0; }
    int front() @property { return items[0]; }
    void popFront() { items = items[1 .. $]; }
}

class RefRange {
    int[] items;
    this(int[] src) { items = src; }
    bool empty() @property { return items.length == 0; }
    int front() @property { return items[0]; }
    void popFront() { items = items[1 .. $]; }
}

void main() {
    import std.stdio;

    int[] ints = [1, 2, 3];
    auto valRange = ValRange(ints);

    writeln("Val 1st Run:");
    foreach(i; valRange) writeln(i);
    assert(!valRange.empty);

    writeln("Val 2nd Run:");
    foreach(i; valRange) writeln(i);
    assert(!valRange.empty);

    auto refRange = new RefRange(ints);

    writeln("Ref 1st Run:");
    foreach(i; refRange) writeln(i);
    assert(refRange.empty);

    writeln("Ref 2nd Run:");
    foreach(i; refRange) writeln(i); // prints nothing
}
November 17, 2017
On Friday, November 17, 2017 07:40:35 Mike Parker via Digitalmars-d-learn wrote:
> On Friday, 17 November 2017 at 03:15:12 UTC, Tony wrote:
> > Thanks T! Good information, especially "iterating over a range is supposed to consume it". I have been reading dlang.org->Documentation->Language Reference, but  should have also read dlang.org->Dlang-Tour->Ranges. Although that page
>
> You might also find use in this article (poorly adapted from Chapter 6 of Learning D by the publisher, but still readable):
>
> https://www.packtpub.com/books/content/understanding-ranges
>
> > makes a distinction about "range consumption" with regard to a "reference type" or a "value type" and it isn't clear to me why there would be a difference.
>
> With a value type, you're consuming a copy of the original range, so you can reuse it after. With a reference type, you're consuming the original range and therefore can't reuse it.

Technically, per the range API, you can _never_ reuse a range. The only legitimate way to get a copy of a range to then iterate over separately is to call save on the range (which of course requires it to then be a forward range and not just an input range). However, unfortunately, for many common range types (dynamic arrays included), calling save and copying the range have the same semantics. So, it's easy to write code that will work with many ranges without calling save anywhere but falls flat on its face as soon as you use a range that actually requires that save be called (typically because it's either a reference type, or it's a pseudo-reference type where only some state gets copied when the range is copied, and you get particularly weird behaviors when reusing the range that was copied).

So, while you can get away with reusing a range where save and copying the range do the same thing, it's an incredibly bad idea in general and definitely causes problems in generic code. Certainly, it should only be done when you're dealing with a specific range type where you know what the semantics of copying it are. In general, as soon as you've copied a range, it should never be used again unless it's assigned a new value.

Of course, if something is truly an input range (and not a forward range that merely doesn't have save declared like it should), then it's particularly bad to be trying to use copies of ranges, because any range that can't be a forward range is by definition either a reference type or a pseudo-reference type where a copy is not fully distinct from the original (typically where making a copy isn't possible or where it would be too expensive to do so).

Personally, I'm inclined to think that we should never have had save and should have required that reference type ranges which are forward ranges be wrapped in a struct where copying it does the same thing that save does now, but I seriously doubt that we could make a change that big now. And we'd still have to watch out for how input ranges are different, since copying them wouldn't and couldn't work the same (and while getting rid of save like that would really clean up some range stuff that uses forward ranges, it would make it a lot harder to distinguish between input and forward ranges). So, it's not like there would be a perfect solution even if we were redesigning things from scratch.

Ultimately, folks just need to be aware that they need to be calling save when they want to actually copy a range and make sure that they unit test their code well to make sure that it works with various range types if it's generic code (and that it works with the exact ranges that it uses if it's not generic). Unfortunately, it's usually the case that when you first test range-based code with a range that doesn't implicitly save when it's copied that you find that your code doesn't work with it, because save wasn't explicitly called when it needed to be. But at least you then catch it and can fix your code.

- Jonathan M Davis

November 17, 2017
On Friday, 17 November 2017 at 07:40:35 UTC, Mike Parker wrote:

>
> You might also find use in this article (poorly adapted from Chapter 6 of Learning D by the publisher, but still readable):
>
> https://www.packtpub.com/books/content/understanding-ranges
>
>> makes a distinction about "range consumption" with regard to a "reference type" or a "value type" and it isn't clear to me why there would be a difference.
>
> With a value type, you're consuming a copy of the original range, so you can reuse it after. With a reference type, you're consuming the original range and therefore can't reuse it.
>
>
> ========
> struct ValRange {
>     int[] items;
>     bool empty() @property { return items.length == 0; }
>     int front() @property { return items[0]; }
>     void popFront() { items = items[1 .. $]; }
> }
>
> class RefRange {
>     int[] items;
>     this(int[] src) { items = src; }
>     bool empty() @property { return items.length == 0; }
>     int front() @property { return items[0]; }
>     void popFront() { items = items[1 .. $]; }
> }
>
> void main() {
>     import std.stdio;
>
>     int[] ints = [1, 2, 3];
>     auto valRange = ValRange(ints);
>
>     writeln("Val 1st Run:");
>     foreach(i; valRange) writeln(i);
>     assert(!valRange.empty);
>
>     writeln("Val 2nd Run:");
>     foreach(i; valRange) writeln(i);
>     assert(!valRange.empty);
>
>     auto refRange = new RefRange(ints);
>
>     writeln("Ref 1st Run:");
>     foreach(i; refRange) writeln(i);
>     assert(refRange.empty);
>
>     writeln("Ref 2nd Run:");
>     foreach(i; refRange) writeln(i); // prints nothing
> }

Thanks for the reference and the code. I will have to iterate over the packpub text a while consulting the docs. I see that the code runs as you say, but I don't understand what's going on. You say with regard to a "value type" : "you're consuming a copy of the original range" but I don't see anything different between the processing in the struct versus in the class. They both have a dynamic array variable that they re-assign a "slice" to (or maybe that is - that they modify to be the sliced version). Anyway, I can't see why the one in the struct shrinks and then goes back to what it was originally. It's like calls were made by the compiler that aren't shown.

November 17, 2017
On Friday, November 17, 2017 17:37:01 Tony via Digitalmars-d-learn wrote:
> On Friday, 17 November 2017 at 07:40:35 UTC, Mike Parker wrote:
> > You might also find use in this article (poorly adapted from Chapter 6 of Learning D by the publisher, but still readable):
> >
> > https://www.packtpub.com/books/content/understanding-ranges
> >
> >> makes a distinction about "range consumption" with regard to a "reference type" or a "value type" and it isn't clear to me why there would be a difference.
> >
> > With a value type, you're consuming a copy of the original range, so you can reuse it after. With a reference type, you're consuming the original range and therefore can't reuse it.
> >
> >
> > ========
> > struct ValRange {
> >
> >     int[] items;
> >     bool empty() @property { return items.length == 0; }
> >     int front() @property { return items[0]; }
> >     void popFront() { items = items[1 .. $]; }
> >
> > }
> >
> > class RefRange {
> >
> >     int[] items;
> >     this(int[] src) { items = src; }
> >     bool empty() @property { return items.length == 0; }
> >     int front() @property { return items[0]; }
> >     void popFront() { items = items[1 .. $]; }
> >
> > }
> >
> > void main() {
> >
> >     import std.stdio;
> >
> >     int[] ints = [1, 2, 3];
> >     auto valRange = ValRange(ints);
> >
> >     writeln("Val 1st Run:");
> >     foreach(i; valRange) writeln(i);
> >     assert(!valRange.empty);
> >
> >     writeln("Val 2nd Run:");
> >     foreach(i; valRange) writeln(i);
> >     assert(!valRange.empty);
> >
> >     auto refRange = new RefRange(ints);
> >
> >     writeln("Ref 1st Run:");
> >     foreach(i; refRange) writeln(i);
> >     assert(refRange.empty);
> >
> >     writeln("Ref 2nd Run:");
> >     foreach(i; refRange) writeln(i); // prints nothing
> >
> > }
>
> Thanks for the reference and the code. I will have to iterate over the packpub text a while consulting the docs. I see that the code runs as you say, but I don't understand what's going on. You say with regard to a "value type" : "you're consuming a copy of the original range" but I don't see anything different between the processing in the struct versus in the class. They both have a dynamic array variable that they re-assign a "slice" to (or maybe that is - that they modify to be the sliced version). Anyway, I can't see why the one in the struct shrinks and then goes back to what it was originally. It's like calls were made by the compiler that aren't shown.

When you have

foreach(e; range)

it gets lowered to something like

for(auto r = range; !r.empty; r.popFront())
{
    auto e = r.front;
}

So, the range is copied when you use it in a foreach. In the case of a class, it's just the reference that's copied. So, both "r" and "range" refer to the same object, but with a struct, you get two separate copies. So, when foreach iterates over "r", "range" isn't mutated.

So, in the general case, if you want to use a range in foreach without consuming the range, it needs to be a forward range, and you need to call save. e.g.

foreach(e; range.save)

For many ranges, copying a range is equivalent to calling save, but for some it is not (most notably classes, since copying a class reference just means that you get two references to the same object). So, it's pretty typical for folks to write code that doesn't use save where it should and that works just fine with dynamic arrays and many structs but which fails miserably when you pass it a class. Also, just because something is a struct doesn't mean that copying it does a deep enough copy. If multiple variables hold state in the struct, and some of them are reference types and some are value types, then copying the struct does not result in an independent copy - and you can get really weird results when that happens. That's why it's important to test range-based functions with a variety of range types if it's intended to work with ranges in general as opposed to a specific type. Then you can ensure that you aren't accidentally relying on some aspect of a specific range type.

- Jonathan M Davis

November 18, 2017
On Friday, 17 November 2017 at 17:55:30 UTC, Jonathan M Davis wrote:
>
> When you have
>
> foreach(e; range)
>
> it gets lowered to something like
>
> for(auto r = range; !r.empty; r.popFront())
> {
>     auto e = r.front;
> }
>
> So, the range is copied when you use it in a foreach. In the case of a class, it's just the reference that's copied. So, both "r" and "range" refer to the same object, but with a struct, you get two separate copies. So, when foreach iterates over "r", "range" isn't mutated.

Ah, I get it now ("r=range; process r"), thanks!

>
> So, in the general case, if you want to use a range in foreach without consuming the range, it needs to be a forward range, and you need to call save. e.g.
>
> foreach(e; range.save)
>

Seems like you can make class-based ranges to work on multiple foreach calls without having to do save, although maybe it falls apart in other usage. It also doesn't appear that the compiler requires an @property annotation as specified in the interface :

import std.stdio : writeln;

class RefRange {
    int foreach_index;
    int[] items;
    this(int[] src)
    {
       items = src;
    }

    bool empty()
    {
       if (foreach_index == items.length)
       {
	  foreach_index = 0; // reset for another foreach
	  return true;
       }
       return false;
    }
    int front() { return items[foreach_index]; }
    void popFront() { foreach_index++; }
}

void main() {
    import std.stdio;

    int[] ints = [1, 2, 3];
    auto refRange = new RefRange(ints);

    writeln("Ref 1st Run:");
    foreach(i; refRange) writeln(i);
    assert( ! refRange.empty);
    writeln("Ref 2nd Run:");
    foreach(i; refRange) writeln(i); // works
}
------------------------------------------
Ref 1st Run:
1
2
3
Ref 2nd Run:
1
2
3
November 18, 2017
On Saturday, 18 November 2017 at 05:24:30 UTC, Tony wrote:

Forgot to handle pre-mature foreach exit:

import std.stdio : writeln;

class RefRange {
    int foreach_index;
    int[] items;
    this(int[] src)
    {
       items = src;
    }

    bool empty()
    {
       if (foreach_index == items.length)
       {
	  foreach_index = 0; // reset for another foreach
	  return true;
       }
       return false;
    }
    int front() { return items[foreach_index]; }
    void popFront() { foreach_index++; }
    void resetIteration() { foreach_index = 0; }
}

void main() {

    int[] ints = [1, 2, 3];
    auto refRange = new RefRange(ints);

    writeln("Ref 1st Run:");
    foreach(i; refRange)
    {
       writeln(i);
       if ( i == 2 )
       {
	  refRange.resetIteration();
          break;
       }
    }
    assert( ! refRange.empty);
    writeln("Ref 2nd Run:");
    foreach(i; refRange) writeln(i); // works
}
-------------------------------
Ref 1st Run:
1
2
Ref 2nd Run:
1
2
3

November 18, 2017
On Saturday, November 18, 2017 05:24:30 Tony via Digitalmars-d-learn wrote:
> On Friday, 17 November 2017 at 17:55:30 UTC, Jonathan M Davis
> > So, in the general case, if you want to use a range in foreach without consuming the range, it needs to be a forward range, and you need to call save. e.g.
> >
> > foreach(e; range.save)
>
> Seems like you can make class-based ranges to work on multiple foreach calls without having to do save, although maybe it falls apart in other usage.

A range that was a class would just be completely consumed by foreach unless you used break to exit the loop early. But my point is that in generic code, you can't copy a range and then use it any more after copying it without assigning it a new range, because the behavior is unspecified, and different ranges will act differently. Similarly, in generic code, you must call save when you want to get another copy of the range that can then be iterated independently. In non-generic code, you can choose to depend on the behavior of the specific range that you're using, but for the same code to work with all types of ranges, you're a bit more restricted.

> It also doesn't appear that the compiler
> requires an @property annotation as specified in the interface :

@property does almost nothing in general. All it really does is affect how some stuff like typeof works. So, code introspection can be affected by @property, but whether you put @property on something like empty or front really doesn't matter. A lot of us do it out of habit and/or to make it clear that it's intended to be used as a property function, but any function which returns a value but has no parameters can be used as a getter property regardless of whether @property is used. And what the range API really cares about is that the code in isInputRange, isForwardRange, etc. compiles, not that anything is marked with @property. It's not even sure that everything involved is a function - e.g. infinite ranges are ranges where empty is known to be false at compile time, and that's usually done by defining empty as an enum, in which case it's not a function at all. However, it can be called the same way, so the same code work with a range that defines empty as a function and a range that defines it as an enum or variable, since the range API expects empty to be called without parens.

Originally, @property was supposed to do more with optional parens going away, but it never really happened (optional parens became too popular once UFCS was added to the language). The main problem that it may yet be made to solve is property functions which return callables. Right now, if you use () on a property function (whether it has @property on it or not), they're the optional parens of the function call, whereas if the property were a variable, the parens would either trigger opCall or call a delegate (depending on the type of the variable). So, you can't currently turn a public variable that's a callable into a property function and have it work as a property function. @property could be made to indicate that in that case, the single set of parens should be called on the return value rather than the function itself, in which case, you could have a property that's a callable, but while that has been discussed, it's never happened. So, for now at least, @property really doesn't do much.

- Jonathan M Davis