Jump to page: 1 2
Thread overview
Should range foreach be iterating over an implicit copy?
May 16, 2012
Nick Sabalausky
May 17, 2012
Xinok
Jun 07, 2012
Ken
May 16, 2012
Era Scarecrow
May 17, 2012
Jesse Phillips
May 17, 2012
Nick Sabalausky
May 17, 2012
Dmitry Olshansky
Jun 08, 2012
Ken
May 17, 2012
Erik Jensen
May 18, 2012
Jonathan M Davis
May 17, 2012
bearophile
May 17, 2012
Jonathan M Davis
May 17, 2012
Jonathan M Davis
May 16, 2012
A small debate has broken out over on D.learn ( http://forum.dlang.org/thread/jovicg$jta$1@digitalmars.com#post-jovicg:24jta:241:40digitalmars.com ) that I thought I should move here.

Basically, the issue is this: Currently, when you have a struct-based range, foreach will iterate over a *copy* of it:

    Range r;
    auto save = r;
    foreach(e; r)
        assert(r == save);
    assert(r == save);

One side of the argument is that this behavior is correct and expected since structs are value types, and iterating shouldn't consume the range.

My argument is this:

First of all, foreach is conceptually a flow-control statement, not a function. So I'd intuitively expect:

    Range r;
    foreach(e; r) {...}

To work like this:

    Range r;
    for(; !r.empty; r.popFront)
    {
        auto e = r.front;
        ...
    }

Additionally, iterating a range normally *does* consume it. And that's expected, as it should be: Imagine, for example, an input range that read from stdin. Leaving that range unconsumed would make no sense at all. Actually, any input range can be expected to *not* leave an uniterated copy behind: if it *could* have an uniterated copy left behind, it would be a forward range, not an input range.

It actually seems wrong to use the current foreach over an input range: Sometimes it might work by pure luck (as in the original example), but you can expect that for some input ranges it would fail miserably, because an input range is *not* a forward range and by definition cannot reliably save a previous state. So iterating over an input range would leave the original input range in an undefined state (actually, that could even be true for certain forward ranges if foreach doesn't implicitly start out by calling "r.save" and the range doesn't expect to be saved by copying).

Suppose foreach *didn't* iterate over a copy: If you wanted to still have an un-iterated version (of an actual forward range, or an input range that you knew to be safe), then that's trivial:

    Foo a;
    b = a.save(); // Or "b = a;" if you really know what you're doing.
    foreach(elem; a) {}
    assert(a.empty);
    assert(!b.empty);

Note, however, that there is no such simple way to go the other way around: to have the current "foreach over a copy" behavior and have access to the actual iterated range. Maybe if we had "foreach(e; ref range)", but AFAIK we don't.

One counter-argument that was raised is that TDPL has an example on page 381 that indicates foreach iterates over an implicit copy. I don't have a copy handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in on this: I'm curious if this example in TDPL made that copy deliberately, or if the full implications of that copy were just an oversight.


May 16, 2012
On 5/16/12 4:37 PM, Nick Sabalausky wrote:
> One counter-argument that was raised is that TDPL has an example on page 381
> that indicates foreach iterates over an implicit copy. I don't have a copy
> handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in
> on this: I'm curious if this example in TDPL made that copy deliberately, or
> if the full implications of that copy were just an oversight.

It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???"

Andrei

May 16, 2012
On Wednesday, 16 May 2012 at 21:37:54 UTC, Nick Sabalausky wrote:
> A small debate has broken out over on D.learn (
> http://forum.dlang.org/thread/jovicg$jta$1@digitalmars.com#post-jovicg:24jta:241:40digitalmars.com )
> that I thought I should move here.
>
> Basically, the issue is this: Currently, when you have a struct-based range, foreach will iterate over a *copy* of it:

http://forum.dlang.org/post/dskrijudairkopeoyqkc@forum.dlang.org

I've replied in the other one, but I'll repost it here, perhaps andrei can correct me.

---

On Wednesday, 16 May 2012 at 20:50:55 UTC, Nick Sabalausky wrote:

> I was initially open to the idea I may have just misunderstood something, but I'm becoming more and more convinced that the current "foreach over an implicit copy" behavior is indeed a bad idea.
>
> I'd love to see Andrei weigh in on this. I'm curious if the example you pointed out in TDPL made the copy deliberately, or if the effects of that copy were just an oversight.

 I remember going over hard and long over that section. I think it's deliberate.

 Range can refer to many things, and I'm assuming array is one of them. So... If the array is consumed when using foreach, that seems dumb right? (An array in the end is just a small struct afterall...)

---
int[] x = [1,2,3];
foreach(i; x) {
 //stuff
}

 assert(x.length == 0, "Should have been consumed!");
---

 Structs being value types are by value and so are copied, not referenced. Although referencing a copy does seem a bit silly...

int[] x = [1,2,3];
foreach(ref i; x) {
 i = 10;
}

assert(x == [10,10,10], "But i changed them to 10!!");

--

 That means that foreach(ref; ref) if it's considered (And I think it makes sense to) would work for structs in these few cases. (Course you'd still consume dynamic arrays that way)

 In std.stream are classes, so they are consumed. I'm probably just rambling...
May 17, 2012
On Wednesday, 16 May 2012 at 21:37:54 UTC, Nick Sabalausky wrote:
> A small debate has broken out over on D.learn (
> http://forum.dlang.org/thread/jovicg$jta$1@digitalmars.com#post-jovicg:24jta:241:40digitalmars.com )
> that I thought I should move here.

I am in agreement, a value type rage is confusing to use

http://d.puremagic.com/issues/show_bug.cgi?id=7067

I don't agree with Andrei that we want to stop the "hey, where'd my range go" because we spend so much effort to get that to be expected behavior (and it is). The hybrid array really doesn't help, but I wouldn't want it changed.

If .save is not called on the range then the expectation is that it will be consumed. It is clear that calling a function with a value type would likely mean it won't be consumed, but that is the odd case, the expectation is that the range would be consumed.

And this leads to what is suggested in the referenced bug, a wrapper to prevent an implicit forward range.
May 17, 2012
On Wednesday, 16 May 2012 at 21:40:39 UTC, Andrei Alexandrescu wrote:
> On 5/16/12 4:37 PM, Nick Sabalausky wrote:
>> One counter-argument that was raised is that TDPL has an example on page 381
>> that indicates foreach iterates over an implicit copy. I don't have a copy
>> handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in
>> on this: I'm curious if this example in TDPL made that copy deliberately, or
>> if the full implications of that copy were just an oversight.
>
> It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???"
>
> Andrei

I think this is the correct behavior as well. Otherwise, we'd have to take extra care when writing generic code to ensure it doesn't consume the range but doesn't attempt to call .save on non-range types.
May 17, 2012
I'm a little discouraged that my concern about "input ranges can't save their state, and yet that's exactly what happens implicitly" hasn't been addressed. I was hoping to at least get a "That's not really a problem and here's why..."

However, that said...

"Andrei Alexandrescu" wrote...
> It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???"

"Era Scarecrow" wrote...
> Range can refer to many things, and I'm assuming array is one of them. So... If the array is consumed when using foreach, that seems dumb right? (An array in the end is just a small struct afterall...)

I admit, those are fair points.

With those in mind, I've given it some more thought, and here are my current...umm...thoughts:

- A range is not a collection, it's a *view* of a collection (or of something else). This is a necessary distinction because ranges and collections work in fundamentally different ways: A range is, *by necessity* consumed as it's iterated - that's what popFront *does*, that's fundamentally how ranges work. For many collections, OTOH, it makes *no* sense to consume the collection while iterating it. Thus, range != collection, it's a view of a collection.

- Ranges are D's answer to iterators. I don't think people used to iterators from other languages would expect their iterator to magically reset itself after being used. So I see no reason why they would expect a range (ie, an iterator-pair-with-benefits) to behave differently than that.

- D's arrays conflate the ideas of "collection" and "range", hence the odd edge case Era pointed out, and hence the "need" for foreach to automatically make a copy. But in my (not super-extensive) experience creating ranges, I've found that to be a problematic pattern (due to the fundamental distinction between a range and a collection), and learned to prefer making my iterable things *return* a range, rather than actually *be* ranges.

- Admittedly, it would be annoying if foreach had to be used like this on all collections: "foreach(e; myArray.rangeOf)", so I guess it would make sense for a range to automatically be obtained when foreach-ing over a collection. However, I'm still not 100% sold on the current method of doing that (making foreach automatically copy struct-based ranges), partly because of the questionable implications it has for input ranges, and partly because (for struct-ranges) it leaves no way to access the range that's actually being iterated.

- At the very least, perhaps input ranges just shouldn't be allowed to be structs? After all, structs are intended to be copied around, but input ranges, by definition, can't have their current state copied.


May 17, 2012
Nick Sabalausky:

> One side of the argument is that this behavior is correct and expected since structs are value types, and iterating
> shouldn't consume the range.

In that D.learn thread I've shown that iterating on a fixed-size
array, that is a value, doesn't perform a copy of the array.


> Imagine, for example, an input range that read from stdin.
> Leaving that range unconsumed would make no sense at all.
> Actually, any input range can be expected to *not* leave an uniterated copy behind: if it *could* have an uniterated
> copy left behind, it would be a forward range, not an input
> range.

Seems right.

Bye,
bearophile
May 17, 2012
On Thu, 17 May 2012 01:48:06 -0400, Nick Sabalausky <SeeWebsiteToContactMe@semitwist.com> wrote:

> I'm a little discouraged that my concern about "input ranges can't save
> their state, and yet that's exactly what happens implicitly" hasn't been
> addressed. I was hoping to at least get a "That's not really a problem and
> here's why..."

input ranges can save their state if they are also forward ranges.

>
> However, that said...
>
> "Andrei Alexandrescu" wrote...
>> It is deliberate and the intent is that millions of programmers used to
>> foreach from other languages don't scream "where is my range???"
>
> "Era Scarecrow" wrote...
>> Range can refer to many things, and I'm assuming array is one of them.
>> So... If the array is consumed when using foreach, that seems dumb right?
>> (An array in the end is just a small struct afterall...)
>
> I admit, those are fair points.
>
> With those in mind, I've given it some more thought, and here are my
> current...umm...thoughts:

[snip]

This is not a problem with ranges, this is an issue with value types versus reference types.  I believe someone has created a byRef struct that wraps a range and iterates it byRef (maybe dsimcha?)

Almost all ranges except true input-only ranges (i.e. input ranges that are only input ranges) should be value types.

If we *didn't* do this, you would need heap data for each range.

-Steve
May 17, 2012
On 17.05.2012 18:18, Steven Schveighoffer wrote:
> On Thu, 17 May 2012 01:48:06 -0400, Nick Sabalausky
> <SeeWebsiteToContactMe@semitwist.com> wrote:
>
>> I'm a little discouraged that my concern about "input ranges can't save
>> their state, and yet that's exactly what happens implicitly" hasn't been
>> addressed. I was hoping to at least get a "That's not really a problem
>> and
>> here's why..."
>
> input ranges can save their state if they are also forward ranges.
>

s/ save state / copy themselves
Fixed :)
That's why I tried to come up with BacktrackRange or some such. An input range that you can "shake" to reset to some saved point, but can't have two separate states at _the same time_ (nor full copy, it's like ... file)



-- 
Dmitry Olshansky
May 17, 2012
On Wed, 16 May 2012 17:37:14 -0400, Nick Sabalausky <SeeWebsiteToContactMe@semitwist.com> wrote:

> A small debate has broken out over on D.learn (
> http://forum.dlang.org/thread/jovicg$jta$1@digitalmars.com#post-jovicg:24jta:241:40digitalmars.com )
> that I thought I should move here.
>
> Basically, the issue is this: Currently, when you have a struct-based range,
> foreach will iterate over a *copy* of it:
>
>     Range r;
>     auto save = r;
>     foreach(e; r)
>         assert(r == save);
>     assert(r == save);
>
> One side of the argument is that this behavior is correct and expected since
> structs are value types, and iterating shouldn't consume the range.
>
> My argument is this:
>
> First of all, foreach is conceptually a flow-control statement, not a
> function. So I'd intuitively expect:
>
>     Range r;
>     foreach(e; r) {...}
>
> To work like this:
>
>     Range r;
>     for(; !r.empty; r.popFront)
>     {
>         auto e = r.front;
>         ...
>     }

Hm... proposal:

foreach(e; ref r)
{
}

equates to your desired code.  Would this help?

-Steve
« First   ‹ Prev
1 2