Jump to page: 1 24  
Page
Thread overview
foreach() behavior on ranges
Aug 24, 2021
frame
Aug 24, 2021
bauss
Aug 24, 2021
frame
Aug 24, 2021
Alexandru Ermicioi
Aug 24, 2021
Ferhat Kurtulmuş
Aug 24, 2021
Ali Çehreli
Aug 25, 2021
bauss
Aug 25, 2021
Alexandru Ermicioi
Aug 25, 2021
H. S. Teoh
Aug 24, 2021
jfondren
Aug 24, 2021
frame
Aug 24, 2021
frame
Aug 25, 2021
frame
Aug 26, 2021
frame
Aug 24, 2021
H. S. Teoh
Aug 24, 2021
frame
Aug 24, 2021
Alexandru Ermicioi
Aug 25, 2021
frame
Aug 25, 2021
Alexandru Ermicioi
Aug 25, 2021
Alexandru Ermicioi
August 24, 2021

Consider a simple input range that can be iterated with empty(), front() and popFront(). That is comfortable to use with foreach() but what if the foreach loop will be cancelled? If a range isn't depleted yet and continued it will supply the same data twice on front() in the next use of foreach().

For some reason, foreach() does not call popFront() on a break or continue statement. There is no way to detect it except the range itself tracks its status and does an implicit popFront() if needed - but then this whole interface is some kind of useless.

There is opApply() on the other hand that is designed for foreach() and informs via non-0-result if the loop is cancelled - but this means that every range must implement it if the range should work in foreach() correctly?

This is very inconsistent. Either foreach() should deny usage of ranges that have no opApply() method or there should be a reset() or cancel() method in the interfaces that may be called by foreach() if they are implemented.

How do you handle that issue? Are your ranges designed to have this bug or do you implement opApply() always?

August 24, 2021

On Tuesday, 24 August 2021 at 08:36:18 UTC, frame wrote:

>

Consider a simple input range that can be iterated with empty(), front() and popFront(). That is comfortable to use with foreach() but what if the foreach loop will be cancelled? If a range isn't depleted yet and continued it will supply the same data twice on front() in the next use of foreach().

For some reason, foreach() does not call popFront() on a break or continue statement. There is no way to detect it except the range itself tracks its status and does an implicit popFront() if needed - but then this whole interface is some kind of useless.

There is opApply() on the other hand that is designed for foreach() and informs via non-0-result if the loop is cancelled - but this means that every range must implement it if the range should work in foreach() correctly?

This is very inconsistent. Either foreach() should deny usage of ranges that have no opApply() method or there should be a reset() or cancel() method in the interfaces that may be called by foreach() if they are implemented.

How do you handle that issue? Are your ranges designed to have this bug or do you implement opApply() always?

A range should be a struct always and thus its state is copied when the foreach loop is created.

Which means the state resets every time the loop is initiated.

If your range uses some internal state that isn't able to be copied then or your ranges are not structs then your ranges are inherently incorrect.

This is what a foreach loop on a range actually compiles to:

for (auto copy = range; !copy.empty; copy.popFront())
{
    ...
}

This is easily evident in this example:

https://run.dlang.io/is/YFuWHn

Which prints:
1
2
1
2
3
4
5

Unless I'm misunderstanding your concern?

August 24, 2021

On Tuesday, 24 August 2021 at 08:36:18 UTC, frame wrote:

>

Consider a simple input range that can be iterated with empty(), front() and popFront(). That is comfortable to use with foreach() but what if the foreach loop will be cancelled? If a range isn't depleted yet and continued it will supply the same data twice on front() in the next use of foreach().

I think you strayed from the beaten path, in a second way, as soon as your range's lifetime escaped a single expression, to be possibly used in two foreach loops. With ranges, as you do more unusual things, you're already encouraged to use a more advanced range. And ranges already have caveats for surprising behavior, like map/filter interactions that redundantly execute code. So I see this as a documentation problem. The current behavior of 'if you break then the next foreach gets what you broke on' is probably a desirable behavior for some uses:

import std;

class MyIntRange {
    int[] _elements;
    size_t _offset;

    this(int[] elems) { _elements = elems; }

    bool empty() { return !_elements || _offset >= _elements.length; }

    int front() { return _elements[_offset]; }

    void popFront() { _offset++; }
}

void main() {
    auto ns = new MyIntRange([0, 1, 1, 2, 3, 4, 4, 4, 5]);
    // calls writeln() as many times as there are numbers:
    while (!ns.empty) {
        foreach (odd; ns) {
            if (odd % 2 == 0) break;
            writeln("odd: ", odd);
        }
        foreach (even; ns) {
            if (even % 2 != 0) break;
            writeln("even: ", even);
        }
    }
}
August 24, 2021

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:

>

A range should be a struct always and thus its state is copied when the foreach loop is created.

This is not conform with the aggregate expression mentioned in the manual where a class object would be also allowed.

>

Which means the state resets every time the loop is initiated.

Yes, it should reset - thus foreach() also needs to handle that correctly.

August 24, 2021

On Tuesday, 24 August 2021 at 09:26:20 UTC, jfondren wrote:

>

I think you strayed from the beaten path, in a second way, as soon as your range's lifetime escaped a single expression, to be possibly used in two foreach loops. With ranges, as you do more unusual things, you're already encouraged to use a more advanced range. And ranges already have caveats for surprising behavior, like map/filter interactions that redundantly execute code. So I see this as a documentation problem. The current behavior of 'if you break then the next foreach gets what you broke on' is probably a desirable behavior for some uses:

Yes, I have a special case where a delegate jumps back to the range because something must be buffered before it can be delivered.

>
import std;

class MyIntRange {
    int[] _elements;
    size_t _offset;

    this(int[] elems) { _elements = elems; }

    bool empty() { return !_elements || _offset >= _elements.length; }

    int front() { return _elements[_offset]; }

    void popFront() { _offset++; }
}

void main() {
    auto ns = new MyIntRange([0, 1, 1, 2, 3, 4, 4, 4, 5]);
    // calls writeln() as many times as there are numbers:
    while (!ns.empty) {
        foreach (odd; ns) {
            if (odd % 2 == 0) break;
            writeln("odd: ", odd);
        }
        foreach (even; ns) {
            if (even % 2 != 0) break;
            writeln("even: ", even);
        }
    }
}

That is just weird. It's not logical and a source of bugs. I mean, we should use foreach() to avoid loop-bugs. Then it's a desired behavior to rely on that?

August 24, 2021

On 8/24/21 4:36 AM, frame wrote:

>

Consider a simple input range that can be iterated with empty(), front() and popFront(). That is comfortable to use with foreach() but what if the foreach loop will be cancelled? If a range isn't depleted yet and continued it will supply the same data twice on front() in the next use of foreach().

For some reason, foreach() does not call popFront() on a break or continue statement.

continue calls popFront. break does not.

>

There is no way to detect it except the range itself tracks its status and does an implicit popFront() if needed - but then this whole interface is some kind of useless.

You can call popFront if you need to after the loop, or just before the break. I have to say, the term "useless" does not even come close to describing ranges using foreach in my experience.

>

There is opApply() on the other hand that is designed for foreach() and informs via non-0-result if the loop is cancelled - but this means that every range must implement it if the range should work in foreach() correctly?

opApply has to return different values because it needs you to pass through its instructions to the compiler-generated code. The compiler has written the delegate to return the message, and so you need to pass through that information. The non-zero result is significant, not just non-zero. For instance, if you end with a break somelabel; statement, it has to know which label to go to.

The correct behavior for opApply should be, if the delegate returns non-zero, return that value immediately. It should not be doing anything else. Would you be happy with a break somelabel; actually triggering output? What if it just continued the loop instead? You don't get to decide what happens at that point, you are acting as the compiler.

>

This is very inconsistent. Either foreach() should deny usage of ranges that have no opApply() method or there should be a reset() or cancel() method in the interfaces that may be called by foreach() if they are implemented.

How do you handle that issue? Are your ranges designed to have this bug or do you implement opApply() always?

It's not a bug. So there is no need to "handle" it.

The pattern of using a for(each) loop to align certain things occurs all the time in code. Imagine a loop that is looking for a certain line in a file, and breaks when the line is there. Would you really want the compiler to unhelpfully throw away that line for you?

And if that is what you want, put popFront in the loop before you exit. You can't "unpopFront" something, so this provides the most flexibility.

-Steve

August 24, 2021
On Tue, Aug 24, 2021 at 08:36:18AM +0000, frame via Digitalmars-d-learn wrote:
> Consider a simple input range that can be iterated with empty(),
> front() and popFront(). That is comfortable to use with foreach() but
> what if the foreach loop will be cancelled? If a range isn't depleted
> yet and continued it will supply the same data twice on front() in the
> next use of foreach().

Generally, if you need precise control over range state between multiple loops, you really should think about using a while loop instead of a for loop, and call .popFront where it's needed.


> For some reason, foreach() does not call popFront() on a break or continue
> statement. There is no way to detect it except the range itself tracks its
> status and does an implicit popFront() if needed - but then this whole
> interface is some kind of useless.

In some cases, you *want* to retain the same element between loops, e.g., if you're iterating over elements of some category and stop when you encounter something that belongs to the next category -- you wouldn't want to consume that element, but leave it to the next loop to consume it.  So it's not a good idea to have break call .popFront automatically.  Similarly, sometimes you might want to reuse an element (e.g., the loop body detects a condition that warrants retrying).

Basically, once you need anything more than a single sequential iteration over a range, it's better to be explicit about what exactly you want, rather than depend on implicit semantics, which may lead to surprising results.

	while (!range.empty) {
		doSomething(range.front);
		if (someCondition) {
			range.popFront;
			break;
		} else if (someOtherCondition) {
			// Don't consume current element
			break;
		} else if (skipElement) {
			range.popFront;
			continue;
		} else if (retryElement) {
			continue;
		}
		range.popFront;	// normal iteration
	}


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
August 24, 2021

On Tuesday, 24 August 2021 at 13:02:38 UTC, Steven Schveighoffer wrote:

>

On 8/24/21 4:36 AM, frame wrote:

>

Consider a simple input range that can be iterated with empty(), front() and popFront(). That is comfortable to use with foreach() but what if the foreach loop will be cancelled? If a range isn't depleted yet and continued it will supply the same data twice on front() in the next use of foreach().

For some reason, foreach() does not call popFront() on a break or continue statement.

continue calls popFront. break does not.

Of course by the next iteration, you are right.

>

You can call popFront if you need to after the loop, or just before the break. I have to say, the term "useless" does not even come close to describing ranges using foreach in my experience.

I disagree, because foreach() is a language construct and therefore it should behave in a logic way. The methods are fine in ranges or if something is done manually. But in case of foreach() it's just unexpected.

It becomes useless for foreach() because you can't rely on them if other code breaks the loop and you need to use that range, like in my case. But also for ranges - there is no need for a popFront() if it is not called in a logic way. Then even empty() could fetch next data if needed. It only makes sense if language system code uses it in a strictly order and ensures that this order is always assured.

>

It's not a bug. So there is no need to "handle" it.

The pattern of using a for(each) loop to align certain things occurs all the time in code. Imagine a loop that is looking for a certain line in a file, and breaks when the line is there. Would you really want the compiler to unhelpfully throw away that line for you?

I don't get this point. If it breaks from the loop then it changes the scope anyway, so my data should be already processed or copied. What is thrown away here?

>

And if that is what you want, put popFront in the loop before you exit. You can't "unpopFront" something, so this provides the most flexibility.

-Steve

Yes, this is the solution but not the way how it should be. If the programmer uses the range methods within the foreach-loop then you would expect some bug. There shouldn't be a need to manipulate the range just because I break the foreach-loop.

Java, for example just uses next() and hasNext(). You can't run into a bug here because one method must move the cursor.

PHP has a rewind() method. So any foreach() would reset the range or could clean up before next use of it.

But D just lets your range in an inconsistent state between an iteration cycle. This feels just wrong. The next foreach() would not continue with popFront() but with empty() again - because it even relies on it that a range should be called in a given order. As there is no rewind or exit-method, this order should be maintained by foreach-exit too, preparing for next use. That's it.

You don't see a bug here?

August 24, 2021
On Tuesday, 24 August 2021 at 16:45:27 UTC, H. S. Teoh wrote:

>
> In some cases, you *want* to retain the same element between loops, e.g., if you're iterating over elements of some category and stop when you encounter something that belongs to the next category -- you wouldn't want to consume that element, but leave it to the next loop to consume it.  So it's not a good idea to have break call .popFront automatically.  Similarly, sometimes you might want to reuse an element (e.g., the loop body detects a condition that warrants retrying).

I'm only talking about foreach() uses and that you should'nt need to mix it with manual methods. Such iterations are another topic.

August 24, 2021

On Tuesday, 24 August 2021 at 08:36:18 UTC, frame wrote:

>

How do you handle that issue? Are your ranges designed to have this bug or do you implement opApply() always?

This is expected behavior imho. I think what you need is a forward range, not input range. By the contract of input range, it is a consumable object, hence once used in a foreach it can't be used anymore. It is similar to an iterator or a stream object in java.

Forward range exposes also capability to create save points, which is actually used by foreach to do, what it is done in java by iterable interface for example.

Then there is bidirectional and random access ranges that offer even more capabilities.

Per knowledge I have opApply is from pre range era, and is kinda left as an option to provide easy foreach integration. In this case you can think of objects having opApply as forward ranges, though just for foreach constructs only.

Regards,
Alexandru.

« First   ‹ Prev
1 2 3 4