August 24, 2021

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:

>

A range should be a struct always and thus its state is copied when the foreach loop is created.

Actually the range contracts don't mention that it needs to be a by value type. It can also be a reference type, i.e. a class.

>

Which means the state resets every time the loop is initiated.

True for any forward range and above, not true for input ranges. The problem with them is that some of them are structs, and even if they are not forward ranges they do have this behavior due to implicit copy on assignment, which can potentially make the code confusing.

>

If your range uses some internal state that isn't able to be copied then or your ranges are not structs then your ranges are inherently incorrect.

If we follow the definition of ranges, they must not be copy-able at all. The only way to copy/save, would be to have .save method and call that method. This again is not being properly followed by even phobos implementations.

Note, that a better approach would be to replace .save in definition of forward range with a copy constructor, then all non-compliant ranges would become suddenly compliant, while those that have .save method should be refactored to a copy constructor version.

>

This is what a foreach loop on a range actually compiles to:

for (auto copy = range; !copy.empty; copy.popFront())
{
    ...
}

You should add .save on assignment if range is a forward range, or just remove the assignment if it is not.

Best regards,
Alexandru.

August 24, 2021

On Tuesday, 24 August 2021 at 19:06:44 UTC, Alexandru Ermicioi wrote:

>

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:

>

[...]

Actually the range contracts don't mention that it needs to be a by value type. It can also be a reference type, i.e. a class.

>

[...]

True for any forward range and above, not true for input ranges. The problem with them is that some of them are structs, and even if they are not forward ranges they do have this behavior due to implicit copy on assignment, which can potentially make the code confusing.

>

[...]

If we follow the definition of ranges, they must not be copy-able at all. The only way to copy/save, would be to have .save method and call that method. This again is not being properly followed by even phobos implementations.

Note, that a better approach would be to replace .save in definition of forward range with a copy constructor, then all non-compliant ranges would become suddenly compliant, while those that have .save method should be refactored to a copy constructor version.

>

[...]

You should add .save on assignment if range is a forward range, or just remove the assignment if it is not.

Best regards,
Alexandru.

Just out of curiosity, if a range implementation uses malloc in save, is it only possible to free the memory with the dtor? I worry about that especially when using those nogc range implementations with standard library. I don't have a list of the functions calling save in phobos. Is a save function only meaningful for GC ranges?

August 24, 2021

On 8/24/21 2:12 PM, frame wrote:

> >

You can call popFront if you need to after the loop, or just before the break. I have to say, the term "useless" does not even come close to describing ranges using foreach in my experience.

I disagree, because foreach() is a language construct and therefore it should behave in a logic way. The methods are fine in ranges or if something is done manually. But in case of foreach() it's just unexpected.

I can't agree at all. It's totally expected.

If you have a for loop:

int i;
for(i = 0; i < someArr.length; ++i)
{
   if(someArr[i] == desiredValue) break;
}

You are saying, "compiler, please execute the ++i when I break from the loop because I already processed that one". How can that be expected? I would never expect that. When I break, it means "stop the loop, I'm done", and then I use i which is where I expected it to be.

>

It becomes useless for foreach() because you can't rely on them if other code breaks the loop and you need to use that range, like in my case. But also for ranges - there is no need for a popFront() if it is not called in a logic way. Then even empty() could fetch next data if needed. It only makes sense if language system code uses it in a strictly order and ensures that this order is always assured.

There is no problem with the ordering. What seems to be the issue is that you aren't used to the way ranges work.

What's great about D is that there is a solution for you:

struct EagerPopfrontRange(R)
{
   R source;
   ElementType!R front;
   bool empty;
   void popFront() {
     if(source.empty) empty = true;
     else {
        front = source.front;
        source.popFront;
     }
   }
}

auto epf(R)(R inputRange) {
   auto result = EagerPopfrontRange!R(inputRange);
   result.popFront; // eager!
   return result;
}

// usage
foreach(v; someRange.epf) { ... }

Now if you break from the loop, the original range is pointing at the element after the one you last were processing.

> >

It's not a bug. So there is no need to "handle" it.

The pattern of using a for(each) loop to align certain things occurs all the time in code. Imagine a loop that is looking for a certain line in a file, and breaks when the line is there. Would you really want the compiler to unhelpfully throw away that line for you?

I don't get this point. If it breaks from the loop then it changes the scope anyway, so my data should be already processed or copied. What is thrown away here?
Why does the loop have to contain all your code? Maybe you have code after the loop. Maybe the loop's purpose is to align the range based on some criteria (e.g. take this byLine range and prime it so it contains the first line of the thing I'm looking for).

> >

And if that is what you want, put popFront in the loop before you exit. You can't "unpopFront" something, so this provides the most flexibility.

Yes, this is the solution but not the way how it should be. If the programmer uses the range methods within the foreach-loop then you would expect some bug. There shouldn't be a need to manipulate the range just because I break the foreach-loop.

You shouldn't need to in most circumstances. I don't think I've ever needed to do this. And I use foreach on ranges all the time.

Granted, I probably would use a while loop to align a range rather than foreach.

>

Java, for example just uses next() and hasNext(). You can't run into a bug here because one method must move the cursor.

This gives a giant clue as to the problem -- you aren't used to this. Java's iterator interface is different than D's. It consumes the element as you fetch it, instead of acting like a pointer to a current element. Once it gives you the element, it's done with it.

D's ranges are closer to a C++ iterator pair (which is modeled after a pair of pointers).

>

PHP has a rewind() method. So any foreach() would reset the range or could clean up before next use of it.

I'm surprised you bring PHP as an example, as it appears their foreach interface works EXACTLY as D does:

$arriter = new ArrayIterator(array(1, 2, 3, 4));
foreach($arriter as $val) { if ($val == 2) break; }
print($arriter->current()); // 2
>

But D just lets your range in an inconsistent state between an iteration cycle. This feels just wrong. The next foreach() would not continue with popFront() but with empty() again - because it even relies on it that a range should be called in a given order. As there is no rewind or exit-method, this order should be maintained by foreach-exit too, preparing for next use. That's it.

You don't see a bug here?

I believe the bug is in your expectations. While Java-like iteration would be a possible API D could have chosen, it's not what D chose.

-Steve

August 24, 2021
On 8/24/21 1:44 PM, Ferhat Kurtulmuş wrote:

> Just out of curiosity, if a range implementation uses malloc in save, is
> it only possible to free the memory with the dtor?

Yes but It depends on the specific case. For example, if the type has a clear() function that does clean up, then one might call that. I don't see it as being different from any other resource management.

> Is a save function only meaningful for GC ranges?

save() is to store the iteration state of a range. It should seldom require memory allocation unless we're dealing with e.g. stdin where we would have to store input lines just to support save(). It would not be a good design to hide such  potentilly expensive storage of lines behind save().

To me, save() should mostly be as trivial as returning a copy of the struct object to preserve the state of the original range. Here is a trivial generator:

import std.range;

struct Squares {
  int current;

  enum empty = false;

  int front() const {
    return current * current;
  }

  void popFront() {
    ++current;
  }

  auto save() {
    return this;
  }
}

void main() {
  auto r = Squares(0);
  r.popFront();  // Drop 0 * 0
  r.popFront();  // Drop 1 * 1

  auto copy = r.save;
  copy.popFront();  // Drop 2 * 2 only from the copy

  assert(r.front == 2 * 2);  // Saved original still has 2 * 2
}

Ali


August 25, 2021

On Tuesday, 24 August 2021 at 19:06:44 UTC, Alexandru Ermicioi wrote:

>

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:

>

A range should be a struct always and thus its state is copied when the foreach loop is created.

Actually the range contracts don't mention that it needs to be a by value type. It can also be a reference type, i.e. a class.

Of course it doesn't disallow classes but it's generally advised that you use structs and that's what you want in 99% of the cases. It's usually a red flag when a range starts being a reference type.

August 25, 2021

On Tuesday, 24 August 2021 at 18:52:19 UTC, Alexandru Ermicioi wrote:

>

Forward range exposes also capability to create save points, which is actually used by foreach to do, what it is done in java by iterable interface for example.

I know, but foreach() doesn't call save().

August 25, 2021

On Tuesday, 24 August 2021 at 21:15:02 UTC, Steven Schveighoffer wrote:

>

If you have a for loop:

int i;
for(i = 0; i < someArr.length; ++i)
{
   if(someArr[i] == desiredValue) break;
}

You are saying, "compiler, please execute the ++i when I break from the loop because I already processed that one". How can that be expected? I would never expect that. When I break, it means "stop the loop, I'm done", and then I use i which is where I expected it to be.

I get your point, you see foreach() as raw translate to the for-loop and I'm fine with that. To automatically popFront() on break also is only a suggestion if there is no other mechanism to the tell the range we have cancelled it.

> >

It becomes useless for foreach() because you can't rely on them if other code breaks the loop and you need to use that range, like in my case. But also for ranges - there is no need for a popFront() if it is not called in a logic way. Then even empty() could fetch next data if needed. It only makes sense if language system code uses it in a strictly order and ensures that this order is always assured.

There is no problem with the ordering. What seems to be the issue is that you aren't used to the way ranges work.

Ehm, no...
-> empty()
-> front()
-> popFront()
-> empty()
-> front()
break;

-> empty();
-> front();

clearly violates the order for me.
Well, nobody said that we must move on the range - but come on...

>

What's great about D is that there is a solution for you:

struct EagerPopfrontRange(R)
{
   R source;
   ElementType!R front;
   bool empty;
   void popFront() {
     if(source.empty) empty = true;
     else {
        front = source.front;
        source.popFront;
     }
   }
}

auto epf(R)(R inputRange) {
   auto result = EagerPopfrontRange!R(inputRange);
   result.popFront; // eager!
   return result;
}

// usage
foreach(v; someRange.epf) { ... }

Now if you break from the loop, the original range is pointing at the element after the one you last were processing.

This is nice. But foreach() should do it automatically - avoiding this.
foreach() should be seen as a special construct that does that, not just a dumb alias for the for-loop. Why? Because it is a convenient language construct and usage should be easy. Again, there should be no additional popFront() just because I break the loop.

>

I'm surprised you bring PHP as an example, as it appears their foreach interface works EXACTLY as D does:

Yeah, but the point is, there is a rewind() method. That is called every time on foreach().

August 25, 2021

On Wednesday, 25 August 2021 at 06:51:36 UTC, bauss wrote:

>

Of course it doesn't disallow classes but it's generally advised that you use structs and that's what you want in 99% of the cases. It's usually a red flag when a range starts being a reference type.

Well, sometimes you can't avoid ref types. For example when you need to mask the implementation of the range, but yes, in most of the cases best is to use simpler methods to represent ranges.

August 25, 2021

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:

>

A range should be a struct always and thus its state is copied when the foreach loop is created.

That's quite a strong assumption, because its state might be a reference type, or it might not have state in a meaningful sense -- consider an input range that wraps reading from a socket, or that just reads from /dev/urandom, for two examples.

Deterministic copying per foreach loop is only guaranteed for forward ranges.

August 25, 2021

On Wednesday, 25 August 2021 at 08:15:18 UTC, frame wrote:

>

I know, but foreach() doesn't call save().

Hmm, this is a regression probably, or I missed the time frame when foreach moved to use of copy constructor for forward ranges.

Do we have a well defined description of what input, forward and any other well known range is, and how it does interact with language features?

For some reason I didn't manage to find anything on dlang.org.