Thread overview
News and problems about foreach loops
Nov 03, 2012
bearophile
Nov 03, 2012
Peter Alexander
Nov 03, 2012
bearophile
Nov 03, 2012
Andrej Mitrovic
Nov 03, 2012
Andrej Mitrovic
November 03, 2012
(This post is not crystal clear because unfortunately now I don't have a lot of time to make it better, but I think it's acceptable.)

In DMD 2.061 there will be some interesting changes, one of them regards the foreach loops. Foreach loops are everywhere in D code, it's one of the most used D constructs, so this is an important topic; and avoiding problems and corner cases, and making it not bug-prone is quite important.

Hara, Walter and others have started to implement a change I have suggested time ago (from Python loops):


import std.stdio;
void main() {
    foreach (i; 0 .. 10)
        i++; // line 4
    auto array = [10, 20, 30, 40, 50];
    foreach (i, item; array) {
        writeln(item);
        i++; // line 8
    }
}



With warnings this now gives:

test.d(4): Warning: variable modified in foreach body requires ref storage class
test.d(8): Warning: variable modified in foreach body requires ref storage class
10
30
50



Anoter change is visible here, in past this code required:

struct Foo { int x; }
void main() {
    auto data = [Foo(10), Foo(20), Foo(30)];
    foreach (const Foo f; data) {}
}



Now you can omit the type, this is very handy:

struct Foo { int x; }
void main() {
    auto data = [Foo(10), Foo(20), Foo(30)];
    foreach (const f; data) {}
}




Unfortunately currently this compiles and runs, but there is already a patch to forbid it:
https://github.com/D-Programming-Language/dmd/pull/1249


void main() {
    foreach (const i; 0 .. 10)
        i++;
}


See also:
http://d.puremagic.com/issues/show_bug.cgi?id=6214
http://d.puremagic.com/issues/show_bug.cgi?id=4090
http://d.puremagic.com/issues/show_bug.cgi?id=6652



Given how foreach now works on integer ranges, there is one more situation worth discussing. In D this is a recognized common source of bugs:


struct F { int x; }
void main() {
    auto fs = [F(10), F(20), F(30)];
    foreach (f; fs)
        f.x += 1;
    assert(fs[0].x == 10);
}



I think there are 3 main use cases for foreach loops on collections of structs:

Case1: The struct iterated on is constant or it's not modified in the loop body:

import std.stdio;
struct F { int x; }
void main() {
    auto data = [F(10), F(20), F(30)];
    foreach (f; data)
        writeln(f);
    foreach (const f; data)
        writeln(f);
    foreach (const ref f; data)
        writeln(f);
    foreach (immutable f; data)
        writeln(f);
    immutable data2 = [F(10), F(20), F(30)];
    foreach (immutable ref f; data2)
        writeln(f);
}



Case2: you want to modify the structs of the array:

import std.stdio;
struct F { int x; }
void main() {
    auto data = [F(10), F(20), F(30)];
    foreach (ref f; data)
        f.x++;
    writeln(data);
}



Case3: you want to modify just the copy of the struct, but you don't want to modify the structs inside the array:

import std.stdio, std.range;
void main() {
    auto iotas = [iota(0,3), iota(3,7), iota(7,10)];
    foreach (it; iotas) {
        it.popFront();
        it.popFront();
        writeln(it);
    }
    writeln(iotas);
}




[Note: in foreach ranged loops you only have Case1 and Case2 (and in practice Case2 is not common):

import std.stdio;
void main() {
    foreach (const i; 0 .. 10)
        writeln(i);
}

]


Currently in D Case1 requires nothing, but today it's also more handy than before to mark the struct loop variable with const or immutable or even "const ref", to make sure you will not change it inside the loop.

Case2 requires ref. If you see "ref" (but not "const ref") you know inside the loop the struct will probably be modified.

Currently Case3 is denoted as Case1, but in my code this is a much less common case. It's easy to write by mistake the Case3 pattern when you want to write the Case2 pattern.


So if there is a warning that warns against code like (later the warning will become an error, I think. It's a very small breaking change):

import std.stdio;
void main() {
    foreach (i; 0 .. 4)
        i++;
}



then maybe the fake case3 too should become a warning and later an error (this is another breaking change):

// Solution1:
void main() {
    auto data = [0, 1, 2, 3];
    foreach (x; data)
        x++; // warning
}

So the compiler always require "const" or "ref" in the loops, so you can't write true Case3 any more. You have to copy the struct manually:

import std.stdio, std.range;
void main() {
    auto iotas = [iota(0,3), iota(3,7), iota(7,10)];
    foreach (it; iotas) {
        auto it2 = it;
        it2.popFront();
        it2.popFront();
        writeln(it2);
    }
    writeln(iotas);
}


What are the alternative solutions?

A second solution is to require some kind of tagging to tell the D compiler that the programmer really wants the Case3:

// Solution2:
void main() {
    auto data = [0, 1, 2, 3];
    foreach (@mutable x; data)
        x++;
    foreach (/*mut*/ x; data)
        x++;
}


A third solution is use idioms, and do not change D. It means that on default the programmer puts always a "const" in foreach. This avoids most bugs caused by fake Case3, and you don't use it in the uncommon true Cases3.

There are few more alternative ways to face this problem. Including doing nothing :-)

Bye,
bearophile
November 03, 2012
On Saturday, 3 November 2012 at 14:04:45 UTC, bearophile wrote:
>
> A third solution is use idioms, and do not change D. It means that on default the programmer puts always a "const" in foreach. This avoids most bugs caused by fake Case3, and you don't use it in the uncommon true Cases3.
>
> There are few more alternative ways to face this problem. Including doing nothing :-)

I never use Case3, except accidentally. It's possibly the most common language-caused bug in my D code.

That said, I'm not a fan of introducing new keywords or introducing breaking changes in D code (even though I don't use Case3). I think this might just be something we have to live with, although making use of "const" idiomatic may be a good step forward (unfortunately, it's more unnecessary typing, so it's unlikely to catch on).
November 03, 2012
On 11/3/12, bearophile <bearophileHUGS@lycos.com> wrote:
> Now you can omit the type, this is very handy:
>
> struct Foo { int x; }
> void main() {
>      auto data = [Foo(10), Foo(20), Foo(30)];
>      foreach (const f; data) {}
> }

That's really cool that we have this now. But this error message needs to improve:

    foreach (const f; data)
    {
        f.x++;
    }

test.d(9): Error: can only initialize const member x inside constructor

It should be something like "cannot modify const member x".
November 03, 2012
On 11/3/12, Andrej Mitrovic <andrej.mitrovich@gmail.com> wrote:
> It should be something like "cannot modify const member x".

In fact the error is already read but the else statement is not taken in Expression::checkModifiable:

    if (var && var->storage_class & STCctorinit)  // goes here
    {
        const char *p = var->isStatic() ? "static " : "";
        error("can only initialize %sconst member %s inside %sconstructor",
            p, var->toChars(), p);
    }
    else
    {
        OutBuffer buf;
        MODtoBuffer(&buf, type->mod);
        error("cannot modify %s expression %s", buf.toChars(), toChars());
    }
November 03, 2012
Peter Alexander:

> I'm not a fan of introducing new keywords or introducing breaking changes in D code (even though I don't use Case3). I think this might just be something we have to live with,

Using a custom property is maybe acceptable (this @copy doesn't need to be a built-in, probably it's not too much hard to define it with user-defined attributes):


void main() {
    auto data = [0, 1, 2, 3];
    foreach (@copy x; data)
        x++;
}


But there are other solutions, like asking D tool (like D IDEs or future D lints) to show a warning here.

Another solution is to require an already present keyword ("new" means it's a copy, it's not related to heap allocations) for the Case3:

void main() {
    auto data = [0, 1, 2, 3];
    foreach (new x; data)
        x++;
}

Bye,
bearophile