Thread overview
Parameter storage classes on foreach variables
May 17

As of now, foreach admits ref variables as in foreach (ref x; xs). There, ref can be used for two conceptually different things:

  • Avoiding copies
  • Mutating the values in place

If mutating in place is desired, ref is an excellent choice.
However, if mere copy avoiding is desired, another great option would be in.
On parameters, it avoids expensive copies, but does trivial ones.

A type supplying opApply can, in principle, easily provide an implementation where the callback takes an argument by in or out:

struct Range
{
    int opApply(scope int delegate(size_t, in X) callback)
    {
        X x;
        if (auto result = callback(0, x)) return result;
        return 0;
    }
}

For out, it’s not really different.

However, how do classical ranges (empty, front, popFront) fare with these?
First in.

foreach (in x; xs) { … }
// lowers to
{
    auto __xs = xs;
    for (; !__xs.empty; __xs.popFront)
    {
        static if (/* should be ref */)
            const scope ref x = __xs.front;
        else
            const scope x = __xs.front;
        …
    }
}

The first notable observation is that out makes no sense for input ranges. Rather, it would make sense for, well, output ranges: Every time the loop reaches the end, a put is issued, whereas continue means “this loop iteration did not produce a value, but continue” and break means “end the loop”:

foreach (out T x; xs) { … }
// lowers to
{
    auto __xs = xs; // or xs[]
    for (; !__xs.empty /* or __xs.length > 0 or nothing */;)
    {
        auto x = T.init;
        …
        __xs.put(x); /* or similar */
    }
}

The program should assign x in its body. If control reaches the end of the loop, the value is put in the output range.
As an output range, in general, need not be finite, the loop is endless by design, but if the range has an empty member, it’s being used, and for types with length, but no empty, the condition is __xs.length > 0. For arrays and slices, the put operation is __xs[0] = x; __xs = __xs[1 .. $];.

If T is not explicitly given, and xs is not an array or slice, an attempt should be made to extract it from the single parameter of a non-overloaded xs.put. Otherwise, it’s an error.

Dynamic arrays and slices should support size_t keys as well:

foreach (i, out x; xs) { … }
// lowers to
{
    auto __xs = xs[];
    for (size_t __i = 0; __xs.length > 0; ++__i)
    {
        size_t i = __i;
        auto x = typeof(xs[0]).init;
        …
        __xs[0] = x;
        __xs = __xs[1 .. $];
    }
}

Associative arrays specifically can be filled using out key and values:

int[string] aa;
foreach (out key, out value; aa) { … }
// lowers to
{
    auto __aa = aa;
    for (;;)
    {
        KeyType key = KeyType.init;
        ValueType value = ValueType.init;
        …
        __aa[key] = value;
    }
}

At some point, a break is needed, otherwise the loop is infinite.

May 19
On 5/17/24 20:59, Quirin Schroll wrote:
> As of now, `foreach` admits `ref` variables as in `foreach (ref x; xs)`. There, `ref` can be used for two conceptually different things:
> * Avoiding copies
> * Mutating the values in place
> 
> If mutating in place is desired, `ref` is an excellent choice.
> However, if mere copy avoiding is desired, another great option would be `in`.

I contest that `in` is a great option every time mere avoiding of copies is desired (because it implies transitive `const`).

In general, extending `foreach` to `in` and `out` makes some sense, but `out` is likely to be quite controversial, especially the output range lowering. When I think of `foreach`, I think of consuming a range, not producing one.
May 20
On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
>> If mutating in place is desired, `ref` is an excellent choice.
>> However, if mere copy avoiding is desired, another great option would be `in`.
>
> I contest that `in` is a great option every time mere avoiding of copies is desired (because it implies transitive `const`).

Did you mean "isn't a great option"?
And if so, presumably we still need `auto ref`:
https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1022.md

> In general, extending `foreach` to `in` and `out` makes some sense, but `out` is likely to be quite controversial, especially the output range lowering. When I think of `foreach`, I think of consuming a range, not producing one.

+1
May 20
On 5/20/24 16:29, Nick Treleaven wrote:
> On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
>>> If mutating in place is desired, `ref` is an excellent choice.
>>> However, if mere copy avoiding is desired, another great option would be `in`.
>>
>> I contest that `in` is a great option every time mere avoiding of copies is desired (because it implies transitive `const`).
> ...

The negation is in the word "contest".
(Stated more clearly: Sometimes `in` cannot be used because `const` is transitive.)

> Did you mean "isn't a great option"?
> And if so, presumably we still need `auto ref`:
> https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1022.md
> ...

Would be good, also for local variables outside of `foreach`.

May 21

On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:

>

On 5/17/24 20:59, Quirin Schroll wrote:

>

As of now, foreach admits ref variables as in foreach (ref x; xs). There, ref can be used for two conceptually different things:

  • Avoiding copies
  • Mutating the values in place

If mutating in place is desired, ref is an excellent choice.
However, if mere copy avoiding is desired, another great option would be in.

I contest that in is a great option every time mere avoiding of copies is desired (because it implies transitive const).

True. In generic code, one basically can’t use const, and therefore in, as a type can become simply unusable. (Prime example would be delegate types once they’re fixed.)

I case you know the type and it’s a type that works well being const, then in might be a great option.

>

In general, extending foreach to in and out makes some sense, but out is likely to be quite controversial, especially the output range lowering. When I think of foreach, I think of consuming a range, not producing one.

I thought the same, but on the other hand, there’s a keyword, so it absolutely won’t happen accidentally. It may just surprise people to read it in someone else’s code.

My sense is that everything that the stuff in a foreach header before the semicolon should support exactly the same things a lambda parameter list would simply because it may become a lambda passed to opApply. If it isn’t, well, it’s up for discussion what to do with it. Making it invalid is always an option.

May 21

On Friday, 17 May 2024 at 18:59:13 UTC, Quirin Schroll wrote:

>
foreach (out T x; xs) { … }
// lowers to
{
    auto __xs = xs; // or xs[]
    for (; !__xs.empty /* or __xs.length > 0 or nothing */;)
    {
        auto x = T.init;
        …
        __xs.put(x); /* or similar */
    }
}

[...]

>
int[string] aa;
foreach (out key, out value; aa) { … }
// lowers to
{
    auto __aa = aa;
    for (;;)
    {
        KeyType key = KeyType.init;
        ValueType value = ValueType.init;
        …
        __aa[key] = value;
    }
}

I don't like these special-case rewrites. Binding an array/AA/range element to an out loop variable should have exactly the same semantics as binding a function argument to an out parameter. That is,

  • The element must be an lvalue.
  • The element is bound by reference.
  • Upon being bound, the element is set its .init value.

So, no implicit calls to put, no implicit insertion of AA elements, etc.

Aside from that, this seems like a good idea to me. 👍