Thread overview
Printing a range of ranges drains them
May 27
monkyyy
May 28
monkyyy
May 27

If you print a range of ranges (that are not arrays) with writeln, even if the nested range is a forward range, writeln will drain the nested ranges.

example:

import std.stdio;
import std.range;
struct R
{
    int* ptr;
    size_t len;
    int front() {return  *ptr;}
    void popFront() { ++ptr; --len; }
    bool empty() {return len == 0;}
    typeof(this) save() { return this; }
}

static assert(isForwardRange!R);

void main()
{
    int[] arr = [1, 2, 3];
    auto r = R(arr.ptr, arr.length);
    R[] mdarr = [r, r, r];
    writeln(mdarr);
    writeln(mdarr);
}

Output:

[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[[], [], []]

If you do this with nested arrays, it does not drain the inner arrays.

You can fix by un-reffing the elements of the outer array: writeln(mdarr.map!(e => e));

So, does anyone expect this behavior? If so, can you explain why you think this is intentionally designed this way?

I wanted to file a bug, but I was shocked that this behavior as far as I can tell has always existed, and nobody has ever filed a bug on it.

-Steve

May 27

On Monday, 27 May 2024 at 00:25:42 UTC, Steven Schveighoffer wrote:

>

So, does anyone expect this behavior? If so, can you explain why you think this is intentionally designed this way?

I think everything is as it should be. Because each element in mdarr is a copy of each other. It will appear blank unless you rewind. You can see the situation with the reward() function:

    //...
    auto a = R(arr.ptr, arr.length);
    auto arrs = [ a, a, a ];

    void reward(R[] r)
    {
        foreach(i,ref e; r)
        {
            ++i;
            foreach(_; 0..i)
            {
                --e.ptr;
                ++e.len;
            }
        }
    }
    writeln(arrs); // [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
    reward(arrs);
    writeln(arrs); // [[3], [2, 3], [1, 2, 3]]
}

SDB@79

May 27
On Sunday, May 26, 2024 6:25:42 PM MDT Steven Schveighoffer via Digitalmars-d wrote:
> If you print a range of ranges (that are not arrays) with `writeln`, even if the nested range is a forward range, `writeln` will drain the nested ranges.
>
> example:
>
> ```d
> import std.stdio;
> import std.range;
> struct R
> {
>      int* ptr;
>      size_t len;
>      int front() {return  *ptr;}
>      void popFront() { ++ptr; --len; }
>      bool empty() {return len == 0;}
>      typeof(this) save() { return this; }
> }
>
> static assert(isForwardRange!R);
>
> void main()
> {
>      int[] arr = [1, 2, 3];
>      auto r = R(arr.ptr, arr.length);
>      R[] mdarr = [r, r, r];
>      writeln(mdarr);
>      writeln(mdarr);
> }
> ```
>
> Output:
>
> ```
> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
> [[], [], []]
> ```
>
> If you do this with nested arrays, it does not drain the inner arrays.
>
> You can fix by un-reffing the elements of the outer array:
> `writeln(mdarr.map!(e => e));`
>
> So, does anyone expect this behavior? If so, can you explain why you think this is intentionally designed this way?
>
> I wanted to file a bug, but I was shocked that this behavior as far as I can tell has always existed, and nobody has ever filed a bug on it.

I don't recall ever really thinking about it. I don't think that it's something that I've done very often, and when I have, it was probably for debugging. And in many cases, if you want readable input, it makes sense to use foreach to loop through the outer range and print out each inner range individually, in which case, you can call save on the inner ranges. That's usually what I'd do if I know that I'm printing out a range of ranges.

Given that writeln needs to work with basic input ranges, having it not consume the inner ranges would result in different behavior between basic input ranges and forward ranges, which wouldn't be great. So, arguably, having it consume them is the correct choice, but I'd have to spend a fair bit of time thinking through the implications to come to a properly informed conclusion.

Realistically though, I expect that it's an issue that was never really thought through, and the current behavior is accidental whether it's truly desirable behavior or not.

- Jonathan M Davis



May 27

On Monday, 27 May 2024 at 06:31:37 UTC, Jonathan M Davis wrote:

>

...
Given that writeln needs to work with basic input ranges, having it not consume the inner ranges would result in different behavior between basic input ranges and forward ranges, which wouldn't be great. So, arguably, having it consume them is the correct choice, but I'd have to spend a fair bit of time thinking through the implications to come to a properly informed conclusion.
...

It is possible to show the same situation with iota() and of course if this is a contradictory situation:

alias strings = char[][];
enum form = "[%(%s, %)]";
void main()
{
    auto num = iota(1, 4);
    auto range = [num, num, num];

    write("[ ");
    foreach(rng; range)
      rng.write(" ");
    writeln("]");

    range.writefln!form;
    range.writefln!form;

    strings str;
    auto s = "123".dup;
    str = [s, s, s];

    str.writefln!form;
    str.writefln!form;
} /*

   [ [1, 2, 3] [1, 2, 3] [1, 2, 3] ]
   [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
   [[], [], []]
   ["123", "123", "123"]
   ["123", "123", "123"]

//*/

When we do the same experiment with the strings above, we get different results. Moreover, foreach() similarly does not consume the inner ranges...

Now there is a contradiction on the writeln() side!

SDB@79

May 27

On Monday, 27 May 2024 at 06:31:37 UTC, Jonathan M Davis wrote:

>

I don't recall ever really thinking about it. I don't think that it's something that I've done very often, and when I have, it was probably for debugging. And in many cases, if you want readable input, it makes sense to use foreach to loop through the outer range and print out each inner range individually, in which case, you can call save on the inner ranges. That's usually what I'd do if I know that I'm printing out a range of ranges.

This is what writeln does (loops over the individual elements). You can even format the nested ranges using writefln and the %(...%) format specifier.

>

Given that writeln needs to work with basic input ranges, having it not consume the inner ranges would result in different behavior between basic input ranges and forward ranges, which wouldn't be great. So, arguably, having it consume them is the correct choice, but I'd have to spend a fair bit of time thinking through the implications to come to a properly informed conclusion.

I don't think you are grasping how surprising this is. If you are debugging something, and you want to see what something looks like at the moment, you print it. In this case, the act of printing modifies the thing you are debugging! And it doesn't even look like it did anything, because it printed fine. It's only on the second printing you see there is a problem. So you think "what happened between the first printing and the second printing?".

This is actually the use case I was looking at yesterday when I discovered (probably rediscovered) this issue. Would you expect printing a struct to modify the struct? Well, it does if it includes one of these range-of-ranges!

Note also, if you make the outer range by-ref, it will consume all the inner ranges but not the outer range, even if it uses by-ref elements. In other words, it has different behavior on the outer range, vs the inner range. This is because writeln accepts its parameters by value, but the underlying formatValue uses auto ref (to support non-copyable range elements).

And, by the way, nested arrays are not consumed, even if they are inside a range with lvalue elements. So that is another outlier. And also likely why nobody has complained about this -- most people use arrays for their ranges.

>

Realistically though, I expect that it's an issue that was never really thought through, and the current behavior is accidental whether it's truly desirable behavior or not.

I tend to agree. I'm going to file an issue on it. I think any forward ranges should be passed via .save to their respective formatters. This should fix the problem, and is what most people would expect.

When I posed this question, my thought was that the behavior was unintuitive, but given the length of time this has existed, I thought maybe someone has a good reason why the code is this way, and I'm just not seeing it.

Note for the range redesign -- this is going to make things tricky as we won't have a save to use. We will have to explicitly copy the range before passing to the formatValue function (as long as it's a forward range). This is kind of a drawback, I'll put that on the range redesign thread.

-Steve

May 27

On Monday, 27 May 2024 at 00:25:42 UTC, Steven Schveighoffer wrote:

>

So, does anyone expect this behavior? If so, can you explain why you think this is intentionally designed this way?

This is correct behavior for ref front ranges with imperative pop

ref front is a violation of the "views of data", but given the current api how else is sorting going to work?

I tried functional pop in my api experiment, it was hard to get right and will come with tradeoffs I doubt poeple will accept

possible solutions are:

  1. specaility n-depth range functions

  2. a upper level to the api, so auto i=foo[].find!F.key; foo[i]=... is the correct way to mutate data

  3. treat ranges of ranges as rare and unimportant

May 28

On Monday, 27 May 2024 at 16:28:28 UTC, monkyyy wrote:

>

On Monday, 27 May 2024 at 00:25:42 UTC, Steven Schveighoffer wrote:

>

So, does anyone expect this behavior? If so, can you explain why you think this is intentionally designed this way?

This is correct behavior for ref front ranges with imperative pop

Is the situation the same as this example?

void main()
{
  class R
  {
    wchar* ptr;
    size_t len;

    this(T)(T[] range)
    {
      ptr = cast(wchar*)range.ptr;
      len = range.length;
    }

    auto empty() => len == 0;
    auto front() => *ptr++;
    auto popFront() => len--;
    auto save()
    {
      auto r = new R([]);
      r.len = len;
      r.ptr = ptr;
      return r;
    }
  }

  auto c = ['€', '₺', '₽'];
  auto r = new R(c);

  assert(!r.empty);

  import std.conv : text;
  auto str = r.text; // "€₺₽"

  assert(r.empty);
}

Okay, the objections are about inner ranges, but when you rewrite as struct and remove the new operator while the R class is consumed above, a backup of the range is taken. Or is the difference between a class and a struct related to the reference type?

Thanks...

SDB@79

May 28

On Tuesday, 28 May 2024 at 05:54:36 UTC, Salih Dincer wrote:

>

On Monday, 27 May 2024 at 16:28:28 UTC, monkyyy wrote:

>

On Monday, 27 May 2024 at 00:25:42 UTC, Steven Schveighoffer wrote:

>

So, does anyone expect this behavior? If so, can you explain why you think this is intentionally designed this way?

This is correct behavior for ref front ranges with imperative pop

Is the situation the same as this example?

There's no reason why this issue can't be easily fixed. Because when you include narrow string or wchar, there is no problem of not being able to save(). Here is the proof:

void main()
{
  ushort[] i = [1, 2, 3];
  auto r = R(i);
  auto arr = [r, r, r];

  import std.conv : text;
  auto str = arr.text;
  assert(!arr.empty);

  foreach(n; arr)
  {
    n.writefln!"%(%d, %)";
  } // no problem
}

SDB@79

May 28

On Tuesday, 28 May 2024 at 05:54:36 UTC, Salih Dincer wrote:

>

On Monday, 27 May 2024 at 16:28:28 UTC, monkyyy wrote:

>

On Monday, 27 May 2024 at 00:25:42 UTC, Steven Schveighoffer wrote:

>

So, does anyone expect this behavior? If so, can you explain why you think this is intentionally designed this way?

This is correct behavior for ref front ranges with imperative pop

Is the situation the same as this example?

I think so, should sorting fail if you use pointers rather then ref?

When writing generic code a ref range and a pointer to a range probably isnt different(given pointer flattening on argument call); maybe theres some edge case thats detectable but like I would never write it and its probably incorrect for other pointers

consumption and mutability is part of the range spec, and has to be for file io/sorting to work with the current goals.

So a mutable reference to a consuming range; will drain unless something else prevents it