June 24, 2020
On Wednesday, 24 June 2020 at 16:17:58 UTC, Andrei Alexandrescu wrote:
> On 6/24/20 11:55 AM, Paul Backus wrote:
>> To be clear: I agree with you *in principle* that pure input ranges should be non-copyable.
>
> By what principle two input ranges should absolutely never use the same data feed?

By no principle. If a given data feed can be used in such manner - then used it shall be. By two distinct input ranges that keep their own state and don't breed references to said state.

> I'm not sure, but with a no-quarter-given approach to copying input ranges, simple tasks      like "feed input from this file, or from stdin, and output to this other file or to stdout" mutate from trivial into little research projects

I don't see how. This and everything of the same simplicity stays exactly the same:

import std.stdio, std.array, std.algorithm;

void main()
{
    stdin
        .byLine
        .map!"a.length"
        .each!writeln;
}

Alleged issues materialize when you start using lvalues. Which I've already shown an example of a few posts back.

Does a CsvReader really need to allocate, especially if my data is numbers? The answer in current incarnation of ranges is "yes". Well then, I'd rather write one that doesn't. Thanks but no thanks, Phobos.

> (btw by the same (misguded imho) arguments "pure" output ranges ought to also be noncopyable).

Perhaps.
June 24, 2020
On Wednesday, 24 June 2020 at 16:25:31 UTC, Adam D. Ruppe wrote:
> I'm barely reading this thread, but could @live be useful with input ranges too?

I don't think it can be, at least in its current form, but my reasoning behind making them non-copyable is pretty much the same as one behind @live and pointers: at any given moment, there can be only one owner of an input range's iteration state.
For some reason, Andrei equates that to not being able to read from the same source. ¯\_(ツ)_/¯
June 25, 2020
On Thursday, 18 June 2020 at 15:55:06 UTC, jmh530 wrote:
> [snip]

Added bugzilla
https://issues.dlang.org/show_bug.cgi?id=20978
June 25, 2020
On Wednesday, 24 June 2020 at 20:05:27 UTC, Stanislav Blinov wrote:
> On Wednesday, 24 June 2020 at 16:25:31 UTC, Adam D. Ruppe wrote:
>> I'm barely reading this thread, but could @live be useful with input ranges too?
>
> I don't think it can be, at least in its current form, but my reasoning behind making them non-copyable is pretty much the same as one behind @live and pointers: at any given moment, there can be only one owner of an input range's iteration state.
> For some reason, Andrei equates that to not being able to read from the same source. ¯\_(ツ)_/¯

I think the missing piece here is that it's entirely possible for an input range to "borrow" its iteration state from somewhere else, rather than owning it. For example, a pointer to an input range is, itself, considered an input range, because of how the `.` operator does automatic dereferencing:

struct Example
{
    int i;

    bool empty() { return false; }
    int front() { return i; }
    void popFront() { ++i; }
}

static assert(isInputRange!(Example*));

Clearly there is no need for `Example*` to be non-copyable, even if we think `Example` ought to be.
June 26, 2020
On Wednesday, 24 June 2020 at 16:08:05 UTC, Andrei Alexandrescu wrote:
> On 6/24/20 10:17 AM, Joseph Rushton Wakeling wrote:
>> I'm struck that no one AFAICS has suggested the following alternative: instead of `tail`, to allow popFront() (and if implemented, popBack()) to return an `Option!(ElementType, None)`.
>> 
>> What is returned would be either the popped element, or None if no more elements remain (with the nice by-product of no longer getting assert failures if pop{Front,Back} are called when the range is empty).
>
> I think the discussion about tail() had to do with immutable ranges. Having popXxx() return something would still have it mutate the range.

Ah, gotcha, thanks, I'd missed that context.

Nevertheless, I'd like the option-returning popFront to be considered on its own merits, because it seems to me to add useful distinctions between what input and forward ranges to do.

On immutability: it feels like this relates at least in part to not necessarily distinguishing adequately between ranges versus containers (or more generally, ranges versus owners of memory).

Have I missed anything in noting that ranges can be divided roughly along the following lines:

  - processing external input, where the internal state is just some small
    memory buffer or other collection of variables used to store the latest
    data read

  - generative mechanisms where the range elements are produced by a sequential
    algorithm

  - iteration across data stored in memory

... where immutability only really makes sense in the first place w.r.t. the last of these?  And there, it doesn't really make sense that the range be const or immutable -- only that the underlying data is?
June 26, 2020
On Friday, 26 June 2020 at 14:46:24 UTC, Joseph Rushton Wakeling wrote:
> Have I missed anything in noting that ranges can be divided roughly along the following lines:
>
>   - processing external input, where the internal state is just some small
>     memory buffer or other collection of variables used to store the latest
>     data read
>
>   - generative mechanisms where the range elements are produced by a sequential
>     algorithm
>
>   - iteration across data stored in memory
>
> ... where immutability only really makes sense in the first place w.r.t. the last of these?  And there, it doesn't really make sense that the range be const or immutable -- only that the underlying data is?

Because const and immutable are transitive, any data structure that includes a range, either directly or by reference, is itself forced to be mutable. This makes D's const and immutable significantly less useful than they otherwise would be; i.e., it makes D a worse language.

The fact that you, personally, cannot think of a use-case for immutable ranges off the top of your head is not a strong argument against supporting them. All language and library features ought to work together unless there is a specific reason for them not to (as is the case, for example, with garbage collection and BetterC).

So I think the burden of proof is on you here. If we're designing a new range interface, is there a good reason it *shouldn't* work with const and immutable?

One argument in favor of requiring mutability is that using tail() instead of popFront() could result in a loss of performance. I am working on investigating that claim using the examples provided by Jon Degenhardt earlier in this thread. Another, which I find less convincing, is that writing algorithms recursively instead of iteratively would be too "difficult" or "onerous". If you have any more to add, I am interested in hearing them.

[1] https://issues.dlang.org/show_bug.cgi?id=5377
June 26, 2020
On Wednesday, 24 June 2020 at 17:00:37 UTC, Jon Degenhardt wrote:
> On Tuesday, 23 June 2020 at 21:19:39 UTC, Paul Backus wrote:
>> Do you (or anyone else reading this--feel free to chime in) have an example in mind of such an "elaborate range"? If so, I'd be happy to do the legwork of running additional experiments. No need to settle for speculation here when we can have actual data.
>
> I don't think they'll satisfy what you are looking for, but I do have a few examples of ranges doing non-trivial things in public repos. You are welcome to take a look:
>
> * bufferedByLine - https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/utils.d#L831
> * inputSourceRange - https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/utils.d#L1443
> * parseFieldList - https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/fieldlist.d#L291
> * findFieldGroups - https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/fieldlist.d#L1045
> * parseNumericFieldList - https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/fieldlist.d#L2029
>
> None of them are used in performance sensitive regions of code, so they haven't been benchmarked or optimized.

A correction to the last entry. The line number in the URL is to a different range. Still a valid range to look at, but in the end it just returns a Phobos range. The one I meant to include is a more typical range. The correct entries, replacing the last one:

* parseNumericFieldGroup - https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/fieldlist.d#L2029
* parseNumericFieldList - https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/fieldlist.d#L2482

June 27, 2020
On Friday, 26 June 2020 at 16:03:23 UTC, Paul Backus wrote:
> The fact that you, personally, cannot think of a use-case for immutable ranges off the top of your head is not a strong argument against supporting them. All language and library features ought to work together unless there is a specific reason for them not to (as is the case, for example, with garbage collection and BetterC).
>
> So I think the burden of proof is on you here. If we're designing a new range interface, is there a good reason it *shouldn't* work with const and immutable?

There's no need to be combative here.  I'm not trying to argue a side, but to make sure we get clarity on what it means for ranges to be compatible with const and immutable in this way.

Whether we like it or not, pretty much all ranges currently work via some under-the-hood mutability, whether it's just updating the pointer of a slice, incrementing an internal variable, or reading from some external input into an internal buffer.

In some cases there are ready ways out of that.  For a forward range whose internal state is cheap to duplicate, there should be no problem just (automatically) making a mutable deep copy to iterate over.  If we're talking about a range that iterates over the contents of some memory locations, it seems natural enough that it should be possible to automatically create a mutable shallow copy that iterates over immutable memory locations (analogous to immutable(T[]) => immutable(T)[] and so on).

But -- for example -- what would it mean to have an input range as part of a const or immutable data structure?  Forget the strict interpretation of the current D keywords, let's just think conceptually for a moment.  Do we want to think of an immutable input range as just a source of input, with any under-the-hood mutability just an implementation detail that shouldn't be of interest to the external caller?  Do we consider it a contradiction in terms?

I'm inclined to agree with the suggestion that input ranges should either be non-copyable or have reference semantics, but I don't think that really resolves the issue given the transitiveness of const and immutable.

I also think it may be worth making a point of comparison to Rust's iterators.  Obviously Rust has a different approach to immutability than D -- the main concern is preventing multiple mutable references to the same piece of data -- but there is a very clear separation between iterators versus what they iterate over, and generally speaking (probably because of that separation) iterators themselves seem to be always implemented as mutable.

> One argument in favor of requiring mutability is that using tail() instead of popFront() could result in a loss of performance. I am working on investigating that claim using the examples provided by Jon Degenhardt earlier in this thread.

FWIW my objection to `tail` was probably misplaced.  What I was interested in was coming from a completely different angle to the const/immutable concern.

Specifically, I'm interested in drawing a clear line between input ranges -- where it is not really a given that `front` can be pre-populated -- versus forward ranges.  For example, given a RandomSample, one arguably does not want `front` to be pre-determined on construction, but from the moment one starts consuming the range.  Given the current range API that means some unpleasant is-this-the-first-call special-casing under the hood of the `front` method.

So, what I'm interested in is whether we can reliably rework the input range API such that the only way to determine the next element is to actually fetch it.  That would also work with a `tail`-like approach if the function concerned would return something like:

    Tuple!(Option!(ElementType, None), Tail)

... but I'm not sure if that would interfere with other concerns about that approach (e.g. would it get in the way of optimizing performance of the recursive approach?).

> Another, which I find less convincing, is that writing algorithms recursively instead of iteratively would be too "difficult" or "onerous". If you have any more to add, I am interested in hearing them.

I don't have specific examples, but it's worth noting that one of the plus points of D is that it tends to allow you to write any given code in the simplest and most natural way, rather than forcing you through more complicated paradigms for the sake of it.

So, I'd personally be reluctant to force all writers of ranges to write their code in a more complicated/virtuous way unless their use case specifically requires it.

With that in mind, it might be better to identify ways to simplify the _implementation_ of ranges that need to support const/immutable use cases.  That is, keep it pay-as-you-go -- you don't have to write ranges in a way that supports const/immutable if you don't have that use-case -- but if you do, the cost you have to pay is less.
June 27, 2020
On Saturday, 27 June 2020 at 12:25:34 UTC, Joseph Rushton Wakeling wrote:

> But -- for example -- what would it mean to have an input range as part of a const or immutable data structure?  Forget the strict interpretation of the current D keywords, let's just think conceptually for a moment.  Do we want to think of an immutable input range as just a source of input, with any under-the-hood mutability just an implementation detail that shouldn't be of interest to the external caller?  Do we consider it a contradiction in terms?

An immutable or const input range just cannot be. An input range can't not be mutable.
If a range itself can implement a `tail()`, it's not an input range, it's a forward range. The only way for `tail()` to work with input ranges is to be a free function:

R tail(R)(R r)
if (isInputRangeWithAHypotheticalAPI!R)
{
    r.next(); // fetchNext(), advance(), whatever the API is
    return move(r);
}

and used like this:

input = move(input).tail;

(Yes, I'm still insisting that input ranges must be non-copyable).

> I also think it may be worth making a point of comparison to Rust's iterators.  Obviously Rust has a different approach to immutability than D -- the main concern is preventing multiple mutable references to the same piece of data -- but there is a very clear separation between iterators versus what they iterate over, and generally speaking (probably because of that separation) iterators themselves seem to be always implemented as mutable.

True input ranges are basically Rust's iterators, i.e. their API should be

Optional!T next(); // in other words, a bool and a T

or variations thereof, as outlined by Andrei some pages back. IMHO, the above API is the cleanest. But there is also place for buffered input ranges (i.e. the currently existing API), see below.

> So, what I'm interested in is whether we can reliably rework the input range API such that the only way to determine the next element is to actually fetch it.

Both APIs have their place. A "true" input range (i.e. a generator) yields temporaries, and only needs a "fetch next" primitive. But the *algorithms* may require a buffer. Consider a `find`: it needs to return the needle and the remainder of the range. In order to find the needle it needs to pop it off from the "fetch next" haystack, irreversibly advancing it. If we want to keep `find` generic over input and forward ranges (i.e. returning a range), it will have to wrap the haystack into a buffered input range with the front/popFront API, placing the found needle at the front, i.e. hypothetically:

auto find(alias pred, Range, Needle)(Range range, auto ref const Needle needle)
if (isInputRangeWithAHypotheticalAPI!R)
{
    static struct Result
    {
        private typeof(Range.init.next()) buf;
        private Range src;

        // So long as the Optional holds a valid value, range is not empty
        @property bool empty() const { return !buf; }

        // Accessing an empty Optional would be illegal,
        // so the non-empty() invariant is observed.
        ref front() inout { return buf.value; }

        void popFront() { buf = src.next(); }
    }

    typeof(range.next()) current; // i.e. an Optional!ElementType
    while (true)
    {
        current = range.next;
        if (!current || pred(current.value, needle))
            break;
    }

    return Result(move(current), move(range));
}
June 27, 2020
On Saturday, 27 June 2020 at 13:11:09 UTC, Stanislav Blinov wrote:
> On Saturday, 27 June 2020 at 12:25:34 UTC, Joseph Rushton Wakeling wrote:
>
>> But -- for example -- what would it mean to have an input range as part of a const or immutable data structure?  Forget the strict interpretation of the current D keywords, let's just think conceptually for a moment.  Do we want to think of an immutable input range as just a source of input, with any under-the-hood mutability just an implementation detail that shouldn't be of interest to the external caller?  Do we consider it a contradiction in terms?
>
> An immutable or const input range just cannot be. An input range can't not be mutable.
> If a range itself can implement a `tail()`, it's not an input range, it's a forward range.

This is false.

While it is true that a pure input range cannot exist without the presence of mutable state *somewhere*, it does not follow that the mutable state must be stored in (or pointed to by) the range itself. Consider the following example:

    auto fdByChar(int fd)
    {
        import core.sys.posix.unistd;

        static struct Result
        {
            private int fd;
            bool empty;
            char front;

            Result tail() const
                in (!empty)
            {
                char next;
                ssize_t nread = read(fd, cast(void*) &next, 1);

                if (nread == 0)
                    return Result(fd, true, char.init);
                else if (nread == 1)
                    return Result(fd, false, next);
                else
                    throw new Exception("Error reading file");
            }
        }

        return Result(fd, false, char.init).tail;
    }


This is a pure input range. Copying it and iterating the copy will invalidate the original. Yet it is perfectly capable of being `immutable`. Here's an example of usage:

    void each(alias fun, Range)(Range range)
    {
        if (range.empty) return;
        fun(range.front);
        range.tail.each!fun;
    }

    void main()
    {
        import std.stdio;
        immutable stdinByChar = fdByChar(0);
        stdinByChar.each!writeln;
    }


Of course, there is mutable state involved here, but that state is not in the range--it's in the kernel. So the `immutable` qualifier on the range does not apply to it.