April 16, 2018
On 04/16/2018 06:15 AM, Shachar Shemesh wrote:
> Input ranges provide no guarantees about copying the range itself. If you do it and expect *anything* (which it seems you do), you have a bug. If you need to copy the range itself, you absolutely need to require a forward range.
> 
> A forward range with ref front absolutely promises that both iterations reference the same elements.
> 
> I find myself sincerely hoping that you are making these examples up in order to win this discussion turned into an argument for some reason, instead of the way phobos is actually written, because if you're right phobos is *broken*.
> 
> The good news is that if Phobos is broken, this is a great way to flush out the places it's broken. Let's add UTs calling all Phobos functions accepting input ranges (as opposed to forward ranges) with non-copyable ranges (i.e. - the range itself is non-copyable) and fix the compilation errors. This shouldn't take more than a few days to do.

I'm currently attempting something similar: fix Phobos to work with RefRange. RefRange is copyable, but assignment is weird. Assignment isn't part of the range interface, so I'm replacing it with `move`:

https://issues.dlang.org/show_bug.cgi?id=18657
https://github.com/dlang/phobos/pull/6346

That PR doesn't fix all of Phobos, it's just a start. Going over all of Phobos (and getting it merged) might take a while.

Thinking about it, those `move`s probably fix the code for non-copyable ranges as well. There might be considerable overlap here.
April 16, 2018
On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
> On 16/04/18 01:28, Jonathan M Davis wrote:
> > The fact that foreach copies the range that it's given makes using ref with anything other than arrays a bit of an iffy proposition. It will compile regardless of whether front returns by ref or not (which arguably is a bug), but it will only affect the original elements if copying the range doesn't copy the elements
>
> You keep referring to ranges that contain the data, but I am racking my mind and failing to find examples.

Pretty much any generative range does. e.g. iota contains data, and each element is then calculated from the current value. The same goes for stuff like random number generators. It's quite common for ranges to have popFront calculate the next front and store it as a member rather than have front do the calculation, since then multiple calls to front don't incur the cost of whatever has to be done to get the next element, and they can just return the same value each time. In fact, that's generally considered to be best practice. It's true that most ranges do not contain every element directly in a struct, but some do (e.g. std.range.only does, which is a way to generate a range from a list of elements without allocating anything on the heap), and many ranges at least store the value for front. So, I'd say that a fairly large percentage of ranges store at least one value directly in the range object.

> Plus, as I've said before, if ranges are expected to contain data, copying them seems like an incredibly bad idea. The whole idea behind casually copying the range about is that it's cheap to do.

Yes, it's generally assumed that it's cheap to copy a range, but it's also generally assumed by D code in general that copying is cheap. Relatively few functions in Phobos accept arguments by ref. Some do accept by auto ref - either to forward the refness or to avoid a copy - but since passing by ref doesn't work with rvalues, most code in Phobos doesn't do so unless it's intended to mutate the argument and give the caller access to the result. I would strongly suggest that anyone who is looking to put an object that is expensive to copy in a range consider making it so that that object is on the heap rather than directly in the range.

> > But as it stands, using ref with foreach in generic code is not going to go well. If anything, it's actually a code smell.
>
> You are conflating ref with owning data, which I find unsubstantiated.

My point is that if you have

foreach(ref e; range)
{
}

then unless range is a dynamic array, the odds are very high that it's a bug, because then you risk problems like

foreach(ref e; range)
{
    e = foo;
}

Outside of dynamic arrays, that assignment is usually pointless. ref implies that it's going to actually affect the range that you passed to foreach, but in most cases, it won't. Because the range is copied, in order for ref to affect elements of the range that is passed to foreach, the range must be a reference type or must act like a dynamic array in the sense that each copy refers to exactly the same objects - which is true for some ranges (e.g. those over a container) - but for many ranges, it isn't (e.g. generative ranges aren't usually backed by anything). Also, the front of most ranges does not return by ref, which would be required for the ref on foreach to actually allow you to mutate the element. As such, I'd argue that the fact that ref is allowed on foreach for all ranges is a bug. At minimum, it should require that front return by ref, and even then, depending on the range, it won't necessary work to assign to the element and have it affect the range that was copied when it was passed to foreach. So, while it will work in _some_ cases to pass a range to foreach and then use ref, in general, I would consider it a code smell, because the odds are high that it does not do what the programmer who wrote it assumed that it would do.

> > And if a templated function fails to compile if it's given a particular argument, and that failure is not due to the template constraint, that usually means that the template constraint is incomplete.
>
> I do not accept this argument outright. In fact, our experience at Weka has drove us to the exact opposite direction.
>
> We deliberately set up functions to include invalid cases and then either let them fail because of lacking constraints, or even set up a static assert to fail the function generation rather than fail matching.
>
> We found that the error messages produced by the alternative are simply too difficult to parse out and figure what went wrong.

For better or for worse, it's generally considered a bug in Phobos if it's possible to write code that gets past the template constraint and then does not compile. There are reasons why that can be a problematic approach, but that is the approach that we've taken thus far. As such, if isInputRange started allowing non-copyable elements, it would be expected that all range-based functions in Phobos would then have to either require copyability in their template constraint or actually work with non-copyable types.

> > it would mean that _every_ range-based function then has to worry about non-copyable elements
>
> No, just those in libraries. The same way they have to worry about big O complexity, type safety, @nogc and a bunch of other things.
>
> When you write a library, you have to think about your users. This doesn't change anything about that.

It changes it by adding yet another complication to worry about when implementing ranges and range-based functions that no one implementing range-based code currently has to worry about. And writing bug-free generic, range-based code can already be surprisingly difficult without adding yet another complication into the mix.

> > Not all algorithms can call front just once without
> > copying it
>
> Huh? Where did that come from?
>
> Phobos already assumes that calling front needs to be an O(1) operation. You can call front as many times as you like. Why would you assume otherwise?

Sure, you can call front as many times as you want. But in some cases, it's more expensive to call front multiple times than it is to call it once and store the result. Also, while the result of front may be equivalent every time, it may not be exactly the same object every time. For instance,

auto r2 = range.map!(a => to!string())();

is going to to result in a range that allocates a string every time that front is called. This sort of thing is not an uncommon thing to do with map, and while many ranges are implemented to do most of their work in popFront (thereby making calling front cheap), there's no requirement that they do so. So, there are ranges that do their work in front rather than popFront, and in the case of map, it actually has to be done in front, because map can be random-access, and using popFront to calculate the value only works for front, not arbitrary elements. In such cases, calling front multiple times is definitely worse than just calling it once and storing the result. I think that pretty much the only time that it's better to call front multiple times rathre than calling it once and storing the result is when it returns by ref, because if it doesn't return by ref, you're either going to get a copy every time you call front, or the range is going to be creating a new object every time you call front. In those cases, storing the result of front and reusing it would be cheaper. And since most ranges do not return by ref, most ranges are not avoiding copies or any expensive calculations by calling front multiple times. If anything, they're incurring them.

Now, arguably, in the cases where a generic function needs to access front multiple times, it would be best if it tested front for refness and then used static ifs so that it stored the result of front when it didn't return by ref and called it multiple times when it did, but that would get ugly fast, and I'm not aware of any Phobos code that currently does that.

> > The result of all of this is that in generic code, you have no clue
> > whatsoever what the semantics are of copying a range around, because it
> > depends entirely on the copying semantics of whatever range that code is
> > currently being instantiated with. As such, generic code basically has
> > to
> > assume that once you copy a range, you can't mutate the original (since
> > it may or may not be affected by mutating the copy and may or may not
> > be cleanly affected by mutating the copy),
>
> Jonathan, I'm sorry, but what you are describing is called "a bug".
>
> Input ranges provide no guarantees about copying the range itself. If you do it and expect *anything* (which it seems you do), you have a bug. If you need to copy the range itself, you absolutely need to require a forward range.
>
> A forward range with ref front absolutely promises that both iterations reference the same elements.
>
> I find myself sincerely hoping that you are making these examples up in order to win this discussion turned into an argument for some reason, instead of the way phobos is actually written, because if you're right phobos is *broken*.
>
> The good news is that if Phobos is broken, this is a great way to flush out the places it's broken. Let's add UTs calling all Phobos functions accepting input ranges (as opposed to forward ranges) with non-copyable ranges (i.e. - the range itself is non-copyable) and fix the compilation errors. This shouldn't take more than a few days to do.

I'm not making anything up. Very little range-based code passes ranges by reference. They're copied by value all the time, even if they're not forward ranges. foreach itself copies a range before iterating over it. Yes, if you want to iterate over two independent copies, you need a forward range, and you need to call save, but ranges are copied constantly. It's just that it's then a bug to do anything with the original after the copy has been made. As it stands, a range that cannot be copied is going to work with almost nothing in Phobos and would be borderline useless. And adding ref everywhere to range-based functions to change that would destroy our ability to chain function calls with ranges. Also, a large percentage of range-based functions return a range that wraps another range, and that means copying the range that's passed in. It simply wouldn't work with many functions to accept a range by ref without then copying the range inside the function anyway. The assumption that input ranges are copyable is built into almost everything range-based in all of Phobos. The rare exceptions are functions like std.conv.parse which are specifically intended to mutate the range that's passed in rather than copying it.

> >> And allowing ranges with uncopyable members would turn this potential
> >> deadly run-time bugs into easy to find compile time errors. Sound like
> >> a
> >> great reason to allow it, to me.
> >
> > Ranges have to be copyable.
>
> And why would a range iterating over uncopyable objects not be copyable? How are the two related?

Because many ranges do stuff like store the value of front after calculating it with popFront or store some set of values to calculate new values. No such range would work with non-copyable element types even if isInputRange allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to work with non-copyable element types, but the fact that ranges must be copyable means that any range that worked with non-copyable element types could not be a struct that stored any of its elements as a direct member.

- Jonathan M Davis

April 17, 2018
This is going nowhere.

Let's flesh this out face to face at dconf.

Shachar

On 16/04/18 12:01, Jonathan M Davis wrote:
> On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
>> On 16/04/18 01:28, Jonathan M Davis wrote:
>>> The fact that foreach copies the range that it's given makes using ref
>>> with anything other than arrays a bit of an iffy proposition. It will
>>> compile regardless of whether front returns by ref or not (which
>>> arguably is a bug), but it will only affect the original elements if
>>> copying the range doesn't copy the elements
>>
>> You keep referring to ranges that contain the data, but I am racking my
>> mind and failing to find examples.
> 
> Pretty much any generative range does. e.g. iota contains data, and each
> element is then calculated from the current value. The same goes for stuff
> like random number generators. It's quite common for ranges to have popFront
> calculate the next front and store it as a member rather than have front do
> the calculation, since then multiple calls to front don't incur the cost of
> whatever has to be done to get the next element, and they can just return
> the same value each time. In fact, that's generally considered to be best
> practice. It's true that most ranges do not contain every element directly
> in a struct, but some do (e.g. std.range.only does, which is a way to
> generate a range from a list of elements without allocating anything on the
> heap), and many ranges at least store the value for front. So, I'd say that
> a fairly large percentage of ranges store at least one value directly in the
> range object.
> 
>> Plus, as I've said before, if ranges are expected to contain data,
>> copying them seems like an incredibly bad idea. The whole idea behind
>> casually copying the range about is that it's cheap to do.
> 
> Yes, it's generally assumed that it's cheap to copy a range, but it's also
> generally assumed by D code in general that copying is cheap. Relatively few
> functions in Phobos accept arguments by ref. Some do accept by auto ref -
> either to forward the refness or to avoid a copy - but since passing by ref
> doesn't work with rvalues, most code in Phobos doesn't do so unless it's
> intended to mutate the argument and give the caller access to the result. I
> would strongly suggest that anyone who is looking to put an object that is
> expensive to copy in a range consider making it so that that object is on
> the heap rather than directly in the range.
> 
>>> But as it stands, using ref with foreach in generic code is not going to
>>> go well. If anything, it's actually a code smell.
>>
>> You are conflating ref with owning data, which I find unsubstantiated.
> 
> My point is that if you have
> 
> foreach(ref e; range)
> {
> }
> 
> then unless range is a dynamic array, the odds are very high that it's a
> bug, because then you risk problems like
> 
> foreach(ref e; range)
> {
>      e = foo;
> }
> 
> Outside of dynamic arrays, that assignment is usually pointless. ref implies
> that it's going to actually affect the range that you passed to foreach, but
> in most cases, it won't. Because the range is copied, in order for ref to
> affect elements of the range that is passed to foreach, the range must be a
> reference type or must act like a dynamic array in the sense that each copy
> refers to exactly the same objects - which is true for some ranges (e.g.
> those over a container) - but for many ranges, it isn't (e.g. generative
> ranges aren't usually backed by anything). Also, the front of most ranges
> does not return by ref, which would be required for the ref on foreach to
> actually allow you to mutate the element. As such, I'd argue that the fact
> that ref is allowed on foreach for all ranges is a bug. At minimum, it
> should require that front return by ref, and even then, depending on the
> range, it won't necessary work to assign to the element and have it affect
> the range that was copied when it was passed to foreach. So, while it will
> work in _some_ cases to pass a range to foreach and then use ref, in
> general, I would consider it a code smell, because the odds are high that it
> does not do what the programmer who wrote it assumed that it would do.
> 
>>> And if a templated function fails to compile if it's given a particular
>>> argument, and that failure is not due to the template constraint, that
>>> usually means that the template constraint is incomplete.
>>
>> I do not accept this argument outright. In fact, our experience at Weka
>> has drove us to the exact opposite direction.
>>
>> We deliberately set up functions to include invalid cases and then
>> either let them fail because of lacking constraints, or even set up a
>> static assert to fail the function generation rather than fail matching.
>>
>> We found that the error messages produced by the alternative are simply
>> too difficult to parse out and figure what went wrong.
> 
> For better or for worse, it's generally considered a bug in Phobos if it's
> possible to write code that gets past the template constraint and then does
> not compile. There are reasons why that can be a problematic approach, but
> that is the approach that we've taken thus far. As such, if isInputRange
> started allowing non-copyable elements, it would be expected that all
> range-based functions in Phobos would then have to either require
> copyability in their template constraint or actually work with non-copyable
> types.
> 
>>> it would mean that _every_ range-based function then has to worry about
>>> non-copyable elements
>>
>> No, just those in libraries. The same way they have to worry about big O
>> complexity, type safety, @nogc and a bunch of other things.
>>
>> When you write a library, you have to think about your users. This
>> doesn't change anything about that.
> 
> It changes it by adding yet another complication to worry about when
> implementing ranges and range-based functions that no one implementing
> range-based code currently has to worry about. And writing bug-free generic,
> range-based code can already be surprisingly difficult without adding yet
> another complication into the mix.
> 
>>> Not all algorithms can call front just once without
>>> copying it
>>
>> Huh? Where did that come from?
>>
>> Phobos already assumes that calling front needs to be an O(1) operation.
>> You can call front as many times as you like. Why would you assume
>> otherwise?
> 
> Sure, you can call front as many times as you want. But in some cases, it's
> more expensive to call front multiple times than it is to call it once and
> store the result. Also, while the result of front may be equivalent every
> time, it may not be exactly the same object every time. For instance,
> 
> auto r2 = range.map!(a => to!string())();
> 
> is going to to result in a range that allocates a string every time that
> front is called. This sort of thing is not an uncommon thing to do with map,
> and while many ranges are implemented to do most of their work in popFront
> (thereby making calling front cheap), there's no requirement that they do
> so. So, there are ranges that do their work in front rather than popFront,
> and in the case of map, it actually has to be done in front, because map can
> be random-access, and using popFront to calculate the value only works for
> front, not arbitrary elements. In such cases, calling front multiple times
> is definitely worse than just calling it once and storing the result. I
> think that pretty much the only time that it's better to call front multiple
> times rathre than calling it once and storing the result is when it returns
> by ref, because if it doesn't return by ref, you're either going to get a
> copy every time you call front, or the range is going to be creating a new
> object every time you call front. In those cases, storing the result of
> front and reusing it would be cheaper. And since most ranges do not return
> by ref, most ranges are not avoiding copies or any expensive calculations by
> calling front multiple times. If anything, they're incurring them.
> 
> Now, arguably, in the cases where a generic function needs to access front
> multiple times, it would be best if it tested front for refness and then
> used static ifs so that it stored the result of front when it didn't return
> by ref and called it multiple times when it did, but that would get ugly
> fast, and I'm not aware of any Phobos code that currently does that.
> 
>>> The result of all of this is that in generic code, you have no clue
>>> whatsoever what the semantics are of copying a range around, because it
>>> depends entirely on the copying semantics of whatever range that code is
>>> currently being instantiated with. As such, generic code basically has
>>> to
>>> assume that once you copy a range, you can't mutate the original (since
>>> it may or may not be affected by mutating the copy and may or may not
>>> be cleanly affected by mutating the copy),
>>
>> Jonathan, I'm sorry, but what you are describing is called "a bug".
>>
>> Input ranges provide no guarantees about copying the range itself. If
>> you do it and expect *anything* (which it seems you do), you have a bug.
>> If you need to copy the range itself, you absolutely need to require a
>> forward range.
>>
>> A forward range with ref front absolutely promises that both iterations
>> reference the same elements.
>>
>> I find myself sincerely hoping that you are making these examples up in
>> order to win this discussion turned into an argument for some reason,
>> instead of the way phobos is actually written, because if you're right
>> phobos is *broken*.
>>
>> The good news is that if Phobos is broken, this is a great way to flush
>> out the places it's broken. Let's add UTs calling all Phobos functions
>> accepting input ranges (as opposed to forward ranges) with non-copyable
>> ranges (i.e. - the range itself is non-copyable) and fix the compilation
>> errors. This shouldn't take more than a few days to do.
> 
> I'm not making anything up. Very little range-based code passes ranges by
> reference. They're copied by value all the time, even if they're not forward
> ranges. foreach itself copies a range before iterating over it. Yes, if you
> want to iterate over two independent copies, you need a forward range, and
> you need to call save, but ranges are copied constantly. It's just that it's
> then a bug to do anything with the original after the copy has been made. As
> it stands, a range that cannot be copied is going to work with almost
> nothing in Phobos and would be borderline useless. And adding ref everywhere
> to range-based functions to change that would destroy our ability to chain
> function calls with ranges. Also, a large percentage of range-based
> functions return a range that wraps another range, and that means copying
> the range that's passed in. It simply wouldn't work with many functions to
> accept a range by ref without then copying the range inside the function
> anyway. The assumption that input ranges are copyable is built into almost
> everything range-based in all of Phobos. The rare exceptions are functions
> like std.conv.parse which are specifically intended to mutate the range
> that's passed in rather than copying it.
> 
>>>> And allowing ranges with uncopyable members would turn this potential
>>>> deadly run-time bugs into easy to find compile time errors. Sound like
>>>> a
>>>> great reason to allow it, to me.
>>>
>>> Ranges have to be copyable.
>>
>> And why would a range iterating over uncopyable objects not be copyable?
>> How are the two related?
> 
> Because many ranges do stuff like store the value of front after calculating
> it with popFront or store some set of values to calculate new values. No
> such range would work with non-copyable element types even if isInputRange
> allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to
> work with non-copyable element types, but the fact that ranges must be
> copyable means that any range that worked with non-copyable element types
> could not be a struct that stored any of its elements as a direct member.
> 
> - Jonathan M Davis
> 

April 17, 2018
On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
> On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
>> On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
>>> [...]
>>
>> It's extremely common for range-based functions to copy front. Even foreach does it. e.g.
>>
>> [...]
> 
> 
> Arguably it should “move” them.
> 
> This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.

Ugh, then you get things like fungetc.

However, there are certainly cases where it's expensive to compute front, and therefore, requires a cache. A new primitive that accesses front and pops it at the same time would be useful. We can have selected algorithms that can deal with it use that primitive instead (and instrument foreach to do it). Potentially, we can create a wrapper for all input ranges such that they support it.

> Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.

Well, sometimes you HAVE to cache it. For instance File.byLine is clearly an input range, and not a forward range. But it MUST cache because it's i/o that has to be buffered to be looked at! No reason to force caching on the user at that point.

-Steve
April 17, 2018
On Tue, Apr 17, 2018 at 11:14:02AM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
[...]
> > Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.
> 
> Well, sometimes you HAVE to cache it. For instance File.byLine is clearly an input range, and not a forward range. But it MUST cache because it's i/o that has to be buffered to be looked at! No reason to force caching on the user at that point.
[...]

In the past I've used range-like (i.e., iteration) APIs that do not cache the current element. To be quite frank, they were a royal pain to work with, because they force user code to be needlessly cluttered with ad hoc caching schemes that pollute the code all over the place. Instead of having the range handle the caching, all downstream code that uses the range has to implement their own caching and/or otherwise implement ugly ad hoc workarounds for the lack thereof.  Andrei got it right when he designed the range API with a cached .front, and I would be opposed to any range API design that involves a non-caching .front.

For generative ranges like std.range.generate, well, .generate exists precisely for that reason: to be the single implementation of a caching .front that wraps around a non-caching generating function, so that range implementers don't have to write their own caching code, but only need to care about writing the generating function and let .generate handle the caching for them.  The downstream user also doesn't need to worry about the odd semantics of a non-caching .front, but can use the range API as usual.  Win-win.


T

-- 
Recently, our IT department hired a bug-fix engineer. He used to work for Volkswagen.
April 17, 2018
On Tuesday, 17 April 2018 at 15:14:02 UTC, Steven Schveighoffer wrote:
> On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
>> On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
>>> On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
>>>> [...]
>>>
>>> It's extremely common for range-based functions to copy front. Even foreach does it. e.g.
>>>
>>> [...]
>> 
>> 
>> Arguably it should “move” them.
>> 
>> This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.
>
> Ugh, then you get things like fungetc.
>

For many algorithms indeed you can’t stop w/o peeking ahead. It’s just that lookahead of 1 is hardcoded in the range definition b/c it’s enough for a lot of things.

But then once you are past lookahead of 1, it has to do .save + restore if overshot. ungetc of 2 if you’d like.

> However, there are certainly cases where it's expensive to compute front, and therefore, requires a cache. A new primitive that accesses front and pops it at the same time would be useful. We can have selected algorithms that can deal with it use that primitive instead (and instrument foreach to do it).

Might be an option. But as long as you have front/popFront/empty as well, no concurrent stuff.

> Potentially, we can create a wrapper for all input ranges such that they support it.
>
>> Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.
>
> Well, sometimes you HAVE to cache it. For instance File.byLine

File caches, byLine actually does
“read up until EOL and give me slice of that”

I do not see where caching of char[] plays any significant role. Except that due to how ranges work it likely has to do the readline at _empty_ or on construction + popFront.

> is clearly an input range, and not a forward range. But it MUST cache because it's i/o that has to be buffered to be looked at! No reason to force caching on the user at that point.
> -Steve


April 20, 2018
> foreach(ref e; range)
> {
> }

On idea is to have `ref` foreach to say that you would like to iterate your range without copying. The syntax could be:

foreach(e; ref range)
{
}

or:
ref foreach(e; range)
{
}
At least it will not break existing code. But this means that in each case you should make a decision about `ref`/non ref.

1 2
Next ›   Last »