Jump to page: 1 2
Thread overview
isInputRange is false for non-copyable types
Apr 15, 2018
Shachar Shemesh
Apr 15, 2018
Jonathan M Davis
Apr 15, 2018
Dmitry Olshansky
Apr 15, 2018
Dmitry Olshansky
Apr 15, 2018
Jonathan M Davis
Apr 15, 2018
Dmitry Olshansky
Apr 15, 2018
Jonathan M Davis
Apr 17, 2018
H. S. Teoh
Apr 17, 2018
Dmitry Olshansky
Apr 15, 2018
Shachar Shemesh
Apr 15, 2018
Jonathan M Davis
Apr 16, 2018
Shachar Shemesh
Apr 16, 2018
ag0aep6g
Apr 16, 2018
Jonathan M Davis
Apr 17, 2018
Shachar Shemesh
Apr 20, 2018
Uranuz
April 15, 2018
import std.range;
import std.stdio;

struct S {
    int a;

    @disable this(this);
}

void main() {
    writeln(isInputRange!(S[])); // Prints false
}

The reason, as far as I can tell, is that an input range requires that we can do:

auto a = ir.front; // Assuming ir isn't empty

That doesn't work for non-copyable types, for obvious reasons.

With that said, that seems more like a deficiency in the input range definition than anything. There is no reason we shouldn't be able to iterator type operations over non-copyable types.

Shachar
April 15, 2018
On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
> import std.range;
> import std.stdio;
>
> struct S {
>      int a;
>
>      @disable this(this);
> }
>
> void main() {
>      writeln(isInputRange!(S[])); // Prints false
> }
>
> The reason, as far as I can tell, is that an input range requires that we can do:
>
> auto a = ir.front; // Assuming ir isn't empty
>
> That doesn't work for non-copyable types, for obvious reasons.
>
> With that said, that seems more like a deficiency in the input range definition than anything. There is no reason we shouldn't be able to iterator type operations over non-copyable types.

It's extremely common for range-based functions to copy front. Even foreach does it. e.g.

import std.stdio;

struct S
{
    this(int i)
    {
        _i = i;
    }

    this(this)
    {
        writefln("copied: %s", _i);
    }

    int _i;
}

void main()
{
    foreach(e; [S(0), S(1), S(2)])
    {
    }
}

prints

copied: 0
copied: 1
copied: 2

If non-copyable types were allowed, then no function which accepted a range would be allowed to copy front without first checking that front was copyable (something that most code simply isn't going to do), and some functions outright require being able to copy front. They could test for copyability, but that would complicate a lot of range-based functions for a relatively rare corner case (much as it obviously matters to any code that has that corner case).

Remember also that a large percentage of ranges that don't wrap dynamic arrays put their elements on the stack, which means that whenever the range is copied, the elements are copied. e.g. if the above example uses std.range.only instead of an array literal, it actually prints

copied: 0
copied: 1
copied: 2
copied: 0
copied: 1
copied: 2
copied: 0
copied: 1
copied: 2

I'm not sure why it ends up printing each of them 3 times rather than 2, but the fact that foreach copies any range that it's given means that in order to have ranges allow non-copyable types, we'd have to change how foreach worked, which would break a lot of code. Right now,

foreach(e; range)
{
}

is lowered to something like

for(auto __range = range; !__range.empty; __range.popFront())
{
    auto e = __range.front;
}

That requires both copying the range and copying front. We could theoretically change it so that everywhere that e was used, it would be replaced with __range.front so that front wasn't copied and probably wouldn't break code in the process (though it could affect performance), but if we made it so that the range itself wasn't copied, then instead of iterating over a copy, foreach would consume the original range, which would definitely break a lot of code.

Now, generic range-based code really shouldn't ever use a range after it's been copied, since whether mutating the copy affects the original depends on the implementation of the range (and thus generic code should assume that foreach may have consumed the range and call save if it doesn't want that to happen), but lots of code does it anyway, and if the code isn't generic, it can work just fine, since then the code can depend on the semantics of that specific range type instead of having to work with the semantics of arbitrary ranges. e.g. if code is written specifically for dynamic arrays rather than ranges in general, then calling save to iterate over a separate slice would be silly, but if the code doesn't currently call save, and foreach were changed to not copy the range, then all of a sudden code like

string str = "hello world";
foreach(c; str)
{
}

would consume the dynamic array instead of iterating over a separate slice.

Also, in order for a range to work with a non-copyable type, either front would have to return by ref or it would have to construct a new object to return so that it could be moved rather than copied. I expect that quite a few ranges would have problems with such a restriction, and certainly, even if a range could have its front changed to return by ref, the range would no longer be able to guarantee that the caller couldn't mutate the value of front, which would cause problems in at least some cases.

In addition, it's quite possible that forcing functions to not copy front would hurt performance. In some cases, by using front directly, you can avoid a copy (e.g. it returns by ref), but in others, the value of front is calculated every time it's called. So, depending on the implementation of the range, calling front repeatedly can be more expensive than simply copying the value and reusing it. As such, it's not necessarily a good idea to force range-based code to not copy front. Also, it's not uncommon for ranges to store the value of front (e.g. it's calculated every time that popFront is called), and that wouldn't work with non-copyable types. Also, if such ranges were changed to calculate the value in front to then work with non-copyable types, whatever calculation they were doing would then have to be done every time front was called instead of being guaranteed to only happen once, which would harm performance in many cases.

So, while aside from the issues with foreach, some specific cases might work with non-copyable types, it causes a lot of complications for ranges in general, and best case, we'd end up with most existing range-based code failing to compile with non-copyable types, and we'd have to go to a bunch of extra work in Phobos to either rework stuff so that it would work with non-copyable types or make it test for copyability. Either way, most range-based code outside of Phobos would almost certainly continue to fail to compile with non-copyable types. We have enough problems already because of save frequently not being used correctly and how badly reference type ranges interact with everything because of that. Having to support non-copyable types would then add yet one more thing that many range implementations would fail miserably at and that solid range implementations would have to test for.

And of course, I don't see how we could support non-copyable types even if we wanted to because of the issues with foreach. Just like with save, we could take a different approach if we didn't care about breaking code, but there's no way that we could make a change like that unless we could find a clean deprecation path to change the behavior, and I have no clue how we could do that with foreach. Either way, a really compelling case would have to be made as to why all of the changes required to support non-copyable types would be worth it, especially considering that this would affect foreach rather than just a stray function or two in Phobos and as such would potentially affect almost every D program in existence.

IIRC, when this has come up previously, Andrei ruled that supporting non-copyable types simply wasn't worth the extra complication, though a quick search doesn't turn up anything where he talked about it. So, I can't verify that at the moment.

- Jonathan M Davis

April 15, 2018
On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
> On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
>> [...]
>
> It's extremely common for range-based functions to copy front. Even foreach does it. e.g.
>
> [...]


Arguably it should “move” them.

This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.

Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.

>
> [...]

April 15, 2018
On Sunday, 15 April 2018 at 10:14:20 UTC, Dmitry Olshansky wrote:
> On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
>> On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
>>> [...]
>>
>> It's extremely common for range-based functions to copy front. Even foreach does it. e.g.
>>
>> [...]
>
>
> Arguably it should “move” them.
>
> This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.
>
> Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.

On a related note this also destroys idea of “lock-free” or “concurrent” input ranges since front/empty/popFront sequence of calls can’t be cheaply synchronized in as a whole in presence of multiple consumers.


>
>>
>> [...]


April 15, 2018
On Sunday, April 15, 2018 10:14:20 Dmitry Olshansky via Digitalmars-d wrote:
> On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
> > On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via
> >
> > Digitalmars-d wrote:
> >> [...]
> >
> > It's extremely common for range-based functions to copy front. Even foreach does it. e.g.
> >
> > [...]
>
> Arguably it should “move” them.
>
> This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.
>
> Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.

If that were the route that we were going to go, then front and popFront wouldn't make any sense at all. Rather, you'd treat it more like in iterator in Java and have next() return the next element and pop it off at the same time. But even if we did that, it wouldn't have been appropriate in the general case to move the element unless we put serious restrictions on what could be an input range (e.g. that would make no sense whatsoever with dynamic arrays or any kind of container).

In some respects it would be nice if input ranges were completely separate beasts from forward ranges, since ideally, forward ranges would all be value types, and input ranges would all be reference types (in which case, save wouldn't be a thing), but that would cause it's own set of problems, because then it wouldn't really work to have input ranges and forward ranges use the same code, which could get really annoying. Input ranges are kind of an abomination IMHO , because they're so painfully limited, but unfortunately, it simply doesn't work to have all ranges be forward ranges if you don't want to have to do stuff like buffer data. So, figuring out what we would ideally do with input ranges and forward ranges gets to be a bit of an interesting question.

Regardless, even if we could all agree on how ranges would ideally be redesigned if we were doing this stuff from scratch, we're not doing it from scratch, and for better or worse, a lot of existing code depends on the current design. So, while I'd love to have the opportunity to sit down and redesign the range API, I don't see how we really can - not and actually have it implemented as part of the language or Phobos anyway.

- Jonathan M Davis


April 15, 2018
On Sunday, 15 April 2018 at 10:39:32 UTC, Jonathan M Davis wrote:
> On Sunday, April 15, 2018 10:14:20 Dmitry Olshansky via Digitalmars-d wrote:
>> On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
>> > On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via
>> >
>> > Digitalmars-d wrote:
>> >> [...]
>> >
>> > It's extremely common for range-based functions to copy front. Even foreach does it. e.g.
>> >
>> > [...]
>>
>> Arguably it should “move” them.
>>
>> This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.
>>
>> Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.
>
> If that were the route that we were going to go, then front and popFront wouldn't make any sense at all. Rather, you'd treat it more like in iterator in Java and have next() return the next element and pop it off at the same time.

Surprisingly, they might have got that right.

> But even if we did that, it wouldn't have been appropriate in the general case to move the element unless we put serious restrictions on what could be an input range (e.g. that would make no sense whatsoever with dynamic arrays or any kind of container).

There it would copy. You get rvalue either way. Another option is ref scope T. I do not see incompatibility you seem to imply.


> So, while I'd love to have the opportunity to sit down and redesign the range API, I don't see how we really can - not and actually have it implemented as part of the language or Phobos anyway.

Yeah, I think as surely as we can redesign postblit. Let’s not focus on learned helplessnes and “it’s just the way things are”. Recent opMove shows there is great flexibility in what can be done.

Seriously, it’s just fixing a library design issue even if its use proved viral.

>
> - Jonathan M Davis


April 15, 2018
On 15/04/18 09:39, Jonathan M Davis wrote:
> 
> It's extremely common for range-based functions to copy front. Even foreach
> does it. e.g.
> 

Not necessarily:

foreach(ref e; [S(0), S(1), S(2)]){}

While that would work with foreach, it will not work with anything that expects an inputRange (and since D defines a forward range as an extension, this is also not a forward range).

> If non-copyable types were allowed, then no function which accepted a range
> would be allowed to copy front without first checking that front was
> copyable (something that most code simply isn't going to do)

No, that's not correct.

If non-copyable types were allowed, then functions that accept a range would fail to compile on non-copyable types if they do a copy.

But the flip side is that if non-copyable would be allowed, a lot of functions that current copy would not. There are far too many unnecessary copies in the code that serve no purpose, but they are invisible, so they are not fixed.

> 
> Remember also that a large percentage of ranges that don't wrap dynamic
> arrays put their elements on the stack, which means that whenever the range
> is copied, the elements are copied.

If I am guaranteed to be able to copy an input range, what's the difference between it and a forward range?

Strike that, I have a better one: If I am allowed to copy an input range, what does the "save" function in forward range even mean?


> I'm not sure why it ends up printing each of them 3 times rather than 2,

Because some code somewhere does an unnecessary copy?

> the fact that foreach copies any range that it's given means that in order
> to have ranges allow non-copyable types, we'd have to change how foreach
> worked, which would break a lot of code. Right now,

Like I said above, foreach(ref) works with "input ranges" that define ref return from front to uncopyable objects. Foreach without ref doesn't, but that's mandatory from the interface: You cannot iterate copies of a range returning uncopyable objects.

As for ranges that store the values inside them: you REALLY don't want to copy those around. They may get quite big (not to mention that I don't see how you can define one over a non-copyable type).

> 
> foreach(e; range)
> {
> }
> 
> is lowered to something like
> 
> for(auto __range = range; !__range.empty; __range.popFront())
> {
>      auto e = __range.front;
> }

And that indeed generates a lot of confusion regarding what running two foreaches in a row does. This is a bug that already affects more than just uncopyable objects.

> 
> That requires both copying the range and copying front. We could
> theoretically change it so that everywhere that e was used, it would be
> replaced with __range.front

Like I said, already solved for foreach(ref)

> Now, generic range-based code really shouldn't ever use a range after it's
> been copied, since whether mutating the copy affects the original depends on
> the implementation of the range (and thus generic code should assume that
> foreach may have consumed the range and call save if it doesn't want that to
> happen), but lots of code does it anyway,

And allowing ranges with uncopyable members would turn this potential deadly run-time bugs into easy to find compile time errors. Sound like a great reason to allow it, to me.

> and if the code isn't generic, it
> can work just fine, since then the code can depend on the semantics of that
> specific range type

Such as it being copyable? Non-generic code is of no relevance to this discussion, as far as I can tell.

> Also, in order for a range to work with a non-copyable type, either front
> would have to return by ref or it would have to construct a new object to
> return so that it could be moved rather than copied.

Correct

> I expect that quite a
> few ranges would have problems with such a restriction,

Then those ranges will not work with non-copyable objects. That's no reason to make it impossible for *any* range to work with non-copyable.

> In addition, it's quite possible that forcing functions to not copy front
> would hurt performance.

I'm going to stop responding now, because, frankly, I think you and I are having very different discussions.

You seem to be advocating against making *all* ranges support non-copyable types, while I'm merely asking that *any* range be allowed to support such types.

Current situation is that a range with uncopyable types is not considered an input range, and cannot use any phobos range functions, including some that it should be able to use without a problem.

There is no reason to block such ranges from using "map" or "any", except the fact they require that the type answer "isInputRange" with true.

> IIRC, when this has come up previously, Andrei ruled that supporting
> non-copyable types simply wasn't worth the extra complication, though a
> quick search doesn't turn up anything where he talked about it. So, I can't
> verify that at the moment.

I will trust Andrei to weigh in on this point if he thinks he should. Until he does, please feel free to only bring up points you are willing to argue yourself.

Shachar
April 15, 2018
On Sunday, April 15, 2018 15:32:37 Shachar Shemesh via Digitalmars-d wrote:
> On 15/04/18 09:39, Jonathan M Davis wrote:
> > It's extremely common for range-based functions to copy front. Even foreach does it. e.g.
>
> Not necessarily:
>
> foreach(ref e; [S(0), S(1), S(2)]){}
>
> While that would work with foreach, it will not work with anything that expects an inputRange (and since D defines a forward range as an extension, this is also not a forward range).

The fact that foreach copies the range that it's given makes using ref with anything other than arrays a bit of an iffy proposition. It will compile regardless of whether front returns by ref or not (which arguably is a bug), but it will only affect the original elements if copying the range doesn't copy the elements - which in the case of a basic input range rather than a forward range, is generally true, otherwise, it would be a forward range. But as it stands, using ref with foreach in generic code is not going to go well. If anything, it's actually a code smell.

For it to work in general, we'd either have to have foreach stop copying ranges (which would break a _lot_ of code, including code that just uses dynamic arrays), or we'd have to have some way to mark or detect a range as working correctly with ref when it's copied - which would mean a new range trait specifically for that, be that one that tests for a feature (such as hasLength) or one that tests for a whole new range type. And I suspect that we would have to use an enum or somesuch to tag the range as having the right semantics, since I don't think that that can be determined just by looking at the types of its member variables.

> > If non-copyable types were allowed, then no function which accepted a range would be allowed to copy front without first checking that front was copyable (something that most code simply isn't going to do)
>
> No, that's not correct.
>
> If non-copyable types were allowed, then functions that accept a range would fail to compile on non-copyable types if they do a copy.
>
> But the flip side is that if non-copyable would be allowed, a lot of functions that current copy would not. There are far too many unnecessary copies in the code that serve no purpose, but they are invisible, so they are not fixed.

And if a templated function fails to compile if it's given a particular argument, and that failure is not due to the template constraint, that usually means that the template constraint is incomplete. And that means that any and all templated functions which copy front would then have to test whether front was copyable. Yes, in some cases, that would mean that a range-based function would be rewritten to not copy front, but either way, it would mean that _every_ range-based function then has to worry about non-copyable elements - either by testing for them in its template constraint or by being written specifically in a way that works with them and then adding that particular use case to their unit tests. So, allowing non-copyable types complicates range-based code across the board.

And the reality of the matter is that in many cases, copying front is actually more efficient. Not all algorithms can call front just once without copying it, and many ranges (e.g. map) calculate front on each call, meaning that repeated calls to front can be more expensive than just calling it once and copying the result. Dealing with that in the most efficient way possible would probably mean having generic code test whether front returns by ref so that it can copy it if it doesn't and just keep referring to front directly if it does, but that would tend to make for some really messy code in any generic function where front needs to be accessed multiple times. As it stands, because of functions like map, it's almost certainly better for any function which accesses front multiple times to copy it.

> > Remember also that a large percentage of ranges that don't wrap dynamic arrays put their elements on the stack, which means that whenever the range is copied, the elements are copied.
>
> If I am guaranteed to be able to copy an input range, what's the difference between it and a forward range?
>
> Strike that, I have a better one: If I am allowed to copy an input range, what does the "save" function in forward range even mean?

You can copy input ranges as much as you like. You just don't get a guaranteed independent copy that can be iterated separately. The classic example would be that a struct copies its members, whereas a class is just a reference. So with structs, you generally (but not always) have a value type, whereas with classes, you have a reference type, and copies of each act completely differently. One is independent, whereas the other is the same as the original aside from assignment. To make matters worse, a range could be a pseudo-reference type in that copying it doesn't provide a completely independent copy, and it doesn't act like a reference type either.

Dynamic arrays are one example of that. In their case, the result is a range that's independent for iteration purposes but where mutating any elements affects the original. Similarly, any time that a range is defined with multiple members and they're not all value types, you get a pseudo-reference type. At that point, how mutating the copy affects the original depends on what each of the members does. It could act like a dynamic array, or you could end up in a situation where mutating the copy only partially affects the state of the iteration in the original, in which case, trying to use the original after the copy has been made would be buggy regardless of whether the code is generic or not. e.g. there's a real risk of a situation like that when you do something like have a range wrapping a socket - especially if it does something like put the bytes it grabs in a buffer in the range when it reads them. If the state of the iteration is shared between value types and reference types, you can get all kinds of weird behaviors if you try to use the original after mutating the copy.

The result of all of this is that in generic code, you have no clue whatsoever what the semantics are of copying a range around, because it depends entirely on the copying semantics of whatever range that code is currently being instantiated with. As such, generic code basically has to assume that once you copy a range, you can't mutate the original (since it may or may not be affected by mutating the copy and may or may not be cleanly affected by mutating the copy), and as such, if you want an independent copy to iterate over, you need a specific way to get one other than just copying. That's what save does.

save was introduced to deal with ranges that were classes (though since it generally requires allocating a new object with every call to save, it's not necessrily a great idea to implement such a range). However, the issues related to copying ranges in general weren't as clear at that point in time. It was just clear that if you wanted to be able to generically get an independent copy, a solution was needed.

But of course, since _most_ ranges act just like save when they're copied, save gets frequently forgotten, making it a bit of a thorn in our side. And the fact that generic code can't ever use a range after copying it (even after passing it to foreach, since that does a copy) is something that's frequently missed, causing subtle bugs. So, the fact that ranges weren't better thought through in the beginning with regards to copying means that we have something that works quite well in typical use cases but gets fairly thorny when you're writing generic code (especially if you're then distributing it to other folks to use with ranges that you've never seen or heard of).

A lot of the problems that we have with ranges stem from the fact that they're basically designed around being an abstraction of dynamic arrays, and any time that a range doesn't act like a dynamic array does, we start ending up with cracks in how things work. In general, we've gotten it to work fairly well with just front, popFront, and empty if you can assume that ranges copy like dynamic arrays do, but of course, many don't, and there, things start to fall apart. Adding save helped, but it didn't really fix the problem, since it can so often be ignored and still have code work, and it leaves the problem of basic input ranges, which by definition, can't be copied like dynamic arrays are. If a range can't have save, then it's basically because there's something about its iteration state that either can't be saved or can't be saved efficiently.

Ideally, we'd be able to do something like require that basic input ranges be reference types (since they obviously can't be value types, or they'd be forward ranges) and require that forward ranges act like value types (at least insofar as copying them copies the iteration state and thus acts like save). And in that case, if we could have a clean way to differentiate between input ranges and forward ranges, we could drop the whole save thing and even rely on how ranges behave when they're copied, but it would make certain types of ranges that work right now not work (e.g. classes couldn't forward be ranges without being wrapped in a struct with a postblit constructor, and basic input ranges would either have to be on the heap or involve a pointer to the stack). It would probably also require that we treat input ranges and forward ranges as incompatible, which would be great in the case where we want to deal with a forward range but not so great in the case where we want to be able to just iterate without ever having to look at any part of the range multiple times, since then we'd basically have to have separate code for input ranges and forward ranges.

So, some redesign of ranges would be great. However, they're highly integrated into Phobos and heavily used in existing code, making it very difficult to have any kind of clean deprecation path so that we could transition from the current state of things to wherever we want to be. We could do it if we were willing to pretty much outright fork Phobos, but I'm not sure that it can be done otherwise. And there's still the issue of foreach in that if we want to change its semantics, that would be a breaking change in the language itself, meaning that forking the standard library wouldn't fix that one. If we want to change foreach at all, we'd have to figure out how to do so in place with a clean deprecation path.

> > foreach(e; range)
> > {
> > }
> >
> > is lowered to something like
> >
> > for(auto __range = range; !__range.empty; __range.popFront())
> > {
> >
> >      auto e = __range.front;
> >
> > }
>
> And that indeed generates a lot of confusion regarding what running two foreaches in a row does. This is a bug that already affects more than just uncopyable objects.

It was the behavior that that was required in order to have ranges behave the same as dynamic arrays with foreach without changing how dynamic arrays work with foreach. Yes, that becomes a problem once you start having ranges that are reference types or have any range type where copying it is not equivalent to calling save, but given that ranges were added on top of the existing behavior, nothing else would have made sense.

Maybe foreach should have been changed for dynamic arrays, but I don't think that the implications of copying ranges in generic code had been particularly thought through at the point in time that ranges were added to foreach, and I don't know how easy it would have been to talk Walter into changing the behavior of foreach for dynamic arrays at the time. At this point in time, it would require a way to cleanly transition from the old behavior to the new without immediately breaking existing code. If someone can come up with that, and the arguments for changing foreach are compelling enough, then maybe it can be changed, but as it stands, it's just one of those annoyances that stems from the fact that ranges tried to emulate dynamic arrays without requiring that they act like dynamic arrays.

> > Now, generic range-based code really shouldn't ever use a range after it's been copied, since whether mutating the copy affects the original depends on the implementation of the range (and thus generic code should assume that foreach may have consumed the range and call save if it doesn't want that to happen), but lots of code does it anyway,
>
> And allowing ranges with uncopyable members would turn this potential deadly run-time bugs into easy to find compile time errors. Sound like a great reason to allow it, to me.

Ranges have to be copyable. If they're not, you have to use ref all over the place, and it becomes impossible to create a function that returns a new range without putting it on the heap, completely destroying function chaining unless we want to start allocating a whole bunch of ranges on the heap, which would destroy performance. It would be great if we could clean up the semantics of copying ranges, but making it so that you can't rely on ranges being copied would destroy our ability to use ranges as we do now.

> I'm going to stop responding now, because, frankly, I think you and I are having very different discussions.
>
> You seem to be advocating against making *all* ranges support non-copyable types, while I'm merely asking that *any* range be allowed to support such types.
>
> Current situation is that a range with uncopyable types is not considered an input range, and cannot use any phobos range functions, including some that it should be able to use without a problem.
>
> There is no reason to block such ranges from using "map" or "any", except the fact they require that the type answer "isInputRange" with true.

If isInputRange accepted non-copyable element types, then all range-based code would have to deal with non-copyable element types. Yes, in many cases, that would mean adding a check to their template constraints which made them require that the element type be copyable rather than refactoring them to actually work with non-copyable element types, but all well-written, range-based code would then have to take non-copyable types into consideration. And any range-based code that was going to have to work with non-copyable types would then have to be very particular about what it did. We'd be complicating all range-based code to handle yet another corner case. We have too many problems already because of ranges trying to be too many things at once and not locking down their semantics in ways that either require things that are not currently required or disallow things that are currently allowed.

- Jonathan M Davis

April 15, 2018
On Sunday, April 15, 2018 12:29:17 Dmitry Olshansky via Digitalmars-d wrote:
> On Sunday, 15 April 2018 at 10:39:32 UTC, Jonathan M Davis wrote:
> > So, while I'd love to have the opportunity to sit down and redesign the range API, I don't see how we really can - not and actually have it implemented as part of the language or Phobos anyway.
>
> Yeah, I think as surely as we can redesign postblit. Let’s not
> focus on learned helplessnes and “it’s just the way things are”.
> Recent opMove shows there is great flexibility in what can be
> done.
>
> Seriously, it’s just fixing a library design issue even if its use proved viral.

Replacing postblit is trivial in comparison to redesigning ranges. The worst that's going to happen with postblit is that all postblit constructors are going to be deprecated in favor of some other kind of constructor, which would result in a deprecation warning for types with postblit constructors until they've replaced one constructor with the other. As far as breaking changes go, it shouldn't be much more complicated than renaming a function and requiring everyone to update their code. It shouldn't even affect anyone using types with postblit constructors except maybe if they're using __traits(getMembers, ...) to get at __postblit or __xpostblit. It's just the types themselves that need to be updated. In that sense, it's actually less disruptive to replace the postblit constructor than it would be to rename a function, because all that has to be changed is the function itself, not every place that it gets called.

And if we get some kind of opMove solution, that wouldn't break any existing code. It would be invisible to anything but new code that was written to use it. That's the easiest kind of improvement.

Ranges, on the other hand, are integral to how a large percentage of Phobos works, and a lot of existing code depends on them heavily. And unless you're talking about simply renaming popFront to something else, changing them gets pretty hairy. Changing ranges in any serious way risks having to essentially fork Phobos (or at least large portions of it). Any major redesign would almost certainly require renaming most or all of the range traits. All functions accepting ranges would have to be rewritten. Any function returning a range would have to either be renamed or somehow work with both the old and new range API (and since containers use an operator rather than a function name to provide ranges, that could make updating containers rather interesting unless we're willing to give up on slicing being how we get a range from a container).

We may be able to make small changes here and there, but I don't see how anything major is possible unless we're willing to break a ton of existing code. Now, maybe someone can come up with something clever to make some decent improvements to what we have without doing a major redesign, but I think that a major redesign requires effectively forking the range API, which means either ending up with two different range APIs that everyone has to deal with, or requiring all range-based code to eventually be updated from one range API to another, which is an enormous change and far more than simply swapping one function out for another.

So, if we decide that we're willing to make changes that effectively require that almost every D program make changes that are potentially quite invasive, then sure, we can redesign the range API. And it would be great to be able to do that. But I have a hard time buying that such a large change would be acceptable at this point.

- Jonathan M Davis


April 16, 2018
On 16/04/18 01:28, Jonathan M Davis wrote:
> The fact that foreach copies the range that it's given makes using ref with
> anything other than arrays a bit of an iffy proposition. It will compile
> regardless of whether front returns by ref or not (which arguably is a bug),
> but it will only affect the original elements if copying the range doesn't
> copy the elements

You keep referring to ranges that contain the data, but I am racking my mind and failing to find examples.

You can have ranges over arrays,linked lists, trees (though you'd probably use opApply there), and any other container, as well as external data sources (files, sockets, pipes). In none of those cases would your range contain actual data. I need to see the use case for ranges *with* embedded data.

Plus, as I've said before, if ranges are expected to contain data, copying them seems like an incredibly bad idea. The whole idea behind casually copying the range about is that it's cheap to do.

> which in the case of a basic input range rather than a
> forward range

I don't see how that statement is true.

> is generally true, otherwise, it would be a forward range.

A forward range is an input range.

> But as it stands, using ref with foreach in generic code is not going to go
> well. If anything, it's actually a code smell.

You are conflating ref with owning data, which I find unsubstantiated.

>> If non-copyable types were allowed, then functions that accept a range
>> would fail to compile on non-copyable types if they do a copy.
>>
>> But the flip side is that if non-copyable would be allowed, a lot of
>> functions that current copy would not. There are far too many
>> unnecessary copies in the code that serve no purpose, but they are
>> invisible, so they are not fixed.
> 
> And if a templated function fails to compile if it's given a particular
> argument, and that failure is not due to the template constraint, that
> usually means that the template constraint is incomplete.

I do not accept this argument outright. In fact, our experience at Weka has drove us to the exact opposite direction.

We deliberately set up functions to include invalid cases and then either let them fail because of lacking constraints, or even set up a static assert to fail the function generation rather than fail matching.

We found that the error messages produced by the alternative are simply too difficult to parse out and figure what went wrong.

With that said, there is some merit to setting up the function constraints so that it matches what the function actually need. It allows overloading the missing combinations by another module. I don't think it is a good argument in this particular case, however.

I think we should set up to document which phobos functions require copying. It will flesh out bugs and unnecessary copying in phobos, as well as force phobos developers to think about this scenario when they write code. It shouldn't even take that much time to do this run: just go over all UTs and add instantiation over a non-copyable type. The UTs that fail need work.

> it would mean that _every_ range-based function then has to worry about
> non-copyable elements

No, just those in libraries. The same way they have to worry about big O complexity, type safety, @nogc and a bunch of other things.

When you write a library, you have to think about your users. This doesn't change anything about that.

> So, allowing
> non-copyable types complicates range-based code across the board.

Like I said above, it is more likely to find bugs than to introduce needless complications. If you code cannot be easily implemented over non-copyable types, exclude them. Don't shut me out of the entire game.

> Not all algorithms can call front just once without
> copying it

Huh? Where did that come from?

Phobos already assumes that calling front needs to be an O(1) operation. You can call front as many times as you like. Why would you assume otherwise?

> and many ranges (e.g. map) calculate front on each call, meaning
> that repeated calls to front can be more expensive than just calling it once
> and copying the result.

Not by a lot, and definitely not in any way the generic code has any right to assume. I reject this argument outright.

> The result of all of this is that in generic code, you have no clue
> whatsoever what the semantics are of copying a range around, because it
> depends entirely on the copying semantics of whatever range that code is
> currently being instantiated with. As such, generic code basically has to
> assume that once you copy a range, you can't mutate the original (since it
> may or may not be affected by mutating the copy and may or may not be
> cleanly affected by mutating the copy),

Jonathan, I'm sorry, but what you are describing is called "a bug".

Input ranges provide no guarantees about copying the range itself. If you do it and expect *anything* (which it seems you do), you have a bug. If you need to copy the range itself, you absolutely need to require a forward range.

A forward range with ref front absolutely promises that both iterations reference the same elements.

I find myself sincerely hoping that you are making these examples up in order to win this discussion turned into an argument for some reason, instead of the way phobos is actually written, because if you're right phobos is *broken*.

The good news is that if Phobos is broken, this is a great way to flush out the places it's broken. Let's add UTs calling all Phobos functions accepting input ranges (as opposed to forward ranges) with non-copyable ranges (i.e. - the range itself is non-copyable) and fix the compilation errors. This shouldn't take more than a few days to do.

Hell, let's write UTs to pass forward ranges that are non-copyable (but return a copy via "save") and flush out those bugs as well.

The point of "development" is to make our code better, not defend the status-quo. Especially since we've just identified a fairly simple way to find the places that need work and easily work on them.

> But of course, since _most_ ranges act just like save when they're copied,
> save gets frequently forgotten, making it a bit of a thorn in our side.

Then rejoice: I just suggested a way to *easily* find and fix those places.

> require that forward ranges act like value types (at
> least insofar as copying them copies the iteration state and thus acts like
> save).

The brackets are important. Just because I'm returning a reference to elements does not mean I don't support the range itself restarting a scan.

> So, some redesign of ranges would be great.

Yes, but please note that I don't think it is *necessary*. Yes, save is annoying, but it *is* workable as is, and since we can harness the compiler to locate where we're deviating from the righteous way, not that expensively if we could just be bothered to do it.

>> And allowing ranges with uncopyable members would turn this potential
>> deadly run-time bugs into easy to find compile time errors. Sound like a
>> great reason to allow it, to me.
> 
> Ranges have to be copyable.

And why would a range iterating over uncopyable objects not be copyable? How are the two related?

Shachar
« First   ‹ Prev
1 2