November 02, 2012
On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
> Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer.
> 
> Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS?

It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it.

- Jonathan M Davis
November 03, 2012
On Fri, Nov 02, 2012 at 04:17:10PM -0400, Jonathan M Davis wrote:
> On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
> > Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer.
> > 
> > Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS?
> 
> It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it.
[...]

I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.


T

-- 
Real Programmers use "cat > a.out".
November 04, 2012
On Saturday, November 03, 2012 16:21:19 H. S. Teoh wrote:
> I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.

I think that you're the only person that I've ever heard of who created a front that was transient outside of std.stdio. But who knows what people outside of this newsgroup are doing.

Andrei seems to think that algorithms should consider input ranges' fronts to be transient, and that forward ranges and greater shouldn't have transient fronts. And I don't think that he's said much about it after that (probably mostly because he was out of town and unable to communicate on the newsgroup much for a while). However, I think that he's the only one in this thread who had come to that conclusion. Most of the rest of us agree that transience isn't necessarily related to the type of range at all. And I think that most everyone has considered ranges like ByLine and ByChunk to be bizarre and abnormal and not something that normal algorithms need to worry about (and if anything that ByLine and ByChunk should be changed to be non-transient or to not be ranges at all).

If we're going to support transient fronts, then we need a solid plan for doing so. Even if it's the case that we're ultimately going to go with input ranges' fronts being considered transient, and all other ranges' fronts being considered non-transient, we need that to be clear and be made explicitly clear in the docs. It's definitely not how things are treated right now.

- Jonathan M Davis
November 04, 2012
On Saturday, 3 November 2012 at 23:19:11 UTC, H. S. Teoh wrote:
> I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.

 From watching and gleaming from what I have so far, I can only think that transient should NOT be the default way that ranges work (as it causes too many problems); However transient should be allowed (and available) when possible.

 I can only think that perhaps using a template to determine if it's transient should be present, that way it's a simple flag to enable/disable and should propagate anywhere that that range was used.

struct Range(bool isTransient=false) {
  static if (isTransient) {
    //front and transient
  } else {
    //front and non-transient
  }
}

 Unless someone thinks this is a bad approach?
November 04, 2012
On Sunday, November 04, 2012 02:30:49 Era Scarecrow wrote:
>   From watching and gleaming from what I have so far, I can only
> think that transient should NOT be the default way that ranges
> work (as it causes too many problems); However transient should
> be allowed (and available) when possible.
> 
>   I can only think that perhaps using a template to determine if
> it's transient should be present, that way it's a simple flag to
> enable/disable and should propagate anywhere that that range was
> used.
> 
> struct Range(bool isTransient=false) {
>    static if (isTransient) {
>      //front and transient
>    } else {
>      //front and non-transient
>    }
> }
> 
>   Unless someone thinks this is a bad approach?

That's just trying to be able to tell a range whether it should be transient or not and doesn't really solve the problem. The problem is that algorithms in general assume that front is _not_ transient, and we need to either decide that algorithms are free to continue to assume that front is non-transient (meaning that you make front transient on your range at your own risk and no guarantees that any range-based functions will work with it), or we need to provide a way for range-based functions to know whether a particular range has a transient front or not. As such, I think that it comes down primarily to either

1. Just decide that all input ranges can have transient fronts and that no ranges greater than input ranges can have transient fronts. Then algorithms can infer transience from the type of range. This is how Andrei thinks that it's supposed to work but pretty much no one else does (if nothing else, because that idea was never made clear anywhere, and no one else inferred that from the definitions of input ranges).

2. Make it so that any range can have a transient front but provide a template for checking for it like we do with stuff like hasSlicing or length. Presumably that template would check that for something like like the existence of R.isTransient, and ranges with transient fronts would just declare an enum with that name. Then every range-based function which relied on front not being transient would have to check whether the range that it was given was transient or not or fail to compile if it is, and every wrapper range would have to propogate the isTransient member. Failure to do so in either case (which _would_ happen at least some of the time, at least outside of Phobos), would result in transient ranges silently doing bizarre things.

3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable.

4. Just decide that range-based algorithms can assume that front isn't ever transient. Any range types which make it transient do so at their own risk.

Honestly, I'd really like to go with #4 and make ByLine and ByChunk use opApply. I think that transience complicates things too much. Even just having input ranges have transient fronts really screws with things. Something as simple as

auto app = appender!E();
foreach(e; range)
    app.put(e);

is totally screwed by transience, and it's only operating on input ranges. I seriously question that a function like std.array.array can be implemented with a transient front. Without a way to explicitly copy front, you can't have front be transient and keep it like std.array.array does. And even if there _were_ a way to explictly copy the result of front, that would be horribly inefficient in the cases where front isn't transient, because it would force you to copy when it was completely unnecessary.

I honestly don't think that transience really works outside of some very specific use cases, and the cost of trying to support it will not be small. And given the issues that std.array.array has with it and the fact that std.array.array really shouldn't have to operate on forward ranges, I don't think that the idea of input ranges having transient fronts is really going to work, which would mean using something like isTransient, which has the same sort of problems as save tends to and is likely to be forgotten even more than save is. So, I really question that transience and ranges is really going to work at all. If we _do_ support it though, I think that we need to go the isTransient route.

- Jonathan M Davis
November 04, 2012
On 11/3/12 7:21 PM, H. S. Teoh wrote:
> On Fri, Nov 02, 2012 at 04:17:10PM -0400, Jonathan M Davis wrote:
>> On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
>>> Ah, I see. That makes sense. So basically it's not the source (or
>>> any intermediate step) that decides whether to use the optimization,
>>> but the final consumer.
>>>
>>> Though now we have the issue that all intermediate ranges must
>>> propagate .fast, which is troublesome if every range has to do it
>>> manually. Can this be handled automatically by UFCS?
>>
>> It's no different form propogating slicing or random access or
>> whatnot. Wrapper ranges have to look at the capabilities of the ranges
>> that they're wrapping and create wrappers for each of the range
>> functions where appropriate. If we added isTransient or fastRange or
>> whatever, wrapper ranges would then have to take it into account, or
>> the wrapper wouldn't have it.
> [...]
>
> I wish Andrei would give some input as to how we should proceed with
> this. I do consider this a major issue with ranges, because for
> efficiency reasons I often write ranges that have transient .front
> values, and they can lead to subtle bugs with the current implementation
> of std.algorithm. It would be good to settle on this issue one way or
> another. I'd be happy even if the decision is to say that transient
> ranges are invalid and shouldn't be considered "real" ranges. Anything
> is better than the current nebulous state of things which only leads to
> subtle bugs for the unwary.

I think at this point we should streamline and simplify ranges while addressing fundamental concepts (such as unbuffered ranges). Delving into defining increasingly niche attributes for ranges is important work, but to the extent we can make things simpler that would be a gain.

In my view, transiency of .front has a lot to do with input ranges. An input range means that .front is evanescent upon calling .popBack; it has no lasting presence in memory, and invoking popBack means the next item will replace the current one.

Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable.


Andrei
November 04, 2012
On 11/3/12 9:02 PM, Jonathan M Davis wrote:
> Andrei seems to think that algorithms should consider input ranges' fronts to
> be transient, and that forward ranges and greater shouldn't have transient
> fronts. And I don't think that he's said much about it after that (probably
> mostly because he was out of town and unable to communicate on the newsgroup
> much for a while). However, I think that he's the only one in this thread who
> had come to that conclusion. Most of the rest of us agree that transience
> isn't necessarily related to the type of range at all. And I think that most
> everyone has considered ranges like ByLine and ByChunk to be bizarre and
> abnormal and not something that normal algorithms need to worry about (and if
> anything that ByLine and ByChunk should be changed to be non-transient or to
> not be ranges at all).

(Let's drop the argumentum ad populum part and focus on the technical argument.)

These ranges are the only ones dealing with streaming input - the only veritable input ranges, so qualifying them as bizarre would pretty much raise the point that input ranges are bizarre.

An input range means that you're fetching an item, looking at it, and discard it. Transience is, in my opinion, quite intrinsic to input ranges. It is also a well understood fact that copying a char[] does not guarantee its contents are copied, in any API. So just by seeing as ByLine is an input range of char[], its properties are self-evident. If it were an input range of string, it would have other properties, equally self-evident. This destroys the argument that transience is unrelated to the range type and kind.

> If we're going to support transient fronts, then we need a solid plan for
> doing so. Even if it's the case that we're ultimately going to go with input
> ranges' fronts being considered transient, and all other ranges' fronts being
> considered non-transient, we need that to be clear and be made explicitly
> clear in the docs. It's definitely not how things are treated right now.

I think indeed input ranges should offer no more guarantees about the transiency of their .front as any other APIs offering the same type. Invoking .popFront for an input range means the current element buffered is discarded and replaced with the next one.

Requiring anything more would put undue pressure on any actual input range implementations and would essentially make the entire range infrastructure unusable efficiently with streaming.

For ByLine we should add another range that automatically produces strings - either ByLineStrings or ByLine!string.


Andrei
November 04, 2012
Le 02/11/2012 21:17, Jonathan M Davis a écrit :
> On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
>> Ah, I see. That makes sense. So basically it's not the source (or any
>> intermediate step) that decides whether to use the optimization, but the
>> final consumer.
>>
>> Though now we have the issue that all intermediate ranges must propagate
>> .fast, which is troublesome if every range has to do it manually. Can
>> this be handled automatically by UFCS?
>
> It's no different form propogating slicing or random access or whatnot. Wrapper
> ranges have to look at the capabilities of the ranges that they're wrapping
> and create wrappers for each of the range functions where appropriate. If we
> added isTransient or fastRange or whatever, wrapper ranges would then have to
> take it into account, or the wrapper wouldn't have it.
>
> - Jonathan M Davis

The whole point of my proposal is to avoid adding isTransient or something similar.

Let's put back relevant elements of the solution I propose :
1/ range preserve .front by default .
2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front .
3/ Range that don't support that feature can be made compatible with UFCS, .fast being a NOP in such case.
4/ Ranges that work as a transofmration on existing ranges can propose the property too if they don't require .front to be persistant. It is implemented by bubbling the property call to the source range.
5/ It is up to the ultimate consumer to call that property or not.
November 04, 2012
Le 04/11/2012 03:26, Jonathan M Davis a écrit :
> 3. Make it so that ranges which can be transient are non-transient by default
> but provide a function to get at a transient version for speed (which was the
> fastRange proposal in this thread). The main problem here is that when the
> fast range gets wrapped, it's transient, and so anything using the wrapped
> range will be forced to use the transient version rather than using the non-
> transient version and only using the transient version when it's asked for.
> So, I don't think that this is particularly viable.
>

Can you elaborate on that. I definitively don't see a problem that big here.
November 04, 2012
Le 04/11/2012 14:55, Andrei Alexandrescu a écrit :
> On 11/3/12 9:02 PM, Jonathan M Davis wrote:
>> Andrei seems to think that algorithms should consider input ranges'
>> fronts to
>> be transient, and that forward ranges and greater shouldn't have
>> transient
>> fronts. And I don't think that he's said much about it after that
>> (probably
>> mostly because he was out of town and unable to communicate on the
>> newsgroup
>> much for a while). However, I think that he's the only one in this
>> thread who
>> had come to that conclusion. Most of the rest of us agree that transience
>> isn't necessarily related to the type of range at all. And I think
>> that most
>> everyone has considered ranges like ByLine and ByChunk to be bizarre and
>> abnormal and not something that normal algorithms need to worry about
>> (and if
>> anything that ByLine and ByChunk should be changed to be non-transient
>> or to
>> not be ranges at all).
>
> (Let's drop the argumentum ad populum part and focus on the technical
> argument.)
>
> These ranges are the only ones dealing with streaming input - the only
> veritable input ranges, so qualifying them as bizarre would pretty much
> raise the point that input ranges are bizarre.
>
> An input range means that you're fetching an item, looking at it, and
> discard it. Transience is, in my opinion, quite intrinsic to input
> ranges. It is also a well understood fact that copying a char[] does not
> guarantee its contents are copied, in any API. So just by seeing as
> ByLine is an input range of char[], its properties are self-evident. If
> it were an input range of string, it would have other properties,
> equally self-evident. This destroys the argument that transience is
> unrelated to the range type and kind.
>

You are explaining here that is reasonable to expect so, not that this is the most obvious choice, the most simple, or whatever.

The conclusion exceed the argument here.

I'd also argue that if people that wrote phobos get that wrong (I think it is fair to consider people writing phobos code as D specialists), it is safe to assume that most D programmer will get it wrong.

>> If we're going to support transient fronts, then we need a solid plan for
>> doing so. Even if it's the case that we're ultimately going to go with
>> input
>> ranges' fronts being considered transient, and all other ranges'
>> fronts being
>> considered non-transient, we need that to be clear and be made explicitly
>> clear in the docs. It's definitely not how things are treated right now.
>
> I think indeed input ranges should offer no more guarantees about the
> transiency of their .front as any other APIs offering the same type.
> Invoking .popFront for an input range means the current element buffered
> is discarded and replaced with the next one.
>
> Requiring anything more would put undue pressure on any actual input
> range implementations and would essentially make the entire range
> infrastructure unusable efficiently with streaming.
>

This is already doomed to phobos implementation of range stuffs. That may sound nice on paper, but practice have shown this isn't a realistic approach.

> For ByLine we should add another range that automatically produces
> strings - either ByLineStrings or ByLine!string.
>

If we are modifying things like that, better proposal have been made in this thread and previous.