July 03, 2012
On Tuesday, July 03, 2012 19:29:44 Tobias Pankrath wrote:
> consumeFront?

That seems like a good suggestion.

- Jonathan M Davis
July 03, 2012
On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:
> On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
>> takeFront implementation is dangerous for ranges which invalidates their
>> front value when popFront is called, for instance, File.byLine. Thus
>> takeFront will have to be used with care: any range implement takeFront
>> (because of the template and USFC), but it may not be valid. That makes
>> the range interface more complicated: There is a takeFront property, but
>> you have to check it is safe to use... how do you check that by the way?
>
> Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm
> not quite sure how to get around that. Hmmm... It would arguably be a bit
> ugly, but a range which couldn't safely be used with takeFront could have an
> enum on it which indicated that, and takeFront would fail to compile if used
> with such a range. Of course, that would mean that you'd have to special case
> such ranges in any range-based function which used takeFront so that there was
> a branch which didn't use takeFront for ranges which couldn't use it.

Just an idea (feels dirty, but still): such range could redefine takeFront, and maintain state. takeFront would defer a call to popFront until the next operation. All of front, popFront, takeFront and empty would check state (whether popFront has been deferred), and execute popFront first. Side effect would be that a call to empty would invalidate result of takeFront, but that could be changed also: empty instead of calling popFront would check whether there is anything else after the front element.

This is much more complicated, but would only be necessary for such containers.

July 03, 2012
On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:
> On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
> > takeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?
> 
> Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it.
> 
> I don't know. It's certainly something to think about. We may need a different solution.

An alternative would be to make it so that takeFront (or consumeFront, as someone suggested, since that's a better name) would be defined only for ranges which it helps. So, similar to hasSlicing, you get something like hasConsume, and then a range-based function which wants to use consumeFront or consumeBack checks hasConsume!R and uses it on that particular static if branch if it does and uses another branch with front and popFront if it doesn't. std.range.consumeFront then works only with strings (and maybe arrays).

However, it seemed to me that a major upside of consumeFront as I proposed it was that you could always just use it, and it would do the most efficient thing, so you _didn't_ have to special case for it, whereas with hasConsume, you would. But if you can't use consumeFront safely on all ranges, then that doesn't really work. It's still worth something but not as much.

You can already special case strings (and in many cases need to). What consumeFront does is make it so that functions operating on wrappers around strings (i.e. the results of functions like filter or map) can operate on them more efficiently as well. It also makes it so that user-defined ranges can get that benefit from algorithms that know nothing about those ranges and therefore don't special case them (e.g. std.algorithm functions can then be more efficient for user-defined ranges which can define a more efficient consumeFront). So, even if we had hasConsume and functions had to special case in order to use consumeFront or consumeBack, it would still be beneficial. However, it would be nice to have a solution which _didn't_ require special casing.

- Jonathan M Davis
July 03, 2012
On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:
> On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
>> takeFront implementation is dangerous for ranges which invalidates their
>> front value when popFront is called, for instance, File.byLine. Thus
>> takeFront will have to be used with care: any range implement takeFront
>> (because of the template and USFC), but it may not be valid. That makes
>> the range interface more complicated: There is a takeFront property, but
>> you have to check it is safe to use... how do you check that by the way?
>
> Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm
> not quite sure how to get around that. Hmmm... It would arguably be a bit
> ugly, but a range which couldn't safely be used with takeFront could have an
> enum on it which indicated that, and takeFront would fail to compile if used
> with such a range. Of course, that would mean that you'd have to special case
> such ranges in any range-based function which used takeFront so that there was
> a branch which didn't use takeFront for ranges which couldn't use it.
>
> I don't know. It's certainly something to think about. We may need a different
> solution.

An alternative is not to define a free function. So if consumeFront is defined, it may be used by the algorithm, otherwise simply call to front and popFront in the algorithm. If branching is necessary, then the is no value in a free function (it wouldn't optimize for speed any way).
July 03, 2012
On Tuesday, 3 July 2012 at 18:00:55 UTC, Jonathan M Davis wrote:
> On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:
>> On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
>> > takeFront implementation is dangerous for ranges which invalidates their
>> > front value when popFront is called, for instance, File.byLine. Thus
>> > takeFront will have to be used with care: any range implement takeFront
>> > (because of the template and USFC), but it may not be valid. That makes
>> > the range interface more complicated: There is a takeFront property, but
>> > you have to check it is safe to use... how do you check that by the way?
>> 
>> Hmm. I hadn't thought of that. That could be a good reason not to do this.
>> I'm not quite sure how to get around that. Hmmm... It would arguably be a
>> bit ugly, but a range which couldn't safely be used with takeFront could
>> have an enum on it which indicated that, and takeFront would fail to
>> compile if used with such a range. Of course, that would mean that you'd
>> have to special case such ranges in any range-based function which used
>> takeFront so that there was a branch which didn't use takeFront for ranges
>> which couldn't use it.
>> 
>> I don't know. It's certainly something to think about. We may need a
>> different solution.
>
> An alternative would be to make it so that takeFront (or consumeFront, as
> someone suggested, since that's a better name) would be defined only for ranges
> which it helps. So, similar to hasSlicing, you get something like hasConsume,
> and then a range-based function which wants to use consumeFront or consumeBack
> checks hasConsume!R and uses it on that particular static if branch if it does
> and uses another branch with front and popFront if it doesn't.
> std.range.consumeFront then works only with strings (and maybe arrays).
>
> However, it seemed to me that a major upside of consumeFront as I proposed it
> was that you could always just use it, and it would do the most efficient thing,
> so you _didn't_ have to special case for it, whereas with hasConsume, you
> would. But if you can't use consumeFront safely on all ranges, then that
> doesn't really work. It's still worth something but not as much.
>
> You can already special case strings (and in many cases need to). What
> consumeFront does is make it so that functions operating on wrappers around
> strings (i.e. the results of functions like filter or map) can operate on them
> more efficiently as well. It also makes it so that user-defined ranges can get
> that benefit from algorithms that know nothing about those ranges and therefore
> don't special case them (e.g. std.algorithm functions can then be more efficient
> for user-defined ranges which can define a more efficient consumeFront). So, even
> if we had hasConsume and functions had to special case in order to use
> consumeFront or consumeBack, it would still be beneficial. However, it would be
> nice to have a solution which _didn't_ require special casing.
>
> - Jonathan M Davis

You bet me by 19 seconds and a better explanation :)

So far we have two options: cleaner code because there would be no need to hack containers which invalidate their front value on a call to popFront, or a hack which I proposed above, but localized in such containers. By the way, some of such containers could potentially defer invalidating front till the next call to front.
July 03, 2012
Range have been designed with the idea that front is valid until next call to popFront. If popFront was to be called right after front, then it would just be a popFront that returns a value, and maybe a justPop or something if you don't want to copy the value.

It's delicate to come now and decide that if a previously returned front value is no longer valid after a call to popFront, it should be documented by an enum. Who is going to review all libraries (written and to come) to make sure the enum is placed when it should be? The reverse should be done instead: if you want you range to be optimized by calling takeFront, define something (for example... takeFront). Then use algorithms that are specialized for takeFront.
July 03, 2012
On Tuesday, 3 July 2012 at 18:12:54 UTC, travert@phare.normalesup.org (Christophe Travert) wrote:
>
> Range have been designed with the idea that front is valid until next
> call to popFront. If popFront was to be called right after front, then
> it would just be a popFront that returns a value, and maybe a justPop or
> something if you don't want to copy the value.
>
> It's delicate to come now and decide that if a previously returned front
> value is no longer valid after a call to popFront, it should be
> documented by an enum. Who is going to review all libraries (written and
> to come) to make sure the enum is placed when it should be? The reverse
> should be done instead: if you want you range to be optimized by calling
> takeFront, define something (for example... takeFront). Then use
> algorithms that are specialized for takeFront.

takeFront -> consumeFront

hasConsume has been proposed by Jonathan already.

But not having to modify all existing ranges which invalidate value returned by front might be a good argument against my hack also.

However I'm not sure there are really many of such ranges. Those must return something with reference semantics in order to be able to invalidate it. Most common case would be when front value is always stored in the same memory (a buffer).

The benefit of my hacking idea is that clients would need no special casing.
July 03, 2012
Still, ranges which invalidate front on popFront could defer that (store a flag that popFront or consumeFront has been called and invalidate value previously returned from front in the next call to front). That would be intrusive with respect to such ranges, and a breaking change. But it would simplify client code significantly.

(This doesn't mean that I'm insisting, or strongly prefer some of my two ideas, just wanted to clearly restate the second one.)
July 03, 2012
On 03-Jul-12 20:37, Jonathan M Davis wrote:
> This seems like it probably merits a bit of discussion, so I'm bringing it up
> here rather than simply opening a pull request.
>
> At present, for some ranges (variably-lengthed ranges such as strings in
> particular), calling front incurs a cost which popFront at least partially
> duplicates. So, the range primitives are inherently inefficient in that they
> force you to incur that extra cost as you iterate over the range. Ideally,
> there would be a way to get front and pop it off at the same time so that you
> incur the cost only once (either that or have front cache its result in a way
> that lets it avoid the extra cost in popFront when popFront is called - though
> that wouldn't work with strings, since for them, the range primitives are free
> functions). So, I'm proposing takeFront and takeBack:
>

I was about to propose fetchFront/fetchBack with similar semantics. Thanks for pushing this proposal it looks good.

My initial intent however was to add Variable Length Range as a concept. ... Now I think binary predicate hasFetch a-la hasSlicing is better.

> https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b
>
> For most ranges, takeFront does this:
>
> auto takeFront(R)(ref R range)
>      if(isInputRange!R && !isNarrowString!R)
> {
>      assert(!range.empty);
>      auto retval = range.front;
>      range.popFront();
>      return retval;
> }
>

Aye. Yet there is indeed a problem with pseudo-ranges that reuse 'slot' for front on each popFront. Well they always been brittle.

>
> So, for strings, it'll be more efficient to use takeFront than calling front and
> popFront separately. The idea then is that any user-defined range which can
> implement takeFront more efficiently than the default will define it. Then range-
> based functions use takeFront - e.g. range.takeFront() - and if the user-
> defined range implements a more efficient version, that one is used and they gain
> the extra efficiency, or if they don't, then the free function is used with
> UFCS, and they incur the same cost that they'd incur calling front and
> popFront separately. So, it's invisible to range-based functions whether a
> range actually implements takeFront itself. takeBack does the same thing as
> takeFront but it does it with back and popBack for bidirectional ranges.
>
> I _think_ that this is a fairly clean solution to the problem, but someone
> else might be able to point out why this is a bad idea, or they might have a
> better idea. And this will have a definite impact on how ranges are normally
> used if we add this, so I'm bringing it up here for discussion. Opinions?
> Thoughts? Insights?
>
> Oh, and if we go with this, ideally, the compiler would be updated to use
> takeFront for foreach instead of front and popFront if a range implements it
> (but still do it the current way if it doesn't). So, if typeof(range)
> implements takeFront,

Right. In fact as we've seen above with e.g. stdin.byLine not every range can do fetchFront correctly so it's a strict subset. That was the reason for current trio of basic operations BTW.


-- 
Dmitry Olshansky


July 03, 2012
monarch_dodra:

> As far as I can recall, I've always been taught that pop does NOT (should not) return a value. Rationale being it makes you pay for a read/copy you may not have asked for.

I think it's also a matter of exception safety.


> That's the way C++ does it, and is what I've come to expect from any language.

I expect a language to have a pop() in its collections, that returns the first item and removes it from the collection, as in Python. In D sometimes I put the front and popfront on the same line of code, because I think of them almost as a single operation :-)

Bye,
bearophile