Jump to page: 1 24  
Page
Thread overview
Proposal: takeFront and takeBack
Jul 03, 2012
Jonathan M Davis
Jul 03, 2012
Roman D. Boiko
Jul 03, 2012
Jonathan M Davis
Jul 03, 2012
Wouter Verhelst
Jul 03, 2012
monarch_dodra
Jul 03, 2012
bearophile
Jul 04, 2012
deadalnix
Jul 04, 2012
Tobias Pankrath
Jul 04, 2012
Jonathan M Davis
Jul 04, 2012
Tobias Pankrath
Jul 04, 2012
Jonathan M Davis
Jul 04, 2012
Tobias Pankrath
Jul 04, 2012
Jonathan M Davis
Jul 04, 2012
deadalnix
Jul 04, 2012
deadalnix
Jul 04, 2012
Timon Gehr
Jul 04, 2012
Jonathan M Davis
Jul 05, 2012
Christophe Travert
Jul 05, 2012
David Piepgrass
Jul 06, 2012
Jonathan M Davis
Jul 03, 2012
monarch_dodra
Jul 03, 2012
Tobias Pankrath
Jul 03, 2012
Roman D. Boiko
Jul 03, 2012
Jonathan M Davis
Jul 03, 2012
Christophe Travert
Jul 03, 2012
Jonathan M Davis
Jul 03, 2012
Roman D. Boiko
Jul 03, 2012
Christophe Travert
Jul 03, 2012
Roman D. Boiko
Jul 03, 2012
Roman D. Boiko
Jul 03, 2012
Roman D. Boiko
Jul 03, 2012
Jonathan M Davis
Jul 03, 2012
Roman D. Boiko
Jul 03, 2012
Dmitry Olshansky
July 03, 2012
This seems like it probably merits a bit of discussion, so I'm bringing it up here rather than simply opening a pull request.

At present, for some ranges (variably-lengthed ranges such as strings in particular), calling front incurs a cost which popFront at least partially duplicates. So, the range primitives are inherently inefficient in that they force you to incur that extra cost as you iterate over the range. Ideally, there would be a way to get front and pop it off at the same time so that you incur the cost only once (either that or have front cache its result in a way that lets it avoid the extra cost in popFront when popFront is called - though that wouldn't work with strings, since for them, the range primitives are free functions). So, I'm proposing takeFront and takeBack:

https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

For most ranges, takeFront does this:

auto takeFront(R)(ref R range)
    if(isInputRange!R && !isNarrowString!R)
{
    assert(!range.empty);
    auto retval = range.front;
    range.popFront();
    return retval;
}

So, it's pretty much the same cost as using front and popFront separately (whether it costs more or less probably depends on the exact code and optimizations, but it should be comparable). But for strings, it looks like this

auto takeFront(R)(ref R range)
    if(isNarrowString!R)
{
    import std.utf;
    assert(!range.empty);
    size_t index = 0;
    auto retval = decode(range, index);
    range = range[index .. $];
    return retval;
}

So, for strings, it'll be more efficient to use takeFront than calling front and
popFront separately. The idea then is that any user-defined range which can
implement takeFront more efficiently than the default will define it. Then range-
based functions use takeFront - e.g. range.takeFront() - and if the user-
defined range implements a more efficient version, that one is used and they gain
the extra efficiency, or if they don't, then the free function is used with
UFCS, and they incur the same cost that they'd incur calling front and
popFront separately. So, it's invisible to range-based functions whether a
range actually implements takeFront itself. takeBack does the same thing as
takeFront but it does it with back and popBack for bidirectional ranges.

I _think_ that this is a fairly clean solution to the problem, but someone else might be able to point out why this is a bad idea, or they might have a better idea. And this will have a definite impact on how ranges are normally used if we add this, so I'm bringing it up here for discussion. Opinions? Thoughts? Insights?

Oh, and if we go with this, ideally, the compiler would be updated to use takeFront for foreach instead of front and popFront if a range implements it (but still do it the current way if it doesn't). So, if typeof(range) implements takeFront,

foreach(e; range) {...}

would then become

for(auto _range = range; !_range.empty;)
{
    auto e = _range.takeFront();
    ...
}

instead of

for(auto _range = range; !_range.empty; _range.popFront())
{
    auto e = _range.front();
    ...
}

but that's an optimization which could be added later.

- Jonathan M Davis
July 03, 2012
The name takeFront is too similar to take, which has very different semantics.
July 03, 2012
Jonathan M Davis <jmdavisProg@gmx.com> writes:

> This seems like it probably merits a bit of discussion, so I'm bringing it up here rather than simply opening a pull request.
>
> At present, for some ranges (variably-lengthed ranges such as strings in particular), calling front incurs a cost which popFront at least partially duplicates. So, the range primitives are inherently inefficient in that they force you to incur that extra cost as you iterate over the range. Ideally, there would be a way to get front and pop it off at the same time so that you incur the cost only once (either that or have front cache its result in a way that lets it avoid the extra cost in popFront when popFront is called - though that wouldn't work with strings, since for them, the range primitives are free functions). So, I'm proposing takeFront and takeBack:
>
> https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b
>
> For most ranges, takeFront does this:
>
> auto takeFront(R)(ref R range)
>     if(isInputRange!R && !isNarrowString!R)
> {
>     assert(!range.empty);
>     auto retval = range.front;
>     range.popFront();
>     return retval;
> }

Couldn't you just overload popFront?

have a void popFront which throws off the first element without returning anything, and an auto popFront which does return data.

I'd always been taught that "pop" means "read a bit and chop it off",
which means that having to first read the front and then pop it off
(i.e., in two separate methods) feels rather counterintuitive to
me. That's in addition to the fact that yes, there's a performance
issue.

But hey, I've only been doing this D thing for a few weeks, so feel free to ignore me if I'm not making any sense :-)

-- 
The volume of a pizza of thickness a and radius z can be described by the following formula:

pi zz a
July 03, 2012
On Tuesday, 3 July 2012 at 16:37:20 UTC, Jonathan M Davis wrote:
> ...
> Oh, and if we go with this, ideally, the compiler would be updated to use
> takeFront for foreach instead of front and popFront if a range implements it
> (but still do it the current way if it doesn't). So, if typeof(range)
> implements takeFront,
>
> foreach(e; range) {...}
>
> would then become
>
> for(auto _range = range; !_range.empty;)
> {
>     auto e = _range.takeFront();
>     ...
> }
>
> instead of
>
> for(auto _range = range; !_range.empty; _range.popFront())
> {
>     auto e = _range.front();
>     ...
> }
>
> but that's an optimization which could be added later.
>
> - Jonathan M Davis

Great job! I think takeFront, is a really good idea. It would also finally fix the whole "should pop return a value" debate.

I'd only be afraid the name might clash (in spirit) with the existing "take", "takeOne" and "takeNone". Not so long ago, I asked for a way to get the last elements of a subrange, and proposed a method called "takeBack"...

Maybe "popMoveFront" would be more addequate/Accurate? This would make it more in line with the existing "moveFront" defined inside range, As well as the "popFront" which would be inside the range implementation: It would make the trio "front, popFront, popMoveFront".

I think wiring it into foreach is also a good idea, but would require compiler support of course. Personally though, regarding foreach, I'd appreciate it more if we could first get foreach with ref to work with "front assign" ( http://forum.dlang.org/thread/ceftaiklanejfhodbpix@forum.dlang.org ) That said, your proposal is probably less costly to implement.
July 03, 2012
On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:
> Jonathan M Davis <jmdavisProg@gmx.com> writes:
>
> Couldn't you just overload popFront?
>
> have a void popFront which throws off the first element without
> returning anything, and an auto popFront which does return data.
>
> I'd always been taught that "pop" means "read a bit and chop it off",
> which means that having to first read the front and then pop it off
> (i.e., in two separate methods) feels rather counterintuitive to
> me. That's in addition to the fact that yes, there's a performance
> issue.
>
> But hey, I've only been doing this D thing for a few weeks, so feel free
> to ignore me if I'm not making any sense :-)

You can't overload by return value, so that is not possible.

As far as I can recall, I've always been taught that pop does NOT (should not) return a value. Rationale being it makes you pay for a read/copy you may not have asked for. That's the way C++ does it, and is what I've come to expect from any language.
July 03, 2012
consumeFront?
July 03, 2012
On Tuesday, 3 July 2012 at 17:29:45 UTC, Tobias Pankrath wrote:
> consumeFront?
+1
July 03, 2012
takeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?

Are there so many ranges that duplicate efforts in front and popFront, where there is no easy workarround? Would it be such an improvement in performance to add takeFront? strings being immutable, my guess would be that it is not that hard for the compiler to optimize a foreach loop (it should be easy to profile this...). Even if there is an efficiency issue, you could easily solve it by implementing a range wrapper or an opApply method to make foreach faster.

-- 
Christophe
July 03, 2012
On Tuesday, July 03, 2012 19:17:40 Roman D. Boiko wrote:
> The name takeFront is too similar to take, which has very different semantics.

I have similar concerns, but takeFront and takeBack were the best that I could come up with. frontPopFront or retPopFront and the like would be horrible. So, suggestions are welcome. The concept is more important than the name, and if someone can come up with a better name, then I'm all for it.

- Jonathan M Davis
July 03, 2012
On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
> takeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?

Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it.

I don't know. It's certainly something to think about. We may need a different solution.

> Are there so many ranges that duplicate efforts in front and popFront, where there is no easy workarround? Would it be such an improvement in performance to add takeFront? strings being immutable, my guess would be that it is not that hard for the compiler to optimize a foreach loop (it should be easy to profile this...). Even if there is an efficiency issue, you could easily solve it by implementing a range wrapper or an opApply method to make foreach faster.

foreach isn't the problem. strings don't even use the range API for foreach. It's operating on them outside of foreach that's the problem.

- Jonathan M Davis
« First   ‹ Prev
1 2 3 4