View mode: basic / threaded / horizontal-split · Log in · Help
July 03, 2012
Proposal: takeFront and takeBack
This seems like it probably merits a bit of discussion, so I'm bringing it up 
here rather than simply opening a pull request.

At present, for some ranges (variably-lengthed ranges such as strings in 
particular), calling front incurs a cost which popFront at least partially 
duplicates. So, the range primitives are inherently inefficient in that they 
force you to incur that extra cost as you iterate over the range. Ideally, 
there would be a way to get front and pop it off at the same time so that you 
incur the cost only once (either that or have front cache its result in a way 
that lets it avoid the extra cost in popFront when popFront is called - though 
that wouldn't work with strings, since for them, the range primitives are free 
functions). So, I'm proposing takeFront and takeBack:

https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

For most ranges, takeFront does this:

auto takeFront(R)(ref R range)
   if(isInputRange!R && !isNarrowString!R)
{
   assert(!range.empty);
   auto retval = range.front;
   range.popFront();
   return retval;
}

So, it's pretty much the same cost as using front and popFront separately 
(whether it costs more or less probably depends on the exact code and 
optimizations, but it should be comparable). But for strings, it looks like 
this

auto takeFront(R)(ref R range)
   if(isNarrowString!R)
{
   import std.utf;
   assert(!range.empty);
   size_t index = 0;
   auto retval = decode(range, index);
   range = range[index .. $];
   return retval;
}

So, for strings, it'll be more efficient to use takeFront than calling front and 
popFront separately. The idea then is that any user-defined range which can 
implement takeFront more efficiently than the default will define it. Then range-
based functions use takeFront - e.g. range.takeFront() - and if the user-
defined range implements a more efficient version, that one is used and they gain 
the extra efficiency, or if they don't, then the free function is used with 
UFCS, and they incur the same cost that they'd incur calling front and 
popFront separately. So, it's invisible to range-based functions whether a 
range actually implements takeFront itself. takeBack does the same thing as 
takeFront but it does it with back and popBack for bidirectional ranges.

I _think_ that this is a fairly clean solution to the problem, but someone 
else might be able to point out why this is a bad idea, or they might have a 
better idea. And this will have a definite impact on how ranges are normally 
used if we add this, so I'm bringing it up here for discussion. Opinions? 
Thoughts? Insights?

Oh, and if we go with this, ideally, the compiler would be updated to use 
takeFront for foreach instead of front and popFront if a range implements it 
(but still do it the current way if it doesn't). So, if typeof(range) 
implements takeFront,

foreach(e; range) {...}

would then become

for(auto _range = range; !_range.empty;)
{
   auto e = _range.takeFront();
   ...
}

instead of

for(auto _range = range; !_range.empty; _range.popFront())
{
   auto e = _range.front();
   ...
}

but that's an optimization which could be added later.

- Jonathan M Davis
July 03, 2012
Re: Proposal: takeFront and takeBack
The name takeFront is too similar to take, which has very 
different semantics.
July 03, 2012
Re: Proposal: takeFront and takeBack
Jonathan M Davis <jmdavisProg@gmx.com> writes:

> This seems like it probably merits a bit of discussion, so I'm bringing it up 
> here rather than simply opening a pull request.
>
> At present, for some ranges (variably-lengthed ranges such as strings in 
> particular), calling front incurs a cost which popFront at least partially 
> duplicates. So, the range primitives are inherently inefficient in that they 
> force you to incur that extra cost as you iterate over the range. Ideally, 
> there would be a way to get front and pop it off at the same time so that you 
> incur the cost only once (either that or have front cache its result in a way 
> that lets it avoid the extra cost in popFront when popFront is called - though 
> that wouldn't work with strings, since for them, the range primitives are free 
> functions). So, I'm proposing takeFront and takeBack:
>
> https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b
>
> For most ranges, takeFront does this:
>
> auto takeFront(R)(ref R range)
>     if(isInputRange!R && !isNarrowString!R)
> {
>     assert(!range.empty);
>     auto retval = range.front;
>     range.popFront();
>     return retval;
> }

Couldn't you just overload popFront?

have a void popFront which throws off the first element without
returning anything, and an auto popFront which does return data.

I'd always been taught that "pop" means "read a bit and chop it off",
which means that having to first read the front and then pop it off
(i.e., in two separate methods) feels rather counterintuitive to
me. That's in addition to the fact that yes, there's a performance
issue.

But hey, I've only been doing this D thing for a few weeks, so feel free
to ignore me if I'm not making any sense :-)

-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a
July 03, 2012
Re: Proposal: takeFront and takeBack
On Tuesday, 3 July 2012 at 16:37:20 UTC, Jonathan M Davis wrote:
> ...
> Oh, and if we go with this, ideally, the compiler would be 
> updated to use
> takeFront for foreach instead of front and popFront if a range 
> implements it
> (but still do it the current way if it doesn't). So, if 
> typeof(range)
> implements takeFront,
>
> foreach(e; range) {...}
>
> would then become
>
> for(auto _range = range; !_range.empty;)
> {
>     auto e = _range.takeFront();
>     ...
> }
>
> instead of
>
> for(auto _range = range; !_range.empty; _range.popFront())
> {
>     auto e = _range.front();
>     ...
> }
>
> but that's an optimization which could be added later.
>
> - Jonathan M Davis

Great job! I think takeFront, is a really good idea. It would 
also finally fix the whole "should pop return a value" debate.

I'd only be afraid the name might clash (in spirit) with the 
existing "take", "takeOne" and "takeNone". Not so long ago, I 
asked for a way to get the last elements of a subrange, and 
proposed a method called "takeBack"...

Maybe "popMoveFront" would be more addequate/Accurate? This would 
make it more in line with the existing "moveFront" defined inside 
range, As well as the "popFront" which would be inside the range 
implementation: It would make the trio "front, popFront, 
popMoveFront".

I think wiring it into foreach is also a good idea, but would 
require compiler support of course. Personally though, regarding 
foreach, I'd appreciate it more if we could first get foreach 
with ref to work with "front assign" ( 
http://forum.dlang.org/thread/ceftaiklanejfhodbpix@forum.dlang.org 
) That said, your proposal is probably less costly to implement.
July 03, 2012
Re: Proposal: takeFront and takeBack
On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:
> Jonathan M Davis <jmdavisProg@gmx.com> writes:
>
> Couldn't you just overload popFront?
>
> have a void popFront which throws off the first element without
> returning anything, and an auto popFront which does return data.
>
> I'd always been taught that "pop" means "read a bit and chop it 
> off",
> which means that having to first read the front and then pop it 
> off
> (i.e., in two separate methods) feels rather counterintuitive to
> me. That's in addition to the fact that yes, there's a 
> performance
> issue.
>
> But hey, I've only been doing this D thing for a few weeks, so 
> feel free
> to ignore me if I'm not making any sense :-)

You can't overload by return value, so that is not possible.

As far as I can recall, I've always been taught that pop does NOT 
(should not) return a value. Rationale being it makes you pay for 
a read/copy you may not have asked for. That's the way C++ does 
it, and is what I've come to expect from any language.
July 03, 2012
Re: Proposal: takeFront and takeBack
consumeFront?
July 03, 2012
Re: Proposal: takeFront and takeBack
On Tuesday, 3 July 2012 at 17:29:45 UTC, Tobias Pankrath wrote:
> consumeFront?
+1
July 03, 2012
Re: Proposal: takeFront and takeBack
takeFront implementation is dangerous for ranges which invalidates their 
front value when popFront is called, for instance, File.byLine. Thus 
takeFront will have to be used with care: any range implement takeFront 
(because of the template and USFC), but it may not be valid. That makes 
the range interface more complicated: There is a takeFront property, but 
you have to check it is safe to use... how do you check that by the way?

Are there so many ranges that duplicate efforts in front and popFront, 
where there is no easy workarround? Would it be such an improvement in 
performance to add takeFront? strings being immutable, my guess would be 
that it is not that hard for the compiler to optimize a foreach loop (it 
should be easy to profile this...). Even if there is an efficiency 
issue, you could easily solve it by implementing a range wrapper or an 
opApply method to make foreach faster.

-- 
Christophe
July 03, 2012
Re: Proposal: takeFront and takeBack
On Tuesday, July 03, 2012 19:17:40 Roman D. Boiko wrote:
> The name takeFront is too similar to take, which has very
> different semantics.

I have similar concerns, but takeFront and takeBack were the best that I could 
come up with. frontPopFront or retPopFront and the like would be horrible. So, 
suggestions are welcome. The concept is more important than the name, and if 
someone can come up with a better name, then I'm all for it.

- Jonathan M Davis
July 03, 2012
Re: Proposal: takeFront and takeBack
On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
> takeFront implementation is dangerous for ranges which invalidates their
> front value when popFront is called, for instance, File.byLine. Thus
> takeFront will have to be used with care: any range implement takeFront
> (because of the template and USFC), but it may not be valid. That makes
> the range interface more complicated: There is a takeFront property, but
> you have to check it is safe to use... how do you check that by the way?

Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm 
not quite sure how to get around that. Hmmm... It would arguably be a bit 
ugly, but a range which couldn't safely be used with takeFront could have an 
enum on it which indicated that, and takeFront would fail to compile if used 
with such a range. Of course, that would mean that you'd have to special case 
such ranges in any range-based function which used takeFront so that there was 
a branch which didn't use takeFront for ranges which couldn't use it.

I don't know. It's certainly something to think about. We may need a different 
solution.

> Are there so many ranges that duplicate efforts in front and popFront,
> where there is no easy workarround? Would it be such an improvement in
> performance to add takeFront? strings being immutable, my guess would be
> that it is not that hard for the compiler to optimize a foreach loop (it
> should be easy to profile this...). Even if there is an efficiency
> issue, you could easily solve it by implementing a range wrapper or an
> opApply method to make foreach faster. 

foreach isn't the problem. strings don't even use the range API for foreach. 
It's operating on them outside of foreach that's the problem.

- Jonathan M Davis
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home