Lots of low hanging fruit in Phobos (page 4)

March 08, 2014

Re: Lots of low hanging fruit in Phobos

Posted by Nick Sabalausky
in reply to Adam D. Ruppe

Permalink

Nick Sabalausky

Posted in reply to Adam D. Ruppe

Permalink

On 3/7/2014 9:15 PM, Adam D. Ruppe wrote:
> On Saturday, 8 March 2014 at 01:10:38 UTC, H. S. Teoh wrote:
>> Having a way to auto-generate input range boilerplate, though, would
>> be really, *really* nice.
>
> Eh, I don't think it is a big deal and would be fairly limited compared
> to the current setup. If you use a fiber or state variable or something
> for the yield this yield that trick, how do you go backward? Random access?
>

Oh it would definitely be for nothing more advanced than a forward range. It's a coroutine, ie a generator function, so the elements are determined by execution flow. Since execution can't go in reverse or random-access, anything beyond forward range is ruled out. But you knew that :)

Forward range should be possible though (at least for pure-ish coroutines anyway), since all you'd have to do is clone the coroutine's internal state structure (which it must already have anyway, since you can't have coroutines without it).

While it couldn't be used for bidirectional or random-access ranges, it would still be a big help for generators - ie the whole point of coroutines in the first place. Other languages have coroutines - we'd have coroutines that can be used as ranges :)

> I think the best the yield stuff can do is maybe forward range and maybe
> infinite (probably with an annotation of some sort, since otherwise, the
> infiniteness wouldn't be obvious at compile time).
>
>
> So the best we're looking to automate is input or perhaps forward
> ranges. And how hard are these really to write?
>
> yield query(string q) {
>     auto result = c_query(toStringz(q));
>     while(!HasRow(result))
>        yield GetNextRow(result);
> }
>
> OK, that is kinda nice, but, is the status quo so bad? (BTW the reason I
> went with some kind of C database api here is everything else I could
> think of are actually pretty short when using std.algorithm functions to
> help define them.)
>
> struct query {
>      private Result result;
>      this(string q) {
>           result = c_query(toStringz(q));
>           if(!empty) popFront();
>      }
>
>      Row front;
>      @property bool empty() { return HasRow(result); }
>      void popFront() in { assert(!empty); } body {
>           front = GetNextRow(result);
>      }
> }
>
>
> It is certainly a bit longer, but it isn't that bad, and is easily
> extended to other range capabilities.
>
>
> Translating recursive iteration to a range does take a bit more, you
> need to track your local variables and put them in a stack of your own,
> but even that isn't too hard (though a bit wordier).
>
>
> I guess the whole yield thing can be kinda nice, I'm just not sure it is
> that big of a win given its other limitations compared to full ranges.

I dunno, what you wrote sounds to me like a pretty convincing argument in favor of coroutine ranges. ;)

Keep in mind, making *all* ranges easier to write isn't the charter here, and it doesn't need to be: Bidirectional and random-access ranges ARE more complicated than generators, so I think they're already just about as easy to write as we *can* make them.

Instead, the problem we're looking at solving here is exactly what you've described above: Making generators in D is more complicated and boilerplate-y than it really needs to be (unless you want to give up on ranges and use opApply - which still isn't particularly great without that mixin you suggested a couple years ago to cover up the return value ugliness).

Generators may be a subset of ranges, but they're an important subset. Unfortunately, languages like C# make us look bad when it comes to generators.

What really itches at me is that we're so tantalizingly close with opApply. The only real problem with it (aside from the return value system being kinda awkward compared to a "yield" statement) is that it can't be used as a range. And it can't be converted to a range without using Phobos fibers which imposes too much of an overhead to be a one-size-fits-all technique for generators.

> (now Python 3 has yield that is usable for coroutines too, and it has recently added another optimization). The optimization for Python yield is present in F# too: http://theburningmonk.com/2011/09/fsharp-yield-vs-yield/ And I have another paper about F# to show in this thread. Bye, bearophile

> And I have another paper about F# to show in this thread. I have not yet found the paper, but I think this was the topic: http://blogs.msdn.com/b/dsyme/archive/2007/09/22/some-details-on-f-computation-expressions-aka-monadic-or-workflow-syntax.aspx Bye, bearophile

> I have not yet found the paper, Here it is: http://tomasp.net/academic/papers/computation-zoo/ More on the same: http://en.wikibooks.org/wiki/F_Sharp_Programming/Computation_Expressions http://msdn.microsoft.com/en-us/library/dd233182.aspx Bye, bearophile

On 3/8/2014 5:56 AM, Nick Sabalausky wrote: > > What really itches at me is that we're so tantalizingly close with > opApply. The only real problem with it (aside from the return value > system being kinda awkward compared to a "yield" statement) is that it > can't be used as a range. And it can't be converted to a range without > using Phobos fibers which imposes too much of an overhead to be a > one-size-fits-all technique for generators. > The troublesome thing about that is anyone who writes a generator in D is forced to choose between [mostly] straightforward implementation logic (ie opApply) *or* range compatibility. I think we can do better than to force an ugly choice like that.

On Sat, Mar 08, 2014 at 10:38:50AM +0100, Timon Gehr wrote: > On 03/08/2014 02:09 AM, H. S. Teoh wrote: > >On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote: > >>On 03/07/2014 02:37 AM, H. S. Teoh wrote: > >>>Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases). > >> > >>I think that the separation of front and popFront causes much of this. > > > >I'm on the fence about that one. On the one hand, some ranges that do complex calculations in .front are better off with .popAndReturnFront. But on the other hand, there are also algorithms for which you don't want to advance the range just to pick off its front element. Since the separated semantics can't be achieved with a non-separated range API, I lean slightly towards favoring the current split approach. ... > > I agree that there are trade-offs involved. > > >Having a way to auto-generate input range boilerplate, though, would > >be really, *really* nice. Coroutine-style code would be ideal. > >... > > The drawback is that without the compiler being really clever, ranges thus defined will be just input ranges. We can achieve at least forward ranges with coroutines, if they're pure. T -- Your inconsistency is the only consistent thing about you! -- KD

W dniu 2014-03-08 02:09, H. S. Teoh pisze: > Having a way to auto-generate input range boilerplate, though, would be > really, *really* nice. Coroutine-style code would be ideal. https://github.com/pszturmaj/dgenerators

On 3/8/2014 5:58 PM, Piotr Szturmaj wrote: > W dniu 2014-03-08 02:09, H. S. Teoh pisze: >> Having a way to auto-generate input range boilerplate, though, would be >> really, *really* nice. Coroutine-style code would be ideal. > > https://github.com/pszturmaj/dgenerators Yea, there is that approach, which is nice in certain ways. I did the same kinda thing a couple years ago[1], but your API appears much, much nicer. The unfortunate downside, though, is that as jerro demonstrated[2] there's a high overhead when using fibers for iteration. It likely wouldn't be an issue for some things, like IO, but for other things it could be a problem. So what we end up with is an unfortunate three-way choice anytime someone needs a generator in D: 1. Give up on range compatibility (opApply) 2. Give up on straightforward implementation (input/forward range) 3. Give up on maximum performance (fiber-based coroutine range) D's just dancing all around the ideal solution for generators without ever actually hitting the mark. Which I find frustrating because I know that's a target D's capable of nailing. [1] http://semitwist.com/articles/article/view/combine-coroutines-and-input-ranges-for-dead-simple-d-iteration [2] http://forum.dlang.org/thread/jno6o5$qtb$1@digitalmars.com

On Saturday, 8 March 2014 at 23:23:21 UTC, Nick Sabalausky wrote: > 3. Give up on maximum performance (fiber-based coroutine range) I think that's what I would go for. yield with co-routines could be a quick and dirty way to create an InputRange at some performance cost. Then writing the InputRange by hand would then be an optimisation. I think I tend to write things in such a way where I have room to optimise in such a way, but I almost always reach for the easier to write, less performant option first. (It's a bonus when the easier way is also the more performant way.)

w0rp: >> 3. Give up on maximum performance (fiber-based coroutine range) > > I think that's what I would go for. Yet there's no need for that. You can have your pie and eat it too, with a small cost. D can rewrite code that contains a yield in an efficient finite state machine (this is what ShedSkin compiler for Python does, and perhaps the C# compiler does the same). Bye, bearophile

Forums