March 08, 2014
On 03/07/2014 02:37 AM, H. S. Teoh wrote:
> Unfortunately, input ranges are somewhat tedious to write -- nice
> foreach loops have to be broken up into .empty, .front, .popFront, which
> is a lot of boilerplate code and not so nice in inner loops (though I
> suppose the idea is to improve compiler inlining to handle these cases).

I think that the separation of front and popFront causes much of this.
March 08, 2014
On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:
> On 03/07/2014 02:37 AM, H. S. Teoh wrote:
> >Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases).
> 
> I think that the separation of front and popFront causes much of this.

I'm on the fence about that one. On the one hand, some ranges that do complex calculations in .front are better off with .popAndReturnFront. But on the other hand, there are also algorithms for which you don't want to advance the range just to pick off its front element. Since the separated semantics can't be achieved with a non-separated range API, I lean slightly towards favoring the current split approach.

Having a way to auto-generate input range boilerplate, though, would be really, *really* nice. Coroutine-style code would be ideal.


T

-- 
Bomb technician: If I'm running, try to keep up.
March 08, 2014
On 3/7/2014 5:09 PM, H. S. Teoh wrote:
> Having a way to auto-generate input range boilerplate, though, would be
> really, *really* nice. Coroutine-style code would be ideal.

I usually start writing an InputRange by pasting in a boilerplate version, then modifying it.

March 08, 2014
On Saturday, 8 March 2014 at 01:10:38 UTC, H. S. Teoh wrote:
> Having a way to auto-generate input range boilerplate, though, would be really, *really* nice.

Eh, I don't think it is a big deal and would be fairly limited compared to the current setup. If you use a fiber or state variable or something for the yield this yield that trick, how do you go backward? Random access?

I think the best the yield stuff can do is maybe forward range and maybe infinite (probably with an annotation of some sort, since otherwise, the infiniteness wouldn't be obvious at compile time).


So the best we're looking to automate is input or perhaps forward ranges. And how hard are these really to write?

yield query(string q) {
   auto result = c_query(toStringz(q));
   while(!HasRow(result))
      yield GetNextRow(result);
}

OK, that is kinda nice, but, is the status quo so bad? (BTW the reason I went with some kind of C database api here is everything else I could think of are actually pretty short when using std.algorithm functions to help define them.)

struct query {
    private Result result;
    this(string q) {
         result = c_query(toStringz(q));
         if(!empty) popFront();
    }

    Row front;
    @property bool empty() { return HasRow(result); }
    void popFront() in { assert(!empty); } body {
         front = GetNextRow(result);
    }
}


It is certainly a bit longer, but it isn't that bad, and is easily extended to other range capabilities.


Translating recursive iteration to a range does take a bit more, you need to track your local variables and put them in a stack of your own, but even that isn't too hard (though a bit wordier).


I guess the whole yield thing can be kinda nice, I'm just not sure it is that big of a win given its other limitations compared to full ranges.
March 08, 2014
Adam D. Ruppe:

> struct query {
>     private Result result;
>     this(string q) {
>          result = c_query(toStringz(q));
>          if(!empty) popFront();
>     }
>
>     Row front;
>     @property bool empty() { return HasRow(result); }
>     void popFront() in { assert(!empty); } body {
>          front = GetNextRow(result);
>     }
> }
>
>
> It is certainly a bit longer, but it isn't that bad, and is easily extended to other range capabilities.

Your code is badly formatted.


> Translating recursive iteration to a range does take a bit more, you need to track your local variables and put them in a stack of your own, but even that isn't too hard (though a bit wordier).

In generally this is rather bad, unless you are a C programmer used to use tweezers and needles to implement your own hash set every other day :-(


> I guess the whole yield thing can be kinda nice, I'm just not sure it is that big of a win given its other limitations compared to full ranges.

yield is limited, but in a large amount of cases it's enough, especially if it's well implemented (now Python 3 has yield that is usable for coroutines too, and it has recently added another optimization). For the other cases you can use normal range protocols as you have done.

Bye,
bearophile
March 08, 2014
On Saturday, 8 March 2014 at 02:15:19 UTC, Adam D. Ruppe wrote:
> I guess the whole yield thing can be kinda nice, I'm just not sure it is that big of a win given its other limitations compared to full ranges.

I slapped this together to see if we can do it with a mixin template:

http://arsdnet.net/dcode/next.d

struct ShortRange {
	int next() {
		switch(callNumber) {
			case 0: return 50;
			case 1: return 60;
			case 2: return 70;
			case 3: return finished();
			default: assert(0);
		}
	}

	mixin MakeShortRange!();
}

Wordier than
yield ShortRange() { yield 50; yield 60; yield 70; }

but the same idea. (In fact, i'm sure the whole yield thing should pretty much just rewrite into something like this anyway).

My hypothetical from the previous post would be written:


struct ShortRange {
        Result result;
        this(string s) {
            result = c_query(s);
        }
	Row next() {
             if(!HasRow(result))
                 return finished();
             return GetNextRow(result);
	}

	mixin MakeShortRange!();
}

Which is a bit shorter and perhaps nicer to use than the long form range.
March 08, 2014
Adam D. Ruppe:

> Wordier than
> yield ShortRange() { yield 50; yield 60; yield 70; }
>
> but the same idea.

In general a typed syntax could be:

yield(int) ShortRange() { yield 50; yield 60; yield 70; }

Or even:

yield(auto) ShortRange() { yield 50; yield 60; yield 70; }

So yield(auto) could be sugar for yield(auto):

yield ShortRange() { yield 50; yield 60; yield 70; }

Bye,
bearophile
March 08, 2014
On Sat, Mar 08, 2014 at 03:27:15AM +0000, bearophile wrote:
> Adam D. Ruppe:
> 
> >Wordier than
> >yield ShortRange() { yield 50; yield 60; yield 70; }
> >
> >but the same idea.
> 
> In general a typed syntax could be:
> 
> yield(int) ShortRange() { yield 50; yield 60; yield 70; }
> 
> Or even:
> 
> yield(auto) ShortRange() { yield 50; yield 60; yield 70; }
> 
> So yield(auto) could be sugar for yield(auto):
> 
> yield ShortRange() { yield 50; yield 60; yield 70; }
[...]

Yield syntax is generally not a big deal with simple ranges, but once you start needing recursion and branching logic, things become significantly hairier without yield. For example:

	yield ComplicatedRange(int arg) {
		if (arg == 0) {
			foreach (i; 0 .. 10)
				yield i;
		} else if (arg >= 1 && arg <= 5) {
			yield 10;
			yield 20;
			yield 30;
		} else if (arg >= 6) {
			foreach (i; 10 .. 20) {
				if (i >= 5 && i < 9) {
					yield 40;
					yield 50;
				} else {
					yield i*2;
				}
			}
		}
	}

With yield, the branching logic is clear and relatively easy to follow. Without yield, this will turn into a nasty mess of nested state variables, due to the different numbers of yielded items within nested conditional blocks.

(Of course, you could argue that if things get this complicated, one should be splitting it up into multiple composed ranges instead, but not everything is readily decomposible.)


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.
March 08, 2014
On 03/08/2014 03:15 AM, Adam D. Ruppe wrote:
>
> So the best we're looking to automate is input or perhaps forward
> ranges. And how hard are these really to write?
>
> yield query(string q) {
>     auto result = c_query(toStringz(q));
>     while(!HasRow(result))
>        yield GetNextRow(result);
> }
>
> OK, that is kinda nice, but, is the status quo so bad?
> (BTW the reason I
> went with some kind of C database api here is everything else I could
> think of are actually pretty short when using std.algorithm functions to
> help define them.)
>
> struct query {
>      private Result result;
>      this(string q) {
>           result = c_query(toStringz(q));
>           if(!empty) popFront();
>      }
>
>      Row front;
>      @property bool empty() { return HasRow(result); }
>      void popFront() in { assert(!empty); } body {
>           front = GetNextRow(result);
>      }
> }
>
>
> It is certainly a bit longer, but it isn't that bad, and is easily
> extended to other range capabilities.

This does not do the same thing. It computes the first row even if it is never requested. std.algorithm.filter suffers from the same problem. This is part of the reason why I think the separation of front and popFront is suboptimal design.
March 08, 2014
On 03/08/2014 02:09 AM, H. S. Teoh wrote:
> On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:
>> On 03/07/2014 02:37 AM, H. S. Teoh wrote:
>>> Unfortunately, input ranges are somewhat tedious to write -- nice
>>> foreach loops have to be broken up into .empty, .front, .popFront,
>>> which is a lot of boilerplate code and not so nice in inner loops
>>> (though I suppose the idea is to improve compiler inlining to handle
>>> these cases).
>>
>> I think that the separation of front and popFront causes much of this.
>
> I'm on the fence about that one. On the one hand, some ranges that do
> complex calculations in .front are better off with .popAndReturnFront.
> But on the other hand, there are also algorithms for which you don't
> want to advance the range just to pick off its front element. Since the
> separated semantics can't be achieved with a non-separated range API, I
> lean slightly towards favoring the current split approach.
> ...

I agree that there are trade-offs involved.

> Having a way to auto-generate input range boilerplate, though, would be
> really, *really* nice. Coroutine-style code would be ideal.
> ...

The drawback is that without the compiler being really clever, ranges thus defined will be just input ranges.