View mode: basic / threaded / horizontal-split · Log in · Help
January 15, 2013
The rfind challenge
(Warning: this assumes a fair bit of background with ranges and the STL.)

We've had a long-standing weakness of ranges when compared with STL 
iterators: in a find() call we get to access the balance of the range 
from the found position (if any) through the end of the initial range. 
In contrast, STL's find returns an iterator that can be combined with 
the beginning of the initial range or the end of the initial range, thus 
offering two ranges instead of one.

D offers the findSplit family which partially deals with this, but in an 
arguably imperfect way because the range from the beginning of the 
initial range to the found element has a different type than the initial 
range.

In spite of that, things have worked out well.

Nowadays I've been thinking (inspired by an exchange on a C=+ mailing 
list) that STL's rfind (http://goo.gl/Btg8s) is a real challenge for D's 
ranges.

In D, r.rfind(e) should return a range starting with the last occurrence 
of e in r and spanning through the rest of r.

If r is a forward range but not better, this is rather simple:

R rfind(R, E)(R range, E element)
{
    for (;;)
    {
        auto ahead = range.save.find(element);
        if (ahead.empty) return range;
        range = ahead;
    }
}

If the range is random-access, the algorithm is easy to implement 
efficiently by scanning from the end and using indexing.

The tricky case is with bidirectional range. I was unable to define an 
algorithm using the current primitives that: (a) scans the range from 
the end, and (b) returns a range from the found position through the end.

Looks like rfind() would need one extra primitive for bidirectional 
ranges. What would be a good one? I have a few starters, but don't want 
to bias anyone. Please chime in!


Thanks,

Andrei
January 15, 2013
Re: The rfind challenge
On 1/15/13 12:51 AM, Andrei Alexandrescu wrote:
> Nowadays I've been thinking (inspired by an exchange on a C=+ mailing

s/C=+/C++/

Andrei
January 15, 2013
Re: The rfind challenge
On Tue, Jan 15, 2013 at 12:51:16AM -0500, Andrei Alexandrescu wrote:
[...]
> Nowadays I've been thinking (inspired by an exchange on a C=+
> mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real
> challenge for D's ranges.
> 
> In D, r.rfind(e) should return a range starting with the last
> occurrence of e in r and spanning through the rest of r.
> 
> If r is a forward range but not better, this is rather simple:
[...]
> If the range is random-access, the algorithm is easy to implement
> efficiently by scanning from the end and using indexing.
> 
> The tricky case is with bidirectional range. I was unable to define
> an algorithm using the current primitives that: (a) scans the range
> from the end, and (b) returns a range from the found position
> through the end.

Hmm. What about introducing a new primitive takeUntil, that returns the
initial segment of the range until a given predicate becomes true? Then
you could implement rfind thus:

	auto rfind(alias pred, R)(R range)
		if (isBidirectionalRange!R)
	{
		return range.retro()
			.takeUntil!pred()
			.retro();
	}


T

-- 
Answer: Because it breaks the logical sequence of discussion. /
Question: Why is top posting bad?
January 15, 2013
Re: The rfind challenge
On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu 
wrote:
> (Warning: this assumes a fair bit of background with ranges and 
> the STL.)
>
> We've had a long-standing weakness of ranges when compared with 
> STL iterators: in a find() call we get to access the balance of 
> the range from the found position (if any) through the end of 
> the initial range. In contrast, STL's find returns an iterator 
> that can be combined with the beginning of the initial range or 
> the end of the initial range, thus offering two ranges instead 
> of one.
>
> D offers the findSplit family which partially deals with this, 
> but in an arguably imperfect way because the range from the 
> beginning of the initial range to the found element has a 
> different type than the initial range.
>
> In spite of that, things have worked out well.
>
> Nowadays I've been thinking (inspired by an exchange on a C=+ 
> mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real 
> challenge for D's ranges.
>
> In D, r.rfind(e) should return a range starting with the last 
> occurrence of e in r and spanning through the rest of r.
>
> If r is a forward range but not better, this is rather simple:
>
> R rfind(R, E)(R range, E element)
> {
>     for (;;)
>     {
>         auto ahead = range.save.find(element);
>         if (ahead.empty) return range;
>         range = ahead;
>     }
> }

I'd hardly call that an ideal solution >:( The point of rfind is 
to not start the search from the start of the string. What if 
your input was:
//----
string mobyDick = ...:
auto r = rfind(mobyDick, ishmael); This may take a while...
//----

That, and it would fail with rfind("A_bc_bc_bc_A", "bc_bc"): It 
would find "bc_bc_bc_A" instead of "bc_bc_A".

The root cause is that a bidirectional range just can't be 
"cut"/"sliced" at a given point, whereas iterators allow this.

As much as I love ranges, I think this is a fundamental flaw for 
anything less that hasSlicing.

So far, the problem has stayed under the radar, thanks (because?) 
of take, but the problems it brings are clear:
-Bias towards front iteration.
-Type system warping.
-Loss of bidirectionality when taken.

Algorithms suffer because of this. Containers suffer because of 
this.

I don't have any real solution to offer (there'd be a real 
complexity cost to any proposal), but I think there *needs* to be 
a way to provide "inter" slicing of ranges. Something along the 
lines of:
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = cutBefore(a, b); //Gets the part of a that is before b
auto r = cutAfter(a, b); //Gets the part of a that is after b
static assert(typeof(a) == typeof(l));
static assert(typeof(a) == typeof(r));
assert(equal(l), [1, 2]);
assert(equal(r), [5, 6]);
//----

The advantage here is that:
1) consistent typing.
2) no direction bias.

If we had this, then find could just return the slice containing 
the found element, and user could work his way from there.

Problem is bloating the interface ... That, and we'd also need 
the versions that keep the cut element...
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = mergeBefore(a, b); //Gets the part of a that is before 
b, with b
auto r = mergeAfter(a, b); //Gets the part of a that is after b, 
with b
assert(equal(l), [1, 2, 3, 4]);
assert(equal(r), [3, 4, 5, 6]);
//----

It is a complicated problem, but one that is worth solving, IMO.
January 15, 2013
Re: The rfind challenge
On 2013-01-15 06:51, Andrei Alexandrescu wrote:
> If r is a forward range but not better, this is rather simple:
>
> R rfind(R, E)(R range, E element)
> {
>      for (;;)
>      {
>          auto ahead = range.save.find(element);
>          if (ahead.empty) return range;
>          range = ahead;
>      }
> }

That example creates an infinite loop. Needs a popFront:

    R rfind(R, E)(R range, E element)
    {
        if (range.empty) return range;
        for (;;)
        {
            auto ahead = range.save;
            ahead.popFront();
            ahead = ahead.find(element);
            if (ahead.empty) return range;
            range = ahead;
        }
    }

Another thing: if the element is not found, I think an empty range should be 
returned and not the whole range.
January 15, 2013
Re: The rfind challenge
On 1/15/13 1:31 AM, H. S. Teoh wrote:
> Hmm. What about introducing a new primitive takeUntil, that returns the
> initial segment of the range until a given predicate becomes true? Then
> you could implement rfind thus:
>
> 	auto rfind(alias pred, R)(R range)
> 		if (isBidirectionalRange!R)
> 	{
> 		return range.retro()
> 			.takeUntil!pred()
> 			.retro();
> 	}

That works, but returns a different type. Ideally rfind would return the 
same type as the initial range, R.

Andrei
January 15, 2013
Re: The rfind challenge
On 1/15/13 2:20 AM, monarch_dodra wrote:
> On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu wrote:
>> R rfind(R, E)(R range, E element)
>> {
>> for (;;)
>> {
>> auto ahead = range.save.find(element);
>> if (ahead.empty) return range;
>> range = ahead;
>> }
>> }
>
> I'd hardly call that an ideal solution >:( The point of rfind is to not
> start the search from the start of the string.

I may reply to the rest, but let me destroy this quickly: this is the 
implementation for a forward range that's not bidirectional. The 
challenge is indeed implementing rfind for bidirectional ranges that are 
not random (e.g. strings).


Andrei
January 15, 2013
Re: The rfind challenge
On 1/15/13 6:53 AM, FG wrote:
> On 2013-01-15 06:51, Andrei Alexandrescu wrote:
>> If r is a forward range but not better, this is rather simple:
>>
>> R rfind(R, E)(R range, E element)
>> {
>> for (;;)
>> {
>> auto ahead = range.save.find(element);
>> if (ahead.empty) return range;
>> range = ahead;
>> }
>> }
>
> That example creates an infinite loop. Needs a popFront:
>
> R rfind(R, E)(R range, E element)
> {
> if (range.empty) return range;
> for (;;)
> {
> auto ahead = range.save;
> ahead.popFront();
> ahead = ahead.find(element);
> if (ahead.empty) return range;
> range = ahead;
> }
> }
>
> Another thing: if the element is not found, I think an empty range
> should be returned and not the whole range.

Thanks for the fixes!

Andrei
January 15, 2013
Re: The rfind challenge
On Tuesday, 15 January 2013 at 12:56:43 UTC, Andrei Alexandrescu 
wrote:
> On 1/15/13 2:20 AM, monarch_dodra wrote:
>> On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei 
>> Alexandrescu wrote:
>>> R rfind(R, E)(R range, E element)
>>> {
>>> for (;;)
>>> {
>>> auto ahead = range.save.find(element);
>>> if (ahead.empty) return range;
>>> range = ahead;
>>> }
>>> }
>>
>> I'd hardly call that an ideal solution >:( The point of rfind 
>> is to not
>> start the search from the start of the string.
>
> I may reply to the rest, but let me destroy this quickly: this 
> is the implementation for a forward range that's not 
> bidirectional.

I'd argue against providing rfind at all if R is not 
bidirectional. But that's not the thread's main issue, so let's 
not focus on this.

> The challenge is indeed implementing rfind for bidirectional 
> ranges that are not random (e.g. strings).
>
> Andrei

Agreed.

On a side note, it would be very easy if we agreed to return the 
subrange starting at the beginning of range, and ending after the 
last element.

But that's really just avoiding the problem.
January 15, 2013
Re: The rfind challenge
On 01/15/2013 12:53 PM, FG wrote:
> On 2013-01-15 06:51, Andrei Alexandrescu wrote:
>> If r is a forward range but not better, this is rather simple:
>>
>> R rfind(R, E)(R range, E element)
>> {
>>      for (;;)
>>      {
>>          auto ahead = range.save.find(element);
>>          if (ahead.empty) return range;
>>          range = ahead;
>>      }
>> }
>
> That example creates an infinite loop. Needs a popFront:
>
>      R rfind(R, E)(R range, E element)
>      {
>          if (range.empty) return range;
>          for (;;)
>          {
>              auto ahead = range.save;
>              ahead.popFront();
>              ahead = ahead.find(element);
>              if (ahead.empty) return range;
>              range = ahead;
>          }
>      }
>

That is buggy too.

> Another thing: if the element is not found, I think an empty range
> should be returned and not the whole range.

Yup.

R rfind(R, E)(R range,E element){
    auto r=range.find(element);
    if(r.empty) return r;
    for(;;){
        auto ahead=r.save;
        ahead.popFront();
        ahead=ahead.find(element);
        if(ahead.empty) return r;
        r=ahead;
    }
}


Full proof (assuming purity of the range primitives):

/+pure+/ R tail(R)(R r)out(result){
    assert(result=={
        auto a=r;
        a.popFront();
        return a;
    }());
}body{
    auto a=r;
    a.popFront();
    return a;
}

/+pure+/ R rfind(R, E)(R range,E element)out(result){
	assert(range.find(element).empty&&result.empty||!range.find(element).empty&&!result.empty&&result.front==element&&result.tail.find(element).empty);
}body{
    auto r=range.find(element);
    assert(r==range.find(element)&&(r.empty||r.front==element));
    if(r.empty){
        assert(r==range.find(element)&&r.empty);
        return r;
        /+assert(result==range.find(element)&&result.empty);
        assert(range.find(element).empty&&result.empty);+/
    }
    assert(r==range.find(element)&&!r.empty&&r.front==element);
    assert(!range.find(element).empty&&!r.empty&&r.front==element);
    for(;;){

assert(true&&!range.find(element).empty&&!r.empty&&r.front==element);

assert(!range.find(element).empty&&!r.empty&&r.front==element&&!r.empty);
        auto ahead=r.save;

assert(!range.find(element).empty&&!r.empty&&r.front==element&&!ahead.empty&&ahead==r.save);
        ahead.popFront();

assert(!range.find(element).empty&&!r.empty&&r.front==element&&ahead=={
            auto ahead=r.save;
            ahead.popFront();
            return ahead;
        }());

assert(!range.find(element).empty&&!r.empty&&r.front==element&&ahead==r.tail);
        ahead=ahead.find(element);

assert(!range.find(element).empty&&!r.empty&&r.front==element&&(ahead.empty||ahead.front==element)&&ahead==r.tail.find(element));
        if(ahead.empty){

assert(!range.find(element).empty&&ahead.empty&&!r.empty&&r.front==element&&ahead==r.tail.find(element));
            return r;

/+assert(!range.find(element).empty&&ahead.empty&&!result.empty&&result.front==element&&ahead==result.tail.find(element));

assert(!range.find(element).empty&&!result.empty&&result.front==element&&result.tail.find(element).empty);+/
        }

assert(!range.find(element).empty&&!ahead.empty&&ahead.front==element);
        r=ahead;
        assert(!range.find(element).empty&&!r.empty&&r.front==element);
    }
    assert(false);
}
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home