Jump to page: 1 25  
Page
Thread overview
The rfind challenge
Jan 16, 2013
Mehrdad
Jan 15, 2013
H. S. Teoh
Jan 15, 2013
H. S. Teoh
Jan 16, 2013
deadalnix
Jan 15, 2013
monarch_dodra
Jan 15, 2013
monarch_dodra
Jan 15, 2013
monarch_dodra
Jan 15, 2013
FG
Jan 15, 2013
Timon Gehr
Jan 15, 2013
Timon Gehr
Jan 15, 2013
FG
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
FG
Jan 15, 2013
monarch_dodra
Jan 15, 2013
FG
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
H. S. Teoh
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
Phil Lavoie
Jan 15, 2013
H. S. Teoh
Jan 15, 2013
monarch_dodra
Jan 15, 2013
monarch_dodra
Jan 16, 2013
deadalnix
Jan 15, 2013
FG
January 15, 2013
(Warning: this assumes a fair bit of background with ranges and the STL.)

We've had a long-standing weakness of ranges when compared with STL iterators: in a find() call we get to access the balance of the range from the found position (if any) through the end of the initial range. In contrast, STL's find returns an iterator that can be combined with the beginning of the initial range or the end of the initial range, thus offering two ranges instead of one.

D offers the findSplit family which partially deals with this, but in an arguably imperfect way because the range from the beginning of the initial range to the found element has a different type than the initial range.

In spite of that, things have worked out well.

Nowadays I've been thinking (inspired by an exchange on a C=+ mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real challenge for D's ranges.

In D, r.rfind(e) should return a range starting with the last occurrence of e in r and spanning through the rest of r.

If r is a forward range but not better, this is rather simple:

R rfind(R, E)(R range, E element)
{
    for (;;)
    {
        auto ahead = range.save.find(element);
        if (ahead.empty) return range;
        range = ahead;
    }
}

If the range is random-access, the algorithm is easy to implement efficiently by scanning from the end and using indexing.

The tricky case is with bidirectional range. I was unable to define an algorithm using the current primitives that: (a) scans the range from the end, and (b) returns a range from the found position through the end.

Looks like rfind() would need one extra primitive for bidirectional ranges. What would be a good one? I have a few starters, but don't want to bias anyone. Please chime in!


Thanks,

Andrei
January 15, 2013
On 1/15/13 12:51 AM, Andrei Alexandrescu wrote:
> Nowadays I've been thinking (inspired by an exchange on a C=+ mailing

s/C=+/C++/

Andrei
January 15, 2013
On Tue, Jan 15, 2013 at 12:51:16AM -0500, Andrei Alexandrescu wrote: [...]
> Nowadays I've been thinking (inspired by an exchange on a C=+
> mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real
> challenge for D's ranges.
> 
> In D, r.rfind(e) should return a range starting with the last occurrence of e in r and spanning through the rest of r.
> 
> If r is a forward range but not better, this is rather simple:
[...]
> If the range is random-access, the algorithm is easy to implement efficiently by scanning from the end and using indexing.
> 
> The tricky case is with bidirectional range. I was unable to define an algorithm using the current primitives that: (a) scans the range from the end, and (b) returns a range from the found position through the end.

Hmm. What about introducing a new primitive takeUntil, that returns the initial segment of the range until a given predicate becomes true? Then you could implement rfind thus:

	auto rfind(alias pred, R)(R range)
		if (isBidirectionalRange!R)
	{
		return range.retro()
			.takeUntil!pred()
			.retro();
	}


T

-- 
Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?
January 15, 2013
On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu wrote:
> (Warning: this assumes a fair bit of background with ranges and the STL.)
>
> We've had a long-standing weakness of ranges when compared with STL iterators: in a find() call we get to access the balance of the range from the found position (if any) through the end of the initial range. In contrast, STL's find returns an iterator that can be combined with the beginning of the initial range or the end of the initial range, thus offering two ranges instead of one.
>
> D offers the findSplit family which partially deals with this, but in an arguably imperfect way because the range from the beginning of the initial range to the found element has a different type than the initial range.
>
> In spite of that, things have worked out well.
>
> Nowadays I've been thinking (inspired by an exchange on a C=+ mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real challenge for D's ranges.
>
> In D, r.rfind(e) should return a range starting with the last occurrence of e in r and spanning through the rest of r.
>
> If r is a forward range but not better, this is rather simple:
>
> R rfind(R, E)(R range, E element)
> {
>     for (;;)
>     {
>         auto ahead = range.save.find(element);
>         if (ahead.empty) return range;
>         range = ahead;
>     }
> }

I'd hardly call that an ideal solution >:( The point of rfind is to not start the search from the start of the string. What if your input was:
//----
string mobyDick = ...:
auto r = rfind(mobyDick, ishmael); This may take a while...
//----

That, and it would fail with rfind("A_bc_bc_bc_A", "bc_bc"): It would find "bc_bc_bc_A" instead of "bc_bc_A".

The root cause is that a bidirectional range just can't be "cut"/"sliced" at a given point, whereas iterators allow this.

As much as I love ranges, I think this is a fundamental flaw for anything less that hasSlicing.

So far, the problem has stayed under the radar, thanks (because?) of take, but the problems it brings are clear:
-Bias towards front iteration.
-Type system warping.
-Loss of bidirectionality when taken.

Algorithms suffer because of this. Containers suffer because of this.

I don't have any real solution to offer (there'd be a real complexity cost to any proposal), but I think there *needs* to be a way to provide "inter" slicing of ranges. Something along the lines of:
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = cutBefore(a, b); //Gets the part of a that is before b
auto r = cutAfter(a, b); //Gets the part of a that is after b
static assert(typeof(a) == typeof(l));
static assert(typeof(a) == typeof(r));
assert(equal(l), [1, 2]);
assert(equal(r), [5, 6]);
//----

The advantage here is that:
1) consistent typing.
2) no direction bias.

If we had this, then find could just return the slice containing the found element, and user could work his way from there.

Problem is bloating the interface ... That, and we'd also need the versions that keep the cut element...
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = mergeBefore(a, b); //Gets the part of a that is before b, with b
auto r = mergeAfter(a, b); //Gets the part of a that is after b, with b
assert(equal(l), [1, 2, 3, 4]);
assert(equal(r), [3, 4, 5, 6]);
//----

It is a complicated problem, but one that is worth solving, IMO.
January 15, 2013
On 2013-01-15 06:51, Andrei Alexandrescu wrote:
> If r is a forward range but not better, this is rather simple:
>
> R rfind(R, E)(R range, E element)
> {
>      for (;;)
>      {
>          auto ahead = range.save.find(element);
>          if (ahead.empty) return range;
>          range = ahead;
>      }
> }

That example creates an infinite loop. Needs a popFront:

    R rfind(R, E)(R range, E element)
    {
        if (range.empty) return range;
        for (;;)
        {
            auto ahead = range.save;
            ahead.popFront();
            ahead = ahead.find(element);
            if (ahead.empty) return range;
            range = ahead;
        }
    }

Another thing: if the element is not found, I think an empty range should be returned and not the whole range.
January 15, 2013
On 1/15/13 1:31 AM, H. S. Teoh wrote:
> Hmm. What about introducing a new primitive takeUntil, that returns the
> initial segment of the range until a given predicate becomes true? Then
> you could implement rfind thus:
>
> 	auto rfind(alias pred, R)(R range)
> 		if (isBidirectionalRange!R)
> 	{
> 		return range.retro()
> 			.takeUntil!pred()
> 			.retro();
> 	}

That works, but returns a different type. Ideally rfind would return the same type as the initial range, R.

Andrei
January 15, 2013
On 1/15/13 2:20 AM, monarch_dodra wrote:
> On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu wrote:
>> R rfind(R, E)(R range, E element)
>> {
>> for (;;)
>> {
>> auto ahead = range.save.find(element);
>> if (ahead.empty) return range;
>> range = ahead;
>> }
>> }
>
> I'd hardly call that an ideal solution >:( The point of rfind is to not
> start the search from the start of the string.

I may reply to the rest, but let me destroy this quickly: this is the implementation for a forward range that's not bidirectional. The challenge is indeed implementing rfind for bidirectional ranges that are not random (e.g. strings).


Andrei
January 15, 2013
On 1/15/13 6:53 AM, FG wrote:
> On 2013-01-15 06:51, Andrei Alexandrescu wrote:
>> If r is a forward range but not better, this is rather simple:
>>
>> R rfind(R, E)(R range, E element)
>> {
>> for (;;)
>> {
>> auto ahead = range.save.find(element);
>> if (ahead.empty) return range;
>> range = ahead;
>> }
>> }
>
> That example creates an infinite loop. Needs a popFront:
>
> R rfind(R, E)(R range, E element)
> {
> if (range.empty) return range;
> for (;;)
> {
> auto ahead = range.save;
> ahead.popFront();
> ahead = ahead.find(element);
> if (ahead.empty) return range;
> range = ahead;
> }
> }
>
> Another thing: if the element is not found, I think an empty range
> should be returned and not the whole range.

Thanks for the fixes!

Andrei
January 15, 2013
On Tuesday, 15 January 2013 at 12:56:43 UTC, Andrei Alexandrescu wrote:
> On 1/15/13 2:20 AM, monarch_dodra wrote:
>> On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu wrote:
>>> R rfind(R, E)(R range, E element)
>>> {
>>> for (;;)
>>> {
>>> auto ahead = range.save.find(element);
>>> if (ahead.empty) return range;
>>> range = ahead;
>>> }
>>> }
>>
>> I'd hardly call that an ideal solution >:( The point of rfind is to not
>> start the search from the start of the string.
>
> I may reply to the rest, but let me destroy this quickly: this is the implementation for a forward range that's not bidirectional.

I'd argue against providing rfind at all if R is not bidirectional. But that's not the thread's main issue, so let's not focus on this.

> The challenge is indeed implementing rfind for bidirectional ranges that are not random (e.g. strings).
>
> Andrei

Agreed.

On a side note, it would be very easy if we agreed to return the subrange starting at the beginning of range, and ending after the last element.

But that's really just avoiding the problem.
January 15, 2013
On 01/15/2013 12:53 PM, FG wrote:
> On 2013-01-15 06:51, Andrei Alexandrescu wrote:
>> If r is a forward range but not better, this is rather simple:
>>
>> R rfind(R, E)(R range, E element)
>> {
>>      for (;;)
>>      {
>>          auto ahead = range.save.find(element);
>>          if (ahead.empty) return range;
>>          range = ahead;
>>      }
>> }
>
> That example creates an infinite loop. Needs a popFront:
>
>      R rfind(R, E)(R range, E element)
>      {
>          if (range.empty) return range;
>          for (;;)
>          {
>              auto ahead = range.save;
>              ahead.popFront();
>              ahead = ahead.find(element);
>              if (ahead.empty) return range;
>              range = ahead;
>          }
>      }
>

That is buggy too.

> Another thing: if the element is not found, I think an empty range
> should be returned and not the whole range.

Yup.

R rfind(R, E)(R range,E element){
    auto r=range.find(element);
    if(r.empty) return r;
    for(;;){
        auto ahead=r.save;
        ahead.popFront();
        ahead=ahead.find(element);
        if(ahead.empty) return r;
        r=ahead;
    }
}


Full proof (assuming purity of the range primitives):

/+pure+/ R tail(R)(R r)out(result){
    assert(result=={
        auto a=r;
        a.popFront();
        return a;
    }());
}body{
    auto a=r;
    a.popFront();
    return a;
}

/+pure+/ R rfind(R, E)(R range,E element)out(result){
	assert(range.find(element).empty&&result.empty||!range.find(element).empty&&!result.empty&&result.front==element&&result.tail.find(element).empty);
}body{
    auto r=range.find(element);
    assert(r==range.find(element)&&(r.empty||r.front==element));
    if(r.empty){
        assert(r==range.find(element)&&r.empty);
        return r;
        /+assert(result==range.find(element)&&result.empty);
        assert(range.find(element).empty&&result.empty);+/
    }
    assert(r==range.find(element)&&!r.empty&&r.front==element);
    assert(!range.find(element).empty&&!r.empty&&r.front==element);
    for(;;){

assert(true&&!range.find(element).empty&&!r.empty&&r.front==element);

assert(!range.find(element).empty&&!r.empty&&r.front==element&&!r.empty);
        auto ahead=r.save;

assert(!range.find(element).empty&&!r.empty&&r.front==element&&!ahead.empty&&ahead==r.save);
        ahead.popFront();

assert(!range.find(element).empty&&!r.empty&&r.front==element&&ahead=={
            auto ahead=r.save;
            ahead.popFront();
            return ahead;
        }());

assert(!range.find(element).empty&&!r.empty&&r.front==element&&ahead==r.tail);
        ahead=ahead.find(element);

assert(!range.find(element).empty&&!r.empty&&r.front==element&&(ahead.empty||ahead.front==element)&&ahead==r.tail.find(element));
        if(ahead.empty){

assert(!range.find(element).empty&&ahead.empty&&!r.empty&&r.front==element&&ahead==r.tail.find(element));
            return r;

/+assert(!range.find(element).empty&&ahead.empty&&!result.empty&&result.front==element&&ahead==result.tail.find(element));

assert(!range.find(element).empty&&!result.empty&&result.front==element&&result.tail.find(element).empty);+/
        }

assert(!range.find(element).empty&&!ahead.empty&&ahead.front==element);
        r=ahead;
        assert(!range.find(element).empty&&!r.empty&&r.front==element);
    }
    assert(false);
}



« First   ‹ Prev
1 2 3 4 5