The rfind challenge - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » The rfind challenge

Thread overview

The rfind challenge
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 Andrei Alexandrescu
Jan 16, 2013 Mehrdad
Jan 15, 2013 H. S. Teoh
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 H. S. Teoh
Jan 16, 2013 deadalnix
Jan 16, 2013 Andrei Alexandrescu
Jan 15, 2013 monarch_dodra
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 monarch_dodra
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 monarch_dodra
Jan 15, 2013 FG
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 Timon Gehr
Jan 15, 2013 Timon Gehr
Jan 15, 2013 FG
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 FG
Jan 15, 2013 monarch_dodra
Jan 15, 2013 FG
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 H. S. Teoh
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Phil Lavoie
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 H. S. Teoh
Jan 15, 2013 Andrei Alexandrescu
Jan 15, 2013 monarch_dodra
Jan 15, 2013 monarch_dodra
Jan 15, 2013 Andrei Alexandrescu
Jan 16, 2013 deadalnix
Jan 15, 2013 FG

January 15, 2013

The rfind challenge

Posted by Andrei Alexandrescu

Andrei Alexandrescu

(Warning: this assumes a fair bit of background with ranges and the STL.)

We've had a long-standing weakness of ranges when compared with STL iterators: in a find() call we get to access the balance of the range from the found position (if any) through the end of the initial range. In contrast, STL's find returns an iterator that can be combined with the beginning of the initial range or the end of the initial range, thus offering two ranges instead of one.

D offers the findSplit family which partially deals with this, but in an arguably imperfect way because the range from the beginning of the initial range to the found element has a different type than the initial range.

In spite of that, things have worked out well.

Nowadays I've been thinking (inspired by an exchange on a C=+ mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real challenge for D's ranges.

In D, r.rfind(e) should return a range starting with the last occurrence of e in r and spanning through the rest of r.

If r is a forward range but not better, this is rather simple:

R rfind(R, E)(R range, E element)
{
    for (;;)
    {
        auto ahead = range.save.find(element);
        if (ahead.empty) return range;
        range = ahead;
    }
}

If the range is random-access, the algorithm is easy to implement efficiently by scanning from the end and using indexing.

The tricky case is with bidirectional range. I was unable to define an algorithm using the current primitives that: (a) scans the range from the end, and (b) returns a range from the found position through the end.

Looks like rfind() would need one extra primitive for bidirectional ranges. What would be a good one? I have a few starters, but don't want to bias anyone. Please chime in!


Thanks,

Andrei

January 15, 2013

Re: The rfind challenge

Posted by Andrei Alexandrescu
in reply to Andrei Alexandrescu

Andrei Alexandrescu

Posted in reply to Andrei Alexandrescu

On 1/15/13 12:51 AM, Andrei Alexandrescu wrote:
> Nowadays I've been thinking (inspired by an exchange on a C=+ mailing

s/C=+/C++/

Andrei

January 15, 2013

Re: The rfind challenge

Posted by H. S. Teoh
in reply to Andrei Alexandrescu

H. S. Teoh

Posted in reply to Andrei Alexandrescu

On Tue, Jan 15, 2013 at 12:51:16AM -0500, Andrei Alexandrescu wrote: [...]
> Nowadays I've been thinking (inspired by an exchange on a C=+
> mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real
> challenge for D's ranges.
> 
> In D, r.rfind(e) should return a range starting with the last occurrence of e in r and spanning through the rest of r.
> 
> If r is a forward range but not better, this is rather simple:
[...]
> If the range is random-access, the algorithm is easy to implement efficiently by scanning from the end and using indexing.
> 
> The tricky case is with bidirectional range. I was unable to define an algorithm using the current primitives that: (a) scans the range from the end, and (b) returns a range from the found position through the end.

Hmm. What about introducing a new primitive takeUntil, that returns the initial segment of the range until a given predicate becomes true? Then you could implement rfind thus:

	auto rfind(alias pred, R)(R range)
		if (isBidirectionalRange!R)
	{
		return range.retro()
			.takeUntil!pred()
			.retro();
	}


T

-- 
Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?

January 15, 2013

Re: The rfind challenge

Posted by monarch_dodra
in reply to Andrei Alexandrescu

monarch_dodra

Posted in reply to Andrei Alexandrescu

On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu wrote:
> (Warning: this assumes a fair bit of background with ranges and the STL.)
>
> We've had a long-standing weakness of ranges when compared with STL iterators: in a find() call we get to access the balance of the range from the found position (if any) through the end of the initial range. In contrast, STL's find returns an iterator that can be combined with the beginning of the initial range or the end of the initial range, thus offering two ranges instead of one.
>
> D offers the findSplit family which partially deals with this, but in an arguably imperfect way because the range from the beginning of the initial range to the found element has a different type than the initial range.
>
> In spite of that, things have worked out well.
>
> Nowadays I've been thinking (inspired by an exchange on a C=+ mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real challenge for D's ranges.
>
> In D, r.rfind(e) should return a range starting with the last occurrence of e in r and spanning through the rest of r.
>
> If r is a forward range but not better, this is rather simple:
>
> R rfind(R, E)(R range, E element)
> {
>     for (;;)
>     {
>         auto ahead = range.save.find(element);
>         if (ahead.empty) return range;
>         range = ahead;
>     }
> }

I'd hardly call that an ideal solution >:( The point of rfind is to not start the search from the start of the string. What if your input was:
//----
string mobyDick = ...:
auto r = rfind(mobyDick, ishmael); This may take a while...
//----

That, and it would fail with rfind("A_bc_bc_bc_A", "bc_bc"): It would find "bc_bc_bc_A" instead of "bc_bc_A".

The root cause is that a bidirectional range just can't be "cut"/"sliced" at a given point, whereas iterators allow this.

As much as I love ranges, I think this is a fundamental flaw for anything less that hasSlicing.

So far, the problem has stayed under the radar, thanks (because?) of take, but the problems it brings are clear:
-Bias towards front iteration.
-Type system warping.
-Loss of bidirectionality when taken.

Algorithms suffer because of this. Containers suffer because of this.

I don't have any real solution to offer (there'd be a real complexity cost to any proposal), but I think there *needs* to be a way to provide "inter" slicing of ranges. Something along the lines of:
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = cutBefore(a, b); //Gets the part of a that is before b
auto r = cutAfter(a, b); //Gets the part of a that is after b
static assert(typeof(a) == typeof(l));
static assert(typeof(a) == typeof(r));
assert(equal(l), [1, 2]);
assert(equal(r), [5, 6]);
//----

The advantage here is that:
1) consistent typing.
2) no direction bias.

If we had this, then find could just return the slice containing the found element, and user could work his way from there.

Problem is bloating the interface ... That, and we'd also need the versions that keep the cut element...
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = mergeBefore(a, b); //Gets the part of a that is before b, with b
auto r = mergeAfter(a, b); //Gets the part of a that is after b, with b
assert(equal(l), [1, 2, 3, 4]);
assert(equal(r), [3, 4, 5, 6]);
//----

It is a complicated problem, but one that is worth solving, IMO.

January 15, 2013

Re: The rfind challenge

Posted by FG
in reply to Andrei Alexandrescu

FG

Posted in reply to Andrei Alexandrescu

On 2013-01-15 06:51, Andrei Alexandrescu wrote:
> If r is a forward range but not better, this is rather simple:
>
> R rfind(R, E)(R range, E element)
> {
>      for (;;)
>      {
>          auto ahead = range.save.find(element);
>          if (ahead.empty) return range;
>          range = ahead;
>      }
> }

That example creates an infinite loop. Needs a popFront:

    R rfind(R, E)(R range, E element)
    {
        if (range.empty) return range;
        for (;;)
        {
            auto ahead = range.save;
            ahead.popFront();
            ahead = ahead.find(element);
            if (ahead.empty) return range;
            range = ahead;
        }
    }

Another thing: if the element is not found, I think an empty range should be returned and not the whole range.

January 15, 2013

Re: The rfind challenge

Posted by Andrei Alexandrescu
in reply to H. S. Teoh

Andrei Alexandrescu

Posted in reply to H. S. Teoh

On 1/15/13 1:31 AM, H. S. Teoh wrote:
> Hmm. What about introducing a new primitive takeUntil, that returns the
> initial segment of the range until a given predicate becomes true? Then
> you could implement rfind thus:
>
> 	auto rfind(alias pred, R)(R range)
> 		if (isBidirectionalRange!R)
> 	{
> 		return range.retro()
> 			.takeUntil!pred()
> 			.retro();
> 	}

That works, but returns a different type. Ideally rfind would return the same type as the initial range, R.

Andrei

January 15, 2013

Re: The rfind challenge

Posted by Andrei Alexandrescu
in reply to monarch_dodra

Andrei Alexandrescu

Posted in reply to monarch_dodra

On 1/15/13 2:20 AM, monarch_dodra wrote:
> On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu wrote:
>> R rfind(R, E)(R range, E element)
>> {
>> for (;;)
>> {
>> auto ahead = range.save.find(element);
>> if (ahead.empty) return range;
>> range = ahead;
>> }
>> }
>
> I'd hardly call that an ideal solution >:( The point of rfind is to not
> start the search from the start of the string.

I may reply to the rest, but let me destroy this quickly: this is the implementation for a forward range that's not bidirectional. The challenge is indeed implementing rfind for bidirectional ranges that are not random (e.g. strings).

Andrei

January 15, 2013

Re: The rfind challenge

Posted by Andrei Alexandrescu
in reply to FG

Andrei Alexandrescu

Posted in reply to FG

On 1/15/13 6:53 AM, FG wrote:
> On 2013-01-15 06:51, Andrei Alexandrescu wrote:
>> If r is a forward range but not better, this is rather simple:
>>
>> R rfind(R, E)(R range, E element)
>> {
>> for (;;)
>> {
>> auto ahead = range.save.find(element);
>> if (ahead.empty) return range;
>> range = ahead;
>> }
>> }
>
> That example creates an infinite loop. Needs a popFront:
>
> R rfind(R, E)(R range, E element)
> {
> if (range.empty) return range;
> for (;;)
> {
> auto ahead = range.save;
> ahead.popFront();
> ahead = ahead.find(element);
> if (ahead.empty) return range;
> range = ahead;
> }
> }
>
> Another thing: if the element is not found, I think an empty range
> should be returned and not the whole range.

Thanks for the fixes!

Andrei

January 15, 2013

Re: The rfind challenge

Posted by monarch_dodra
in reply to Andrei Alexandrescu

monarch_dodra

Posted in reply to Andrei Alexandrescu

On Tuesday, 15 January 2013 at 12:56:43 UTC, Andrei Alexandrescu wrote:
> On 1/15/13 2:20 AM, monarch_dodra wrote:
>> On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu wrote:
>>> R rfind(R, E)(R range, E element)
>>> {
>>> for (;;)
>>> {
>>> auto ahead = range.save.find(element);
>>> if (ahead.empty) return range;
>>> range = ahead;
>>> }
>>> }
>>
>> I'd hardly call that an ideal solution >:( The point of rfind is to not
>> start the search from the start of the string.
>
> I may reply to the rest, but let me destroy this quickly: this is the implementation for a forward range that's not bidirectional.

I'd argue against providing rfind at all if R is not bidirectional. But that's not the thread's main issue, so let's not focus on this.

> The challenge is indeed implementing rfind for bidirectional ranges that are not random (e.g. strings).
>
> Andrei

Agreed.

On a side note, it would be very easy if we agreed to return the subrange starting at the beginning of range, and ending after the last element.

But that's really just avoiding the problem.

January 15, 2013

Re: The rfind challenge

Posted by Timon Gehr
in reply to FG

Timon Gehr

Posted in reply to FG

On 01/15/2013 12:53 PM, FG wrote:
> On 2013-01-15 06:51, Andrei Alexandrescu wrote:
>> If r is a forward range but not better, this is rather simple:
>>
>> R rfind(R, E)(R range, E element)
>> {
>>      for (;;)
>>      {
>>          auto ahead = range.save.find(element);
>>          if (ahead.empty) return range;
>>          range = ahead;
>>      }
>> }
>
> That example creates an infinite loop. Needs a popFront:
>
>      R rfind(R, E)(R range, E element)
>      {
>          if (range.empty) return range;
>          for (;;)
>          {
>              auto ahead = range.save;
>              ahead.popFront();
>              ahead = ahead.find(element);
>              if (ahead.empty) return range;
>              range = ahead;
>          }
>      }
>

That is buggy too.

> Another thing: if the element is not found, I think an empty range
> should be returned and not the whole range.

Yup.

R rfind(R, E)(R range,E element){
    auto r=range.find(element);
    if(r.empty) return r;
    for(;;){
        auto ahead=r.save;
        ahead.popFront();
        ahead=ahead.find(element);
        if(ahead.empty) return r;
        r=ahead;
    }
}


Full proof (assuming purity of the range primitives):

/+pure+/ R tail(R)(R r)out(result){
    assert(result=={
        auto a=r;
        a.popFront();
        return a;
    }());
}body{
    auto a=r;
    a.popFront();
    return a;
}

/+pure+/ R rfind(R, E)(R range,E element)out(result){
	assert(range.find(element).empty&&result.empty||!range.find(element).empty&&!result.empty&&result.front==element&&result.tail.find(element).empty);
}body{
    auto r=range.find(element);
    assert(r==range.find(element)&&(r.empty||r.front==element));
    if(r.empty){
        assert(r==range.find(element)&&r.empty);
        return r;
        /+assert(result==range.find(element)&&result.empty);
        assert(range.find(element).empty&&result.empty);+/
    }
    assert(r==range.find(element)&&!r.empty&&r.front==element);
    assert(!range.find(element).empty&&!r.empty&&r.front==element);
    for(;;){

assert(true&&!range.find(element).empty&&!r.empty&&r.front==element);

assert(!range.find(element).empty&&!r.empty&&r.front==element&&!r.empty);
        auto ahead=r.save;

assert(!range.find(element).empty&&!r.empty&&r.front==element&&!ahead.empty&&ahead==r.save);
        ahead.popFront();

assert(!range.find(element).empty&&!r.empty&&r.front==element&&ahead=={
            auto ahead=r.save;
            ahead.popFront();
            return ahead;
        }());

assert(!range.find(element).empty&&!r.empty&&r.front==element&&ahead==r.tail);
        ahead=ahead.find(element);

assert(!range.find(element).empty&&!r.empty&&r.front==element&&(ahead.empty||ahead.front==element)&&ahead==r.tail.find(element));
        if(ahead.empty){

assert(!range.find(element).empty&&ahead.empty&&!r.empty&&r.front==element&&ahead==r.tail.find(element));
            return r;

/+assert(!range.find(element).empty&&ahead.empty&&!result.empty&&result.front==element&&ahead==result.tail.find(element));

assert(!range.find(element).empty&&!result.empty&&result.front==element&&result.tail.find(element).empty);+/
        }

assert(!range.find(element).empty&&!ahead.empty&&ahead.front==element);
        r=ahead;
        assert(!range.find(element).empty&&!r.empty&&r.front==element);
    }
    assert(false);
}

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation