Jump to page: 1 2 3
Thread overview
std.partition is fucked
May 13, 2009
superdan
May 13, 2009
Sean Kelly
May 13, 2009
Sean Kelly
May 13, 2009
Sean Kelly
May 13, 2009
Sean Kelly
May 13, 2009
Sean Kelly
May 13, 2009
Lutger
May 13, 2009
Sean Kelly
May 13, 2009
Lutger
May 13, 2009
Denis Koroskin
May 13, 2009
Sean Kelly
May 13, 2009
Matti Niemenmaa
May 13, 2009
Sean Kelly
May 13, 2009
Lutger
May 13, 2009
dis must be related to bug 2966 sayin' std.sort is fucked. problem must be with std.partition. i tested and unstable partition is 10 times slower than stable. should be faster akshully. so looked into tat and found in da loop for std.partition unstable and found da range r1 is fucked.

        for (;;)
        {
            for (;;)
            {
                if (r.empty) return r;
                if (!pred(r.front)) break;
                r.popFront;
            }
            // found the left bound
            assert(!r.empty);
            auto r1 = r;
            for (;;)
            {
                if (pred(r1.back)) break;
                r1.popBack;
                if (r1.empty) return r;
            }
            // found the right bound, swap & make progress
            swap(r.front, r1.back);
            r.popFront;
            r1.popBack;
        }

r1 is popbacked a few times but then all tat is forgotten the next time around da loop coz r1 is local to da loop. so da loop is quadratic methinks. what u need iz save r outside loop & then popfront & popback from da same range 'n' shit.
May 13, 2009
superdan wrote:
> dis must be related to bug 2966 sayin' std.sort is fucked. problem
> must be with std.partition. i tested and unstable partition is 10
> times slower than stable. should be faster akshully. so looked into
> tat and found in da loop for std.partition unstable and found da
> range r1 is fucked.

That is correct. Where were you yesterday when I was looking for this? :o) The fixed loop is:

        // Inspired from www.stepanovpapers.com/PAM3-partition_notes.pdf,
        // section "Bidirectional Partition Algorithm (Hoare)"
        auto result = r;
        for (;;)
        {
            for (;;)
            {
                if (r.empty) return result;
                if (!pred(r.front)) break;
                r.popFront;
                result.popFront;
            }
            // found the left bound
            assert(!r.empty);
            for (;;)
            {
                if (pred(r.back)) break;
                r.popBack;
                if (r.empty) return result;
            }
            // found the right bound, swap & make progress
            swap(r.front, r.back);
            r.popFront;
            result.popFront;
            r.popBack;
        }

Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.

The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.

Thanks David and superdan.


Andrei
May 13, 2009
Andrei Alexandrescu wrote:
> 
> Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.

So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning?  Was that an arbitrary choice, or is there a reason for it?
May 13, 2009
Andrei Alexandrescu wrote:
> 
> The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.

The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat.  I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
May 13, 2009
Sean Kelly wrote:
> Andrei Alexandrescu wrote:
>>
>> The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.
> 
> The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat.  I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).

Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time.

Andrei
May 13, 2009
Sean Kelly wrote:
> Andrei Alexandrescu wrote:
>>
>> Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.
> 
> So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning?  Was that an arbitrary choice, or is there a reason for it?

The elements satisfying the predicate are at the beginning, see e.g. the
unittests. The range returned is (as always) the right-hand side, i.e.
the range containing the elements that don't satisfy the predicate.

Andrei
May 13, 2009
Andrei Alexandrescu wrote:

> Sean Kelly wrote:
>> Andrei Alexandrescu wrote:
>>>
>>> The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.
>> 
>> The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat.  I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
> 
> Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time.
> 
> Andrei

Some time ago I reinvented these wheels for study purpose. A custom stack was a little faster but not that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called many times. Switching to another sort if a certain recursion depth is reached helped, but mostly in degenerate cases.

I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I could dig it up and provide some timings if you want.









May 13, 2009
Lutger wrote:
> Andrei Alexandrescu wrote:
> 
>> Sean Kelly wrote:
>>> Andrei Alexandrescu wrote:
>>>> The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.
>>> The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat.  I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
>> Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time.
>>
>> Andrei
> 
> Some time ago I reinvented these wheels for study purpose. A custom stack was a little faster but not that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called many times. Switching to another sort if a certain recursion depth is reached helped, but mostly in degenerate cases. 
> 
> I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I could dig it up and provide some timings if you want. 

Would be interesting if it's not too much trouble.

If swap is not inlined that would be serious. Sort does a lot of swapping.


Andrei
May 13, 2009
On Wed, 13 May 2009 22:24:51 +0400, Lutger <lutger.blijdestijn@gmail.com> wrote:

> std.swap can't be inlined because it uses ref params
>

Can you elaborate on that one? I'm not so sure that it can't be inlined.
May 13, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> Sean Kelly wrote:
> > Andrei Alexandrescu wrote:
> >>
> >> Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.
> >
> > So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning?  Was that an arbitrary choice, or is there a reason for it?
> The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.

Weird.  I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting.  Though I guess you can just use a greaterThan predicate instead.
« First   ‹ Prev
1 2 3