View mode: basic / threaded / horizontal-split · Log in · Help
January 27, 2012
simultaneous multiple key sorting algorithm
Here's a multikey sorting problem that's bothering me; I think it's well 
researched but I can't find the right keywords.

Say we have a collection of people with height and hair length 
information. We want to get people who are "tall and long-haired". More 
precisely, we want to order people by rank(p.height) + 
rank(p.hairlength), where rank(p.k) is the function that returns the 
rank of person p if ordered by key k.

The brute force approach would essentially compute the two ranks and 
then sort by their sum. That's three sorts and at least one temporary 
array. But I think things can be done considerably better. Any ideas?


Thanks,

Andrei
January 27, 2012
Re: simultaneous multiple key sorting algorithm
On 01/27/2012 08:36 PM, Andrei Alexandrescu wrote:
> Here's a multikey sorting problem that's bothering me; I think it's well
> researched but I can't find the right keywords.
>
> Say we have a collection of people with height and hair length
> information. We want to get people who are "tall and long-haired". More
> precisely, we want to order people by rank(p.height) +
> rank(p.hairlength), where rank(p.k) is the function that returns the
> rank of person p if ordered by key k.
>
> The brute force approach would essentially compute the two ranks and
> then sort by their sum. That's three sorts and at least one temporary
> array. But I think things can be done considerably better. Any ideas?
>
>
> Thanks,
>
> Andrei


http://en.wikipedia.org/wiki/K-d_tree


Sclytrack
January 27, 2012
Re: simultaneous multiple key sorting algorithm
On 1/27/12 3:07 PM, sclytrack wrote:
> On 01/27/2012 08:36 PM, Andrei Alexandrescu wrote:
>> Here's a multikey sorting problem that's bothering me; I think it's well
>> researched but I can't find the right keywords.
>>
>> Say we have a collection of people with height and hair length
>> information. We want to get people who are "tall and long-haired". More
>> precisely, we want to order people by rank(p.height) +
>> rank(p.hairlength), where rank(p.k) is the function that returns the
>> rank of person p if ordered by key k.
>>
>> The brute force approach would essentially compute the two ranks and
>> then sort by their sum. That's three sorts and at least one temporary
>> array. But I think things can be done considerably better. Any ideas?
>>
>>
>> Thanks,
>>
>> Andrei
>
>
> http://en.wikipedia.org/wiki/K-d_tree

Thanks. I think k-d trees are good for general operations (insertion, 
deletion, and nearest neighbor query) whereas here we're interested in 
sorting.

Andrei
January 27, 2012
Re: simultaneous multiple key sorting algorithm
On Sat, 28 Jan 2012 06:36:49 +1100, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> Here's a multikey sorting problem that's bothering me; I think it's well  
> researched but I can't find the right keywords.
>
> Say we have a collection of people with height and hair length  
> information. We want to get people who are "tall and long-haired". More  
> precisely, we want to order people by rank(p.height) +  
> rank(p.hairlength), where rank(p.k) is the function that returns the  
> rank of person p if ordered by key k.
>
> The brute force approach would essentially compute the two ranks and  
> then sort by their sum. That's three sorts and at least one temporary  
> array. But I think things can be done considerably better. Any ideas?

Just a further clarification ...

If a person is tall and has short hair, would they be included in the  
sort? In other words, are you ONLY looking for tall AND long-haired people  
or are you looking for relative ranking of tallness and hairiness?

-- 
Derek Parnell
Melbourne, Australia
January 27, 2012
Re: simultaneous multiple key sorting algorithm
On 1/27/12 5:27 PM, Derek wrote:
> On Sat, 28 Jan 2012 06:36:49 +1100, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> Here's a multikey sorting problem that's bothering me; I think it's
>> well researched but I can't find the right keywords.
>>
>> Say we have a collection of people with height and hair length
>> information. We want to get people who are "tall and long-haired".
>> More precisely, we want to order people by rank(p.height) +
>> rank(p.hairlength), where rank(p.k) is the function that returns the
>> rank of person p if ordered by key k.
>>
>> The brute force approach would essentially compute the two ranks and
>> then sort by their sum. That's three sorts and at least one temporary
>> array. But I think things can be done considerably better. Any ideas?
>
> Just a further clarification ...
>
> If a person is tall and has short hair, would they be included in the
> sort? In other words, are you ONLY looking for tall AND long-haired
> people or are you looking for relative ranking of tallness and hairiness?

Everybody is included in the sort: the size of the array before and 
after the sort is the same. Then, of course, the top K problem is very 
interesting too.

Andrei
January 27, 2012
Re: simultaneous multiple key sorting algorithm
>>> The brute force approach would essentially compute the two ranks and
>>> then sort by their sum. That's three sorts and at least one temporary

I think, you are mixing two problem levels:

Level one (theory):
How are the two properties weighted against each other
and can be combined to a total weight (a weight-function).

Level two (programming):
how can this be expressed in an optimal algorithmical way,
given D2.

Your question suggests, the the (naive) sum-function is
already the solution to level one.
But I suspect, that the solution for the problem requires
rather to find a proper weight function than a
combinatation-algorithm.

Christof
January 27, 2012
Re: simultaneous multiple key sorting algorithm
On 1/27/12 5:35 PM, Christof Schardt wrote:
>>>> The brute force approach would essentially compute the two ranks and
>>>> then sort by their sum. That's three sorts and at least one temporary
>
> I think, you are mixing two problem levels:
>
> Level one (theory):
> How are the two properties weighted against each other
> and can be combined to a total weight (a weight-function).

There's no weighing. The sorting order is rank(k1) + rank(k2).

> Level two (programming):
> how can this be expressed in an optimal algorithmical way,
> given D2.
>
> Your question suggests, the the (naive) sum-function is
> already the solution to level one.

This is not naive, because the quantities being summed are ranks, not 
key values.

> But I suspect, that the solution for the problem requires
> rather to find a proper weight function than a
> combinatation-algorithm.

No, I think this is a confusion.


Andrei
January 27, 2012
Re: simultaneous multiple key sorting algorithm
On Friday, 27 January 2012 at 19:36:49 UTC, Andrei Alexandrescu 
wrote:
> Here's a multikey sorting problem that's bothering me; I think 
> it's well researched but I can't find the right keywords.
>
> Say we have a collection of people with height and hair length 
> information. We want to get people who are "tall and 
> long-haired". More precisely, we want to order people by 
> rank(p.height) + rank(p.hairlength), where rank(p.k) is the 
> function that returns the rank of person p if ordered by key k.
>
> The brute force approach would essentially compute the two 
> ranks and then sort by their sum. That's three sorts and at 
> least one temporary array. But I think things can be done 
> considerably better. Any ideas?

I don't think you can get much better than this.

Knowing the rank (for one key) for one element requires looking 
at all the other elements. Thus, knowing the rank of all elements 
isn't possible in linear time or faster.

Knowing the ranks of only one key doesn't get you much 
information regarding the final result. Even the item ranked last 
on one key can tie for first place in combination with the other.

These observations seem to impose a strict dependency on the 
order the data is calculated. I don't see any shortcuts, but I'd 
love to be proven wrong.

You may want to ask on http://cstheory.stackexchange.com/ .
January 27, 2012
Re: simultaneous multiple key sorting algorithm
A k-d tree isn't directly applicable here - the problem is that 
the input data isn't normalized, so you can't just drop people 
into a 2d grid consisting of one axis being height, and the other 
axis being hair length. You need to 'normalize' the heights and 
lengths into ranks first, *then* insert them, and normalizing the 
lengths -> ranks is an NlogN process to begin with. Constructing 
a k-d tree isn't free either - you either have to pre-sort your 
inputs, or sort-as-you-go, which works out to NlogN as well.

On Friday, 27 January 2012 at 20:07:44 UTC, sclytrack wrote:
> On 01/27/2012 08:36 PM, Andrei Alexandrescu wrote:
>> Here's a multikey sorting problem that's bothering me; I think 
>> it's well
>> researched but I can't find the right keywords.
>>
>> Say we have a collection of people with height and hair length
>> information. We want to get people who are "tall and 
>> long-haired". More
>> precisely, we want to order people by rank(p.height) +
>> rank(p.hairlength), where rank(p.k) is the function that 
>> returns the
>> rank of person p if ordered by key k.
>>
>> The brute force approach would essentially compute the two 
>> ranks and
>> then sort by their sum. That's three sorts and at least one 
>> temporary
>> array. But I think things can be done considerably better. Any 
>> ideas?
>>
>>
>> Thanks,
>>
>> Andrei
>
>
> http://en.wikipedia.org/wiki/K-d_tree
>
>
> Sclytrack
January 28, 2012
Re: simultaneous multiple key sorting algorithm
On Fri, 27 Jan 2012 16:45:41 -0600, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> On 1/27/12 5:35 PM, Christof Schardt wrote:
>>>>> The brute force approach would essentially compute the two ranks and
>>>>> then sort by their sum. That's three sorts and at least one temporary
>>
>> I think, you are mixing two problem levels:
>>
>> Level one (theory):
>> How are the two properties weighted against each other
>> and can be combined to a total weight (a weight-function).
>
> There's no weighing. The sorting order is rank(k1) + rank(k2).

Weighting doesn't imply a simple linear constant. The rank function is an example of a non-linear weighting scheme. Non-linear weighting functions are common in complex sorting and classification problems.
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home