Search
simultaneous multiple key sorting algorithm
Jan 27, 2012
sclytrack
Jan 27, 2012
Christof Schardt
Jan 28, 2012
Robert Jacques
Jan 27, 2012
Andrew Krieger
Jan 27, 2012
Derek
Jan 27, 2012
Jan 28, 2012
Manfred Nowak
Jan 29, 2012
Manfred Nowak
Jan 28, 2012
Peter Alexander
Jan 28, 2012
H. S. Teoh
Jan 28, 2012
H. S. Teoh
Jan 28, 2012
Manfred Nowak
Jan 28, 2012
Manfred Nowak
Jan 28, 2012
Manfred Nowak
```Here's a multikey sorting problem that's bothering me; I think it's well researched but I can't find the right keywords.

Say we have a collection of people with height and hair length information. We want to get people who are "tall and long-haired". More precisely, we want to order people by rank(p.height) + rank(p.hairlength), where rank(p.k) is the function that returns the rank of person p if ordered by key k.

The brute force approach would essentially compute the two ranks and then sort by their sum. That's three sorts and at least one temporary array. But I think things can be done considerably better. Any ideas?

Thanks,

Andrei
```
```On 01/27/2012 08:36 PM, Andrei Alexandrescu wrote:
> Here's a multikey sorting problem that's bothering me; I think it's well
> researched but I can't find the right keywords.
>
> Say we have a collection of people with height and hair length
> information. We want to get people who are "tall and long-haired". More
> precisely, we want to order people by rank(p.height) +
> rank(p.hairlength), where rank(p.k) is the function that returns the
> rank of person p if ordered by key k.
>
> The brute force approach would essentially compute the two ranks and
> then sort by their sum. That's three sorts and at least one temporary
> array. But I think things can be done considerably better. Any ideas?
>
>
> Thanks,
>
> Andrei

http://en.wikipedia.org/wiki/K-d_tree

Sclytrack

```
```On 1/27/12 3:07 PM, sclytrack wrote:
> On 01/27/2012 08:36 PM, Andrei Alexandrescu wrote:
>> Here's a multikey sorting problem that's bothering me; I think it's well
>> researched but I can't find the right keywords.
>>
>> Say we have a collection of people with height and hair length
>> information. We want to get people who are "tall and long-haired". More
>> precisely, we want to order people by rank(p.height) +
>> rank(p.hairlength), where rank(p.k) is the function that returns the
>> rank of person p if ordered by key k.
>>
>> The brute force approach would essentially compute the two ranks and
>> then sort by their sum. That's three sorts and at least one temporary
>> array. But I think things can be done considerably better. Any ideas?
>>
>>
>> Thanks,
>>
>> Andrei
>
>
> http://en.wikipedia.org/wiki/K-d_tree

Thanks. I think k-d trees are good for general operations (insertion, deletion, and nearest neighbor query) whereas here we're interested in sorting.

Andrei

```
```On Sat, 28 Jan 2012 06:36:49 +1100, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Here's a multikey sorting problem that's bothering me; I think it's well researched but I can't find the right keywords.
>
> Say we have a collection of people with height and hair length information. We want to get people who are "tall and long-haired". More precisely, we want to order people by rank(p.height) + rank(p.hairlength), where rank(p.k) is the function that returns the rank of person p if ordered by key k.
>
> The brute force approach would essentially compute the two ranks and then sort by their sum. That's three sorts and at least one temporary array. But I think things can be done considerably better. Any ideas?

Just a further clarification ...

If a person is tall and has short hair, would they be included in the sort? In other words, are you ONLY looking for tall AND long-haired people or are you looking for relative ranking of tallness and hairiness?

--
Derek Parnell
Melbourne, Australia
```
```On 1/27/12 5:27 PM, Derek wrote:
> On Sat, 28 Jan 2012 06:36:49 +1100, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> Here's a multikey sorting problem that's bothering me; I think it's
>> well researched but I can't find the right keywords.
>>
>> Say we have a collection of people with height and hair length
>> information. We want to get people who are "tall and long-haired".
>> More precisely, we want to order people by rank(p.height) +
>> rank(p.hairlength), where rank(p.k) is the function that returns the
>> rank of person p if ordered by key k.
>>
>> The brute force approach would essentially compute the two ranks and
>> then sort by their sum. That's three sorts and at least one temporary
>> array. But I think things can be done considerably better. Any ideas?
>
> Just a further clarification ...
>
> If a person is tall and has short hair, would they be included in the
> sort? In other words, are you ONLY looking for tall AND long-haired
> people or are you looking for relative ranking of tallness and hairiness?

Everybody is included in the sort: the size of the array before and after the sort is the same. Then, of course, the top K problem is very interesting too.

Andrei

```
```>>> The brute force approach would essentially compute the two ranks and then sort by their sum. That's three sorts and at least one temporary

I think, you are mixing two problem levels:

Level one (theory):
How are the two properties weighted against each other
and can be combined to a total weight (a weight-function).

Level two (programming):
how can this be expressed in an optimal algorithmical way,
given D2.

Your question suggests, the the (naive) sum-function is
already the solution to level one.
But I suspect, that the solution for the problem requires
rather to find a proper weight function than a
combinatation-algorithm.

Christof

```
```On 1/27/12 5:35 PM, Christof Schardt wrote:
>>>> The brute force approach would essentially compute the two ranks and
>>>> then sort by their sum. That's three sorts and at least one temporary
>
> I think, you are mixing two problem levels:
>
> Level one (theory):
> How are the two properties weighted against each other
> and can be combined to a total weight (a weight-function).

There's no weighing. The sorting order is rank(k1) + rank(k2).

> Level two (programming):
> how can this be expressed in an optimal algorithmical way,
> given D2.
>
> Your question suggests, the the (naive) sum-function is
> already the solution to level one.

This is not naive, because the quantities being summed are ranks, not key values.

> But I suspect, that the solution for the problem requires
> rather to find a proper weight function than a
> combinatation-algorithm.

No, I think this is a confusion.

Andrei
```
```On Friday, 27 January 2012 at 19:36:49 UTC, Andrei Alexandrescu wrote:
> Here's a multikey sorting problem that's bothering me; I think it's well researched but I can't find the right keywords.
>
> Say we have a collection of people with height and hair length information. We want to get people who are "tall and long-haired". More precisely, we want to order people by rank(p.height) + rank(p.hairlength), where rank(p.k) is the function that returns the rank of person p if ordered by key k.
>
> The brute force approach would essentially compute the two ranks and then sort by their sum. That's three sorts and at least one temporary array. But I think things can be done considerably better. Any ideas?

I don't think you can get much better than this.

Knowing the rank (for one key) for one element requires looking at all the other elements. Thus, knowing the rank of all elements isn't possible in linear time or faster.

Knowing the ranks of only one key doesn't get you much information regarding the final result. Even the item ranked last on one key can tie for first place in combination with the other.

These observations seem to impose a strict dependency on the order the data is calculated. I don't see any shortcuts, but I'd love to be proven wrong.

You may want to ask on http://cstheory.stackexchange.com/ .
```
```A k-d tree isn't directly applicable here - the problem is that the input data isn't normalized, so you can't just drop people into a 2d grid consisting of one axis being height, and the other axis being hair length. You need to 'normalize' the heights and lengths into ranks first, *then* insert them, and normalizing the lengths -> ranks is an NlogN process to begin with. Constructing a k-d tree isn't free either - you either have to pre-sort your inputs, or sort-as-you-go, which works out to NlogN as well.

On Friday, 27 January 2012 at 20:07:44 UTC, sclytrack wrote:
> On 01/27/2012 08:36 PM, Andrei Alexandrescu wrote:
>> Here's a multikey sorting problem that's bothering me; I think it's well
>> researched but I can't find the right keywords.
>>
>> Say we have a collection of people with height and hair length
>> information. We want to get people who are "tall and long-haired". More
>> precisely, we want to order people by rank(p.height) +
>> rank(p.hairlength), where rank(p.k) is the function that returns the
>> rank of person p if ordered by key k.
>>
>> The brute force approach would essentially compute the two ranks and
>> then sort by their sum. That's three sorts and at least one temporary
>> array. But I think things can be done considerably better. Any ideas?
>>
>>
>> Thanks,
>>
>> Andrei
>
>
> http://en.wikipedia.org/wiki/K-d_tree
>
>
> Sclytrack

```
```On Fri, 27 Jan 2012 16:45:41 -0600, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> On 1/27/12 5:35 PM, Christof Schardt wrote:
>>>>> The brute force approach would essentially compute the two ranks and
>>>>> then sort by their sum. That's three sorts and at least one temporary
>>
>> I think, you are mixing two problem levels:
>>
>> Level one (theory):
>> How are the two properties weighted against each other
>> and can be combined to a total weight (a weight-function).
>
> There's no weighing. The sorting order is rank(k1) + rank(k2).

Weighting doesn't imply a simple linear constant. The rank function is an example of a non-linear weighting scheme. Non-linear weighting functions are common in complex sorting and classification problems.
```
« First   ‹ Prev
1 2 3