Jump to page: 1 2
Thread overview
Ranges/algorithms for aggregation
Mar 21, 2014
Luís Marques
Mar 21, 2014
bearophile
Mar 21, 2014
Luís Marques
Mar 21, 2014
bearophile
Mar 21, 2014
Luís Marques
Mar 21, 2014
Justin Whear
Mar 21, 2014
H. S. Teoh
Mar 21, 2014
Luís Marques
Mar 22, 2014
Luís Marques
Mar 22, 2014
bearophile
Mar 22, 2014
Luís Marques
Mar 22, 2014
bearophile
March 21, 2014
Is there a neat way to do this transformation with ranges and std.algorithms?

    Input:
    -------
    B foo
    B bar
    C ble
    B big
    A begga

    Output: (aggregated and sorted on length)
    -------
    B -> [foo, bar, big]
    C -> [ble]
    A -> [begga]

The most obvious way (to me) to do this without standard algorithms is with an AA to the aggregation. The most obvious way (to me) to do this with std.algorithms is:

    B foo
    B bar
    C ble
    B big
    A begga

    =>

    [B, foo]
    [B, bar]
    [C, ble]
    [B, big]
    [A, begga]

    =>

    [B, foo]
    [B, bar]
    [B, big]
    [C, ble]
    [A, begga]

    =>

    B -> [foo, bar, big]
    C -> [ble]
    A -> [begga]

But this seems wasteful on memory. Is there a better way to do this in a more algorithmic way?
March 21, 2014
Luís Marques:

> Is there a neat way to do this transformation with ranges and std.algorithms?
>
>     Input:
>     -------
>     B foo
>     B bar
>     C ble
>     B big
>     A begga
>
>     Output: (aggregated and sorted on length)
>     -------
>     B -> [foo, bar, big]
>     C -> [ble]
>     A -> [begga]

What is the desired output data structure? An associative array of dynamic arrays? Or is a dynamic arrays of dynamic arrays of 2-tuples enough?

There are various ways to solve your problem.

Related:
https://d.puremagic.com/issues/show_bug.cgi?id=5968
https://d.puremagic.com/issues/show_bug.cgi?id=9842

Bye,
bearophile
March 21, 2014
On Friday, 21 March 2014 at 15:38:23 UTC, bearophile wrote:
>>    Output: (aggregated and sorted on length)
>>    -------
>>    B -> [foo, bar, big]
>>    C -> [ble]
>>    A -> [begga]
>
> What is the desired output data structure? An associative array of dynamic arrays? Or is a dynamic arrays of dynamic arrays of 2-tuples enough?

I'm doing this for a D newbie, to teach him the range/algorithmic approach. The function he wrote to output the result of that transformation takes as an input an array of arrays (the latter), but he builds that input iteratively using an AA of arrays (the former). I asked him about that  mismatch and at the time he told me that "for now" he only needed the latter, suggesting he had other future plans where he might need the AA, but I'm not sure. So let's just say that the client is unclear on his requirements, which does happen in the real world anyway :-).

In any case, I think the hashGroupBy is what I was asking about :-). Neat. (did anyone actually implement it?)

I'm not sure how if "a dynamic arrays of dynamic arrays of 2-tuples" sufficed that would help with the intermediate step, if we wanted to avoid the sorting step. Did you have anything in particular in mind there?
March 21, 2014
Luís Marques:

> So let's just say that the client is unclear on his requirements, which does happen in the real world anyway :-).

Yes, it happens, But it's a problem, because often if you know what you need you can produce the results more efficiently :-)


> (did anyone actually implement it?)

Not yet.


> I'm not sure how if "a dynamic arrays of dynamic arrays of 2-tuples" sufficed that would help with the intermediate step, if we wanted to avoid the sorting step. Did you have anything in particular in mind there?

I think this problem needs a sorting or a hash. One possible solution, if you don't need an associative array as output, is to use a multiSort followed by a building of groups using slicing. It could be efficient enough. Later you search the keys with a some kind of binary search.

Bye,
bearophile
March 21, 2014
On Friday, 21 March 2014 at 16:04:45 UTC, bearophile wrote:
> I think this problem needs a sorting or a hash. One possible solution, if you don't need an associative array as output, is to use a multiSort followed by a building of groups using slicing. It could be efficient enough. Later you search the keys with a some kind of binary search.

The number of keys is large and unbounded (not just three as in my example), so I guess this multiSort approach would not be practical, right? I think we really need the hashGroupBy.
March 21, 2014
On Fri, 21 Mar 2014 15:22:13 +0000, Luís Marques wrote:

> Is there a neat way to do this transformation with ranges and std.algorithms?
> 
>      Input:
>      -------
>      B foo B bar C ble B big A begga
> 
>      Output: (aggregated and sorted on length)
>      -------
>      B -> [foo, bar, big]
>      C -> [ble]
>      A -> [begga]
> 
> The most obvious way (to me) to do this without standard algorithms is
> with an AA to the aggregation. The most obvious way (to me) to do this
> with std.algorithms is:
> 
>      B foo B bar C ble B big A begga
> 
>      =>
> 
>      [B, foo]
>      [B, bar]
>      [C, ble]
>      [B, big]
>      [A, begga]
> 
>      =>
> 
>      [B, foo]
>      [B, bar]
>      [B, big]
>      [C, ble]
>      [A, begga]
> 
>      =>
> 
>      B -> [foo, bar, big]
>      C -> [ble]
>      A -> [begga]
> 
> But this seems wasteful on memory. Is there a better way to do this in a more algorithmic way?

This pull request[1] for groupBy has been hanging around for a year now, driving me to copy-and-paste the implementation into a couple of my projects.  Using it, you could do this:

auto tuples = ... // get your list of (B, foo), (B, bar), etc.
auto output = tuples.sort!`a[0] < b[0]`
                    .groupBy!`a[0] == b[0]`;
// output is a range of:
//    [
//     [(A, begga)],
//     [(B, foo), (B, bar), (B, big)],
//     [(C, ble)]
//    ]

The advantage being that output isn't an array at all but a lazy range of lazy ranges.

1 https://github.com/D-Programming-Language/phobos/pull/1186
March 21, 2014
On Fri, Mar 21, 2014 at 04:10:12PM +0000, Justin Whear wrote: [...]
> This pull request[1] for groupBy has been hanging around for a year now, driving me to copy-and-paste the implementation into a couple of my projects.  Using it, you could do this:
> 
> auto tuples = ... // get your list of (B, foo), (B, bar), etc.
> auto output = tuples.sort!`a[0] < b[0]`
>                     .groupBy!`a[0] == b[0]`;
> // output is a range of:
> //    [
> //     [(A, begga)],
> //     [(B, foo), (B, bar), (B, big)],
> //     [(C, ble)]
> //    ]
> 
> The advantage being that output isn't an array at all but a lazy range of lazy ranges.
> 
> 1 https://github.com/D-Programming-Language/phobos/pull/1186

Be aware, though, that groupBy only compares *adjacent* elements for equivalence; it does not sort the input. So if your input has equivalent elements interspersed with non-equivalent elements, you will have the equivalent elements split into multiple runs in the output.

Example:
	auto data = [
		tuple(1, "a")
		tuple(1, "b")
		tuple(2, "c")
		tuple(1, "d")
	];
	writeln(data.groupBy!((a,b) => a[0] == b[0]));

Will output:
	[[tuple(1, "a"), tuple(1, "b")], [tuple(2, "c")], [tuple(1, "d")]]

Which may not be what the OP wants.


T

-- 
Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
March 21, 2014
On Friday, 21 March 2014 at 16:53:46 UTC, H. S. Teoh wrote:
> Be aware, though, that groupBy only compares *adjacent* elements for
> equivalence; it does not sort the input. So if your input has equivalent
> elements interspersed with non-equivalent elements, you will have the
> equivalent elements split into multiple runs in the output.

I think that's why Justin used sort. The hashGroupBy proposed by bearophile would avoid the sort and the additional memory usage though, so that would be even better.
March 22, 2014
On Friday, 21 March 2014 at 17:20:38 UTC, Luís Marques wrote:
> I think that's why Justin used sort. The hashGroupBy proposed by bearophile would avoid the sort and the additional memory usage though, so that would be even better.

I was thinking, we don't even need the full power of sort. Is there a standard algorithm that makes elements with equal keys be in sequence, but that otherwise is less expensive than sort?
March 22, 2014
Luís Marques:

> I was thinking, we don't even need the full power of sort. Is there a standard algorithm that makes elements with equal keys be in sequence, but that otherwise is less expensive than sort?

I don't know any, how is it supposed to know where to group items? Usually you build an associative array for that.

Bye,
bearophile
« First   ‹ Prev
1 2