[WORK] groupBy is in! Next: aggregate (page 2)

On 1/23/15 12:38 PM, bearophile wrote: > Ary Borenszweig: > >> In most languages group by yields a tuple of {group key, group values}. > > I'm saying this since some years... (and those languages probably don't > use sorting to perform the aggregation). At some point in the coming years the pain is bound to become unbearable so you'll give the gift of a pull request. -- Andrei

On Friday, 23 January 2015 at 20:19:31 UTC, Ary Borenszweig wrote: > On 1/23/15 3:08 PM, Andrei Alexandrescu wrote: >> So H.S. Teoh awesomely took >> https://github.com/D-Programming-Language/phobos/pull/2878 to >> completion. We now have a working and fast relational "group by" facility. >> >> See it at work! >> >> ---- >> #!/usr/bin/rdmd >> >> void main() >> { >> import std.algorithm, std.stdio; >> [293, 453, 600, 929, 339, 812, 222, 680, 529, 768] >> .groupBy!(a => a & 1) >> .writeln; >> } >> ---- >> >> [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]] >> >> The next step is to define an aggregate() function, which is a lot >> similar to reduce() but works on ranges of ranges and aggregates a >> function over each group. Continuing the previous example: >> >> [293, 453, 600, 929, 339, 812, 222, 680, 529, 768] >> .groupBy!(a => a & 1) >> .aggregate!max >> .writeln; >> >> should print: >> >> [453, 600, 929, 812, 529, 768] >> >> The aggregate function should support aggregating several functions at >> once, e.g. aggregate!(min, max) etc. >> >> Takers? >> >> >> Andrei > > In most languages group by yields a tuple of {group key, group values}. > > For example (Ruby or Crystal): > > a = [1, 4, 2, 4, 5, 2, 3, 7, 9] > groups = a.group_by { |x| x % 3 } > puts groups #=> {1 => [1, 4, 4, 7], 2 => [2, 5, 2], 0 => [3, 9]} > > In C# it's also called group by: http://www.dotnetperls.com/groupby > > Java: http://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#groupingBy-java.util.function.Function- > > SQL: http://www.w3schools.com/sql/sql_groupby.asp > > So I'm not sure "groupBy" is a good name for this. You are talking about two different functions here. group by and partition by. The function that has been implemented is often called partition by. The best example I know of: https://clojuredocs.org/clojure.core/group-by https://clojuredocs.org/clojure.core/partition-by

On Fri, Jan 23, 2015 at 12:40:01PM -0800, Andrei Alexandrescu via Digitalmars-d wrote: > On 1/23/15 12:38 PM, bearophile wrote: > >Ary Borenszweig: > > > >>In most languages group by yields a tuple of {group key, group values}. > > > >I'm saying this since some years... (and those languages probably don't use sorting to perform the aggregation). > > At some point in the coming years the pain is bound to become unbearable so you'll give the gift of a pull request. -- Andrei ^^^^ (!!!) *cowers and hides from the rotten tomatoes...* T -- It won't be covered in the book. The source code has to be useful for something, after all. -- Larry Wall

On Fri, Jan 23, 2015 at 08:44:05PM +0000, via Digitalmars-d wrote: [...] > You are talking about two different functions here. group by and partition by. The function that has been implemented is often called partition by. [...] It's not too late to rename it, since we haven't released it yet. We still have a little window of time to make this change if necessary. Andrei? Returning each group as a tuple sounds like a distinct, albeit related, function. It can probably be added separately. T -- Why waste time reinventing the wheel, when you could be reinventing the engine? -- Damian Conway

On Friday, 23 January 2015 at 18:08:30 UTC, Andrei Alexandrescu wrote: > So H.S. Teoh awesomely took https://github.com/D-Programming-Language/phobos/pull/2878 to completion. We now have a working and fast relational "group by" facility. > > See it at work! > > ---- > #!/usr/bin/rdmd > > void main() > { > import std.algorithm, std.stdio; > [293, 453, 600, 929, 339, 812, 222, 680, 529, 768] > .groupBy!(a => a & 1) > .writeln; > } > ---- > > [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]] Sorry if this a dumb question, but since you're grouping an array according some rule, this shouldn't be: [293, 453, 929, 339, 529][600, 812, 222, 680, 768] ? Because then you have the array of "trues" and "falses" according the condition (a & 1). Matheus.

On 1/23/15 1:34 PM, H. S. Teoh via Digitalmars-d wrote: > On Fri, Jan 23, 2015 at 12:40:01PM -0800, Andrei Alexandrescu via Digitalmars-d wrote: >> On 1/23/15 12:38 PM, bearophile wrote: >>> Ary Borenszweig: >>> >>>> In most languages group by yields a tuple of {group key, group >>>> values}. >>> >>> I'm saying this since some years... (and those languages probably >>> don't use sorting to perform the aggregation). >> >> At some point in the coming years the pain is bound to become >> unbearable so you'll give the gift of a pull request. -- Andrei > ^^^^ > (!!!) I knew that pun won't go unremarked! :o) -- Andrei

January 23, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by H. S. Teoh
in reply to MattCoder

Permalink

H. S. Teoh

Posted in reply to MattCoder

Permalink

On Fri, Jan 23, 2015 at 09:56:11PM +0000, MattCoder via Digitalmars-d wrote:
> On Friday, 23 January 2015 at 18:08:30 UTC, Andrei Alexandrescu wrote:
> >So H.S. Teoh awesomely took https://github.com/D-Programming-Language/phobos/pull/2878 to completion.  We now have a working and fast relational "group by" facility.
> >
> >See it at work!
> >
> >----
> >#!/usr/bin/rdmd
> >
> >void main()
> >{
> >    import std.algorithm, std.stdio;
> >    [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
> >        .groupBy!(a => a & 1)
> >        .writeln;
> >}
> >----
> >
> >[[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]]
> 
> Sorry if this a dumb question, but since you're grouping an array according some rule, this shouldn't be:
> 
> [293, 453, 929, 339, 529][600, 812, 222, 680, 768]
> 
> ?
> 
> Because then you have the array of "trues" and "falses" according the
> condition (a & 1).
[...]

It's kind of a misnomer, because it actually only considers *consecutive runs* of equivalent elements; it doesn't look at the whole range before deciding what goes in which group. So technically it should be called consecutiveGroupBy or consecutivePartitionBy, but that's too big a mouthful to be a usable name.

What you describe could be an interesting candidate to add, though. It could iterate over distinct values of the predicate, and traverse the forward range (input ranges obviously can't work unless you allocate, which makes it no longer lazy) each time. This, however, has O(n*k) complexity where k is the number of distinct predicate values. If it's anything bigger than bool or a relatively small enum, it would be impractical. Moreover, the requirement to traverse the range multiple times kinda sux; you might as well just call filter() with different expected values yourself, in a loop. In fact, you could ostensibly implement it something along these lines:

	auto globalPartition(alias pred, R)(R range) {
		alias Values = typeof(pred(range.front, range.front));

		return iota(Values.min, Values.max)
			.map!(v => range.save.filter!(pred(e) == v));
	}

T

-- 
Who told you to swim in Crocodile Lake without life insurance??

H. S. Teoh: > What you describe could be an interesting candidate to add, though. It > could iterate over distinct values of the predicate, and traverse the > forward range (input ranges obviously can't work unless you allocate, > which makes it no longer lazy) each time. This, however, has O(n*k) > complexity where k is the number of distinct predicate values. Let's allocate, creating an associative array inside the grouping function :-) Bye, bearophile

On Friday, 23 January 2015 at 22:20:19 UTC, H. S. Teoh wrote: > It's kind of a misnomer, because it actually only considers *consecutive > runs* of equivalent elements; it doesn't look at the whole range before > deciding what goes in which group. So technically it should be called > consecutiveGroupBy or consecutivePartitionBy, but that's too big a > mouthful to be a usable name. > > What you describe could be an interesting candidate to add, though. It > could iterate over distinct values of the predicate, and traverse the > forward range (input ranges obviously can't work unless you allocate, > which makes it no longer lazy) each time. This, however, has O(n*k) > complexity where k is the number of distinct predicate values. If it's > anything bigger than bool or a relatively small enum, it would be > impractical. Moreover, the requirement to traverse the range multiple > times kinda sux; you might as well just call filter() with different > expected values yourself, in a loop. In fact, you could ostensibly > implement it something along these lines: > > auto globalPartition(alias pred, R)(R range) { > alias Values = typeof(pred(range.front, range.front)); > > return iota(Values.min, Values.max) > .map!(v => range.save.filter!(pred(e) == v)); > } >... Alright and thanks for the whole explanation. Matheus.

Forums