Jump to page: 1 25  
Page
Thread overview
[WORK] groupBy is in! Next: aggregate
Jan 23, 2015
Justin Whear
Jan 23, 2015
H. S. Teoh
Jan 23, 2015
H. S. Teoh
Jan 23, 2015
H. S. Teoh
Jan 23, 2015
Nordlöw
Jan 23, 2015
Ary Borenszweig
proper groupBy
Jan 24, 2015
Laeeth Isharc
Jan 23, 2015
bearophile
Jan 23, 2015
H. S. Teoh
Jan 23, 2015
Ulrich Küttler
Jan 23, 2015
H. S. Teoh
Jan 25, 2015
Ary Borenszweig
Jan 25, 2015
H. S. Teoh
Jan 25, 2015
Olivier Grant
Jan 23, 2015
MattCoder
Jan 23, 2015
H. S. Teoh
Jan 23, 2015
bearophile
Jan 25, 2015
Ary Borenszweig
Jan 23, 2015
MattCoder
Jan 26, 2015
Russel Winder
Jan 26, 2015
bearophile
Jan 26, 2015
H. S. Teoh
Jan 26, 2015
Ulrich Küttler
Jan 26, 2015
Ary Borenszweig
Jan 26, 2015
zeljkog
Jan 26, 2015
Phil
Jan 26, 2015
H. S. Teoh
Jan 26, 2015
Ulrich Küttler
Pandas example of groupby
Jan 26, 2015
Laeeth Isharc
Jan 26, 2015
H. S. Teoh
Jan 26, 2015
Dicebot
Jan 26, 2015
Ulrich Küttler
Jan 26, 2015
Laeeth Isharc
Jan 26, 2015
Laeeth Isharc
Jan 26, 2015
Russel Winder
January 23, 2015
So H.S. Teoh awesomely took https://github.com/D-Programming-Language/phobos/pull/2878 to completion. We now have a working and fast relational "group by" facility.

See it at work!

----
#!/usr/bin/rdmd

void main()
{
    import std.algorithm, std.stdio;
    [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
        .groupBy!(a => a & 1)
        .writeln;
}
----

[[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]]

The next step is to define an aggregate() function, which is a lot similar to reduce() but works on ranges of ranges and aggregates a function over each group. Continuing the previous example:

    [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
        .groupBy!(a => a & 1)
        .aggregate!max
        .writeln;

should print:

[453, 600, 929, 812, 529, 768]

The aggregate function should support aggregating several functions at once, e.g. aggregate!(min, max) etc.

Takers?


Andrei
January 23, 2015
On Fri, 23 Jan 2015 10:08:30 -0800, Andrei Alexandrescu wrote:

> So H.S. Teoh awesomely took https://github.com/D-Programming-Language/phobos/pull/2878 to completion. We now have a working and fast relational "group by" facility.
> 

This is great news.  It seems like every time I make use of component programming, I need groupBy at least once.  I have a D file with an old copy of a groupBy implementation (I think it's Andrei's original stab at it) and it gets copied around to the various projects.
January 23, 2015
On Fri, Jan 23, 2015 at 10:08:30AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:
> So H.S. Teoh awesomely took https://github.com/D-Programming-Language/phobos/pull/2878 to completion. We now have a working and fast relational "group by" facility.

Unfortunately it doesn't work in pure/@safe/nothrow code because of limitations in the current RefCounted implementation.


[...]
> The next step is to define an aggregate() function, which is a lot
> similar to reduce() but works on ranges of ranges and aggregates a
> function over each group. Continuing the previous example:
> 
>     [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
>         .groupBy!(a => a & 1)
>         .aggregate!max
>         .writeln;
> 
> should print:
> 
> [453, 600, 929, 812, 529, 768]
> 
> The aggregate function should support aggregating several functions at once, e.g. aggregate!(min, max) etc.
> 
> Takers?
[...]

Isn't that just a simple matter of defining aggregate() in terms of
map() and reduce()?  Working example:

	import std.algorithm.comparison : max;
	import std.algorithm.iteration;
	import std.stdio;

	auto aggregate(alias func, RoR)(RoR ror) {
		return ror.map!(reduce!func);
	}

	void main() {
	    [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
	         .groupBy!(a => a & 1)
	         .aggregate!max
	         .writeln;
	}

Output is as expected.


T

-- 
Verbing weirds language. -- Calvin (& Hobbes)
January 23, 2015
On Fri, Jan 23, 2015 at 10:29:13AM -0800, H. S. Teoh via Digitalmars-d wrote:
> On Fri, Jan 23, 2015 at 10:08:30AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:
[...]
> > The next step is to define an aggregate() function, which is a lot
> > similar to reduce() but works on ranges of ranges and aggregates a
> > function over each group. Continuing the previous example:
> > 
> >     [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
> >         .groupBy!(a => a & 1)
> >         .aggregate!max
> >         .writeln;
> > 
> > should print:
> > 
> > [453, 600, 929, 812, 529, 768]
> > 
> > The aggregate function should support aggregating several functions at once, e.g. aggregate!(min, max) etc.
[...]

Here's a working variadic implementation:

	import std.algorithm.comparison : max, min;
	import std.algorithm.iteration;
	import std.stdio;

	template aggregate(funcs...) {
		auto aggregate(RoR)(RoR ror) {
			return ror.map!(reduce!funcs);
		}
	}

	void main() {
	    [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
	         .groupBy!(a => a & 1)
	         .aggregate!(max,min)
	         .writeln;
	}

Output (kinda ugly, but it works):

	[Tuple!(int, int)(453, 293), Tuple!(int, int)(600, 600), Tuple!(int, int)(929, 339), Tuple!(int, int)(812, 222), Tuple!(int, int)(529, 529), Tuple!(int, int)(768, 768)]


Of course, it will require a little more polish before merging into Phobos, but the core implementation is nowhere near the complexity of groupBy.


T

-- 
The best compiler is between your ears. -- Michael Abrash
January 23, 2015
On 1/23/15 10:29 AM, H. S. Teoh via Digitalmars-d wrote:
> On Fri, Jan 23, 2015 at 10:08:30AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:
>> So H.S. Teoh awesomely took
>> https://github.com/D-Programming-Language/phobos/pull/2878 to
>> completion. We now have a working and fast relational "group by"
>> facility.
>
> Unfortunately it doesn't work in pure/@safe/nothrow code because of
> limitations in the current RefCounted implementation.
>
>
> [...]
>> The next step is to define an aggregate() function, which is a lot
>> similar to reduce() but works on ranges of ranges and aggregates a
>> function over each group. Continuing the previous example:
>>
>>      [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
>>          .groupBy!(a => a & 1)
>>          .aggregate!max
>>          .writeln;
>>
>> should print:
>>
>> [453, 600, 929, 812, 529, 768]
>>
>> The aggregate function should support aggregating several functions at
>> once, e.g. aggregate!(min, max) etc.
>>
>> Takers?
> [...]
>
> Isn't that just a simple matter of defining aggregate() in terms of
> map() and reduce()?  Working example:
>
> 	import std.algorithm.comparison : max;
> 	import std.algorithm.iteration;
> 	import std.stdio;
> 	
> 	auto aggregate(alias func, RoR)(RoR ror) {
> 		return ror.map!(reduce!func);
> 	}
> 	
> 	void main() {
> 	    [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
> 	         .groupBy!(a => a & 1)
> 	         .aggregate!max
> 	         .writeln;
> 	}
>
> Output is as expected.

Clever! Or, conversely, I'm not that bright! Yes, this is awesome. Probably the actual name "aggregate" should be defined even with that trivial implementation to help folks like me :o). -- Andrei

January 23, 2015
On 1/23/15 10:34 AM, H. S. Teoh via Digitalmars-d wrote:
> Of course, it will require a little more polish before merging into
> Phobos, but the core implementation is nowhere near the complexity of
> groupBy.

open https://github.com/D-Programming-Language/phobos/pulls

[F5]... [F5]... [F5]...


Andrei

January 23, 2015
On Fri, Jan 23, 2015 at 10:47:28AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:
> On 1/23/15 10:34 AM, H. S. Teoh via Digitalmars-d wrote:
> >Of course, it will require a little more polish before merging into Phobos, but the core implementation is nowhere near the complexity of groupBy.
> 
> open https://github.com/D-Programming-Language/phobos/pulls
> 
> [F5]... [F5]... [F5]...
[...]

void main() {
	foreach (iota(0 .. 60 * 60 * F5sPerSecond))
		writeln("[F5]...");

	writeln(q"ENDMSG

	https://github.com/D-Programming-Language/phobos/pull/2899

	ENDMG);
}

;-)


T

-- 
No! I'm not in denial!
January 23, 2015
On 1/23/15 3:08 PM, Andrei Alexandrescu wrote:
> So H.S. Teoh awesomely took
> https://github.com/D-Programming-Language/phobos/pull/2878 to
> completion. We now have a working and fast relational "group by" facility.
>
> See it at work!
>
> ----
> #!/usr/bin/rdmd
>
> void main()
> {
>      import std.algorithm, std.stdio;
>      [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
>          .groupBy!(a => a & 1)
>          .writeln;
> }
> ----
>
> [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]]
>
> The next step is to define an aggregate() function, which is a lot
> similar to reduce() but works on ranges of ranges and aggregates a
> function over each group. Continuing the previous example:
>
>      [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
>          .groupBy!(a => a & 1)
>          .aggregate!max
>          .writeln;
>
> should print:
>
> [453, 600, 929, 812, 529, 768]
>
> The aggregate function should support aggregating several functions at
> once, e.g. aggregate!(min, max) etc.
>
> Takers?
>
>
> Andrei

In most languages group by yields a tuple of {group key, group values}.

For example (Ruby or Crystal):

a = [1, 4, 2, 4, 5, 2, 3, 7, 9]
groups = a.group_by { |x| x % 3 }
puts groups #=> {1 => [1, 4, 4, 7], 2 => [2, 5, 2], 0 => [3, 9]}

In C# it's also called group by: http://www.dotnetperls.com/groupby

Java: http://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#groupingBy-java.util.function.Function-

SQL: http://www.w3schools.com/sql/sql_groupby.asp

So I'm not sure "groupBy" is a good name for this.
January 23, 2015
On 1/23/15 12:19 PM, Ary Borenszweig wrote:
> In most languages group by yields a tuple of {group key, group values}.

Interesting, thanks. Looks like we're at a net loss of information with our current approach.

@quickfur, do you think you could expose a tuple with "key" and "values"? The former would be the function value, the latter would be what we offer right now.

That would apply only to the unary version of groupBy.


Andrei

January 23, 2015
Ary Borenszweig:

> In most languages group by yields a tuple of {group key, group values}.

I'm saying this since some years... (and those languages probably don't use sorting to perform the aggregation).

Bye,
bearophile
« First   ‹ Prev
1 2 3 4 5