[WORK] groupBy is in! Next: aggregate (page 3)

On 1/23/15 1:36 PM, H. S. Teoh via Digitalmars-d wrote: > On Fri, Jan 23, 2015 at 08:44:05PM +0000, via Digitalmars-d wrote: > [...] >> You are talking about two different functions here. group by and >> partition by. The function that has been implemented is often called >> partition by. > [...] > > It's not too late to rename it, since we haven't released it yet. We > still have a little window of time to make this change if necessary. > Andrei? > > Returning each group as a tuple sounds like a distinct, albeit related, > function. It can probably be added separately. We already have partition() functions that actually partition a range into two subranges, so adding partitionBy with a different meaning may be confusing. -- Andrei

On 1/23/15 1:56 PM, MattCoder wrote: > On Friday, 23 January 2015 at 18:08:30 UTC, Andrei Alexandrescu wrote: >> So H.S. Teoh awesomely took >> https://github.com/D-Programming-Language/phobos/pull/2878 to >> completion. We now have a working and fast relational "group by" >> facility. >> >> See it at work! >> >> ---- >> #!/usr/bin/rdmd >> >> void main() >> { >> import std.algorithm, std.stdio; >> [293, 453, 600, 929, 339, 812, 222, 680, 529, 768] >> .groupBy!(a => a & 1) >> .writeln; >> } >> ---- >> >> [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]] > > Sorry if this a dumb question, but since you're grouping an array > according some rule, this shouldn't be: > > [293, 453, 929, 339, 529][600, 812, 222, 680, 768] > > ? > > Because then you have the array of "trues" and "falses" according the > condition (a & 1). Yah, that would be partition(). -- Andrei

January 24, 2015

proper groupBy

Posted by Laeeth Isharc
in reply to Andrei Alexandrescu

Permalink

Laeeth Isharc

Posted in reply to Andrei Alexandrescu

Permalink

On Friday, 23 January 2015 at 20:28:32 UTC, Andrei Alexandrescu wrote:
> On 1/23/15 12:19 PM, Ary Borenszweig wrote:
>> In most languages group by yields a tuple of {group key, group values}.
>
> Interesting, thanks. Looks like we're at a net loss of information with our current approach.
>
> @quickfur, do you think you could expose a tuple with "key" and "values"? The former would be the function value, the latter would be what we offer right now.
>
> That would apply only to the unary version of groupBy.
>
>
> Andrei

groupby hack below ?  I haven't yet read the source code and don't feel I understand ranges deeply enough to know if this will work in the general case.  But it at least works for the example (I think).


Laeeth.

#!/usr/bin/rdmd


void main()
{
    import std.algorithm, std.stdio, std.range;
    auto index=[293, 453, 600, 929, 339, 812, 222, 680, 529, 768];
    auto vals=[	1,		2,	3,	4,	5,		6,	7,	8,	9,		10];
    auto zippy=zip(index,vals);

    zippy.groupBy!(a=> a[0] & 1)
        .writeln;
}

[root@fedorabox test]# ./groupby
[[Tuple!(int, int)(293, 1), Tuple!(int, int)(453, 2)], [Tuple!(int, int)(600, 3)], [Tuple!(int, int)(929, 4), Tuple!(int, int)(339, 5)], [Tuple!(int, int)(812, 6), Tuple!(int, int)(222, 7), Tuple!(int, int)(680, 8)], [Tuple!(int, int)(529, 9)], [Tuple!(int, int)(768, 10)]]

On 1/23/15 8:54 PM, Andrei Alexandrescu wrote: > On 1/23/15 1:36 PM, H. S. Teoh via Digitalmars-d wrote: >> On Fri, Jan 23, 2015 at 08:44:05PM +0000, via Digitalmars-d wrote: >> [...] >>> You are talking about two different functions here. group by and >>> partition by. The function that has been implemented is often called >>> partition by. >> [...] >> >> It's not too late to rename it, since we haven't released it yet. We >> still have a little window of time to make this change if necessary. >> Andrei? >> >> Returning each group as a tuple sounds like a distinct, albeit related, >> function. It can probably be added separately. > > We already have partition() functions that actually partition a range > into two subranges, so adding partitionBy with a different meaning may > be confusing. -- Andrei Another name might be chunkBy: it returns chunks that are grouped by some logic.

On 1/23/15 7:30 PM, bearophile wrote: > H. S. Teoh: > >> What you describe could be an interesting candidate to add, though. It >> could iterate over distinct values of the predicate, and traverse the >> forward range (input ranges obviously can't work unless you allocate, >> which makes it no longer lazy) each time. This, however, has O(n*k) >> complexity where k is the number of distinct predicate values. > > Let's allocate, creating an associative array inside the grouping > function :-) > > Bye, > bearophile All languages I know do this for `group by` (because of the complexity involved), and I think it's ok to do so.

On Sun, Jan 25, 2015 at 01:39:59AM -0300, Ary Borenszweig via Digitalmars-d wrote: > On 1/23/15 8:54 PM, Andrei Alexandrescu wrote: > >On 1/23/15 1:36 PM, H. S. Teoh via Digitalmars-d wrote: > >>On Fri, Jan 23, 2015 at 08:44:05PM +0000, via Digitalmars-d wrote: [...] > >>>You are talking about two different functions here. group by and partition by. The function that has been implemented is often called partition by. > >>[...] > >> > >>It's not too late to rename it, since we haven't released it yet. We still have a little window of time to make this change if necessary. Andrei? > >> > >>Returning each group as a tuple sounds like a distinct, albeit related, function. It can probably be added separately. > > > >We already have partition() functions that actually partition a range into two subranges, so adding partitionBy with a different meaning may be confusing. -- Andrei > > Another name might be chunkBy: it returns chunks that are grouped by some logic. Incidentally, that was the original name I implemented it under. T -- What do you get if you drop a piano down a mineshaft? A flat minor.

> We already have partition() functions that actually partition a range into two subranges, so adding partitionBy with a different meaning may be confusing. -- Andrei In ruby, the closest to D's currently-named groupBy method is a set of three methods: slice_before slice_after slice_when http://ruby-doc.org/core-2.2.0/Enumerable.html#method-i-slice_when Your example in ruby would be: 2.2.0 > [293, 453, 600, 929, 339, 812, 222, 680, 529, 768].slice_when { |x,y| x & 1 != y & 1 }.to_a => [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]] O.

On Fri, 2015-01-23 at 10:08 -0800, Andrei Alexandrescu via Digitalmars-d wrote: […] > #!/usr/bin/rdmd > > void main() > { > import std.algorithm, std.stdio; > [293, 453, 600, 929, 339, 812, 222, 680, 529, 768] > .groupBy!(a => a & 1) > .writeln; > } > ---- > > [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]] > […] I think I must be missing something, for me the result of a groupBy operation on the above input data should be: [1:[293, 453, 929, 339, 529], 0:[600, 812, 222, 680, 768]] i.e. a map with keys being the cases and values being the values that meet the case. In this example a & 1 asks for cases "lowest bit 0 or 1" aka "odd or even". There is nothing wrong with the semantics of the result above, but is it's name "group by" as understood by the rest of the world? -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote: > Russel Winder: > > >but is it's name "group by" as understood by the rest of the world? > > Nope... [...] I proposed to rename it but it got shot down. *shrug* We still have a short window of time to sort this out, before 2.067 is released... T -- Don't drink and derive. Alcohol and algebra don't mix.

Forums