January 23, 2015
On 1/23/15 1:36 PM, H. S. Teoh via Digitalmars-d wrote:
> On Fri, Jan 23, 2015 at 08:44:05PM +0000, via Digitalmars-d wrote:
> [...]
>> You are talking about two different functions here. group by and
>> partition by. The function that has been implemented is often called
>> partition by.
> [...]
>
> It's not too late to rename it, since we haven't released it yet. We
> still have a little window of time to make this change if necessary.
> Andrei?
>
> Returning each group as a tuple sounds like a distinct, albeit related,
> function. It can probably be added separately.

We already have partition() functions that actually partition a range into two subranges, so adding partitionBy with a different meaning may be confusing. -- Andrei
January 23, 2015
On 1/23/15 1:56 PM, MattCoder wrote:
> On Friday, 23 January 2015 at 18:08:30 UTC, Andrei Alexandrescu wrote:
>> So H.S. Teoh awesomely took
>> https://github.com/D-Programming-Language/phobos/pull/2878 to
>> completion. We now have a working and fast relational "group by"
>> facility.
>>
>> See it at work!
>>
>> ----
>> #!/usr/bin/rdmd
>>
>> void main()
>> {
>>     import std.algorithm, std.stdio;
>>     [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
>>         .groupBy!(a => a & 1)
>>         .writeln;
>> }
>> ----
>>
>> [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]]
>
> Sorry if this a dumb question, but since you're grouping an array
> according some rule, this shouldn't be:
>
> [293, 453, 929, 339, 529][600, 812, 222, 680, 768]
>
> ?
>
> Because then you have the array of "trues" and "falses" according the
> condition (a & 1).

Yah, that would be partition(). -- Andrei

January 24, 2015
On Friday, 23 January 2015 at 20:28:32 UTC, Andrei Alexandrescu wrote:
> On 1/23/15 12:19 PM, Ary Borenszweig wrote:
>> In most languages group by yields a tuple of {group key, group values}.
>
> Interesting, thanks. Looks like we're at a net loss of information with our current approach.
>
> @quickfur, do you think you could expose a tuple with "key" and "values"? The former would be the function value, the latter would be what we offer right now.
>
> That would apply only to the unary version of groupBy.
>
>
> Andrei

groupby hack below ?  I haven't yet read the source code and don't feel I understand ranges deeply enough to know if this will work in the general case.  But it at least works for the example (I think).


Laeeth.

#!/usr/bin/rdmd


void main()
{
    import std.algorithm, std.stdio, std.range;
    auto index=[293, 453, 600, 929, 339, 812, 222, 680, 529, 768];
    auto vals=[	1,		2,	3,	4,	5,		6,	7,	8,	9,		10];
    auto zippy=zip(index,vals);

    zippy.groupBy!(a=> a[0] & 1)
        .writeln;
}

[root@fedorabox test]# ./groupby
[[Tuple!(int, int)(293, 1), Tuple!(int, int)(453, 2)], [Tuple!(int, int)(600, 3)], [Tuple!(int, int)(929, 4), Tuple!(int, int)(339, 5)], [Tuple!(int, int)(812, 6), Tuple!(int, int)(222, 7), Tuple!(int, int)(680, 8)], [Tuple!(int, int)(529, 9)], [Tuple!(int, int)(768, 10)]]
January 25, 2015
On 1/23/15 8:54 PM, Andrei Alexandrescu wrote:
> On 1/23/15 1:36 PM, H. S. Teoh via Digitalmars-d wrote:
>> On Fri, Jan 23, 2015 at 08:44:05PM +0000, via Digitalmars-d wrote:
>> [...]
>>> You are talking about two different functions here. group by and
>>> partition by. The function that has been implemented is often called
>>> partition by.
>> [...]
>>
>> It's not too late to rename it, since we haven't released it yet. We
>> still have a little window of time to make this change if necessary.
>> Andrei?
>>
>> Returning each group as a tuple sounds like a distinct, albeit related,
>> function. It can probably be added separately.
>
> We already have partition() functions that actually partition a range
> into two subranges, so adding partitionBy with a different meaning may
> be confusing. -- Andrei

Another name might be chunkBy: it returns chunks that are grouped by some logic.
January 25, 2015
On 1/23/15 7:30 PM, bearophile wrote:
> H. S. Teoh:
>
>> What you describe could be an interesting candidate to add, though. It
>> could iterate over distinct values of the predicate, and traverse the
>> forward range (input ranges obviously can't work unless you allocate,
>> which makes it no longer lazy) each time. This, however, has O(n*k)
>> complexity where k is the number of distinct predicate values.
>
> Let's allocate, creating an associative array inside the grouping
> function :-)
>
> Bye,
> bearophile

All languages I know do this for `group by` (because of the complexity involved), and I think it's ok to do so.
January 25, 2015
On Sun, Jan 25, 2015 at 01:39:59AM -0300, Ary Borenszweig via Digitalmars-d wrote:
> On 1/23/15 8:54 PM, Andrei Alexandrescu wrote:
> >On 1/23/15 1:36 PM, H. S. Teoh via Digitalmars-d wrote:
> >>On Fri, Jan 23, 2015 at 08:44:05PM +0000, via Digitalmars-d wrote: [...]
> >>>You are talking about two different functions here. group by and partition by. The function that has been implemented is often called partition by.
> >>[...]
> >>
> >>It's not too late to rename it, since we haven't released it yet. We still have a little window of time to make this change if necessary. Andrei?
> >>
> >>Returning each group as a tuple sounds like a distinct, albeit related, function. It can probably be added separately.
> >
> >We already have partition() functions that actually partition a range into two subranges, so adding partitionBy with a different meaning may be confusing. -- Andrei
> 
> Another name might be chunkBy: it returns chunks that are grouped by some logic.

Incidentally, that was the original name I implemented it under.


T

-- 
What do you get if you drop a piano down a mineshaft? A flat minor.
January 25, 2015
> We already have partition() functions that actually partition a range into two subranges, so adding partitionBy with a different meaning may be confusing. -- Andrei

In ruby, the closest to D's currently-named groupBy method is a set of three methods:

slice_before
slice_after
slice_when

http://ruby-doc.org/core-2.2.0/Enumerable.html#method-i-slice_when

Your example in ruby would be:

2.2.0 > [293, 453, 600, 929, 339, 812, 222, 680, 529, 768].slice_when { |x,y| x & 1 != y & 1 }.to_a
=> [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]]

O.
January 26, 2015
On Fri, 2015-01-23 at 10:08 -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
[…]
> #!/usr/bin/rdmd
> 
> void main()
> {
>      import std.algorithm, std.stdio;
>      [293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
>          .groupBy!(a => a & 1)
>          .writeln;
> }
> ----
> 
> [[293, 453], [600], [929, 339], [812, 222, 680], [529], [768]]
> 
[…]

I think I must be missing something, for me the result of a groupBy operation on the above input data should be:

[1:[293, 453, 929, 339, 529], 0:[600, 812, 222, 680, 768]]

i.e. a map with keys being the cases and values being the values that meet the case. In this example a & 1 asks for cases "lowest bit 0 or 1" aka "odd or even".

There is nothing wrong with the semantics of the result above, but is it's name "group by" as understood by the rest of the world?


-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

January 26, 2015
Russel Winder:

> but is it's name "group by" as understood by the rest of the world?

Nope...

Bye,
bearophile

January 26, 2015
On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
> Russel Winder:
> 
> >but is it's name "group by" as understood by the rest of the world?
> 
> Nope...
[...]

I proposed to rename it but it got shot down. *shrug*

We still have a short window of time to sort this out, before 2.067 is released...


T

-- 
Don't drink and derive. Alcohol and algebra don't mix.