[WORK] groupBy is in! Next: aggregate (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » [WORK] groupBy is in! Next: aggregate (page 4)

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Russel Winder

Russel Winder

Attachments:

signature.asc (This is a digitally signed message part)

On Mon, 2015-01-26 at 08:11 -0800, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
> > Russel Winder:
> > 
> > >but is it's name "group by" as understood by the rest of the world?
> > 
> > Nope...
> [...]
> 
> I proposed to rename it but it got shot down. *shrug*

What name do you think works for this operation?

> We still have a short window of time to sort this out, before 2.067 is released...

To be honest, given the confirmation that the semantics of this operation and that of the one other languages call groupBy are different, then 2.067 should not go out with this operation called groupBy.
> 

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Ulrich Küttler
in reply to H. S. Teoh

Ulrich Küttler

Posted in reply to H. S. Teoh

On Monday, 26 January 2015 at 16:13:40 UTC, H. S. Teoh wrote:
> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
>> Russel Winder:
>> 
>> >but is it's name "group by" as understood by the rest of the world?
>> 
>> Nope...
> [...]
>
> I proposed to rename it but it got shot down. *shrug*

Andrei had a point about `partition` being used already. I liked Oliver's suggestion to go with slice-something. `sliceBy` might be worth considering. It even hints at the (efficient) implementation.

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Andrei Alexandrescu
in reply to H. S. Teoh

Andrei Alexandrescu

Posted in reply to H. S. Teoh

On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
>> Russel Winder:
>>
>>> but is it's name "group by" as understood by the rest of the world?
>>
>> Nope...
> [...]
>
> I proposed to rename it but it got shot down. *shrug*
>
> We still have a short window of time to sort this out, before 2.067 is
> released...

My suggestion was to keep the name but change the code of your groupBy implementation to return tuple(key, lazyValues) instead of just lazyValues. That needs to happen only for binary predicates; unary predicates will all have alternating true/false keys.

Seems that would please everyone.

Andrei

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Ola Fosheim Grøstad
in reply to Ulrich Küttler

Ola Fosheim Grøstad

Posted in reply to Ulrich Küttler

On Monday, 26 January 2015 at 16:44:20 UTC, Ulrich Küttler wrote:
> Andrei had a point about `partition` being used already. I liked Oliver's suggestion to go with slice-something. `sliceBy` might be worth considering. It even hints at the (efficient) implementation.

Does it return slices?

If not, pick a different verb, e.g. "split".

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Ary Borenszweig
in reply to Andrei Alexandrescu

Ary Borenszweig

Posted in reply to Andrei Alexandrescu

On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
> On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>> wrote:
>>> Russel Winder:
>>>
>>>> but is it's name "group by" as understood by the rest of the world?
>>>
>>> Nope...
>> [...]
>>
>> I proposed to rename it but it got shot down. *shrug*
>>
>> We still have a short window of time to sort this out, before 2.067 is
>> released...
>
> My suggestion was to keep the name but change the code of your groupBy
> implementation to return tuple(key, lazyValues) instead of just
> lazyValues. That needs to happen only for binary predicates; unary
> predicates will all have alternating true/false keys.
>
> Seems that would please everyone.
>
>
> Andrei
>

That's much more harder to implement than what it does right now. I don't know how you'll manage to do the lazyValues thing: you'd need to make multiple passes in the range.

Again, other languages return an associative array in this case.

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Andrei Alexandrescu
in reply to Ary Borenszweig

Andrei Alexandrescu

Posted in reply to Ary Borenszweig

On 1/26/15 9:50 AM, Ary Borenszweig wrote:
> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>> On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>>> wrote:
>>>> Russel Winder:
>>>>
>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>
>>>> Nope...
>>> [...]
>>>
>>> I proposed to rename it but it got shot down. *shrug*
>>>
>>> We still have a short window of time to sort this out, before 2.067 is
>>> released...
>>
>> My suggestion was to keep the name but change the code of your groupBy
>> implementation to return tuple(key, lazyValues) instead of just
>> lazyValues. That needs to happen only for binary predicates; unary
>> predicates will all have alternating true/false keys.
>>
>> Seems that would please everyone.
>>
>>
>> Andrei
>>
>
> That's much more harder to implement than what it does right now. I
> don't know how you'll manage to do the lazyValues thing: you'd need to
> make multiple passes in the range.

The implementation right now is quite interesting but not complicated, and achieves lazy grouping in a single pass.

> Again, other languages return an associative array in this case.

I think our approach is superior.


Andrei

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by H. S. Teoh
in reply to Ary Borenszweig

H. S. Teoh

Posted in reply to Ary Borenszweig

On Mon, Jan 26, 2015 at 02:50:16PM -0300, Ary Borenszweig via Digitalmars-d wrote:
> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
> >>On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
> >>>Russel Winder:
> >>>
> >>>>but is it's name "group by" as understood by the rest of the world?
> >>>
> >>>Nope...
[...]
> >My suggestion was to keep the name but change the code of your groupBy implementation to return tuple(key, lazyValues) instead of just lazyValues. That needs to happen only for binary predicates; unary predicates will all have alternating true/false keys.

Huh, what? I think there's some misunderstanding here. The unary version of the current groupBy translates to a binary predicate:

	groupBy!(a => a.field)

is equivalent to:

	groupBy!((a, b) => a.field == b.field)

I don't see how this has anything to do with alternating keys.

[...]
> That's much more harder to implement than what it does right now. I don't know how you'll manage to do the lazyValues thing: you'd need to make multiple passes in the range.
> 
> Again, other languages return an associative array in this case.

I think we're talking past each other here. What groupBy currently does is to group elements by evaluating the predicate on *consecutive runs* of elements. What some people seem to demand is a function that groups elements by *global evaluation* of the predicate over all elements. These two are similar but divergent functions, and conflating them is not helping this discussion in any way.

If "group by" in other languages refers to the latter function, then that means "groupBy" is poorly-named and we need to come up with a better name for it. Changing it to return tuples and what-not seems to be beating around the bush to me.

T

-- 
Computers are like a jungle: they have monitor lizards, rams, mice, c-moss, binary trees... and bugs.

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by zeljkog
in reply to Andrei Alexandrescu

zeljkog

Posted in reply to Andrei Alexandrescu

On 26.01.15 19:05, Andrei Alexandrescu wrote:
> On 1/26/15 9:50 AM, Ary Borenszweig wrote:
>> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>>> On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
>>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
>>>>> Russel Winder:
>>>>>
>>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>>
>>>>> Nope...
>>>> [...]
>>>>
>>>> I proposed to rename it but it got shot down. *shrug*
>>>>
>>>> We still have a short window of time to sort this out, before 2.067 is released...
>>>
>>> My suggestion was to keep the name but change the code of your groupBy implementation to return tuple(key, lazyValues) instead of just lazyValues. That needs to happen only for binary predicates; unary predicates will all have alternating true/false keys.
>>>
>>> Seems that would please everyone.
>>>
>>>
>>> Andrei
>>>
>>
>> That's much more harder to implement than what it does right now. I don't know how you'll manage to do the lazyValues thing: you'd need to make multiple passes in the range.
> 
> The implementation right now is quite interesting but not complicated, and achieves lazy grouping in a single pass.
> 
>> Again, other languages return an associative array in this case.
> 
> I think our approach is superior.
> 
> 
> Andrei

I think std.experimental.algorithm.groupBy is one option. To postpone thing.

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Phil
in reply to zeljkog

Phil

Posted in reply to zeljkog

I also found the behaviour confusing given the name. I like ChunkBy.

January 26, 2015

Re: [WORK] groupBy is in! Next: aggregate

Posted by Andrei Alexandrescu
in reply to H. S. Teoh

Andrei Alexandrescu

Posted in reply to H. S. Teoh

On 1/26/15 10:11 AM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 02:50:16PM -0300, Ary Borenszweig via Digitalmars-d wrote:
>> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>>>> wrote:
>>>>> Russel Winder:
>>>>>
>>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>>
>>>>> Nope...
> [...]
>>> My suggestion was to keep the name but change the code of your
>>> groupBy implementation to return tuple(key, lazyValues) instead of
>>> just lazyValues. That needs to happen only for binary predicates;
>>> unary predicates will all have alternating true/false keys.
>
> Huh, what? I think there's some misunderstanding here. The unary version
> of the current groupBy translates to a binary predicate:
>
> 	groupBy!(a => a.field)
>
> is equivalent to:
>
> 	groupBy!((a, b) => a.field == b.field)
>
> I don't see how this has anything to do with alternating keys.

Here's how. Basically the binary-predicate version has only Boolean keys that may be false or true. They will alternate because it's the change that triggers creation of a new group. In this example:

[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
    .groupBy!((a, b) => (a & 3) == (b & 3))

the groupBy function has no information about the result of a & 3. All it "sees" is the result of the predicate: true, false, true, false...

HOWEVER, if you write it like this:

[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
    .groupBy!(a => (a & 3))

then groupBy sees the actual value of the function and can emit the proper key.

So the key (ahem) here is to make groupBy with unary predicate different from groupBy with binary predicate. The former returns the tuple, the latter is unchanged. Makes sense?

> [...]
>> That's much more harder to implement than what it does right now. I
>> don't know how you'll manage to do the lazyValues thing: you'd need to
>> make multiple passes in the range.
>>
>> Again, other languages return an associative array in this case.
>
> I think we're talking past each other here. What groupBy currently does
> is to group elements by evaluating the predicate on *consecutive runs*
> of elements. What some people seem to demand is a function that groups
> elements by *global evaluation* of the predicate over all elements.
> These two are similar but divergent functions, and conflating them is
> not helping this discussion in any way.

Agreed.

> If "group by" in other languages refers to the latter function, then
> that means "groupBy" is poorly-named and we need to come up with a
> better name for it. Changing it to return tuples and what-not seems to
> be beating around the bush to me.

I like our notion of groupBy the same way I like the notion that something must be a random-access range in order to be sorted. (Other languages give the illusion they sort streams by internally converting them to arrays.) D offers better control, better flexibility, and richer semantics.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation