January 26, 2015
On Mon, 2015-01-26 at 08:11 -0800, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
> > Russel Winder:
> > 
> > >but is it's name "group by" as understood by the rest of the world?
> > 
> > Nope...
> [...]
> 
> I proposed to rename it but it got shot down. *shrug*

What name do you think works for this operation?

> We still have a short window of time to sort this out, before 2.067 is released...

To be honest, given the confirmation that the semantics of this operation and that of the one other languages call groupBy are different, then 2.067 should not go out with this operation called groupBy.
> 

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


January 26, 2015
On Monday, 26 January 2015 at 16:13:40 UTC, H. S. Teoh wrote:
> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
>> Russel Winder:
>> 
>> >but is it's name "group by" as understood by the rest of the world?
>> 
>> Nope...
> [...]
>
> I proposed to rename it but it got shot down. *shrug*

Andrei had a point about `partition` being used already. I liked Oliver's suggestion to go with slice-something. `sliceBy` might be worth considering. It even hints at the (efficient) implementation.
January 26, 2015
On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
>> Russel Winder:
>>
>>> but is it's name "group by" as understood by the rest of the world?
>>
>> Nope...
> [...]
>
> I proposed to rename it but it got shot down. *shrug*
>
> We still have a short window of time to sort this out, before 2.067 is
> released...

My suggestion was to keep the name but change the code of your groupBy implementation to return tuple(key, lazyValues) instead of just lazyValues. That needs to happen only for binary predicates; unary predicates will all have alternating true/false keys.

Seems that would please everyone.


Andrei

January 26, 2015
On Monday, 26 January 2015 at 16:44:20 UTC, Ulrich Küttler wrote:
> Andrei had a point about `partition` being used already. I liked Oliver's suggestion to go with slice-something. `sliceBy` might be worth considering. It even hints at the (efficient) implementation.

Does it return slices?

If not, pick a different verb, e.g. "split".

January 26, 2015
On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
> On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>> wrote:
>>> Russel Winder:
>>>
>>>> but is it's name "group by" as understood by the rest of the world?
>>>
>>> Nope...
>> [...]
>>
>> I proposed to rename it but it got shot down. *shrug*
>>
>> We still have a short window of time to sort this out, before 2.067 is
>> released...
>
> My suggestion was to keep the name but change the code of your groupBy
> implementation to return tuple(key, lazyValues) instead of just
> lazyValues. That needs to happen only for binary predicates; unary
> predicates will all have alternating true/false keys.
>
> Seems that would please everyone.
>
>
> Andrei
>

That's much more harder to implement than what it does right now. I don't know how you'll manage to do the lazyValues thing: you'd need to make multiple passes in the range.

Again, other languages return an associative array in this case.
January 26, 2015
On 1/26/15 9:50 AM, Ary Borenszweig wrote:
> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>> On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>>> wrote:
>>>> Russel Winder:
>>>>
>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>
>>>> Nope...
>>> [...]
>>>
>>> I proposed to rename it but it got shot down. *shrug*
>>>
>>> We still have a short window of time to sort this out, before 2.067 is
>>> released...
>>
>> My suggestion was to keep the name but change the code of your groupBy
>> implementation to return tuple(key, lazyValues) instead of just
>> lazyValues. That needs to happen only for binary predicates; unary
>> predicates will all have alternating true/false keys.
>>
>> Seems that would please everyone.
>>
>>
>> Andrei
>>
>
> That's much more harder to implement than what it does right now. I
> don't know how you'll manage to do the lazyValues thing: you'd need to
> make multiple passes in the range.

The implementation right now is quite interesting but not complicated, and achieves lazy grouping in a single pass.

> Again, other languages return an associative array in this case.

I think our approach is superior.


Andrei
January 26, 2015
On Mon, Jan 26, 2015 at 02:50:16PM -0300, Ary Borenszweig via Digitalmars-d wrote:
> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
> >>On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
> >>>Russel Winder:
> >>>
> >>>>but is it's name "group by" as understood by the rest of the world?
> >>>
> >>>Nope...
[...]
> >My suggestion was to keep the name but change the code of your groupBy implementation to return tuple(key, lazyValues) instead of just lazyValues. That needs to happen only for binary predicates; unary predicates will all have alternating true/false keys.

Huh, what? I think there's some misunderstanding here. The unary version of the current groupBy translates to a binary predicate:

	groupBy!(a => a.field)

is equivalent to:

	groupBy!((a, b) => a.field == b.field)

I don't see how this has anything to do with alternating keys.


[...]
> That's much more harder to implement than what it does right now. I don't know how you'll manage to do the lazyValues thing: you'd need to make multiple passes in the range.
> 
> Again, other languages return an associative array in this case.

I think we're talking past each other here. What groupBy currently does is to group elements by evaluating the predicate on *consecutive runs* of elements. What some people seem to demand is a function that groups elements by *global evaluation* of the predicate over all elements. These two are similar but divergent functions, and conflating them is not helping this discussion in any way.

If "group by" in other languages refers to the latter function, then that means "groupBy" is poorly-named and we need to come up with a better name for it. Changing it to return tuples and what-not seems to be beating around the bush to me.


T

-- 
Computers are like a jungle: they have monitor lizards, rams, mice, c-moss, binary trees... and bugs.
January 26, 2015
On 26.01.15 19:05, Andrei Alexandrescu wrote:
> On 1/26/15 9:50 AM, Ary Borenszweig wrote:
>> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>>> On 1/26/15 8:11 AM, H. S. Teoh via Digitalmars-d wrote:
>>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d wrote:
>>>>> Russel Winder:
>>>>>
>>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>>
>>>>> Nope...
>>>> [...]
>>>>
>>>> I proposed to rename it but it got shot down. *shrug*
>>>>
>>>> We still have a short window of time to sort this out, before 2.067 is released...
>>>
>>> My suggestion was to keep the name but change the code of your groupBy implementation to return tuple(key, lazyValues) instead of just lazyValues. That needs to happen only for binary predicates; unary predicates will all have alternating true/false keys.
>>>
>>> Seems that would please everyone.
>>>
>>>
>>> Andrei
>>>
>>
>> That's much more harder to implement than what it does right now. I don't know how you'll manage to do the lazyValues thing: you'd need to make multiple passes in the range.
> 
> The implementation right now is quite interesting but not complicated, and achieves lazy grouping in a single pass.
> 
>> Again, other languages return an associative array in this case.
> 
> I think our approach is superior.
> 
> 
> Andrei

I think std.experimental.algorithm.groupBy is one option. To postpone thing.

January 26, 2015
I also found the behaviour confusing given the name. I like ChunkBy.
January 26, 2015
On 1/26/15 10:11 AM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 02:50:16PM -0300, Ary Borenszweig via Digitalmars-d wrote:
>> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>>>> wrote:
>>>>> Russel Winder:
>>>>>
>>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>>
>>>>> Nope...
> [...]
>>> My suggestion was to keep the name but change the code of your
>>> groupBy implementation to return tuple(key, lazyValues) instead of
>>> just lazyValues. That needs to happen only for binary predicates;
>>> unary predicates will all have alternating true/false keys.
>
> Huh, what? I think there's some misunderstanding here. The unary version
> of the current groupBy translates to a binary predicate:
>
> 	groupBy!(a => a.field)
>
> is equivalent to:
>
> 	groupBy!((a, b) => a.field == b.field)
>
> I don't see how this has anything to do with alternating keys.

Here's how. Basically the binary-predicate version has only Boolean keys that may be false or true. They will alternate because it's the change that triggers creation of a new group. In this example:

[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
    .groupBy!((a, b) => (a & 3) == (b & 3))

the groupBy function has no information about the result of a & 3. All it "sees" is the result of the predicate: true, false, true, false...

HOWEVER, if you write it like this:

[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
    .groupBy!(a => (a & 3))

then groupBy sees the actual value of the function and can emit the proper key.

So the key (ahem) here is to make groupBy with unary predicate different from groupBy with binary predicate. The former returns the tuple, the latter is unchanged. Makes sense?

> [...]
>> That's much more harder to implement than what it does right now. I
>> don't know how you'll manage to do the lazyValues thing: you'd need to
>> make multiple passes in the range.
>>
>> Again, other languages return an associative array in this case.
>
> I think we're talking past each other here. What groupBy currently does
> is to group elements by evaluating the predicate on *consecutive runs*
> of elements. What some people seem to demand is a function that groups
> elements by *global evaluation* of the predicate over all elements.
> These two are similar but divergent functions, and conflating them is
> not helping this discussion in any way.

Agreed.

> If "group by" in other languages refers to the latter function, then
> that means "groupBy" is poorly-named and we need to come up with a
> better name for it. Changing it to return tuples and what-not seems to
> be beating around the bush to me.

I like our notion of groupBy the same way I like the notion that something must be a random-access range in order to be sorted. (Other languages give the illusion they sort streams by internally converting them to arrays.) D offers better control, better flexibility, and richer semantics.


Andrei