January 09, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #10 from Andrei Alexandrescu <andrei@erdani.com> ---
I'd say we reached the point where some code would help a lot more than additional discourse. A few thoughts:

1. I think we shouldn't care about non-equivalence relations. Most tasks people do are relational-inspired operations using equivalence relations. All APIs I know of only support equivalence relations. It would take a lot of evidence to default the other way. For the time being I am explicitly opposed to supporting non-relational predicates if that complicates anything in any way.

2. Yes I am thinking reference counting would be do the trick. One allocation and deallocation per big range use should be amortized nicely.

3. I am somewhat immune to arguments based on "it's complicated" or "it runs into compiler bugs". Sorry. It seems to me this is exactly the kinds of stuff we need to explore more.

4. Random-access ranges should be a lot easier to handle, so I guess a specialization would be in order.

5. I need to think about .save on the outer range, thanks.

Overall I think we shouldn't continue to find reasons for which it can't be done, but instead explore and see how it can be done. As of now I consider groupBy in it current form unacceptable and a blocker for the next release.

--
January 09, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #11 from bearophile_hugs@eml.cc ---
(In reply to Andrei Alexandrescu from comment #10)

> 1. I think we shouldn't care about non-equivalence relations. Most tasks people do are relational-inspired operations using equivalence relations. All APIs I know of only support equivalence relations. It would take a lot of evidence to default the other way. For the time being I am explicitly opposed to supporting non-relational predicates if that complicates anything in any way.

This was discussed in the pull request. I suggest you to take a look at those discussions. You can't ignore those discussions.

--
January 09, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #12 from hsteoh@quickfur.ath.cx ---
I don't see what's the big problem with supporting non-equivalence relations, all it means is that you have to evaluate the predicate only between adjacent elements rather than between an element and the head of its subrange (which was what was done in the initial implementation). But whatever, I don't actually have any current code that relies on groupBy supporting non-equivalence relations, so I don't really care. I just don't understand why this has suddenly become such a major issue.

By "it's complicated" I do not mean "it's complicated to write code for" -- that's a non-issue. What I mean is "it introduces complicated semantics, which inevitably results in obscure corner cases that have unexpected/pathological behaviour". This includes effects of similar calibre as transient ranges (they technically conform 100% to the range API but have unusual behaviour that cannot be easily dealt with). One such effect is the semantics of .save.

The compiler bugs bit, I admit, is a bit a of red herring, though. :-P

Don't forget that random-access ranges are still forward ranges, so you still have to deal with the semantics of .save.

And I wasn't trying to find reasons it can't be done; I was just pointing out obvious flaws in your proposed solution that would introduce more problems than it solves.

--
January 09, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #13 from Andrei Alexandrescu <andrei@erdani.com> ---
It goes without saying I read through the comments. Your examples are nice but are unlikely to yield a bulk of the use cases.

For non-equivalence relations, I think groupByAdjacent is a fine notion and name (especially since we have findAdjacent already). Would that make sense?

--
January 09, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #14 from Andrei Alexandrescu <andrei@erdani.com> ---
(In reply to hsteoh from comment #12)
> By "it's complicated" I do not mean "it's complicated to write code for" -- that's a non-issue. What I mean is "it introduces complicated semantics, which inevitably results in obscure corner cases that have unexpected/pathological behaviour". This includes effects of similar calibre as transient ranges (they technically conform 100% to the range API but have unusual behaviour that cannot be easily dealt with). One such effect is the semantics of .save.

Again: let's explore and if the semantic complications are too big we're not nailed to the floor. I'll come up with code during the weekend.

--
January 09, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #15 from hsteoh@quickfur.ath.cx ---
I'm OK with breaking up the current somewhat messy code to make a separate groupByAdjacent for the non-equivalence relation implementation. In fact, I had actually implemented that, but reverted it because people objected. *shrug*

--
January 09, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

Dicebot <public@dicebot.lv> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |public@dicebot.lv

--- Comment #16 from Dicebot <public@dicebot.lv> ---
I will need some time to read (and understand :)) all comments here and being on vacation does not really help. Can any action wait one more week please? ^_^

--
January 10, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #17 from Andrei Alexandrescu <andrei@erdani.com> ---
I recreated GroupBy as I envision it from scratch in http://dpaste.dzfl.pl/93a13ee08cc1. So:

* Each group has a group counter that "stamps" it to identify it from the other groups.

* Within a group iteration is fast: there's no additional indirection and one iteration evaluates one empty, one predicate, and one popBack.

* At the end of each group, the next group for the mothership is primed. That way the repeated O(groupLength) iteration is saved.

* Each mothership allocates memory for one payload, which will be then freed when the mothership and all of its groups go away.

* Copying one mothership will work but groups originating from one given mothership only help that particular mothership.

The code is only a proof of concept - it doesn't account for input vs. forward ranges, has scant testing etc.

Does my vision make sense?

--
January 11, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #18 from Dicebot <public@dicebot.lv> ---
Some quick notes from a very quick look:

> I think we shouldn't care about non-equivalence relations

This is something I simply can't accept. groupBy is not a specialized linear algebra thing - it is a quite common utility that will be widely used and widely misused.

In fact the reason why non-equivalence implementation has become default in second PR was exactly that some of first attempts of bearophile to experiment with predicate quickly triggered the infinite empty range behavior. I don't thing it is an acceptable for default implementation at all.

Some of proposed implementation ideas make lot of sense but I feel very strong about two points:

1) non-equivalence relation support must be default groupBy behavior
2) more efficient versions should be enabled by CT argument as opposed to new
function name - it is very hard to invent short names that make sense here (it
can still forward to separate private implementation)

--
January 11, 2015
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #19 from Dicebot <public@dicebot.lv> ---
See also original bug report https://issues.dlang.org/show_bug.cgi?id=13595

--