[Issue 13595] A possible problem with std.algorithm.groupBy

[Issue 13595] Extend std.algorithm.groupBy to support non-equivalence relations

Oct 27, 2014

hsteoh@quickfur.ath.cx

Nov 04, 2014

hsteoh@quickfur.ath.cx

Nov 04, 2014

hsteoh@quickfur.ath.cx

Nov 04, 2014

bearophile_hugs@eml.cc

Dec 01, 2014

github-bugzilla@puremagic.com

Dec 01, 2014

hsteoh@quickfur.ath.cx

Dec 01, 2014

bearophile_hugs@eml.cc

Dec 01, 2014

hsteoh@quickfur.ath.cx

Dec 01, 2014

bearophile_hugs@eml.cc

Dec 01, 2014

hsteoh@quickfur.ath.cx

Dec 01, 2014

hsteoh@quickfur.ath.cx

Feb 19, 2015

github-bugzilla@puremagic.com

Mar 18, 2021

Dlang Bot

https://issues.dlang.org/show_bug.cgi?id=13595 hsteoh@quickfur.ath.cx changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement --- Comment #2 from hsteoh@quickfur.ath.cx --- Actually, I take that back. This is not a bug (w.r.t. the current documentation) because the predicate is expected to be an equivalence relation, but (x,y)=>((x*y)%3)==0 is not a reflexive relation. So the issue is more, should groupBy be extended to support non-equivalence relations? I have considered this before but decided against it because it would be more inefficient to implement. However, allowing non-equivalence relations between adjacent elements *does* allow for much more interesting groupings (e.g., you could implement word breaks with it), so arguably it should be supported somehow. Marking this as as enhancement request. --

https://issues.dlang.org/show_bug.cgi?id=13595 hsteoh@quickfur.ath.cx changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |pull --- Comment #3 from hsteoh@quickfur.ath.cx --- https://github.com/D-Programming-Language/phobos/pull/2654 Note that the Haskell example is actually wrong, because the Haskell groupBy function also assumes that the predicate is an equivalence relation. It just so happens that Haskell's groupBy only assumes transitivity, whereas the original implementation of std.algorithm.groupBy also assumes reflexivity, which causes an infinite loop when that assumption fails to hold. In the above referenced PR, groupBy now defaults to assuming a non-equivalence relation, meaning that the predicate is evaluated for every pair of adjacent elements, so the OP's predicate (x,y) => (x*y)%3 == 0 will produce the grouping [[1], [2,3,4], [5,6,7], [8,9]], because both 2*3 and 3*4 are divisible by 3, as are 5*6 and 6*7. (Notice that Haskell's groupBy evaluates the predicate against the first element of the subrange, thus it checks 2*3 and 2*4 instead of 2*3 and 3*4 for the second subrange.) --

https://issues.dlang.org/show_bug.cgi?id=13595 --- Comment #4 from hsteoh@quickfur.ath.cx --- P.S. Due to the extra performance overhead in evaluating the predicate for every adjacent pair of elements (and also additional logic to ensure correctness), groupBy now takes a compile-time parameter that can be used to get the original performance when the predicate is an equivalence relation. This way we have the best of both worlds. :-) --

https://issues.dlang.org/show_bug.cgi?id=13595 --- Comment #5 from bearophile_hugs@eml.cc --- (In reply to hsteoh from comment #4) > P.S. Due to the extra performance overhead in evaluating the predicate for every adjacent pair of elements (and also additional logic to ensure correctness), groupBy now takes a compile-time parameter that can be used to get the original performance when the predicate is an equivalence relation. This way we have the best of both worlds. :-) Thank you. But there's a risk of "Boolean Blindness" (http://existentialtype.wordpress.com/2011/03/15/boolean-blindness/ ), so perhaps it's better to use std.typecons.Flag instead of a bool. Regarding groupBy, I hope someday we'll also have a hashGroupBy or something similar (Issue 9842 ). --

https://issues.dlang.org/show_bug.cgi?id=13595 --- Comment #6 from github-bugzilla@puremagic.com --- Commit pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/5986e740873d6af793610381a942df78ab41f137 Merge pull request #2654 from quickfur/issue13595b Issue 13595: Extend groupBy to handle non-equivalence relations. --

December 01, 2014

[Issue 13595] Extend std.algorithm.groupBy to support non-equivalence relations

Posted by bearophile_hugs@eml.cc

Permalink

bearophile_hugs@eml.cc

Permalink

https://issues.dlang.org/show_bug.cgi?id=13595

--- Comment #7 from bearophile_hugs@eml.cc ---
As second try I've used groupBy to find the longest anagrams in a dictionary of words. "words.txt" is a text file that contains one different word each line.


void main() {
    import std.stdio, std.algorithm, std.string, std.file, std.array;

    auto anags =
         "words.txt"
         .readText
         .split
         .schwartzSort!((string w) => w.dup.representation.sort().release)
         .groupBy!((w1, w2) =>
w1.dup.representation.sort().equal(w2.dup.representation.sort()))
         .map!array
         .array
         .sort!((g1, g2) => g1.length > g2.length)
         .release
         .groupBy!((g1, g2) => g1.length == g2.length)
         .front;
    writefln("%(%s\n%)", anags);
}



If I replace "(string w)" with "w" in the first schwartzSort I receive the
errors:

...\dmd2\src\phobos\std\algorithm.d(4659,5): Error: field _prev must be
initialized in constructor, because it is nested struct
...\dmd2\src\phobos\std\algorithm.d(4770,51): Error: template instance
anagrams1c.main.groupBy!(__lambda2, cast(Flag)false, SortedRange!(string[], (a,
b) => binaryFun!less(unaryFun!transform(a), unaryFun!transform(b)))) error
instantiating
anagrams1c.d(9,10):        instantiated from here: groupBy!((w1, w2) =>
w1.dup.representation.sort().equal(w2.dup.representation.sort()),
SortedRange!(string[], (a, b) => binaryFun!less(unaryFun!transform(a),
unaryFun!transform(b))))


If I remove the last ".release" (at line 13) I receive the similar errors:

...\dmd2\src\phobos\std\algorithm.d(4659,5): Error: field _prev must be
initialized in constructor, because it is nested struct
...\dmd2\src\phobos\std\algorithm.d(4770,51): Error: template instance
anagrams1c.main.groupBy!(__lambda4, cast(Flag)false, SortedRange!(string[][],
__lambda3)) error instantiating
anagrams1c.d(14,10):        instantiated from here: groupBy!((g1, g2) =>
g1.length == g2.length, SortedRange!(string[][], __lambda3))


Is this a problem of groupBy?

--

Forums