Jump to page: 1 2
Thread overview
[Issue 13595] A possible problem with std.algorithm.groupBy
[Issue 13595] Extend std.algorithm.groupBy to support non-equivalence relations
Mar 18, 2021
Dlang Bot
October 24, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh@quickfur.ath.cx

--
October 24, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|Windows                     |All

--- Comment #1 from hsteoh@quickfur.ath.cx ---
Definitely a bug, the returned subranges should not have empty elements.

--
October 24, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #2 from hsteoh@quickfur.ath.cx ---
Actually, I take that back. This is not a bug (w.r.t. the current
documentation) because the predicate is expected to be an equivalence relation,
but (x,y)=>((x*y)%3)==0 is not a reflexive relation.

So the issue is more, should groupBy be extended to support non-equivalence relations? I have considered this before but decided against it because it would be more inefficient to implement. However, allowing non-equivalence relations between adjacent elements *does* allow for much more interesting groupings (e.g., you could implement word breaks with it), so arguably it should be supported somehow.

Marking this as as enhancement request.

--
October 27, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|A possible problem with     |Extend
                   |std.algorithm.groupBy       |std.algorithm.groupBy to
                   |                            |support non-equivalence
                   |                            |relations

--
November 04, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull

--- Comment #3 from hsteoh@quickfur.ath.cx ---
https://github.com/D-Programming-Language/phobos/pull/2654

Note that the Haskell example is actually wrong, because the Haskell groupBy function also assumes that the predicate is an equivalence relation. It just so happens that Haskell's groupBy only assumes transitivity, whereas the original implementation of std.algorithm.groupBy also assumes reflexivity, which causes an infinite loop when that assumption fails to hold.

In the above referenced PR, groupBy now defaults to assuming a non-equivalence relation, meaning that the predicate is evaluated for every pair of adjacent elements, so the OP's predicate (x,y) => (x*y)%3 == 0 will produce the grouping [[1], [2,3,4], [5,6,7], [8,9]], because both 2*3 and 3*4 are divisible by 3, as are 5*6 and 6*7. (Notice that Haskell's groupBy evaluates the predicate against the first element of the subrange, thus it checks 2*3 and 2*4 instead of 2*3 and 3*4 for the second subrange.)

--
November 04, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

--- Comment #4 from hsteoh@quickfur.ath.cx ---
P.S. Due to the extra performance overhead in evaluating the predicate for every adjacent pair of elements (and also additional logic to ensure correctness), groupBy now takes a compile-time parameter that can be used to get the original performance when the predicate is an equivalence relation. This way we have the best of both worlds. :-)

--
November 04, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

--- Comment #5 from bearophile_hugs@eml.cc ---
(In reply to hsteoh from comment #4)
> P.S. Due to the extra performance overhead in evaluating the predicate for every adjacent pair of elements (and also additional logic to ensure correctness), groupBy now takes a compile-time parameter that can be used to get the original performance when the predicate is an equivalence relation. This way we have the best of both worlds. :-)

Thank you. But there's a risk of "Boolean Blindness" (http://existentialtype.wordpress.com/2011/03/15/boolean-blindness/ ), so perhaps it's better to use std.typecons.Flag instead of a bool.

Regarding groupBy, I hope someday we'll also have a hashGroupBy or something
similar (Issue 9842 ).

--
December 01, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

--- Comment #6 from github-bugzilla@puremagic.com ---
Commit pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/5986e740873d6af793610381a942df78ab41f137 Merge pull request #2654 from quickfur/issue13595b

Issue 13595: Extend groupBy to handle non-equivalence relations.

--
December 01, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--
December 01, 2014
https://issues.dlang.org/show_bug.cgi?id=13595

--- Comment #7 from bearophile_hugs@eml.cc ---
As second try I've used groupBy to find the longest anagrams in a dictionary of words. "words.txt" is a text file that contains one different word each line.


void main() {
    import std.stdio, std.algorithm, std.string, std.file, std.array;

    auto anags =
         "words.txt"
         .readText
         .split
         .schwartzSort!((string w) => w.dup.representation.sort().release)
         .groupBy!((w1, w2) =>
w1.dup.representation.sort().equal(w2.dup.representation.sort()))
         .map!array
         .array
         .sort!((g1, g2) => g1.length > g2.length)
         .release
         .groupBy!((g1, g2) => g1.length == g2.length)
         .front;
    writefln("%(%s\n%)", anags);
}



If I replace "(string w)" with "w" in the first schwartzSort I receive the
errors:

...\dmd2\src\phobos\std\algorithm.d(4659,5): Error: field _prev must be
initialized in constructor, because it is nested struct
...\dmd2\src\phobos\std\algorithm.d(4770,51): Error: template instance
anagrams1c.main.groupBy!(__lambda2, cast(Flag)false, SortedRange!(string[], (a,
b) => binaryFun!less(unaryFun!transform(a), unaryFun!transform(b)))) error
instantiating
anagrams1c.d(9,10):        instantiated from here: groupBy!((w1, w2) =>
w1.dup.representation.sort().equal(w2.dup.representation.sort()),
SortedRange!(string[], (a, b) => binaryFun!less(unaryFun!transform(a),
unaryFun!transform(b))))


If I remove the last ".release" (at line 13) I receive the similar errors:

...\dmd2\src\phobos\std\algorithm.d(4659,5): Error: field _prev must be
initialized in constructor, because it is nested struct
...\dmd2\src\phobos\std\algorithm.d(4770,51): Error: template instance
anagrams1c.main.groupBy!(__lambda4, cast(Flag)false, SortedRange!(string[][],
__lambda3)) error instantiating
anagrams1c.d(14,10):        instantiated from here: groupBy!((g1, g2) =>
g1.length == g2.length, SortedRange!(string[][], __lambda3))


Is this a problem of groupBy?

--
« First   ‹ Prev
1 2