Jump to page: 1 2
Thread overview
[Issue 4705] New: Redesign of std.algorithm.max()/min() + mins()/maxs()
Dec 24, 2010
Denis Derman
Dec 24, 2010
Denis Derman
August 21, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705

           Summary: Redesign of std.algorithm.max()/min() + mins()/maxs()
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2010-08-21 11:14:01 PDT ---
Here I propose a redesign of the std.algorithm.max()/min() functions, and the
creation of two related functions, that may be named mins()/maxs().

Usually the contents of std.algorithm.max() are not necessary to program in D2, their main purpose is to allow to shorten programs (and avoid some bugs), especially in the large number of situations where maximum performance is not necessary.

So the contents of std.algorithm must be flexible and handy to use. So the contents of this module need to be chosen according to the most common pattern found in normal programs. To show few examples I use Python code (because Python design shows a very good choice of the most common patterns you find in your code).


The task is to collect all items that have the max length:

>>> maxlen = max(len(item) for item in data)
>>> [item for item in data if len(item) == maxlen]
['hello', 'roger']


This task is so common that I use an utility module (as you see in Python max()
accepts an optional key mapping function):

>>> from util import maxs
>>> maxs(data, key=len)
['hello', 'roger']


A subtask is to find the length of the longer string of 'data', here I show two possible solutions:

>>> data = ["red", "hello", "yes", "no", "roger", "bud"]
>>> len(max(data, key=len))
5
>>> max(len(item) for item in data)
5


This code shows a possible D2 solution of the subtask and task:

import std.stdio: writeln;
import std.algorithm: reduce, map, filter, max;
import std.range: walkLength;
void main() {
    string[] data = ["red", "hello", "yes", "no", "roger", "bud"];
    int lmax = reduce!max(map!walkLength(data));
    writeln(lmax);
    auto maxs = filter!((a){ return a.length == lmax; })(data);
    writeln(maxs);
}


But "reduce!max(map!walkLength(data))" is too much complex for such simple and
common task (I have found that code only after many tries), so I think the
current max() design is not good enough.

My experience shows that the most common&handy usages of max()/min() are four
(for each):

max(item1, item2)
max(item1, item2, item3)
max(collection)
max!(callable)(collection)

min(item1, item2)
min(item1, item2, item3)
min(collection)
min!(callable)(collection)

They are: to find the max among two items, among tree items, among a collection of items, and among a collection of items using a mapping callable.

The callable is similar to the 'transform' function of schwartzSort().

So I suggest a redesign of min() and max() according to those four most common
usages.

I also suggest to create two new functions, that I have just called maxs() and mins() similar to max and min, that find all the max or min items of a given collection. They don't need the case with two or three items:

maxs(collection)
maxs!(callable)(collection)

mins(collection)
mins!(callable)(collection)


(If this proposal gets accepted I may be able to create a patch with this changes to max/min and the mins/maxs. I have written the same four functions for D1, that don't use ranges.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 11, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|nobody@puremagic.com        |andrei@metalanguage.com


--- Comment #1 from bearophile_hugs@eml.cc 2010-10-11 13:45:17 PDT ---
I have assigned this to Andrei, but I don't know if that's the right thing to do. I am not asking Andrei to write the D2 implementation of the max/min/maxs/mins I discuss here, that's something I may do :-) But in Bugzilla I have some Phobos2 enhancement requests like this one where I'd like to receive a comment.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705


Denis Derman <denis.spir@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |denis.spir@gmail.com


--- Comment #2 from Denis Derman <denis.spir@gmail.com> 2010-12-24 11:12:14 PST ---
(In reply to comment #0)

> I also suggest to create two new functions, that I have just called maxs() and mins() similar to max and min, that find all the max or min items of a given collection. They don't need the case with two or three items:
> 
> maxs(collection)
> maxs!(callable)(collection)
> 
> mins(collection)
> mins!(callable)(collection)

Why I understand the utility of maxs/mins, I wonder about them in std lib. Aren't they "niche domain" tools?

Also, I wonder about maxs(collection) without any trnasform func: if you don't
map, then there is logically a single max value (even if can occur several
times in collection). maxS/minS seems useful only in presence of a transform
func on which results comparison is done: the compared value is (in your case
string len) unique, but there may be several original values in coll that map
to it.
To say it deifferently, in your case mins(collection) would return possibly
multiple specimens of the same string. And mins(lenghts) would return [5,5].
Unless I misunderstand your point.

Denis

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #3 from Denis Derman <denis.spir@gmail.com> 2010-12-24 11:17:33 PST ---
Aside the posted comment, I rather support this proposal.

Have you ever called an external func for min(a,b) or sum(a,b)? (lol ;-)
So, according to me, what we need is a simple and efficient implementation of
min/max (and sum, etc...) for what is the common need: over a collection.
An variant with transform func would also be nice, indeed.

Denis

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 25, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #4 from bearophile_hugs@eml.cc 2010-12-24 22:53:27 PST ---
(In reply to comment #2)

> Aren't they "niche domain" tools?

The opposite is true. Functions and HOFs like max, min, sum, maxs, map, filter, reduce, and so on are basic building blocks that are useful to build most other programs. So their place is in the standard library, or sometimes even in the language itself.


> Also, I wonder about maxs(collection) without any trnasform func: if you don't
> map, then there is logically a single max value (even if can occur several
> times in collection). maxS/minS seems useful only in presence of a transform
> func on which results comparison is done: the compared value is (in your case
> string len) unique, but there may be several original values in coll that map
> to it.
> To say it deifferently, in your case mins(collection) would return possibly
> multiple specimens of the same string. And mins(lenghts) would return [5,5].
> Unless I misunderstand your point.

maxs with no transform function is useful even with integers, to count how many equal max values are present:

maxs([5, 1, 5, 3]).length == 2

And if your collection contains objects, they may compare as equal despite being distinct.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 01, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #5 from bearophile_hugs@eml.cc 2011-01-01 06:33:02 PST ---
Another example to show why a better max() is useful. The task is ot show how
the number less than 1000 which has the longest hailstone sequence. For the
hailstone see:
http://en.wikipedia.org/wiki/Collatz_conjecture

The code, DMD 2.051:

import std.stdio, std.algorithm, std.range, std.typecons;

auto hailstone(int n) {
    auto result = [n];
    while (n != 1) {
        n = n & 1 ? n*3 + 1 : n/2;
        result ~= n;
    }
    return result;
}

void main() {
    auto r = reduce!max(map!((i){return tuple(hailstone(i).length,
i);})(iota(1, 1000)))[1];
    writeln(r);
}


With a max that supports a key function and iterables the line of code in the
main() becomes just:

auto r = max!hailstone(iota(1, 1000));

With the enhancement request 5395 it becomes even more readable:

auto r = max!hailstone(1 .. 1000);

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 13, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #6 from bearophile_hugs@eml.cc 2011-02-13 05:51:55 PST ---
Two more usage examples of the improved max. This program finds the longest common subsequence with a recursive algorithm:


import std.stdio;
T[] lcs(T)(T[] a, T[] b) {
    T[] longest(T)(T[] s, T[] t) {
        return s.length > t.length ? s : t;
    }
    if (!a.length || !b.length)
        return null;
    if (a[0] == b[0])
        return a[0] ~ lcs(a[1..$], b[1..$]);
    return longest(lcs(a, b[1..$]), lcs(a[1..$], b));
}
void main() {
    writeln(lcs("thisisatest", "testing123testing"));
}


With a key-mapping max() you are able to remove the inner function and simplify
the code (untested):

import std.stdio;
T[] lcs(T)(T[] a, T[] b) {
    if (!a.length || !b.length)
        return null;
    if (a[0] == b[0])
        return a[0] ~ lcs(a[1..$], b[1..$]);
    return max!q{a.length}([lcs(a, b[1..$]), lcs(a[1..$], b)]);
}
void main() {
    writeln(lcs("thisisatest", "testing123testing"));
}


--------------

This program shows the most common anagrams from a dictionary file of different words:


import std.stdio, std.algorithm;
void main() {
    string[][string] anags;
    foreach (w; File("unixdict.txt").byLine())
        anags[w.sort.idup] ~= w.idup;
    int m = reduce!max(map!q{a.length}(anags.values));
    writeln(filter!((wl){ return wl.length == m; })(anags.values));
}


A key-mapping max() makes the code shorter and more readable (untested):


import std.stdio, std.algorithm;
void main() {
    string[][string] anags;
    foreach (w; File("unixdict.txt").byLine())
        anags[w.sort.idup] ~= w.idup;
    int m = max!q{a.length}(anags.values);
    writeln(filter!((wl){ return wl.length == m; })(anags.values));
}


maxs() simplifies the code even more (untested):

import std.stdio, std.algorithm;
void main() {
    string[][string] anags;
    foreach (w; File("unixdict.txt").byLine())
        anags[w.sort.idup] ~= w.idup;
    writeln(maxs!q{a.length}(anags.values));
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 24, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #7 from bearophile_hugs@eml.cc 2011-03-24 15:07:40 PDT ---
The examples hopefully show how much useful are the new max/min. This is part of the "pivoting" part of a LU decomposition algorithm:

T imax = mat[j][j];
int nrow = j;
foreach (i; j .. N) {
    if (mat[i][j] > imax) {
        imax = mat[i][j];
        nrow = i;
    }
}


With the improved max() it becomes:

int nrow = max!((int i){ return mat[i][j]; })(iota(j, N));


That is similar to this Python code:

nrow = max(xrange(j, N), key=lambda i: mat[i][j])

Python designers have recognized this is a common pattern in code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 25, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #8 from bearophile_hugs@eml.cc 2011-03-25 10:35:06 PDT ---
A max/min with a comparing function is present in the Haskel standard library too, named "maximumBy":


import Data.List
import Data.Ord

longestWord words = maximumBy (comparing length) words
listx = ["x", "yyy", "zz"]
main = print $ longestWord listx


Prints:
"yyy"

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 23, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #9 from bearophile_hugs@eml.cc 2011-05-23 05:12:44 PDT ---
Another example, compared to using minPos():


import std.stdio, std.algorithm, std.array;

void main() {
    string[] data = ["red", "hello", "yes", "no", "roger", "bud"];

    // works in DMD 2.053
    string r1 = minPos!q{ walkLength(a) > walkLength(b) }(data).front();
    writeln(r1);

    // proposed
    string r2 = max!q{ walkLength(a) }(items);
    writeln(r2);
}


The second version is shorter and more readable, and just like schwartzSort()
the modified max() is more efficient than using minPos when the mapping
function becomes costly to compute.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2