[Issue 10670] New: std.algorithm.reduce: no-seed initialization wrong design

Jul 19, 2013

monarchdodra@gmail.com

Jul 19, 2013

Peter Neubauer

Jul 19, 2013

hsteoh@quickfur.ath.cx

Jul 22, 2013

monarchdodra@gmail.com

Jul 22, 2013

bearophile_hugs@eml.cc

Jul 23, 2013

monarchdodra@gmail.com

Jul 23, 2013

bearophile_hugs@eml.cc

July 19, 2013

[Issue 10670] New: std.algorithm.reduce: no-seed initialization wrong design

Posted by monarchdodra@gmail.com

Permalink

monarchdodra@gmail.com

Permalink

http://d.puremagic.com/issues/show_bug.cgi?id=10670

           Summary: std.algorithm.reduce: no-seed initialization wrong
                    design
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: monarchdodra@gmail.com
        ReportedBy: monarchdodra@gmail.com


--- Comment #0 from monarchdodra@gmail.com 2013-07-19 03:07:10 PDT ---
reduce has a "no-seed" implementation provided: "The one-argument version $(D
reduce!fun(range)) works similarly, but it uses the first element of the range
as the seed (the range must be non-empty)."

I question this design as invalid: The first element is never passed through the transform function. There is no reason it be given any "special treatment". Here is a trivial example of why the design is (IMO) wrong:

//----
import std.stdio, std.algorithm, std.range;

void main(string[] args)
{
    auto arr = [1, 1];
    auto sumDoubles1 = reduce!"a + 2*b"(0, arr);
    auto sumDoubles2 = reduce!"a + 2*b"(arr);
    writeln(sumDoubles1); //4
    writeln(sumDoubles2); //3 (!)
}
//----

I really can't see any universe where you could justify that return value...

The correct seed value should be "Seed seed = fun(Seed.init, r.front)": This means the first element is correctly "processed". Of course, doing this really just boils down to seed being Seed.init.

An added "bonus" is that it makes simple "isIterable" types much more efficient, as they don't have to check "isInitialized" on every iteration.

--------
Unsure if this is "ER", "wrong-code" or what. Not sure if changing the behavior is acceptable (IMO, it should be).

I was re-writing reduce already, so please provide your feedback on the issue so that I can take it into account :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=10670 Peter Neubauer <peterneubauer2@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peterneubauer2@gmail.com --- Comment #1 from Peter Neubauer <peterneubauer2@gmail.com> 2013-07-19 07:34:43 PDT --- float.init is nan, so this change would require all (float[]).reduce!"a+b" calls to be rewritten with an explicit seed. I imagine that this is a relatively common use case, while I've never had to reduce!"a+2b" anything. Users of the latter case should get the burden of the explicit seed. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=10670 hsteoh@quickfur.ath.cx changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hsteoh@quickfur.ath.cx --- Comment #2 from hsteoh@quickfur.ath.cx 2013-07-19 08:07:58 PDT --- I've run into this issue before. I agree that if no seed value is provided, it should use ElementType!range.init, NOT range.front. In the case of floats, well... I'd argue that using the seedless variety of reduce on a float range is already a bug, so it should be fixed anyway. Alternatively, since float.init is always NaN, it makes no sense to use the seedless variety of reduce, so we could just static assert(false) if isFloatingPointType!(ElementType!range). This will break existing code but at least it will point out why the code is breaking, instead of silently changing whatever the current behaviour is. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=10670 --- Comment #3 from monarchdodra@gmail.com 2013-07-22 03:21:52 PDT --- I've done the dev. Here is my preliminary experience with it: For starters, there *is* some convenience in using "front as seed" for calls such as: //---- double[] a = [ 3, 4, 7, 11, 3, 2, 5 ]; // Compute minimum and maximum in one pass auto r = reduce!(min, max)(a); //---- Here, even a straight up "0.0" is a bad seed. The code needs to be fixed as: //---- double[] a = [ 3, 4, 7, 11, 3, 2, 5 ]; // Compute minimum and maximum in one pass auto r = reduce!(min, max)(tuple(double.max, double.min), a); //---- Not very convenient. But still, nothing surmountable. But at the same time, if you look at the existing unittests, you can see some strange things such as: //---- // two funs auto r2 = reduce!("a + b", "a - b")(tuple(0.0, 0.0), a); writeln(r2); assert(r2[0] == 7 && r2[1] == -7); auto r3 = reduce!("a + b", "a - b")(a); assert(r3[0] == 7 && r3[1] == -1); //---- I don't know what that was trying to prove, but it shows that no-seed can do some really strange things. The "assert not float (or char for that matter)" was very efficient in catching float seeds, and catched almost all of the "wrong usage" cases. The only one that it did not catch was the above case. But such code doesn't make sense to me. I think "silent breakage" would really be minimal. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=10670 bearophile_hugs@eml.cc changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bearophile_hugs@eml.cc --- Comment #4 from bearophile_hugs@eml.cc 2013-07-22 04:55:04 PDT --- (In reply to comment #2) > In the case of floats, well... I'd argue that using the seedless variety of reduce on a float range is already a bug, I don't think it's a bug, this is Python: >>> a = [2.0, 3.0, 4.0] >>> reduce(lambda x, y: x * y, a) 24.0 I have a ton of D code that relies on such behavour of D reduce. I suggest to just swap the seed and sequence arguments of reduce, to support UFCS chains, and leave the rest of reduce as it is. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=10670 --- Comment #5 from monarchdodra@gmail.com 2013-07-22 22:59:37 PDT --- (In reply to comment #4) > (In reply to comment #2) > > > In the case of floats, well... I'd argue that using the seedless variety of reduce on a float range is already a bug, > > I don't think it's a bug, this is Python: > > >>> a = [2.0, 3.0, 4.0] > >>> reduce(lambda x, y: x * y, a) > 24.0 > > I have a ton of D code that relies on such behavour of D reduce. You could be right, and I've seen use cases where it makes a lot of sense, eg: reduce!`a ~ " " ~ b`(["hello", "world"]); > I suggest to just swap the seed and sequence arguments of reduce, to support UFCS chains, and leave the rest of reduce as it is. I've tried in http://d.puremagic.com/issues/show_bug.cgi?id=8755, but the conclusion is that it is not possible without breaking code. And when I say "breaking code", I mean "still compiles but silently generates different results". That, and no migration plan was found. Short of finding a new name for a new function, I have not been able to achieve this goad. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=10670 --- Comment #6 from bearophile_hugs@eml.cc 2013-07-22 23:25:14 PDT --- (In reply to comment #5) > I've tried in http://d.puremagic.com/issues/show_bug.cgi?id=8755, but the conclusion is that it is not possible without breaking code. And when I say "breaking code", I mean "still compiles but silently generates different results". That, and no migration plan was found. Short of finding a new name for a new function, I have not been able to achieve this goad. I understand. I suggest to introduce a new function named like your "fold" and slowly deprecate "reduce". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

Forums