Thread overview
[Issue 10670] New: std.algorithm.reduce: no-seed initialization wrong design
Jul 19, 2013
Peter Neubauer
July 19, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10670

           Summary: std.algorithm.reduce: no-seed initialization wrong
                    design
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: monarchdodra@gmail.com
        ReportedBy: monarchdodra@gmail.com


--- Comment #0 from monarchdodra@gmail.com 2013-07-19 03:07:10 PDT ---
reduce has a "no-seed" implementation provided: "The one-argument version $(D
reduce!fun(range)) works similarly, but it uses the first element of the range
as the seed (the range must be non-empty)."

I question this design as invalid: The first element is never passed through the transform function. There is no reason it be given any "special treatment". Here is a trivial example of why the design is (IMO) wrong:

//----
import std.stdio, std.algorithm, std.range;

void main(string[] args)
{
    auto arr = [1, 1];
    auto sumDoubles1 = reduce!"a + 2*b"(0, arr);
    auto sumDoubles2 = reduce!"a + 2*b"(arr);
    writeln(sumDoubles1); //4
    writeln(sumDoubles2); //3 (!)
}
//----

I really can't see any universe where you could justify that return value...

The correct seed value should be "Seed seed = fun(Seed.init, r.front)": This means the first element is correctly "processed". Of course, doing this really just boils down to seed being Seed.init.

An added "bonus" is that it makes simple "isIterable" types much more efficient, as they don't have to check "isInitialized" on every iteration.

--------
Unsure if this is "ER", "wrong-code" or what. Not sure if changing the behavior is acceptable (IMO, it should be).

I was re-writing reduce already, so please provide your feedback on the issue so that I can take it into account :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 19, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10670


Peter Neubauer <peterneubauer2@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peterneubauer2@gmail.com


--- Comment #1 from Peter Neubauer <peterneubauer2@gmail.com> 2013-07-19 07:34:43 PDT ---
float.init is nan, so this change would require all (float[]).reduce!"a+b" calls to be rewritten with an explicit seed. I imagine that this is a relatively common use case, while I've never had to reduce!"a+2b" anything. Users of the latter case should get the burden of the explicit seed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 19, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10670


hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh@quickfur.ath.cx


--- Comment #2 from hsteoh@quickfur.ath.cx 2013-07-19 08:07:58 PDT ---
I've run into this issue before. I agree that if no seed value is provided, it should use ElementType!range.init, NOT range.front.

In the case of floats, well... I'd argue that using the seedless variety of reduce on a float range is already a bug, so it should be fixed anyway. Alternatively, since float.init is always NaN, it makes no sense to use the seedless variety of reduce, so we could just static assert(false) if isFloatingPointType!(ElementType!range). This will break existing code but at least it will point out why the code is breaking, instead of silently changing whatever the current behaviour is.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 22, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10670



--- Comment #3 from monarchdodra@gmail.com 2013-07-22 03:21:52 PDT ---
I've done the dev. Here is my preliminary experience with it:
For starters, there *is* some convenience in using "front as seed" for calls
such as:

//----
double[] a = [ 3, 4, 7, 11, 3, 2, 5 ];
// Compute minimum and maximum in one pass
auto r = reduce!(min, max)(a);
//----

Here, even a straight up "0.0" is a bad seed. The code needs to be fixed as:
//----
double[] a = [ 3, 4, 7, 11, 3, 2, 5 ];
// Compute minimum and maximum in one pass
auto r = reduce!(min, max)(tuple(double.max, double.min), a);
//----

Not very convenient. But still, nothing surmountable.

But at the same time, if you look at the existing unittests, you can see some
strange things such as:
//----
    // two funs
    auto r2 = reduce!("a + b", "a - b")(tuple(0.0, 0.0), a);
    writeln(r2);
    assert(r2[0] == 7 && r2[1] == -7);
    auto r3 = reduce!("a + b", "a - b")(a);
    assert(r3[0] == 7 && r3[1] == -1);
//----
I don't know what that was trying to prove, but it shows that no-seed can do
some really strange things.

The "assert not float (or char for that matter)" was very efficient in catching float seeds, and catched almost all of the "wrong usage" cases. The only one that it did not catch was the above case. But such code doesn't make sense to me. I think "silent breakage" would really be minimal.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 22, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10670


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #4 from bearophile_hugs@eml.cc 2013-07-22 04:55:04 PDT ---
(In reply to comment #2)

> In the case of floats, well... I'd argue that using the seedless variety of reduce on a float range is already a bug,

I don't think it's a bug, this is Python:

>>> a = [2.0, 3.0, 4.0]
>>> reduce(lambda x, y: x * y, a)
24.0

I have a ton of D code that relies on such behavour of D reduce.

I suggest to just swap the seed and sequence arguments of reduce, to support UFCS chains, and leave the rest of reduce as it is.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 23, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10670



--- Comment #5 from monarchdodra@gmail.com 2013-07-22 22:59:37 PDT ---
(In reply to comment #4)
> (In reply to comment #2)
> 
> > In the case of floats, well... I'd argue that using the seedless variety of reduce on a float range is already a bug,
> 
> I don't think it's a bug, this is Python:
> 
> >>> a = [2.0, 3.0, 4.0]
> >>> reduce(lambda x, y: x * y, a)
> 24.0
> 
> I have a ton of D code that relies on such behavour of D reduce.

You could be right, and I've seen use cases where it makes a lot of sense, eg: reduce!`a ~ " " ~ b`(["hello", "world"]);

> I suggest to just swap the seed and sequence arguments of reduce, to support UFCS chains, and leave the rest of reduce as it is.

I've tried in http://d.puremagic.com/issues/show_bug.cgi?id=8755, but the conclusion is that it is not possible without breaking code. And when I say "breaking code", I mean "still compiles but silently generates different results". That, and no migration plan was found. Short of finding a new name for a new function, I have not been able to achieve this goad.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 23, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10670



--- Comment #6 from bearophile_hugs@eml.cc 2013-07-22 23:25:14 PDT ---
(In reply to comment #5)

> I've tried in http://d.puremagic.com/issues/show_bug.cgi?id=8755, but the conclusion is that it is not possible without breaking code. And when I say "breaking code", I mean "still compiles but silently generates different results". That, and no migration plan was found. Short of finding a new name for a new function, I have not been able to achieve this goad.

I understand. I suggest to introduce a new function named like your "fold" and slowly deprecate "reduce".

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------