Thread overview
Functional vs simple code
Oct 02, 2012
ixid
Oct 02, 2012
ixid
Oct 02, 2012
Timon Gehr
Oct 02, 2012
Timon Gehr
Oct 02, 2012
ixid
Oct 03, 2012
ixid
Oct 03, 2012
Tommi
Oct 03, 2012
jerro
Oct 03, 2012
ixid
Oct 03, 2012
ixid
October 02, 2012
Without optimization the range and algorithm method takes about 10 times as long as the simple code below it, with no array bounds checking and optimization it takes six times as long. Why is the difference so huge? I'd expect a moderate overhead but that's a big difference.

module main;
import std.stdio, std.algorithm, std.range, std.datetime;

enum MAX = 10_000_000UL;

void main() {
	StopWatch sw;
	sw.start;

	auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

	sw.peek.msecs.writeln("msecs");
	sum1.writeln;
	sw.start;

	ulong sum2 = 0;
	foreach(i;0..MAX)
		sum2 += i * i;

	sw.peek.msecs.writeln("msecs");

	sum2.writeln;
}
October 02, 2012
On Tuesday, 2 October 2012 at 20:48:31 UTC, ixid wrote:
> Without optimization the range and algorithm method takes about 10 times as long as the simple code below it, with no array bounds checking and optimization it takes six times as long. Why is the difference so huge? I'd expect a moderate overhead but that's a big difference.
>
> module main;
> import std.stdio, std.algorithm, std.range, std.datetime;
>
> enum MAX = 10_000_000UL;
>
> void main() {
> 	StopWatch sw;
> 	sw.start;
>
> 	auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";
>
> 	sw.peek.msecs.writeln("msecs");
> 	sum1.writeln;
> 	sw.start;
>
> 	ulong sum2 = 0;
> 	foreach(i;0..MAX)
> 		sum2 += i * i;
>
> 	sw.peek.msecs.writeln("msecs");
>
> 	sum2.writeln;
> }

I realised I was making the functional version more complicated than necessary. This version is faster but still four times slower than the simple version:

MAX.iota.reduce!"a + b * b".writeln;
October 02, 2012
On 10/02/2012 10:48 PM, ixid wrote:
> Without optimization the range and algorithm method takes about 10 times
> as long as the simple code below it, with no array bounds checking and
> optimization it takes six times as long. Why is the difference so huge?
> I'd expect a moderate overhead but that's a big difference.
>
> module main;
> import std.stdio, std.algorithm, std.range, std.datetime;
>
> enum MAX = 10_000_000UL;
>
> void main() {
>      StopWatch sw;
>      sw.start;
>
>      auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";
>
>      sw.peek.msecs.writeln("msecs");
>      sum1.writeln;
>      sw.start;
>
>      ulong sum2 = 0;
>      foreach(i;0..MAX)
>          sum2 += i * i;
>
>      sw.peek.msecs.writeln("msecs");
>
>      sum2.writeln;
> }


$ cat ixidbench.d
module main;
import std.stdio, std.algorithm, std.range, std.datetime;

enum MAX = 10_000_000_000UL;

void main() {
	StopWatch sw;
	sw.start;

	auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

	sw.stop;
	sw.peek.msecs.writeln("msecs");
	sum1.writeln;
	sw.reset;
	sw.start;

	ulong sum2 = 0;
	foreach(i;0..MAX)
		sum2 += i * i;

	sw.stop;
	sw.peek.msecs.writeln("msecs");

	sum2.writeln;
}
$ ldmd2 -O -release -inline ixidbench.d -ofixidbench
$ ./ixidbench
6528msecs
7032546979563742720
7518msecs
7032546979563742720
October 02, 2012
On 10/03/2012 12:11 AM, Timon Gehr wrote:
> ...
>
> $ cat ixidbench.d
> module main;
> import std.stdio, std.algorithm, std.range, std.datetime;
>
> enum MAX = 10_000_000_000UL;
>
> void main() {
>      StopWatch sw;
>      sw.start;
>
>      auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";
>
>      sw.stop;
>      sw.peek.msecs.writeln("msecs");
>      sum1.writeln;
>      sw.reset;
>      sw.start;
>
>      ulong sum2 = 0;
>      foreach(i;0..MAX)
>          sum2 += i * i;
>
>      sw.stop;
>      sw.peek.msecs.writeln("msecs");
>
>      sum2.writeln;
> }
> $ ldmd2 -O -release -inline ixidbench.d -ofixidbench
> $ ./ixidbench
> 6528msecs
> 7032546979563742720
> 7518msecs
> 7032546979563742720

$ gdmd -O -release -inline ixidbench.d -ofixidbench
$ ./ixidbench
11250msecs
7032546979563742720
11233msecs
7032546979563742720



October 02, 2012
On Tuesday, 2 October 2012 at 22:13:10 UTC, Timon Gehr wrote:
> On 10/03/2012 12:11 AM, Timon Gehr wrote:
>> ...
>>
>> $ cat ixidbench.d
>> module main;
>> import std.stdio, std.algorithm, std.range, std.datetime;
>>
>> enum MAX = 10_000_000_000UL;
>>
>> void main() {
>>     StopWatch sw;
>>     sw.start;
>>
>>     auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";
>>
>>     sw.stop;
>>     sw.peek.msecs.writeln("msecs");
>>     sum1.writeln;
>>     sw.reset;
>>     sw.start;
>>
>>     ulong sum2 = 0;
>>     foreach(i;0..MAX)
>>         sum2 += i * i;
>>
>>     sw.stop;
>>     sw.peek.msecs.writeln("msecs");
>>
>>     sum2.writeln;
>> }
>> $ ldmd2 -O -release -inline ixidbench.d -ofixidbench
>> $ ./ixidbench
>> 6528msecs
>> 7032546979563742720
>> 7518msecs
>> 7032546979563742720
>
> $ gdmd -O -release -inline ixidbench.d -ofixidbench
> $ ./ixidbench
> 11250msecs
> 7032546979563742720
> 11233msecs
> 7032546979563742720

Yes, I think it was just the compiler and compiler options rather than anything inherent in the method, DMD produced much slower code than GDC for the functional version. It's interesting that it's significantly faster in the functional style with LDC.
October 03, 2012
On Tuesday, 2 October 2012 at 22:13:10 UTC, Timon Gehr wrote:
> On 10/03/2012 12:11 AM, Timon Gehr wrote:
>> ...
>>
>> $ cat ixidbench.d
>> module main;
>> import std.stdio, std.algorithm, std.range, std.datetime;
>>
>> enum MAX = 10_000_000_000UL;
>>
>> void main() {
>>     StopWatch sw;
>>     sw.start;
>>
>>     auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";
>>
>>     sw.stop;
>>     sw.peek.msecs.writeln("msecs");
>>     sum1.writeln;
>>     sw.reset;
>>     sw.start;
>>
>>     ulong sum2 = 0;
>>     foreach(i;0..MAX)
>>         sum2 += i * i;
>>
>>     sw.stop;
>>     sw.peek.msecs.writeln("msecs");
>>
>>     sum2.writeln;
>> }
>> $ ldmd2 -O -release -inline ixidbench.d -ofixidbench
>> $ ./ixidbench
>> 6528msecs
>> 7032546979563742720
>> 7518msecs
>> 7032546979563742720
>
> $ gdmd -O -release -inline ixidbench.d -ofixidbench
> $ ./ixidbench
> 11250msecs
> 7032546979563742720
> 11233msecs
> 7032546979563742720

While fiddling with this I came across something that seems odd in the behaviour of reduce and wondered if it's intended. It rather limits the usefulness of reduce with UFCS to "a + b" and "a - b".

Reduce works correctly when provided with a seed argument:

reduce!"a + b + 2"(0, [1,1,1]).writeln; // == 9 as expected

With UFCS I see no elegant way to reduce a chain with a more sophisticated reduce argument than the two simple ones listed above because the operation is not carried out on the first array element, used as the seed. This means:

[1,1,1].reduce!"a + b + 2".writeln; // == 7

which is useless and could easily trip people up because the intuitive expectation is that it will act like the seed example. If I am wrong in that expectation what is a use case for the seedless behaviour? Is there a way of writing the operation to give the same answer as the seed answer?

The element in position 0 should surely have +2 added to itself, it's only being the seed out of convenience isn't it? Instead we get 1 + (1 + 2) + (1 + 2). The order of reduce's arguments, (seed, range) prevent it being used with UFCS properly. If it were (range, seed) then there would be no problem:

[1,1,1].reduce!"a + b + 2"(0).writeln; // == 9
October 03, 2012
On Wednesday, 3 October 2012 at 01:21:38 UTC, ixid wrote:
> If it were (range, seed) then there would be no problem:
>
> [1,1,1].reduce!"a + b + 2"(0).writeln; // == 9

My thoughts exactly.
October 03, 2012
> While fiddling with this I came across something that seems odd in the behaviour of reduce and wondered if it's intended. It rather limits the usefulness of reduce with UFCS to "a + b" and "a - b".
>
> Reduce works correctly when provided with a seed argument:
>
> reduce!"a + b + 2"(0, [1,1,1]).writeln; // == 9 as expected
>
> With UFCS I see no elegant way to reduce a chain with a more sophisticated reduce argument than the two simple ones listed above because the operation is not carried out on the first array element, used as the seed. This means:
>
> [1,1,1].reduce!"a + b + 2".writeln; // == 7
>
> which is useless and could easily trip people up because the intuitive expectation is that it will act like the seed example. If I am wrong in that expectation what is a use case for the seedless behaviour?

The fact that it does something different than the seeded version does not make the seedless version useless - in fact, that's what makes it useful. A typical use case is to find the maximum of a range (there is an example of this in the documentation at http://dlang.org/phobos/std_algorithm.html#reduce). If you don't know the highest possible value of elements, then you couldn't come up with an appropriate seed for this case.

I agree that the difference between the two versions of reduce can be a bit confusing. Maybe it would be better if they were named differently.

> Is there a way of writing the operation to give the same answer as the seed answer?

In general, no (apart from the silly reduce!f(chain([seed], range))). If you want the behavior of the seeded version, use the seeded version.

> The element in position 0 should surely have +2 added to itself, it's only being the seed out of convenience isn't it?

No, reduce!f(r) is not merely a syntactic sugar for reduce!f(0, r) (you could say it's a syntactic sugar for reduce!f(r.front, r.drop(1), though). Making reduce!f(r) be the same as reduce!f(0, r) doesn't really make any sense - would you expect [1, 2, 3].reduce!"a * b" to return 0?  I wouldn't.

> Instead we get 1 + (1 + 2) + (1 + 2). The order of reduce's arguments, (seed, range) prevent it being used with UFCS properly. If it were (range, seed) then there would be no problem:
>
> [1,1,1].reduce!"a + b + 2"(0).writeln; // == 9

This would be better, yes. Reduce was written before there was UFCS, so they couldn't take this into account. I don't know if it is possible for this to be changed now, as it would definitely break some code.
October 03, 2012
>A typical use case is to find the maximum of a range (there is an example of this in the documentation at http://dlang.org/phobos/std_algorithm.html#reduce). If you don't know the highest possible value of elements, then you couldn't come up with an appropriate seed for this case.

Fixing unseeded reduce would not remove that use case though, it just happens to be a case that's not broken by the way unseeded reduce works, why couldn't unseeded reduce compare a (the first datum) to itself? That would discover a isn't greater than a so leave a in position 0 and compare with all of the other elements. It would also avoid breaking uses like "a + b + 2" and "a + b * 2" as well as a raft of more complex uses that reduce should be able to do but requires map and then reduce at present.

Walter Bright has a subtle bug in his code (the combined sum and square reduce) in his latest Dr Dobbs due to the behaviour of unseeded reduce. I'd call that more broken than unintuitive.

int[] arr = [1,2,3,4,5];
auto r = arr.reduce!((a,b) => a + b, (a,b) => a + b * b);
writefln("sum = %s, sum of squares = %s", r[0], r[1]);

If you change the value of the first number to 2 the square sum is an incorrect 56 when it should be 58.
October 03, 2012
>> Stuff

Actually now I realise what the conflict is between the two, a + b * b would give the wrong answer when applied to the whole array in the manner I was thinking by doing a + a * a for the first value.