Thread overview
Re: Compile-time optimization
Jul 25, 2013
H. S. Teoh
Jul 25, 2013
H. S. Teoh
Jul 26, 2013
JS
Jul 26, 2013
JS
Jul 26, 2013
JS
July 25, 2013
On Thu, Jul 25, 2013 at 03:47:23PM +0200, JS wrote:
[...]
> Cool, this is definitely on the right track! I like how you are able to generate the expansion reduction without string mixins(the naive way I would try and do it).

String mixins are sorta like nuclear warheads, a last resort when
absolutely nothing else will work. They're usually overkill for most
tasks. Generally, other less drastic alternatives should be tried first.
:)


> The main things I was after are achieved: Loop unrolling, mixed-time constant folding, and variable arguments.

While it has been a fun exercise, I still think loop unrolling is in the province of the compiler's optimizer (e.g. gdc -O3). Unless you're writing performance-critical inner loops in a 3D game engine or a hard real-time application, it seems like premature optimization to me.

As for constant folding, I wonder if expression templates may be a better approach for the general case. Tuple reduction works well when the reduction can be done pairwise, but in more complicated cases, like optimizing matrix expressions, say, you probably need something more sophisticated.

Still, I think it would be cleaner if the language provided a way for the compiler to support such optimizations, so that we can take advantage of its built-in optimizer instead of reinventing the wheel in user code. I've often thought about how one might go about specifying expression reduction rules for user-defined types, like associativity for matrix expressions, etc., that tell the compiler what kind of rules the custom operators obey, that the optimizer can make use of.


> While I think I understand the overall idea you are using it will take me a bit to grasp exactly how you were able to get it all done correctly.

Maybe you should take the time to study the code, then next time you can do it yourself. ;-)


> I have two requests if you don't mind and have the time...
> 
> Allow for array reduction and delim/indexing. When using join we typically want to join an array of strings while also inserting a delimiter between them... usually with the last one removed.

We already have std.algorithm.join... yes, yes I know you want compile-time optimization, so that can be an interesting exercise for your learning. ;-)

But for starters, one way to achieve this is to preprocess your tuple to insert delimiters between elements, then hand it over to tupleReduce to optimize it. Something like this might do it:

	// Warning: untested code
	template insertDelim(string delim, args...) {
		static if (args.length > 1) {
			alias insertDelim = tuple!(args[0], delim,
				insertDelim!(args[1..$]));
		} else {
			alias insertDelim = args;
		}
	}

	// This should do the trick. Use insertDelim to insert the
	// delimiters, then tupleReduce it to concatenate all
	// compile-time-known arguments, then hand over the rest to the
	// runtime function as before.
	template join(string delim, args...) {
		string join() {
			return joinImpl(tupleReduce!(
				(string x, string y) => x~y,
				insertDelim!(delim, args)));
		}
	}

If I didn't make a mistake in the above code, it should be able to optimize:

	join!(",", "a", "b", x, "c", "d");

into:

	"a,b," ~ x ~ ",c,d"


> Sometimes we want to know the index and maximum array size to do specialized joins.
[...]

That's trivial, tuples can be indexed just like arrays at compile-time, and have a .length property just like arrays.


T

-- 
"Real programmers can write assembly code in any language. :-)" -- Larry Wall
July 25, 2013
On Thu, Jul 25, 2013 at 07:57:30AM -0700, H. S. Teoh wrote:
> On Thu, Jul 25, 2013 at 03:47:23PM +0200, JS wrote:
[...]
> > Allow for array reduction and delim/indexing. When using join we typically want to join an array of strings while also inserting a delimiter between them... usually with the last one removed.
[...]

Sigh... I could never resist a challenge. Here's a full, working, compilable, runnable implementation:

	import std.stdio;

	template tuple(args...) {
		alias tuple = args;
	}

	/**
	 * Given a binary function and a tuple, returns a tuple in which all adjacent
	 * compile-time readable items are recursively substituted with the return
	 * value of the binary function applied to them.
	 */
	template tupleReduce(alias func, args...)
	{
		static if (args.length > 1)
		{
			static if (__traits(compiles, { enum x = args[0]; }))
			{
				static if (__traits(compiles, { enum x = args[1]; }) &&
					is(typeof(func(args[0], args[1]))))
				{
					enum result = func(args[0], args[1]);
					alias tupleReduce = tupleReduce!(func, result,
						args[2..$]);
				}
				else
				{
					alias tupleReduce = tuple!(args[0], args[1],
						tupleReduce!(func, args[2..$]));
				}
			}
			else
			{
				alias tupleReduce = tuple!(args[0],
					tupleReduce!(func, args[1..$]));
			}
		}
		else
		{
			alias tupleReduce = args;
		}
	}

	auto reduceImpl(alias fun, T...)(T args)
	{
		import std.functional;
		typeof(unaryFun!fun(args[0],args[0])) result;

		foreach (arg; args) {
			result = unaryFun!fun(result, arg);
		}
		return result;
	}

	template reduce(alias fun, args...)
	{
		string reduce() {
			return reduceImpl!fun(tupleReduce!(fun, args));
		}
	}

	template join(args...)
	{
		string join() {
			return reduce!((string x, string y) => x~y, args);
		}
	}

	template insertDelim(string delim, args...) {
		static if (args.length > 1)
			alias insertDelim = tuple!(args[0], delim,
				insertDelim!(delim, args[1..$]));
		else
			alias insertDelim = args;
	}

	template joinWithDelim(string delim, args...)
	{
		alias joinWithDelim = join!(insertDelim!(delim, args));
	}

	void joinDelimTest() {
		// Example of joinWithDelim
		string x = "runtime1";
		string y = "runtime2";
		string z = "runtime3";

		writeln(joinWithDelim!(":", "a", "b", x, "c", y, z, "d"));

		// Here's the proof that the delimiters have been concatenated with
		// adjacent items at compile-time where possible:
		writeln([ tupleReduce!((string x, string y) => x~y,
				insertDelim!(":", "a", "b", x, "c", y, z, "d")) ]);
	}

	void main() {
		joinDelimTest();
	}


The key points of note are the joinWithDelim template, which implements a fully-working join with specifiable delimiter, and the code in joinDelimTest() that illustrates how to use it and what it does. The program produces this output:

	a:b:runtime1:c:runtime2:runtime3:d
	["a:b:", "runtime1", ":c:", "runtime2", ":", "runtime3", ":d"]

The last line indicates that the delimiters have been concatenated with all adjacent compile-time readable arguments, where possible. So basically, we start with:

	"a" "b" x "c" y z "d"

then delimiters are inserted:

	"a" ":" "b" ":" x ":" "c" ":" y ":" z ":" "d"

then this is optimized at compile-time into:

	"a:b:" x ":c:" y ":" z ":d"

By capturing this stage in an array, we get to see what the optimized tuple looks like (2nd line of output), before joinWithDelim, at runtime, finally interpolates x, y, and z into the result (1st line of output).

Looking at the disassembly of the program, compiled with gdc -O3, the call to joinWithDelim is inside its own function (reasonable, since it's rather large), in which there are 7 array concatenations, as expected. (If the delimiters weren't optimized at compile-time, we'd have 13 concatenations instead.) That is, it's as if we wrote:

	"a:b" ~ x ~ ":c:" ~ y ~ ":" ~ z ~ ":d"

(The first concatenation happens on the empty array of the return value; that's why there are 7 rather than 6.)

Exercise for the reader: ;-)

	Large numbers of array concatenations are not efficient at
	runtime.  Ideally, if the number of elements to be concatenated
	is > N, for some threshold N, we should use std.array.appender
	instead of the built-in ~.

	Rewrite the code to do this switchover to std.array.appender
	given some value of N, say 5.

	(Hint: since the length of tuples is conveniently available at
	compile-time as .length, we can just check if .length is greater
	than 5, and if so, instead of using ~ we call a different
	template function that declares a string appender, then use .put
	on it in a foreach over the tuple -- remember that foreach over
	tuples are expanded at compile-time.)

	(Unfortunately, this means that we can't use reduce!() anymore,
	since appender can only be used at runtime; at compile-time we
	still have to use ~. So we can't use the same lambda at both
	compile-time and runtime.)


T

-- 
Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- Michael Beibl
July 26, 2013
Thanks.. I am working on getting the code to work with unrolling an array:


module main;

import std.stdio, std.traits;

template tuple(args...)
{
    alias tuple = args;
}

template tUnrollArray(args...)
{
	static if (isArray!(typeof(args)))
	{
		pragma(msg, "isarray");
		static if (args.length > 1)
		{
			pragma(msg, "unrolling");
			alias tUnrollArray = tuple!(arg[0], args[1..$]);
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArray = tuple!(args[0][0], tUnrollArray!(args[0][1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArray = tuple!(args[0][0]);
			}
		}
	}
	else
		alias tUnrollArray = args;
}

void main(string[] argv)
{
	auto z = ["a", "b"];
        writeln(tUnrollArray!(["a", "b"]));
	writeln(tUnrollArray!(z));    // fails
	readln();
}


which seems to follow along the lines of your tuple reduce.

The problem is that I would like for it to unroll compile time known variables. I think we talked about that already and you said it wasn't a good idea because of potential infinite loops for some types of templates(which, I still think in most cases it should be ok, such as above.

Making z immutable does work but I don't get why making it static doesn't.

In any case, if the code looks correct I'll try and add the ability to have tupleReduce join elements that are arrays.

The alias stuff is starting to make some sense but having to do recursion for simple iterations isn't very fun ;/
July 26, 2013
The following code works better. Forgot to allow for multiple arguments. Runtime variables are not unrolled though but I'll work on understanding how you were able to unify the two as one(I think in your reduceImpl).

module main;

import std.stdio, std.traits;

template tuple(args...)
{
    alias tuple = args;
}

template tUnrollArgs(args...)
{
	static if (isArray!(typeof(args[0])))
	{
		pragma(msg, "isarray");
		static if (args.length > 1)
		{
			pragma(msg, "unrolling");
			alias tUnrollArgs = tuple!(tUnrollArgs!(args[0]), args[1..$]);
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArgs = tuple!(args[0][0], tUnrollArgs!(args[0][1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArgs = tuple!(args[0][0]);
			}
		}
	}
	else
		static if (args[0].length > 1)
			alias tUnrollArgs = tuple!(args[0], tUnrollArgs!(args[0][1..$]));
		else
			alias tUnrollArgs = args;
}

void main(string[] argv)
{
	auto z = ["1", "2"];
    writeln(tUnrollArgs!(["a", "b"], z, "c"));
	readln();
}


July 26, 2013
oops, that was a mess... I guess I need to learn how to program. This one is not much better but at least works. I'll have to rewrite the code.


module main;

import std.stdio, std.traits;

template tuple(args...)
{
    alias tuple = args;
}

template tUnrollArgs(args...)
{
	pragma(msg, ">"~args.stringof);
	static if (args.length == 0)
		alias tUnrollArgs = tuple!();
	else
	static if (isArray!(typeof(args[0])))
	{
		pragma(msg, "isarray");
		static if (args[0].length > 1)
		{
			pragma(msg, "unrolling");
			static if (args[0].length > 1)
			{
				pragma(msg, "length > 1");
				alias tUnrollArgs = tuple!(tUnrollArgs!(args[0][0]), tUnrollArgs!(args[0][1..$],args[1..$]));
			}
			else
			{
				pragma(msg, "length == 0 || 1");
				alias tUnrollArgs = args[0];
			}
		}
		else
		{
			static if (args[0].length > 1)
			{
				pragma(msg, "1");
				alias tUnrollArgs = tuple!(args[0][0], tUnrollArgs!(args[0][1..$], args[1..$]));
			}
			else
			{
				pragma(msg, "2");
				alias tUnrollArgs = tuple!(args[0][0], tUnrollArgs!(args[1..$]));
			}
		}
	}
	else
		//static if (args[0].length > 1)
			//alias tUnrollArgs = tuple!(args[0], tUnrollArgs!(args[0][1..$]));
		//else
			alias tUnrollArgs = args;
}

void main(string[] argv)
{
	immutable auto z = ["1", "2"];
    writeln(tUnrollArgs!(["a", "b"], z, "c"));
	readln();
}