Compile-time optimization - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Compile-time optimization

Thread overview

Compile-time optimization
Jul 23, 2013 JS
Jul 23, 2013 Ali Çehreli
Jul 24, 2013 monarch_dodra
Jul 24, 2013 monarch_dodra
Jul 24, 2013 JS
Jul 24, 2013 Ali Çehreli
Jul 24, 2013 H. S. Teoh
Jul 24, 2013 JS
Jul 24, 2013 JS
Jul 24, 2013 monarch_dodra
Jul 23, 2013 H. S. Teoh
Jul 23, 2013 Jesse Phillips
Jul 23, 2013 Jesse Phillips
Jul 23, 2013 H. S. Teoh
Jul 23, 2013 H. S. Teoh
Jul 23, 2013 JS
Jul 23, 2013 Yota
Jul 23, 2013 Yota
Jul 23, 2013 JS
Jul 24, 2013 H. S. Teoh
Jul 24, 2013 JS
Jul 23, 2013 H. S. Teoh
Jul 23, 2013 H. S. Teoh

July 23, 2013

Compile-time optimization

Posted by JS

JS

There seems to be a lot of improvement for programming languages to optimize compile time aspects that are not taken into account. With ctfe I think such optimizations are more and more relevant in D.

I'll give a simple example:

Take a standard string joining function:

string join(string[] s)
{
    string ss;
    foreach(sss; s) ss ~= sss;
    return ss
}

when this string function is called at run-time with literal arguments it is extremely inefficient:

join(["a", "b", "c"]);

because the result can obviously be computed at compile-time by the compiler.

using ctfe can solve this problem but only when used in const results.

But there should be no reason why join(["a", "b", "c"]) can't be optimized at compile time so that the function call is replaced with "abc".

Now, think of join as defined like

string join(T...)(T t);

and we call it like

join("a", s, "b", "c", ss);

where s is string that is compile time evaluable and ss is a run-time string array.

the optimal join code would be something like

r = "a"~s~"bc"
foreach(sss; ss) r ~= sss;

(and which would change depending on the argument types and if they are compile time evaluable)


To generate such "optimal" code(one that gets the compiler to do as much work as it can at compile time) is somewhat convoluted in D. We can use string mixins to solve the problem but not easily because we can't pass variadic arguments to templates.

e.g.,

join(T...)(T s)
{
    mixin(tJoinH!(s))
}

fails because s is not known at compile time... even if s is partially known.

It would be nice if we could pass the arguments to a template even if they are not completely known at compile time.

e.g., join("a", s),

tJoinH can receive "a" and s but s's value, if not compile-time known, is undefined.

Hence, tJoinH can optimize the arguments by attempting to join the compile-time arguments and insert optimal code for the run-time arguments.

That is, if we call a function with an argument list, we should be able to pass that to a template without any errors and be able to determine the value of each argument... if the argument is undefined, then that becomes it's value.

Another way to see it work is with sum:

sum(1,2,x);

int sum(T...)(T i)
{
    // return 1 + 2 + i[3]; for the call above, it is optimal and our goal to generate
    // return mixin(tSumH!(i)); // hypothetically generates 1 + 2 + i[3] for the call above
    // return mixin(tSumH2!(i)); // generates i[1] + i[2] + i[3]. Current possible way

    // naive approach(assumes all arguments are known only at run-time):
    int y;
    foreach(ii; i) y += ii;
    return y;
}


I'm able to achieve this partially by passing T, rather than i, to the template function and the variable name, i. This works but may not produce optimal results(referencing the stack instead of the value of a literal... which prevents it from being further optimized).

e.g., from above, instead of return 1 + 2 + i[3]; I would generate return i[1] + i[2] + i[3]; Not optimal because i[1] and i[2] are treated as run-time entities even though in the above case they are not.


If D had some model to create functions that can optimize the way they deal with their compile-time arguments and run-time arguments then it would be pretty easy to create such functions.

e.g.,

string join(T...)(T t)    // Note join has a variadic parameter
{
    ctfe {
         // called when join is used in ctfe, t can be used to get the values directly if they exist at compile time, else their value is undefined.
         // can call join here with run-time entities

    }

    // runtime version goes here
}

we might have something like this

string join(T...)(T t) // easy notation to constrain T on char, string, string[]
{
    ctfe
    {
        string code = "string s;";
        foreach(k, val; t)
        {
            if (isDefined!val)
                code ~= `s ~= "`~join(val)~`";`; // calls join as a ctfe on val.
            else
            {
                if (type == string[])
                    code ~= `foreach(ss; t[`~k~`]) s~= sss;`;
                } else code ~= `s ~= t[`~k~`];`;
            }
        }
        code ~= "return s;";
        mixin code;
    }

    // concatenates all arguments of t as if they were run-time arguments.
    string s;
    foreach(k, ss; t)
        static if (T[k].type == string[])
            foreach(sss; ss)
                s ~= sss;
        else
            s ~= ss;
    return s;
}


I'm not saying this works but the idea is that the ctfe block is executed at compile time WHEN there are non-compile time arguments(ok, maybe ctfe isn't the best keyword). The block then figures out how to best deal with the non-ct arguments.

In the above code note how join is called in the ctfe block to handle known compile time types. This is just standard ctfe execution. The real "magic" happens when it is dealing with non-ct arguments, in which case it builds up meta code which is used as a mixin(special statement at end of ctfe block) that is inserted.

So for join("a", s); the code would execute something like

at compile time(in ctfe block)
    code = `string s;`;
    "a" is defined so
    code ~= `s ~= "~join("a")~`";`;
        join("a") is called, since "a" is known, the non-ctfe block is called, so we actually have
    code ~= `s ~= "a";`;
 next for s(which is undefined at compile time)
    code ~= `s ~= t[1];`;
 end
    code ~= "return s;";
so the string mixin code looks like:

    string s;
    s ~= "a";
    s ~= t[1];
    return s;

or simplified,

   return "a"~t[1];

which is an optimal join code for the join call join("a", s);

(it would make more sense to call a template in the ctfe block so we don't have form code strings of code strings)


Note also that having such a ctfe block does not break existing code structures and pre existing functions can easily be upgraded for optimal behavior.


Having the compiler do as much optimization as possible should produce a significant speedup in most applications.

When a function is called with any known compile time entities, it is possible for an optimization to take place. I think if D incorporates some mechanism in it's language to do so it will go a long.

In fact, all this simply demonstrates that ctfe's are not as powerful as they can be. Calling a ctfe'able function with just one run-time variable will prevent it from being ctfe'ed and optimized completely... even though there is still some possibility for optimization.


Hopefully some will read this acutely and attempt to ponder its potential.  I expect the immediate naysayers who can't see past their own noses... and those that get hung up on non-essential details(I said that the block doesn't have to be called ctfe... oh hell, you do realize we don't even need to use a block?).

July 23, 2013

Re: Compile-time optimization

Posted by Ali Çehreli
in reply to JS

Ali Çehreli

Posted in reply to JS

On 07/23/2013 09:21 AM, JS wrote:

> string join(T...)(T t)    // Note join has a variadic parameter
> {
>      ctfe {

Are you aware of version(ctfe)?

> Hopefully some will read this acutely

Not me. :(

> and attempt to ponder its potential.

I understood that much though...

> I expect the immediate naysayers who can't see past their own noses... and
> those that get hung up on non-essential details(I said that the block
> doesn't have to be called ctfe... oh hell, you do realize we don't even need
> to use a block?).

That's me! :)

Ali

July 23, 2013

Re: Compile-time optimization

Posted by H. S. Teoh
in reply to JS

H. S. Teoh

Posted in reply to JS

On Tue, Jul 23, 2013 at 06:21:08PM +0200, JS wrote:
> There seems to be a lot of improvement for programming languages to optimize compile time aspects that are not taken into account. With ctfe I think such optimizations are more and more relevant in D.
[...]
> But there should be no reason why join(["a", "b", "c"]) can't be optimized at compile time so that the function call is replaced with "abc".
[...]
> To generate such "optimal" code(one that gets the compiler to do as much work as it can at compile time) is somewhat convoluted in D. We can use string mixins to solve the problem but not easily because we can't pass variadic arguments to templates.
> 
> e.g.,
> 
> join(T...)(T s)
> {
>     mixin(tJoinH!(s))
> }
> 
> fails because s is not known at compile time... even if s is partially known.
[...]

Your CTFE-fu isn't good enough. ;-) Check this out:

	import std.stdio;

	template tuple(args...) {
		alias tuple = args;
	}

	/**
	 * Given a tuple of strings, returns a tuple in which all
	 * adjacent compile-time readable strings are concatenated.
	 */
	template tupleReduce(args...)
	{
		static if (args.length > 1)
		{
			static if (is(typeof(args[0])==string) &&
				__traits(compiles, { enum x = args[0]; }))
			{
				static if (is(typeof(args[1])==string) &&
					__traits(compiles, { enum x = args[1]; }))
				{
					alias tupleReduce = tupleReduce!(args[0] ~
								args[1], args[2..$]);
				}
				else
				{
					alias tupleReduce = tuple!(args[0], args[1],
								tupleReduce!(args[2..$]));
				}
			}
			else
			{
				alias tupleReduce = tuple!(args[0],
							tupleReduce!(args[1..$]));
			}
		}
		else
		{
			alias tupleReduce = args;
		}
	}

	void main() {
		string x = "runtime1";
		string y = "runtime2";
		auto arr = [ tupleReduce!("a", "b", x, "c", "d", y, "e", "f") ];
		writeln(arr);
	}

The output is:

	["ab", "runtime1", "cd", "runtime2", "ef"]

which proves that all adjacent compile-time readable strings have been concatenated at compile-time.

Using tupleReduce, you can optimize calls to join by writing:

	join(tupleReduce!("a", "b", x, "c", "d"));

and it will be compiled as though you've written:

	join("ab", x, "cd");

Yeah, so tupleReduce looks a bit on the ugly side. But the point is that it works. So don't be so quick to say something is not possible in D, 'cos even with its current warts, D is one powerful beast.  :-)


T

-- 
One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot

July 23, 2013

Re: Compile-time optimization

Posted by Jesse Phillips
in reply to JS

Jesse Phillips

Posted in reply to JS

On Tuesday, 23 July 2013 at 16:21:10 UTC, JS wrote:
> Now, think of join as defined like
>
> string join(T...)(T t);
>
> and we call it like
>
> join("a", s, "b", "c", ss);

To summarize to the best of my knowledge, the request is to provide

string join(alias T...)() {}

With the call changed to:

join!("a", s, "b", "c", ss);

I could be wrong here.

I think think this is a reasonable request, but haven't looked at any implications of adding it.

July 23, 2013

Re: Compile-time optimization

Posted by H. S. Teoh

H. S. Teoh

On Tue, Jul 23, 2013 at 11:09:24AM -0700, H. S. Teoh wrote: [...]
> Using tupleReduce, you can optimize calls to join by writing:
> 
> 	join(tupleReduce!("a", "b", x, "c", "d"));
> 
> and it will be compiled as though you've written:
> 
> 	join("ab", x, "cd");
[...]

I've generalized tupleReduce to handle reduction of tuples by any CTFE-able binary function:

	import std.stdio;

	template tuple(args...) {
		alias tuple = args;
	}

	/**
	 * Given a binary function and a tuple, returns a tuple in which all adjacent
	 * compile-time readable items are recursively substituted with the return
	 * value of the binary function applied to them.
	 */
	template tupleReduce(alias func, args...)
	{
		static if (args.length > 1)
		{
			static if (__traits(compiles, { enum x = args[0]; }))
			{
				static if (__traits(compiles, { enum x = args[1]; }) &&
					is(typeof(func(args[0], args[1]))))
				{
					enum result = func(args[0], args[1]);
					alias tupleReduce = tupleReduce!(func, result,
						args[2..$]);
				}
				else
				{
					alias tupleReduce = tuple!(args[0], args[1],
						tupleReduce!(func, args[2..$]));
				}
			}
			else
			{
				alias tupleReduce = tuple!(args[0],
					tupleReduce!(func, args[1..$]));
			}
		}
		else
		{
			alias tupleReduce = args;
		}
	}

	/**
	 * Given a tuple of strings, returns a tuple in which all adjacent compile-time
	 * readable strings are concatenated.
	 */
	template tupleStringReduce(args...)
	{
		alias tupleStringReduce = tupleReduce!(
			(string x, string y) => x~y, args
		);
	}

	void main() {
		string x = "runtime1";
		string y = "runtime2";
		auto arr = [ tupleStringReduce!("a", "b", x, "c", "d", y, "e", "f", "g", x) ];
		writeln(arr);

		int i1=100, i2=200;
		auto brr = [ tupleReduce!((int x, int y) => x+y, 1, 2, i1, 3, 4, i2, 5, 6, 7) ];
		writeln(brr);
	}

The output is:

	["ab", "runtime1", "cd", "runtime2", "efg", "runtime1"]
	[3, 100, 7, 200, 18]

For the string case, to make user code more convenient, I've written an alias template tupleStringReduce() that basically supplies the binary function for concatenating adjacent strings.

The second case shows how this can be explicitly specified via a function literal. The example here reduces adjacent compile-time readable ints via addition; you could substitute that with any binary function you want, say (x,y)=>x*y, or (x,y)=>x*2-y, or whatever. The only limit is that the function must be CTFE-able.

With the help of the new tupleReduce, you can do all sorts of compile-time optimizations.

D gives you power -- awesome power -- but you do have to learn how to tame the beast first. ;-)


T

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth

July 23, 2013

Re: Compile-time optimization

Posted by H. S. Teoh

H. S. Teoh

On Tue, Jul 23, 2013 at 12:42:18PM -0700, H. S. Teoh wrote: [...]
> With the help of the new tupleReduce, you can do all sorts of compile-time optimizations.
[...]

Here's an example of how to implement compile-time optimization of join() with the help of tupleReduce:

	// Just for the sake of example, the following function uses ","
	// to join the strings in the arguments.
	string join(string[] args...) {
		string result;
		string delim;
		foreach (arg; args) {
			result ~= delim ~ arg;
			delim = ",";	// just for example
		}
		return result;
	}

	// This is the wrapper that uses tupleReduce (see my previous
	// post) to reduce the arguments at compile-time. For
	// illustration purposes, I omitted the "," from the reduction
	// function, so that you can tell from the output what has been
	// reduced at compile-time, and what is reduced at runtime.
	template optimizedJoin(args...)
	{
		string optimizedJoin() {
			return join(tupleReduce!((string x, string y) => x~y, args));
		}
	}

	void main() {
		writeln(optimizedJoin!(x, "a", "bc", y, "de", "f"));
	}

The output is:

	runtime1,abc,runtime2,def

As you can see, "a" and "bc" have been concatenated at compile-time, and so have "de" and "f". Since x and y can't be read at compile-time, they remain as variable arguments. The optimizedJoin template then collects the elements of the reduced tuple and hands it off to join() to do the rest of the work at runtime.

Notice that join() uses "delim ~ arg" to do the concatenation at
runtime; we could ostensibly replace this with the same function literal
we pass to tupleReduce in optimizedJoin(). Then you'll have a full
implementation of join() that performs compile-time optimization of its
parameters.

I rest my case. :-)


T

-- 
In order to understand recursion you must first understand recursion.

July 23, 2013

Re: Compile-time optimization

Posted by H. S. Teoh

H. S. Teoh

On Tue, Jul 23, 2013 at 12:58:11PM -0700, H. S. Teoh wrote: [...]
> 	void main() {

Oops, forgot to include the definition of x and y in main():

	string x = "runtime1";
	string y = "runtime2";

> 		writeln(optimizedJoin!(x, "a", "bc", y, "de", "f"));
> 	}


T

-- 
People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG

July 23, 2013

Re: Compile-time optimization

Posted by Jesse Phillips
in reply to H. S. Teoh

Jesse Phillips

Posted in reply to H. S. Teoh

On Tuesday, 23 July 2013 at 18:10:54 UTC, H. S. Teoh wrote:
> 	void main() {
> 		string x = "runtime1";
> 		string y = "runtime2";
> 		auto arr = [ tupleReduce!("a", "b", x, "c", "d", y, "e", "f") ];
> 		writeln(arr);
> 	}

Heh, missed this. Forgot about recursive aliasing too.

My suggestion also is bogus since literals can't be aliased.

July 23, 2013

Re: Compile-time optimization

Posted by JS
in reply to H. S. Teoh

JS

Posted in reply to H. S. Teoh

On Tuesday, 23 July 2013 at 19:59:46 UTC, H. S. Teoh wrote:
> On Tue, Jul 23, 2013 at 12:42:18PM -0700, H. S. Teoh wrote:
> [...]
>> With the help of the new tupleReduce, you can do all sorts of
>> compile-time optimizations.
> [...]
>
> Here's an example of how to implement compile-time optimization of
> join() with the help of tupleReduce:
>
> 	// Just for the sake of example, the following function uses ","
> 	// to join the strings in the arguments.
> 	string join(string[] args...) {
> 		string result;
> 		string delim;
> 		foreach (arg; args) {
> 			result ~= delim ~ arg;
> 			delim = ",";	// just for example
> 		}
> 		return result;
> 	}
>
> 	// This is the wrapper that uses tupleReduce (see my previous
> 	// post) to reduce the arguments at compile-time. For
> 	// illustration purposes, I omitted the "," from the reduction
> 	// function, so that you can tell from the output what has been
> 	// reduced at compile-time, and what is reduced at runtime.
> 	template optimizedJoin(args...)
> 	{
> 		string optimizedJoin() {
> 			return join(tupleReduce!((string x, string y) => x~y, args));
> 		}
> 	}
>
> 	void main() {
> 		writeln(optimizedJoin!(x, "a", "bc", y, "de", "f"));
> 	}
>
> The output is:
>
> 	runtime1,abc,runtime2,def
>
> As you can see, "a" and "bc" have been concatenated at compile-time, and
> so have "de" and "f". Since x and y can't be read at compile-time, they
> remain as variable arguments. The optimizedJoin template then collects
> the elements of the reduced tuple and hands it off to join() to do the
> rest of the work at runtime.
>
> Notice that join() uses "delim ~ arg" to do the concatenation at
> runtime; we could ostensibly replace this with the same function literal
> we pass to tupleReduce in optimizedJoin(). Then you'll have a full
> implementation of join() that performs compile-time optimization of its
> parameters.
>
> I rest my case. :-)
>
>
> T


I must study your powerful ctfe-fu master! Great fu you have!

This is great that D can do this and from just a quick glance it looks like you solve the problem.

Your code doesn't exactly do what I wanted but probably because I wasn't clear... I think it is modifiable and solves the problem I was having(again, assuming your code is correct).

Note though that something like

string x = "asd"
join(x,...); // x is compile time known and should be joined at compile time. Not sure if D is smart enough to get this one?

When I get some time later I'll check your code out, what I want to do is get something like

string x = join(a, b, "a", "b"); to produce the same code as
string x = a~b~"ab";

(no looping using for each except for string arrays)

Thanks!

July 23, 2013

Re: Compile-time optimization

Posted by Yota
in reply to JS

Yota

Posted in reply to JS

On Tuesday, 23 July 2013 at 20:11:21 UTC, JS wrote:
> Your code doesn't exactly do what I wanted but probably because I wasn't clear... I think it is modifiable and solves the problem I was having(again, assuming your code is correct).
>
> Note though that something like
>
> string x = "asd"
> join(x,...); // x is compile time known and should be joined at compile time. Not sure if D is smart enough to get this one?
>
> When I get some time later I'll check your code out, what I want to do is get something like
>
> string x = join(a, b, "a", "b"); to produce the same code as
> string x = a~b~"ab";
>
> (no looping using for each except for string arrays)
>
> Thanks!

Don't forget that tuples can store more than just types!  See this for instance.

  import std.stdio;

  typeof(Xs[0]) sum(Xs...)()
  {
      pragma(msg, Xs); // tuple(1, 2, 3)
      typeof(Xs[0]) y;
      foreach(x; Xs) y ~= x;
      return y;
  }

  int main() {
      auto n = sum!(1, 2, 3);
      enum m = sum!(1, 2, 3);

      pragma(msg, typeof(n)); // int
      writeln(n); // 6
      writeln(m); // 6

      return 0;
  }

Works great.  When I comment out the 'sum!("1", "2", "3")' and replace it with a literal '6', the compiled EXE is byte for byte exactly the same.  You can exchange the '+=' for '~=' and it will concatenate strings instead.  Of course, now all arguments MUST be known at compile time, but that's no surprise.

PS: OK, just noticed that you were hoping to mix run time and compile time arguments.  Oops....
Oh well, posting anyway because I think it's neat. lol

As for a~"B"~"C" compiling into a~"BC", I think we just need to work on constant folding in the compiler. (If it doesn't already do this anyway.)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation