February 16, 2007
Walter Bright wrote:
> janderson wrote:
>> Walter Bright wrote:
>>> Right now, the compiler will fail if the compile time execution results in infinite recursion or an infinite loop. I'll have to figure out some way to eventually deal with this.
>>
>> Maybe you could allow the user to specify stack size and maximum iteration per loop/recursion function to the compiler as flags (with some defaults).   This way the user can up the size if they really need it.  This would make it a platform thing.  That way a D compiler could still be made for less powerful systems.
> 
> Whether you tell it to fail at a smaller limit, or it fails by itself at a smaller limit, doesn't make any difference as to whether it runs on a less powerful system or not <g>.
> 
> The C standard has these "minimum translation limits" for all kinds of things - number of lines, chars in a string, expression nesting level, etc. It's all kind of bogus, hearkening back to primitive compilers that actually used fixed array sizes internally (Brand X, who-shall-not-be-named, was notorious for exceeding internal table limits, much to the delight of Zortech's sales staff). The right way to build a compiler is it either runs out of stack or memory, and that's the only limit.
> 
> If your system is too primitive to run the compiler, you use a cross compiler running on a more powerful machine.
> 
> I have thought of just putting a timer in the interpreter - if it runs for more than a minute, assume things have gone terribly awry and quit with a message.

Please don't.  All sorts of things can affect performance it means that there is a remote possibility that it will fail one time out of 10. This is particularly the case for build machines where they may be doing lots of things at once.  If it suddenly fails because of virtual memory thrashing or something, the programmer would get sent an annoying message "build failed".  If it works then it needs to work every time.

A counter or stack overflow of some sort would be much better.  Even if not specifiable by the programmer.

One way to use a timer though would be to display how long each bit took.  That way the programmer would be able to figure out how to improve compile-time performance.

-Joel
February 16, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> Walter Bright wrote:

> That could be achieved with a watchdog process without changing the compiler, and it's more flexible.
> 
> I think you just let the compiler go and crunch at it. Since you esentially have partial evaluation anyway, the execution process can be seen as extended to compile time. If you have a non-terminating program, that non-termination can be naturally manifest itself during compilation=partial evaluation.

It would be nice though, if the compiler could trap sigint or something and spit out an error message about which part of the code it was trying to compile when you killed it.

Otherwise debugging accidental infinite loops in compile time code becomes...interesting.

--bb
February 16, 2007
Walter Bright wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> This is by far the least interesting application of this stuff. I don't even count it when I think of the feature. "Oh, yeah, I could compile square root at compile time. How quaint."
> 
> I agree. I need a better example. Any ideas?

(Sorry that this got so long -- it kind of turned into a duffel bag of
things I've been thinking about.)

I think the most common case is like the plot for a TV show.  Code that
is semi-interesting, semi-predictable, and semi-repetitive.  If the code
is too interesting, you would need to write it all.  If it was too
repetitive, you could just use a standard D template (boilerplate).

Tasks that are in between -- non-boilerplate, but fairly 'formulaic', where the details are not interesting, is the candidate for this.

To me there are a couple special reasons that stick out.

1. You're building complex types and need to base them on standard
   definitions that might change.  I see this with ASN.1 definitions
   at work.  We have a program that builds C++ code from ASN.1 or
   XML schemas.

   A. Some of our programs need to stream 100s of MB of data so this
      code needs to be as fast as possible.

   B. If a field is added to the definition it has to appear in all
      the code objects.

   C. There is additional logic -- i.e. if a 'mandatory' field is
      not assigned, the serializer has to throw an exception.

2. You're building the inner loop in some performance critical
   application and the rules (expressions and conditional logic)
   used there need or benefit from ultra-optimization.

   This is what (in my view) compile time regex is for, I would
   normally use a runtime regex for ordinary things like parsing
   configuration files.

3. You need a lot of code duplication (i.e. to provide stub functions
   something) and don't want to repeat yourself to get it.

---

This is how I imagine it:

Some of these ideas have been kicking around in my head, but I'm not sure how practical they are.  When I use the word templates here but I mean any kind of code generation.

Starting scenario: Let's say I'm writing a program to solve some
mathematical task.

1. I create a top level class and add some members to it.

2. I add some sub-classes, a dozen or so, mostly just POD stuff.

3. Some of these have associative arrays, user-defined tree stuff,
   regular arrays, hand coded linked lists, etc.

4. I put in a bunch of graph theory code and number crunching stuff.


Uses of metaprogramming:

1. Now let's say that this application is taking a while to run, so I decide to run it in steps and checkpoint the results to disk.

- I write a simple template that can take an arbitrary class and
  write the pointer value and the class's data to disk.  (Actual data
  is just strings and integers, so one template should cover all of
  these classes.)

- For each internal container it can run over the members and do
  the same to every object with a distinct memory address.  (one
  more template is needed for each container concept, like AA,
  list or array -- say 4 or 5 more templates.  It only writes each
  object once by tracking pointers in a set.

- Another template that can read this stuff back in, and fix the
  pointers so that they link up correctly.

(** All of this is much easier than normal, because I can generate
types using typelists as a starting point.  I think in C++ this is
a bit trickly because it convolutes the structure definition --
recursively nested structs and all that; but with code generation, the
"struct builder" can take a list of strings and pump out a struct whose
definition looks exactly like a hand coded struct would look, but
maybe with more utilitarian functionality since its cheaper to add
the automated stuff. **)

Three or four templates later, I have a system for checkpointing any
data structure (with a few exceptions like sockets etc.), to a
string or stream.


2. I want to display this stuff to the user.

I bang together another couple of templates that can show these kinds
of code objects in a simple viewer.  It works just like the last one,
finding variable names and values and doing the writefln() types of
tricks to give the user the details.  Some kind of browser lets me
examine the process starting at the top.  Maybe it looks a little like
a flow chart and a little like a debugger's print of a structure.

- I can define hooks in the important kinds of objects so they can
override their own displays but simple data can work without much
help.


3. I want to build a distributed compute farm for this numerical task.

- I just need to change the serialization to stream the data objects
over the web or sockets, or queue the objects in SQL tables.  Some load
balancing, etc.  Another application that has the same class definitions
can pull in the XML or ASN.1 or home-made serialization format.

The trick here is that we need to be able to build templates that can
inspect the objects and trees of objects in complex ways -- does this
class contain a field named "password";  is this other field a computed
value that can be thrown away.  Does this other class override a method named 'optimizeForTransport'.

Adding arbitrary attributes and arbitrary bits of code and annotation to the classes is not too hard to do, because my original code generation functions used typelists and had hooks for specifying special behavior.


4. I decide to allow my ten closest friends to help with the application by rewriting important subroutines.

- Each person adds code for the application to an SQL database.  A simple script can now pull code from the database and dump it to text files.  This code can be imported into classes and run.

- I can generate ten different versions of a critical loop and select which one to run at random.  The timing output results is stored in a text file.  Later compiles of the code do "arc-profiling" of entire algorithms or modules.

Kevin


February 16, 2007
On Thu, 15 Feb 2007 11:36:54 -0800
Gregor Richards <Richards@codu.org> wrote:

> I see that I can't do this:
> 
> char[] someCompileTimeFunction()
> {
>      return "writefln(\"Wowza!\");";
> }
> 
> int main()
> {
>      mixin(someCompileTimeFunction());
>      return 0;
> }
> 
> 
> Any chance of compile-time code generation via this mechanism? Or are they simply handled in different, incompatible steps of compilation?

It seems to be bug.

> 
>   - Gregor Richards
> 
> PS: Yes, I realize this is a terrible idea ^^

NO. It is the most exciting usage of this feature.
I was waiting for this a long time, and can't
wait for using in really cool things. :)

-- 
Witold Baryluk
MAIL: baryluk@smp.if.uj.edu.pl, baryluk@mpi.int.pl
JID: movax@jabber.autocom.pl
February 16, 2007
Derek Parnell wrote:
> On Thu, 15 Feb 2007 17:52:38 -0700, Russell Lewis wrote:
>> if it took 7.5 million years to run on a supercomputer, how long is it going to take to run on your compiler?
> 
>   int CalculateTheAnswerToLifeUniverseEverything() { return 42; }

No, the function name says *calculate*, not *return* :P.

Also, nitpick: wasn't it "The Answer To _The Ultimate Question_ Of Life, the Universe and Everything"? (i.e. you missed a bit ;) )
February 16, 2007
Frits van Bommel napisaƂ(a):
> Derek Parnell wrote:
>> On Thu, 15 Feb 2007 17:52:38 -0700, Russell Lewis wrote:
>>> if it took 7.5 million years to run on a supercomputer, how long is it going to take to run on your compiler?
>>
>>   int CalculateTheAnswerToLifeUniverseEverything() { return 42; }
> 
> No, the function name says *calculate*, not *return* :P.
> 

I think that it was just a great example of brain-time constant folding :D

Regards
Marcin Kuszczak


February 16, 2007
Bill Baxter wrote:
> I'd like to write this:
> 
> char[] NReps(char[] x, int n)
> {
>     char[] ret = "";
>     for(int i=0; i<n; i++) { ret ~= x; }
>     return ret;
> }
> 
> But that doesn't work.

It does when I try it:
---------------------
import std.stdio;

char[] NReps(char[] x, int n)
{
    char[] ret = "";
    for(int i=0; i<n; i++) { ret ~= x; }
    return ret;
}

void main()
{
   static x = NReps("3", 6);
   writefln(x);
}
-----------------------
prints:

333333
February 16, 2007
Walter Bright wrote:
> Whether you tell it to fail at a smaller limit, or it fails by itself at a smaller limit, doesn't make any difference as to whether it runs on a less powerful system or not <g>.

I'd definitely prefer a way to make it fail early. When I was first trying out v1.006 I modified the sqrt() example to an infinite loop, and made the fatal mistake of switching focus away from my terminal window. My system ground to a halt as DMD tried to allocate pretty much all free memory + swap. (That's over 2 GB!) It got so bad it took a few minutes to switch to a virtual console and 'killall dmd'. And even after that it was slow for a while until the running programs got swapped back in...
Surely it would have been possible to detect something having gone horribly awry before it got this bad?

> If your system is too primitive to run the compiler, you use a cross compiler running on a more powerful machine.

My system is an AMD Sempron 3200+ with 1GB of RAM...

> I have thought of just putting a timer in the interpreter - if it runs for more than a minute, assume things have gone terribly awry and quit with a message.

This might be a good idea, but perhaps make the time depend on a command-line switch, and maybe add something for memory usage as well?
February 16, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> This is by far the least interesting application of this stuff. I don't even count it when I think of the feature. "Oh, yeah, I could compile square root at compile time. How quaint."
>>
>> I agree. I need a better example. Any ideas?
> 
> Well we talked about:
> 
> int a = foo();
> char[] b = bar();
> print("a is $a and b is $b, dammit\n");
> 
> The interesting part is that this will also require you to screw in a couple of extra nuts & bolts (that were needed anyway).

Would this mean a type of function whose return value is automatically mixed in? This is getting awfully close to LISP macros... :)

> Smart enums (that know printing & parsing) are another example. But the print() example is simple, of immediate clear benefit, and suggestive of more powerful stuff.
February 16, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> This is by far the least interesting application of this stuff. I don't even count it when I think of the feature. "Oh, yeah, I could compile square root at compile time. How quaint."
>>
>> I agree. I need a better example. Any ideas?
> 
> Well we talked about:
> 
> int a = foo();
> char[] b = bar();
> print("a is $a and b is $b, dammit\n");
> 
> The interesting part is that this will also require you to screw in a couple of extra nuts & bolts (that were needed anyway).


But add a "!" to the print, and it's already possible? What extra is needed, and is that just to get rid of the "!"?

L.