February 22, 2006
"Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:dthe8b$1jg1$1@digitaldaemon.com...
> Lionello Lunesu skrev:
>> Interesting indeed.
>>
>> Is there no way to "fold constants" in this kind of code too? If you know the inputs to a function are all constant, can't you simply replace the inputs + function call with the function's output?
>
> Disclaimer: I don't know much about this. Most is pure speculation.
>
> I guess it is theoretically possible, but the compiler has to know that the function is pure. That is:
>
> a) The function can not have any side effects.

Good point. Completely forgot about that.

> b) The result has to be deterministic and only depend on the arguments.

Yeah, imagine de compiler calling rand(), taking a void (very constant), returning 123 or so.. assuming it's constant! :-)

> a) Hard for the compiler to tell if a function is pure. In many cases it is not even possible (The halting problem has an example of such an undecidable function).

Let's see. If the function only uses the inputs, without even unreferencing them, then it's pretty clear I suppose. But you're right, it's complex.

> b) The compiler needs a way to evaluate the function at compile time.

That's easy, by just calling it.

> c) The compiler has no way of knowing the function space and time complexity.

How is this different from a) ?

> It would be interesting if there was a way to flag functions as being pure. The compiler could then try to evaluate the function at compile time or reduce the number of calls to the function at run time similar to what a common sub-expression removal optimization would do.

Indeed. Something like C++ "const", but then for real, and not removable by a cast. A "pure" function would simply have a number of restrictions, I suppose something like: not allowed to reference any data outside the function (globals, class members, etc).

Lio.


February 22, 2006
Lionello Lunesu skrev:
> "Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:dthe8b$1jg1$1@digitaldaemon.com...
>> a) Hard for the compiler to tell if a function is pure. In many cases it is not even possible (The halting problem has an example of such an undecidable function).
> 
> Let's see. If the function only uses the inputs, without even unreferencing them, then it's pretty clear I suppose. But you're right, it's complex.

The function has to halt also. An infinite loop can be impossible for the compiler to detect. One would not want the compilation to hang.

>> c) The compiler has no way of knowing the function space and time complexity.
> 
> How is this different from a) ?

This is similar to a), but since a) is provable undecidable, probably not harder. :)

If the function call takes five hours to complete, the compilation would take five hours times the number of times the function got called with different arguments. Also, if the function uses 2 gb of stack space, the compiler might run out of memory...
The compiler would have to execute the function for a certain amount of time, and break it if it doesn't return.

/ Oskar
February 22, 2006
Lionello Lunesu wrote:
> Interesting indeed.
> 
> Is there no way to "fold constants" in this kind of code too? If you know the inputs to a function are all constant, can't you simply replace the inputs + function call with the function's output?

If the function can be inlined and the operations it contains are also subject to constant folding then the optimizer should already do this. Otherwise, while it's possible in some cases I don't know of a compiler that does this.  I believe this has been talked about on the C++ forums as "atomic functions."


Sean
February 23, 2006
Walter Bright wrote:
> "Georg Wrede" <georg.wrede@nospam.org> wrote
>> Walter Bright wrote:
>> 
>>> If the compiler is to constant fold regular expressions, then it needs to build in to the compiler exactly what would happen if
>>> the regex code was evaluated at runtime.
>> 
>> Yes. IMHO in essence, the binary machine code, which the runtime
>> also would build. What I have a hard time seeing is, how this
>> differs from building a normal function at compile time?
> 
> Consider the strlen() function. Compiling a strlen() function and
> generating machine code for it is a very different thing from the
> compiler knowing what strlen is and replacing:
> 
> strlen("abc")
> 
> with:
> 
> 3

Either I'm getting too old for this business, or you're only giving pseudo answers.

(1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here?

(2) {-- and this I've so far avoided to bring up, out of courtesy --},
if Don can do it with templates, what's so impossible doing it the regular way??

-------------------

Just a cross-check: [I think] we're talking about compiling a single regular expression.

My definition: "a compiled regular expression" is any piece of machine code that takes *one string* as the argument, and returns (depending on which of the 2 kinds it is) either a boolean (as in found or not), or an integer denoting position of First Match.

Such a piece of machine code is a function that complies to one of the following signatures:

bool foo(char[]);      // match

int bar(char[]);       // search
February 23, 2006
Georg Wrede wrote:
> Walter Bright wrote:
>> "Georg Wrede" <georg.wrede@nospam.org> wrote
>>> Walter Bright wrote:
>>>
>>>> If the compiler is to constant fold regular expressions, then it needs to build in to the compiler exactly what would happen if
>>>> the regex code was evaluated at runtime.
>>>
>>> Yes. IMHO in essence, the binary machine code, which the runtime
>>> also would build. What I have a hard time seeing is, how this
>>> differs from building a normal function at compile time?
>>
>> Consider the strlen() function. Compiling a strlen() function and
>> generating machine code for it is a very different thing from the
>> compiler knowing what strlen is and replacing:
>>
>> strlen("abc")
>>
>> with:
>>
>> 3
> 
> Either I'm getting too old for this business, or you're only giving pseudo answers.
> 
> (1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here?

That would be OK. The issue is that the compiler is a tool for converting text to machine code. It has no mechanism for executing the machine code.

> (2) {-- and this I've so far avoided to bring up, out of courtesy --},
> if Don can do it with templates, what's so impossible doing it the regular way??

The compiler does have a mechanism for executing the "template language" at compile time, which is what my code is using. But, the template language (which I'll call Double D (DD) :-) ) is fundamentally different to the ordinary D language (eg, it has no variables). Conceivably, a compiler could convert a D function into a DD metafunction, provided that it doesn't write to any variables except at initialisation, and doesn't use any control structures other than "if-else" and "return",
and all of its parameters are compile-time constants. But that would be so restricted as to be almost useless.

Of course the compiler itself could have the DD code built into it, but DD is a horribly inefficient language, and it would be hideous to program from inside the compiler.

What could perhaps be done is to allow functions with all-constant parameters to be converted into overloads.

eg we have the DD metafunction

int strlenT!(char [] s)

Then, if we could define some kind of syntax like
const alias int strlen(char [] s) strlenT!(s);

as an overload of strlen, so that if all parameters are compile-time constants, then the reference to strlen becomes a template instantiation.

More generally, if the lookup mechanism for functions was changed to be:
If the first n parameters of a functions are all compile-time constants, C1, C2, ... with the remainder being variables or constants, V1, V2, ...
try to find a matching template.
eg, given
  func(C1, C2, C3, V1, V2, C4)
the following functions are looked for, in this order:
func!(C1, C2, C3)(V1, V2, C4);
func!(C1, C2)(C3, V1, V2, C4);
func!(C1)(C2, C3, V1, V2, C4);
func(C1, C2, C3, V1, V2, C4);

Note that as soon as a template is found, the search stops.
eg if there is a
func(C1, C2, C3) which doesn't have a (V1, V2, V3) member function, compilation will fail even if a function func(p1, p2, p3, p4, p5, p6) exists.

This is superficially akin to overloading 'const' parameters in C++, but unlike C++ "const" would actually mean "constant" and not just "I'm not _supposed_ to change it".
February 23, 2006
Don Clugston wrote:
> Georg Wrede wrote:
>> Walter Bright wrote:
>>> Georg Wrede wrote:
>>>> Walter Bright wrote:
>>>>
>>>>> If the compiler is to constant fold regular expressions, then it needs to build in to the compiler exactly what would happen if
>>>>> the regex code was evaluated at runtime.
>>>>
>>>> Yes. IMHO in essence, the binary machine code, which the runtime
>>>> also would build. What I have a hard time seeing is, how this
>>>> differs from building a normal function at compile time?
>>>
>>> Consider the strlen() function. Compiling a strlen() function and
>>> generating machine code for it is a very different thing from the
>>> compiler knowing what strlen is and replacing:
>>>
>>> strlen("abc")
>>>
>>> with:
>>>
>>> 3
>>
>> Either I'm getting too old for this business, or you're only giving pseudo answers.
>>
>> (1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here?
> 
> 
> That would be OK. The issue is that the compiler is a tool for converting text to machine code. It has no mechanism for executing the machine code.

Aaaaaah... heureka.

So there's a wavelength problem here!

What I've been talking all along, is 'a regexp compiled into a function, but _not_run_ at compile time.

** So, Don's regexps can be both "compiled" and "run" at compile time, whereas what I've been wishing all along is a "compile-time compiled but not compile-time run" regexp!

In other words, a profoundly normal function, just that it happens to be written in RegexpLanguage instead of vanilla D (Or C, or asm).

(Gees, I hope this same wavelength problem wasn't the reason for last winter's unsuccessful regexp discussions.) :-(
February 23, 2006
--- this post of mine is more or less academic ---

Don Clugston wrote (in D.announce:2783):


....

> That would be OK. The issue is that the compiler is a tool for converting text to machine code. It has no mechanism for executing the machine code.

....

> The compiler does have a mechanism for executing the "template language" at compile time, which is what my code is using. But, the template language (which I'll call Double D (DD) :-) ) is fundamentally different to the ordinary D language (eg, it has no variables). Conceivably, a compiler could convert a D function into a DD metafunction, provided that it doesn't write to any variables except at initialisation, and doesn't use any control structures other than "if-else" and "return",
> and all of its parameters are compile-time constants. But that would be so restricted as to be almost useless.
> 
> Of course the compiler itself could have the DD code built into it, but DD is a horribly inefficient language, and it would be hideous to program from inside the compiler.

This brings us squarely back to a year ago, when I was discussing metaprogramming at length. (In several D newsgroups here.)

My original idea was that we should have 3 levels of language in "D".

 - asm
 - "normal D"
 - the metalanguage

AND that all of the meta stuff would be in a language with its own syntax and semantics, distinct from "normal D".

The idea was, that on all of these 3 levels of programming, the required  (as well as the most common) idioms are very different. Thus meta level concepts will be hard and murky to express in "normal D", or "an outgrowth of normal D".

I got the impression (perhaps mistakenly?) that Walter was strongly opposed to having a meta-level-specific "language" on top of D, while I was against the "D-outgrowth" metalanguage. Well, seems we've got right there now.

(--------------

> What could perhaps be done is to allow functions with all-constant parameters to be converted into overloads.

This is definitely a worthy idea.

-------------)

Back to the issue at hand:

What I'm still saying, is that (even though Walter and Don have made completely kick-ass progress here), there's a glass ceiling waiting for us at this rate. By the time we have evolved the D meta business into something that has cleared all major obstacles, it will be a hard to use, cumbersome, and unobvious metalanguage. (( but still a hell of a lot nicer and of course much more powerful than C++ templates ever ))

Thus, it would behoove us to start examining the features and properties of a from-ground-up designed meta level language for D. (Definitely not ever even intended to anything earlier than D2.0!)

A year ago this was a pipe dream. But now, with the experience we gain from the existing meta language, and (foremost) since I'm not alone anymore in thinking that such an undertaking is actually within reach, we really could start outlining it now!

---

So I suggest that we:

 - continue developing the current D metalanguage
 - continue to have both Don's and Phobos' regexps
 - in this NG start the development of requirements for KBDM

KBDM being Kick-Butt D Metalanguage. :-)
February 23, 2006
(I put stuff in D.dtl.)

georg
February 23, 2006
Georg Wrede wrote:
> Don Clugston wrote:
>> Georg Wrede wrote:
>>> Walter Bright wrote:
>>>> Georg Wrede wrote:
>>>>> Walter Bright wrote:
>>>>>
>>>>>> If the compiler is to constant fold regular expressions, then it needs to build in to the compiler exactly what would happen if
>>>>>> the regex code was evaluated at runtime.
>>>>>
>>>>> Yes. IMHO in essence, the binary machine code, which the runtime
>>>>> also would build. What I have a hard time seeing is, how this
>>>>> differs from building a normal function at compile time?
>>>>
>>>> Consider the strlen() function. Compiling a strlen() function and
>>>> generating machine code for it is a very different thing from the
>>>> compiler knowing what strlen is and replacing:
>>>>
>>>> strlen("abc")
>>>>
>>>> with:
>>>>
>>>> 3
>>>
>>> Either I'm getting too old for this business, or you're only giving pseudo answers.
>>>
>>> (1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here?
>>
>>
>> That would be OK. The issue is that the compiler is a tool for converting text to machine code. It has no mechanism for executing the machine code.
> 
> Aaaaaah... heureka.
> 
> So there's a wavelength problem here!
> 
> What I've been talking all along, is 'a regexp compiled into a function, but _not_run_ at compile time.

Oh dear, I think I've just confused you. I was only referring to strlen, not to regexps. I was trying to explain Walter's statement about why it's difficult for a compiler writer.

> ** So, Don's regexps can be both "compiled" and "run" at compile time, whereas what I've been wishing all along is a "compile-time compiled but not compile-time run" regexp!

No, you were right the first time. At compile time, the regexp pattern string is compiled into an ordinary function.

Example: the trivial case

bool b = test!("abc")(str);

compiles to something like:

int test_a(char [] str)
{
  return str.length>=3 && str[0..3]=="abc";
}

bool b = test_a(str);

It doesn't actually call the test_a function at compile time.

It's only something like strlen!("abc"), where all of the parameters are known at run time, which is "run" at compile time. In the regexp case, it's the "make a regexp engine" code which is run at compile time. The engine itself is only run at runtime.

> In other words, a profoundly normal function, just that it happens to be written in RegexpLanguage instead of vanilla D (Or C, or asm).

Exactly.
February 23, 2006
Keeping all this at an academic level, what do you have in mind for the syntax of this meta-language?

-Craig