templatizing parameters and induction (was Re: Writing Bug-Free C/D Code) (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » templatizing parameters and induction (was Re: Writing Bug-Free C/D Code) (page 5)

March 23, 2007

templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by Kevin Bealer
in reply to Dan

Kevin Bealer

Posted in reply to Dan

Dan wrote:
...
> What should then happen is this:  The compiler should recognize that we
> only accept values such that x > y.  Knowing that, it should then verify
> that in all cases where this function can be called, that the input to the
> function must have this condition.  In cases where this *can* fail, it
> should raise an error - such as variables that are affected by user input
> or outside parameters that aren't already bounds checked or adjusted...
> 
> Theoretically, one could take any given int (between int.min and int.max)
> and adjust it such that x > y will be true for all cases.  This can be
> proven and verified by a proof by induction.
> 
> If you don't isolate the in/out tests from the body, a proof by induction
> couldn't know what needs to be proved to prove the code safe.

I think there are very few cases where this kind of induction could reasonably be done, unless x and y are literals.  I suspect that in most cases where the compiler had enough info to do this kind of induction checking, the code for the function could nearly as easily be "constant folded" and optimized away entirely.  Prove me wrong, but I suspect that this kind of induction is too hard (and in the cases where it works, too slow) to be practical.

If x and y are literals, though, you can do something like this now:

Original source:

int foo(int x, int y, int z)
{
    assert( x > y ); // runtime
...
}

int z = foo(10, 20, 30);

Checking at compile time:

int foo(int x, int y)(int z)
{
    static assert( x > y ); // compile time
...
}

int z = foo!(10, 20)(30);

It's a very simple transformation --- take any compile-time known values and make them template parameters.  In this case, the calling syntax even looks nearly the same!

For most of the practical cases where something is induction testable at compile time, this technique should get what you are suggesting.  You are marking the path of x and y, and the compiler will do the actual induction via constant folding.

If you don't have a constant 10 and 20 for x and y, then propagate the "templatization" of x and y up the call stack (making them template parameters at every level) until you get to the point where x and y are created or computed.

*Note: if you find a place where you can't compute x and y at compile time, induction would also have failed there.

Actually ... it would be interesting to fiddle with a compiler that does this "templatization" for all compile time known parameters, as an optimization --- it would have to be a global optimization of course.

I heard many years ago about an idea that you could take some of the input for a program and use it to optimize the entire program.  For example, you could take a python program and a python interpreter, and insert the python program into the "main" of the interpreter.  The idea was that you could then "optimize" the interpreter + script until both together were an optimized 'C' program that was equivalent to the script.  In theory, this works -- in real life, probably not very well.

Kevin

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by Andrei Alexandrescu (See Website For Email)
in reply to Kevin Bealer

Andrei Alexandrescu (See Website For Email)

Posted in reply to Kevin Bealer

Kevin Bealer wrote:
> Dan wrote:
> ...
>> What should then happen is this:  The compiler should recognize that we
>  > only accept values such that x > y.  Knowing that, it should then verify
>> that in all cases where this function can be called, that the input to the
>  > function must have this condition.  In cases where this *can* fail, it
>> should raise an error - such as variables that are affected by user input
>  > or outside parameters that aren't already bounds checked or adjusted...
>>
>> Theoretically, one could take any given int (between int.min and int.max)
>  > and adjust it such that x > y will be true for all cases.  This can be
>  > proven and verified by a proof by induction.
>>
>> If you don't isolate the in/out tests from the body, a proof by induction
>> couldn't know what needs to be proved to prove the code safe.
> 
> I think there are very few cases where this kind of induction could reasonably be done, unless x and y are literals.  I suspect that in most cases where the compiler had enough info to do this kind of induction checking, the code for the function could nearly as easily be "constant folded" and optimized away entirely.  Prove me wrong, but I suspect that this kind of induction is too hard (and in the cases where it works, too slow) to be practical.
> 
> If x and y are literals, though, you can do something like this now:
> 
> Original source:
> 
> int foo(int x, int y, int z)
> {
>     assert( x > y ); // runtime
> ...
> }
> 
> int z = foo(10, 20, 30);
> 
> Checking at compile time:
> 
> int foo(int x, int y)(int z)
> {
>     static assert( x > y ); // compile time
> ...
> }
> 
> int z = foo!(10, 20)(30);
> 
> It's a very simple transformation --- take any compile-time known values and make them template parameters.  In this case, the calling syntax even looks nearly the same!

The need to cater for cases with partial statically-available
information is seriously taken into consideration. The general scenario
is called "partial evaluation": you call a function with a few
statically-known values and a few runtime values. The current
compile-time evaluation mechanism can't help with that. So a considered
addition to D is to accommodate static parameters (pushed by yours truly
and using a syntax suggested by Dave Held):

int foo(int x, int y, int z);
int foo(static int x, int y, int z); // x is known during compilation
int foo(int x, static int y, int z); // y is known during compilation
int foo(static int x, static int y, int z); // x and y are... you know

Seeing that, the compiler does a little trick - it transforms each
"static" parameter into a template. For example, the second overload
becomes under the covers:

int foo(int x)(int y, int z)
{
  ... you can use e.g. static if (x > 0) ...
}

Then the foo's will overload without you doing anything special:

final a = to!(int)(readln), b = to!(int)(readln);
foo(a, b, b);  // firsts overload
foo(1, a + b, b); // 2nd
foo(a + b, a - b, b);  // 3rd
foo(42, 7, b); // 4th

Such partial evaluation scenarios are very useful for a host of
functions (writef/readf are the first to come to mind). Consider a
simple example: the function:

uint match_one_of(char[] set, char[] candidate);

Returns the position in candidate up to which candidate's characters can
be found inside set. For example, find("0123456789", "23a4") returns 2.

We have quite a few different evaluation scenarios:

1. set and candidate aren't known til runtime. Then find will gotta do
whatever it has to do dynamically to get the task done.

2. set is known statically, and candidate is dynamic. Well, this is a
very interesting case. The function can decide for different strategies
depending on set's size. For small sizes, it just does a series of
comparisons. For moderate sizes, linear search. For large sizes, hash. Cool!

3. candidate is known statically, set is dynamic. Probably this is an
infrequent case. Anyhow, all duplicate characters in candidate can be
removed upon searching, thus again saving time.

4. Both are known statically. The case becomes again uninteresting as
D's constant folding can just invoke (1) during compilation. Done.

Andrei

March 23, 2007

Re: Writing Bug-Free C/D Code

Posted by Reiner Pope
in reply to Dan

Reiner Pope

Posted in reply to Dan

Dan wrote:
> What should then happen is this:  The compiler should recognize that we only accept values such that x > y.  Knowing that, it should then verify that in all cases where this function can be called, that the input to the function must have this condition.  
Perhaps I'm misunderstanding what you say, but it seems like you are saying that the compiler could *always* analyse programs to determine whether the assert will ever be triggered. This is impossible for the general case as it solves the halting problem (you can reduce all exit conditions to x > y things).

> int doubleit(int x, int y)
> in {
>   // note: you say if y is negative, make it positive twice and test <-y?
>   // you then tested the original proposition ALSO!
>   assert(x > y);
> }
> body {
>   return x+x;
> }
> out(result) {
>   assert( result == x*2, format("doubleit(%d): Weird Math Error", x) );
> }
...
> If you don't isolate the in/out tests from the body, a proof by induction couldn't know what needs to be proved to prove the code safe.
I agree with how you would code it here, although I wouldn't expect the compiler to verify all my conditions at compile-time. To me, the point about the 'in' contracts is that they will only ever trigger at a bug. You hope not to have any of these when you release, so you are spending unnecessary processing time checking things which should always be true anyway.

If there is a bug, then either way you get a crash for the end user; an assert failure isn't really so much more pleasant than a segmentation violation for an end user.

> Walter, I'm not sure, but I think you were hoping (or did) make the Abstract Symbol Tree accessible?  If so, it may be possible to start a dsource project to accomplish a proof by induction engine based on the current implementation of contract programming.
Such a program would surely be very useful, but I'm not convinced it should be done through compile-time programming (automated theorem proving takes a long time: you wouldn't want to run it *every time* you compile, and you also wouldn't want to run it through CTFE, which is slow).

If anyone is interested, though, Sage (http://sage.soe.ucsc.edu/) and Spec# (http://research.microsoft.com/specsharp/) are research languages which attempt to statically verify as much in the contracts as possible.

Cheers,

Reiner

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by David B. Held
in reply to Andrei Alexandrescu (See Website For Email)

David B. Held

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> [...]
> int foo(int x, int y, int z);
> int foo(static int x, int y, int z); // x is known during compilation
> int foo(int x, static int y, int z); // y is known during compilation
> int foo(static int x, static int y, int z); // x and y are... you know
> [...]
> final a = to!(int)(readln), b = to!(int)(readln);
> foo(a, b, b);  // firsts overload
> foo(1, a + b, b); // 2nd
> foo(a + b, a - b, b);  // 3rd
> foo(42, 7, b); // 4th
> [...]

Great sir, could you impart upon us mere mortals the wisdom by which you are able to ascertain, at compile time, the difference between two values known only at runtime (a la the "3rd" overload)?  We are in great need of such Oracle-like behavior.

Dave

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by Andrei Alexandrescu (See Website For Email)
in reply to David B. Held

Andrei Alexandrescu (See Website For Email)

Posted in reply to David B. Held

David B. Held wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> [...]
>> int foo(int x, int y, int z);
>> int foo(static int x, int y, int z); // x is known during compilation
>> int foo(int x, static int y, int z); // y is known during compilation
>> int foo(static int x, static int y, int z); // x and y are... you know
>> [...]
>> final a = to!(int)(readln), b = to!(int)(readln);
>> foo(a, b, b);  // firsts overload
>> foo(1, a + b, b); // 2nd
>> foo(a + b, a - b, b);  // 3rd
>> foo(42, 7, b); // 4th
>> [...]
> 
> Great sir, could you impart upon us mere mortals the wisdom by which you are able to ascertain, at compile time, the difference between two values known only at runtime (a la the "3rd" overload)?  We are in great need of such Oracle-like behavior.

Errare humanum, perseverare diabolicum est :o).

Andrei

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by Andrei Alexandrescu (See Website For Email)
in reply to David B. Held

Andrei Alexandrescu (See Website For Email)

Posted in reply to David B. Held

David B. Held wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> [...]
>> int foo(int x, int y, int z);
>> int foo(static int x, int y, int z); // x is known during compilation
>> int foo(int x, static int y, int z); // y is known during compilation
>> int foo(static int x, static int y, int z); // x and y are... you know
>> [...]
>> final a = to!(int)(readln), b = to!(int)(readln);
>> foo(a, b, b);  // firsts overload
>> foo(1, a + b, b); // 2nd
>> foo(a + b, a - b, b);  // 3rd
>> foo(42, 7, b); // 4th
>> [...]
> 
> Great sir, could you impart upon us mere mortals the wisdom by which you are able to ascertain, at compile time, the difference between two values known only at runtime (a la the "3rd" overload)?  We are in great need of such Oracle-like behavior.

I also notice you're still posting from the future. You must make good money daytrading. Future postulae commotus est. :o)

Andrei

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by Reiner Pope
in reply to Andrei Alexandrescu (See Website For Email)

Reiner Pope

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> Kevin Bealer wrote:
>> Dan wrote:
>> ...
>>> What should then happen is this:  The compiler should recognize that we
>>  > only accept values such that x > y.  Knowing that, it should then verify
>>> that in all cases where this function can be called, that the input to the
>>  > function must have this condition.  In cases where this *can* fail, it
>>> should raise an error - such as variables that are affected by user input
>>  > or outside parameters that aren't already bounds checked or adjusted...
>>>
>>> Theoretically, one could take any given int (between int.min and int.max)
>>  > and adjust it such that x > y will be true for all cases.  This can be
>>  > proven and verified by a proof by induction.
>>>
>>> If you don't isolate the in/out tests from the body, a proof by induction
>>> couldn't know what needs to be proved to prove the code safe.
>>
>> I think there are very few cases where this kind of induction could reasonably be done, unless x and y are literals.  I suspect that in most cases where the compiler had enough info to do this kind of induction checking, the code for the function could nearly as easily be "constant folded" and optimized away entirely.  Prove me wrong, but I suspect that this kind of induction is too hard (and in the cases where it works, too slow) to be practical.
>>
>> If x and y are literals, though, you can do something like this now:
>>
>> Original source:
>>
>> int foo(int x, int y, int z)
>> {
>>     assert( x > y ); // runtime
>> ...
>> }
>>
>> int z = foo(10, 20, 30);
>>
>> Checking at compile time:
>>
>> int foo(int x, int y)(int z)
>> {
>>     static assert( x > y ); // compile time
>> ...
>> }
>>
>> int z = foo!(10, 20)(30);
>>
>> It's a very simple transformation --- take any compile-time known values and make them template parameters.  In this case, the calling syntax even looks nearly the same!
> 
> The need to cater for cases with partial statically-available
> information is seriously taken into consideration. The general scenario
> is called "partial evaluation": you call a function with a few
> statically-known values and a few runtime values. The current
> compile-time evaluation mechanism can't help with that. So a considered
> addition to D is to accommodate static parameters (pushed by yours truly
> and using a syntax suggested by Dave Held):
> 
> int foo(int x, int y, int z);
> int foo(static int x, int y, int z); // x is known during compilation
> int foo(int x, static int y, int z); // y is known during compilation
> int foo(static int x, static int y, int z); // x and y are... you know
> 
> Seeing that, the compiler does a little trick - it transforms each
> "static" parameter into a template. For example, the second overload
> becomes under the covers:
Is this in contrast to the

    int foo(const int x)(const int x, int y, int z);

syntax you mentioned earlier? If so, then I am very grateful you now support the much cleaner syntax you now do.

However, while I don't dispute that this is useful, aren't there also times when you want the compiler to do that implicitly? (Maybe the compiler already does, or there are plans to do so in the future, in which case you can ignore the below. If the compiler already does, though, I think this should be documented)


Often, the compiler can optimize the function just through const-folding, without needing the coder to write all the overloads. However, const-folding across function calls doesn't seem to be done in DMD unless the function is inlined.

Consider the example everyone gives with partial eval:

double pow(long exponent, double base)
{
    if (exponent < 0) return pow(-exponent, base);
    if (exponent%2 == 0)
    {
        auto val = pow(exponent/2, base);
        return val * val;
    }
    else
    {
        return base * pow(exponent-1, base);
    }
}

Suppose this is too big to be inlined by DMD. Do you then have to write an identical overload for a statically-known exponent?

double pow(static long exponent, double base)
{
    ... // A duplicate of the above code
}

I suppose you could allow templating by staticness, but even that is still wrong. If the compiler sees a function call with *some* statically-known parameters, shouldn't it automatically create a template and see how much it can const-fold. If nothing, scrap the template and go back to the old version.

Having to actually write the overload seems like an optimization as stupid as having to write 'inline' or 'register'.

Cheers,

Reiner

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by Andrei Alexandrescu (See Website For Email)
in reply to Reiner Pope

Andrei Alexandrescu (See Website For Email)

Posted in reply to Reiner Pope

Reiner Pope wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
[snip]
> Is this in contrast to the
> 
>     int foo(const int x)(const int x, int y, int z);
> 
> syntax you mentioned earlier? If so, then I am very grateful you now support the much cleaner syntax you now do.

Yes. The cleaned-up syntax was Dave's idea.

> However, while I don't dispute that this is useful, aren't there also times when you want the compiler to do that implicitly? (Maybe the compiler already does, or there are plans to do so in the future, in which case you can ignore the below. If the compiler already does, though, I think this should be documented)

I don't know. Right now I don't think so. The whole point of overloading is to write "intelligent" custom code that implements the given semantics in various ways. I don't think the compiler can generally do that with today's technology.

> Often, the compiler can optimize the function just through const-folding, without needing the coder to write all the overloads. However, const-folding across function calls doesn't seem to be done in DMD unless the function is inlined.

And I think that's good. You don't want to send the compiler for a wild goose chase so that it runs 100 times slower.

> Consider the example everyone gives with partial eval:
> 
> double pow(long exponent, double base)
> {
>     if (exponent < 0) return pow(-exponent, base);
>     if (exponent%2 == 0)
>     {
>         auto val = pow(exponent/2, base);
>         return val * val;
>     }
>     else
>     {
>         return base * pow(exponent-1, base);
>     }
> }
> 
> Suppose this is too big to be inlined by DMD. Do you then have to write an identical overload for a statically-known exponent?
> 
> double pow(static long exponent, double base)
> {
>     ... // A duplicate of the above code
> }

I'm afraid so. The compiler can't read your mind and figure that it would be good to fold. If such code is frequent enough, probably we'll provide a way to avoid source duplication. (Btw you got the parameter order backwards.)

> I suppose you could allow templating by staticness, but even that is still wrong. If the compiler sees a function call with *some* statically-known parameters, shouldn't it automatically create a template and see how much it can const-fold. If nothing, scrap the template and go back to the old version.
> 
> Having to actually write the overload seems like an optimization as stupid as having to write 'inline' or 'register'.

That's for the future. One step at a time. The technology is not there yet, and compile times still matter. Don't forget that in 1970 "register" and in 1980 "inline" were ideas of much practical value.

Andrei

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by David B. Held
in reply to Andrei Alexandrescu (See Website For Email)

David B. Held

Posted in reply to Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email) wrote:
> David B. Held wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> [...]
>>> int foo(int x, int y, int z);
>>> int foo(static int x, int y, int z); // x is known during compilation
>>> int foo(int x, static int y, int z); // y is known during compilation
>>> int foo(static int x, static int y, int z); // x and y are... you know
>>> [...]
>>> final a = to!(int)(readln), b = to!(int)(readln);
>>> foo(a, b, b);  // firsts overload
>>> foo(1, a + b, b); // 2nd
>>> foo(a + b, a - b, b);  // 3rd
>>> foo(42, 7, b); // 4th
>>> [...]
>>
>> Great sir, could you impart upon us mere mortals the wisdom by which you are able to ascertain, at compile time, the difference between two values known only at runtime (a la the "3rd" overload)?  We are in great need of such Oracle-like behavior.
> 
> I also notice you're still posting from the future. You must make good money daytrading. Future postulae commotus est. :o)

Hmm...and I'm equally sure you're posting from the past.  My watch currently says 11:45 PM, my last post shows 10:48 PM (which I think is accurate), and your follow-up shows 10:02 PM (which was just a few impossibly short moments ago, by my reckoning).  The only way I can explain this discrepancy is that we have an extreme relative velocity (perhaps on the order of 0.99999998c?).  The other possibility is that your OS is not patched for DST.

Dave

March 23, 2007

Re: templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Posted by Daniel Keep
in reply to David B. Held

Daniel Keep

Posted in reply to David B. Held


David B. Held wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> David B. Held wrote:
>>> Andrei Alexandrescu (See Website For Email) wrote:
>>>> [...]
>>>> int foo(int x, int y, int z);
>>>> int foo(static int x, int y, int z); // x is known during compilation
>>>> int foo(int x, static int y, int z); // y is known during compilation
>>>> int foo(static int x, static int y, int z); // x and y are... you know
>>>> [...]
>>>> final a = to!(int)(readln), b = to!(int)(readln);
>>>> foo(a, b, b);  // firsts overload
>>>> foo(1, a + b, b); // 2nd
>>>> foo(a + b, a - b, b);  // 3rd
>>>> foo(42, 7, b); // 4th
>>>> [...]
>>>
>>> Great sir, could you impart upon us mere mortals the wisdom by which you are able to ascertain, at compile time, the difference between two values known only at runtime (a la the "3rd" overload)?  We are in great need of such Oracle-like behavior.
>>
>> I also notice you're still posting from the future. You must make good money daytrading. Future postulae commotus est. :o)
> 
> Hmm...and I'm equally sure you're posting from the past.  My watch currently says 11:45 PM, my last post shows 10:48 PM (which I think is accurate), and your follow-up shows 10:02 PM (which was just a few impossibly short moments ago, by my reckoning).  The only way I can explain this discrepancy is that we have an extreme relative velocity (perhaps on the order of 0.99999998c?).  The other possibility is that your OS is not patched for DST.
> 
> Dave

Andrei posts at 4:32 PM, you post at 5:48 PM, Andrei replies at 5:02 PM, and then you reply to him at 6:46 PM.

It's currently 6:04 PM :)

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation