Jump to page: 1 2
Thread overview
Algorithms, term rewriting and compile time reflection
Oct 22, 2014
Peter Alexander
Oct 23, 2014
thedeemon
Oct 23, 2014
Low Functioning
Oct 23, 2014
Low Functioning
Oct 28, 2014
deadalnix
Oct 28, 2014
bearophile
Oct 28, 2014
bearophile
Oct 29, 2014
bearophile
Oct 28, 2014
thedeemon
October 22, 2014
Generic programming without high level analytical capabilities are problematic since you cannot optimize everything on a low level.

For instance, some algorithms run better if the input is sorted, and although you can cover one property with typing (e.g. sorted range) the combinatorial explosion is too big and the code becomes very messy.

The better approach is to let the compiler deduce properties of the input/output based on stronger assertions than postconditions. And it should work for FFI as well as for D. That way D can obtain meta-information about C functions.

Let's call it input-output-assumptions and take a look at:

y = f(x)

After calling f on x, f could provide propositions about y such as:

- y is null iff x is null
- length(y) == length(x) if x is not null
- y is sorted for all x if x is not null
- y contains the same elements as x
- y is alias free (copy by value)

It should also state whether the semantic side effects has been completely described.

Then you need statments for f that says something about how functions affects the properties of input like. E.g.:

z = g( f(x) )

For g we could say that it preserves:
- element order
- element values
- z is an alias free copy of the input

Thus we cannot assume length(z) == length(x), but we can assume that it is sorted and that z is a subset of x. That means the compiler can optimize expressions. E.g.:

y = f(x);
z = g(y);

inline_sort(z);  ==>  ;

x = union(z,y); ==> ;

canFind(y,v) || canFind(z,v) ==> canFind(y,v);

linear_find(x,v)  =>  binary_search(y,v)

With term rewriting and compile time reflection we can now perform optimizations by querying properties of expressions, looking up dependent results of an expression, and doing a search over various ways of ordering expressions.

f(sort(g(x))) can be optimized into f(g(x)) if the knowledge of the effects of sort() is complete and if the total effects of f,g and x completely covers what sort() do.

These kinds of optimizations are very difficult to achieve in a low level backend, but you really need them in order to do generic programming properly.

A simple start would be to not provide term rewriting as a language feature, but rather define a vocabulary that is useful for phobos and hardcode term rewriting for that vocabulary.

I think this is feasible.
October 22, 2014
On Wednesday, 22 October 2014 at 11:10:35 UTC, Ola Fosheim Grøstad wrote:
> [snip]
>
> These kinds of optimizations are very difficult to achieve in a low level backend, but you really need them in order to do generic programming properly.
>
> A simple start would be to not provide term rewriting as a language feature, but rather define a vocabulary that is useful for phobos and hardcode term rewriting for that vocabulary.
>
> I think this is feasible.

Term rewriting is very interesting, and I believe some work has been done for this in Haskell. I don't believe anything has been done with this propositions and inference approach you describe. I see a number of problems:

First, annotating this, in the presence of templates, is very hard. Consider:

auto f(alias g, T)(T x) { return g(x); }

We cannot possibly annotate this function with any of propositions you described because we know nothing about g or T. Like purity and nothrow, we'd have to deduce these properties, but most escape deduction in all but the most trivial cases.

Suppose we could deduce a large subset of useful propositions, how does the programmer know what has been deduced? How can I tell what has been deduced and applied without having to disassemble the code to see what's actually going on?

And even if everything is deduced correctly, and I know what's deduced, what if it does a transformation that's undesirable? For example, you changed linear_search to binary_search. What if I knew the element was likely to be at the front and would prefer linear_search to binary_search?

If you have any, I'd love to see some papers on this kind of work.
October 22, 2014
On Wednesday, 22 October 2014 at 21:34:51 UTC, Peter Alexander wrote:
> Term rewriting is very interesting, and I believe some work has been done for this in Haskell. I don't believe anything has been done with this propositions and inference approach you describe.

I  would always assume it has been done plenty of work related to proving properties about a program since this is the holy grail of CS!

I think it is related to so-called dependent types?
http://en.wikipedia.org/wiki/Dependent_type

I'd like to take a look at Xanadu some time…
http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.html

And Agda which also guarantee termination!
http://en.wikipedia.org/wiki/Agda_(programming_language)


> I see a number of problems:
>
> First, annotating this, in the presence of templates, is very hard. Consider:
>
> auto f(alias g, T)(T x) { return g(x); }
>
> We cannot possibly annotate this function with any of propositions you described because we know nothing about g or T.

Why not? You know the moment you instantiate.

If you had pattern matching you could break the signature into multiple distinguishing ones, but the equivalent is to put conditionals into the postcondition.

> Like purity and nothrow, we'd have to deduce these properties, but most escape deduction in all but the most trivial cases.

I think you can solve this with logic/constraints programming and simple dataflow analysis.

// a is @notnull and @aliasfree
b = sort(a)
// b is @notnull, @aliasfree, @sorted_ascending(somekey)

// assume f is @sortpreserving, @pure
c = f(b)
// b is @notnull, @aliasfree, @sorted_ascending(somekey)

// assume reverse is @sortpreserving, @pure
d = reverse(b)
// d is @notnull, @aliasfree, @sorted_descending(somekey)

e = unknownfunction(b)
// e is @typeinfobased

> Suppose we could deduce a large subset of useful propositions, how does the programmer know what has been deduced?

He doesn't have to. The point is to express common patterns in the most terse and readable way. Developers analyze this and come up with the appropriate heuristics. Juste like with peephole optimization?

> How can I tell what has been deduced and applied without having to disassemble the code to see what's actually going on?

If you are that performance oriented then you will not use templates at all! Generic programming is not suitable for performance… Because performance comes with changing the data to fit the hardware, e.g. SIMD instruction set, caches etc

> And even if everything is deduced correctly, and I know what's deduced, what if it does a transformation that's undesirable? For example, you changed linear_search to binary_search. What if I knew the element was likely to be at the front and would prefer linear_search to binary_search?

That's a good point, but then you should be able to guide the search by adding an optimization hint that states that the sought element probably is at the front, or that you constrain the search to a linear_search.

But that does not have to be part of the semantics of the program. So you can keep correctness and optimization separate.

IMO it is always an advantage to keep the code that has to do with correctness short, simple and readable.
October 22, 2014
On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim Grøstad wrote:
> I think it is related to so-called dependent types?
> http://en.wikipedia.org/wiki/Dependent_type

And:

http://en.wikipedia.org/wiki/Refinement_(computing)#Refinement_types

And perhaps:

http://en.wikipedia.org/wiki/Liskov_substitution_principle
October 22, 2014
On a related note, semantic slicing is an interesting concept:

http://en.wikipedia.org/wiki/FermaT_Transformation_System
October 23, 2014
On Wednesday, 22 October 2014 at 23:09:17 UTC, Ola Fosheim Grøstad wrote:

> I  would always assume it has been done plenty of work related to proving properties about a program since this is the holy grail of CS!
>
> I think it is related to so-called dependent types?
> http://en.wikipedia.org/wiki/Dependent_type

Yes, dependent types allow expressing properties like the ones you describe. However
a) it's not easy at all even for simple data structures, often requiring defining many additional types and lemmas,
b) checking them requires turning your compiler into a proof-checker,
c) what works in "clean room" (like high-level total functional language) is hardly feasible in a "dirty" language like D where you can go as unsafe as you want.

> I'd like to take a look at Xanadu some time…
> http://www.cs.bu.edu/~hwxi/Xanadu/Xanadu.html

That's obsolete, I guess, better take a look at ATS language from the same author. It's really really close to what you're thinking of here, minus the rewriting.

> And Agda which also guarantee termination!
> http://en.wikipedia.org/wiki/Agda_(programming_language)

Yep, all dependently typed languages like Agda, ATS or Idris require the functions you use in types to be terminating (otherwise your proofs become unsound and worthless) and they all include termination checking.

To scare you well, here, for example, is my Smoothsort implementation in ATS
http://stuff.thedeemon.com/lj/smooth_dats.html
that includes proofs that the array really gets sorted and the Leonardo heaps used in the process have proper form and properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)
October 23, 2014
On Thursday, 23 October 2014 at 09:41:04 UTC, thedeemon wrote:
> Yes, dependent types allow expressing properties like the ones you describe. However
> a) it's not easy at all even for simple data structures, often requiring defining many additional types and lemmas,

Which is why I only suggest language-builtin-propositions on the library level.

> b) checking them requires turning your compiler into a proof-checker,

I don't propose checking, as in an assert(). I propose to assume() them and then run regular dataflow over the assumptions that have been vetted by the library author.

Pretty close to what Walter wanted in the "assert as assume" thread, but safer.

> c) what works in "clean room" (like high-level total functional language) is hardly feasible in a "dirty" language like D where you can go as unsafe as you want.

That is true, which is why you have to do this at the high level, and assume the worst.

> To scare you well, here, for example, is my Smoothsort implementation in ATS
> http://stuff.thedeemon.com/lj/smooth_dats.html

Thanks! I'll look at it later, but note that I am not proposing something like coq, ats or agda. No theorem prover.

What I am proposing is that libraries can provide assumed facts about the result, then propagate those facts in the dataflow until they have all become invalid.

> properties. Writing it took me a few weeks. You don't want to turn D into this mess. ;)

No, I want to turn it into a more pragmatic mess. ;-)
October 23, 2014
How about a function returns a T', which is implicitly convertible to T, where T' has some enum "tags" attached to it.
October 23, 2014
On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning wrote:
> How about a function returns a T', which is implicitly convertible to T, where T' has some enum "tags" attached to it.

Why is implicit conversion a problem? To the compiler it would just be another function call?
October 23, 2014
On Thursday, 23 October 2014 at 15:18:09 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 23 October 2014 at 13:57:03 UTC, Low Functioning wrote:
>> How about a function returns a T', which is implicitly convertible to T, where T' has some enum "tags" attached to it.
>
> Why is implicit conversion a problem? To the compiler it would just be another function call?

Not everything is generic, for one reason or another.

struct notimplicit(T) {
	T _x;
	enum fubared;
}

struct foo(T) {
	T _x;
	alias _x this;
	enum fubared;
}

unittest {
	notimplicit!int a;
	//int _a = a; //error

	foo!int b;
	int _b = b;
}

While it wouldn't matter for a fully generic pipeline, and you'd lose the fubared tag if you turned it back to the base type, it might be handy to propagate the fubared type while remaining compatible with the base.
« First   ‹ Prev
1 2