Thread overview
Thoughts about "Compile-time types" talk
May 10, 2019
H. S. Teoh
May 10, 2019
JS
May 16, 2019
Alex
May 13, 2019
Luís Marques
May 09, 2019
Skimmed over Luís Marques' slides on "compile-time types", and felt compelled to say the following in response:  I think we're still approaching things from the wrong level of abstraction.  This whole divide between compile-time and runtime is IMO an artificial one, and one that we must transcend in order to break new ground.

I haven't fully thought through this yet, but the basic concept is that there should be *no need* to distinguish between compile-time and runtime in the general case, barring a small number of situations where a decision has to be made.  Ideally, a function should have just a single parameter list -- which Luís has already hit upon -- but I'd go even further and propose that there should be NO distinction made between runtime and compile-time parameters, at least at the declaration level.  It should be the *caller* that decides whether an argument is compile-time or runtime. And that decision in the caller does not have to be made within the caller itself, but could be dictated by *its* caller, and so on. As far as *most* code is concerned, you're just passing arguments from one function to another -- whether it's CT or RT is immaterial. Only at the very top level do you need to make a decision whether something is compile-time or runtime.

The result should be that you can make a single top-level change and a whole cascade of function arguments down the call chain become instantiated at compile-time, or resp. runtime.  Most of the code in between *does not have to change* at all; the compiler *infers* whether something is available at CT or has to be deferred to RT, based on the caller's arguments.  Some things, of course, like alias parameters and what-not will force CT (though in theory one could implement RT counterparts for them), but that should be *transparent* to the surrounding code.

Just like in OO, changing a class's implementation ought not to require changing every piece of code that uses the class (encapsulation and Liskov substitution principle), so the distinction between CT and RT should be transparent to *most* code.  Only code that actually cares about the difference should need to choose between one or the other. The rest of the code should remain agnostic, and thus remain symmetric under any decision to switch between RT/CT.

For example, we could have first class types in the language: a function can take a type argument and do stuff to it, and depending on whether this type is known at CT, the compiler could resolve it at compile-time or implement it at runtime using the equivalent of typeid() -- and the function *doesn't have to care which*.  You could sort a list of types, use range algorithms on them, etc., and if you call it at compile-time, it will produce a compile-time result; if you call it at runtime, it will produce a runtime result.  Which one it will be doesn't have to be a decision that's arbitrarily decided within the function itself; it can be delegated to higher-level code that has the wider context with which to make the best decision.

Another example, you could have general matrix multiplication code with matrix size as a parameter, and at higher-level decide you only need, say, 4D matrices, so the size can be made a CT argument for one project, allowing the optimizer to generate the best code for the 4D-specific case, but an RT argument for another project that wants to operate on general n*n matrices, without needing to implement matrix multiplication twice. The latter would have the size as an RT parameter, so you trade off optimization for flexibility -- *at the decision of higher-level code*, rather than arbitrarily tossing a coin or guessing what your callers might want.


T

-- 
MASM = Mana Ada Sistem, Man!
May 10, 2019
On Friday, 10 May 2019 at 00:33:04 UTC, H. S. Teoh wrote:
> Skimmed over Luís Marques' slides on "compile-time types", and felt compelled to say the following in response:  I think we're still approaching things from the wrong level of abstraction.  This whole divide between compile-time and runtime is IMO an artificial one, and one that we must transcend in order to break new ground.
>
> I haven't fully thought through this yet, but the basic concept is that there should be *no need* to distinguish between compile-time and runtime in the general case, barring a small number of situations where a decision has to be made.  Ideally, a function should have just a single parameter list -- which Luís has already hit upon -- but I'd go even further and propose that there should be NO distinction made between runtime and compile-time parameters, at least at the declaration level.  It should be the *caller* that decides whether an argument is compile-time or runtime. And that decision in the caller does not have to be made within the caller itself, but could be dictated by *its* caller, and so on. As far as *most* code is concerned, you're just passing arguments from one function to another -- whether it's CT or RT is immaterial. Only at the very top level do you need to make a decision whether something is compile-time or runtime.
>
> The result should be that you can make a single top-level change and a whole cascade of function arguments down the call chain become instantiated at compile-time, or resp. runtime.  Most of the code in between *does not have to change* at all; the compiler *infers* whether something is available at CT or has to be deferred to RT, based on the caller's arguments.  Some things, of course, like alias parameters and what-not will force CT (though in theory one could implement RT counterparts for them), but that should be *transparent* to the surrounding code.
>
> Just like in OO, changing a class's implementation ought not to require changing every piece of code that uses the class (encapsulation and Liskov substitution principle), so the distinction between CT and RT should be transparent to *most* code.  Only code that actually cares about the difference should need to choose between one or the other. The rest of the code should remain agnostic, and thus remain symmetric under any decision to switch between RT/CT.
>
> For example, we could have first class types in the language: a function can take a type argument and do stuff to it, and depending on whether this type is known at CT, the compiler could resolve it at compile-time or implement it at runtime using the equivalent of typeid() -- and the function *doesn't have to care which*.  You could sort a list of types, use range algorithms on them, etc., and if you call it at compile-time, it will produce a compile-time result; if you call it at runtime, it will produce a runtime result.  Which one it will be doesn't have to be a decision that's arbitrarily decided within the function itself; it can be delegated to higher-level code that has the wider context with which to make the best decision.
>
> Another example, you could have general matrix multiplication code with matrix size as a parameter, and at higher-level decide you only need, say, 4D matrices, so the size can be made a CT argument for one project, allowing the optimizer to generate the best code for the 4D-specific case, but an RT argument for another project that wants to operate on general n*n matrices, without needing to implement matrix multiplication twice. The latter would have the size as an RT parameter, so you trade off optimization for flexibility -- *at the decision of higher-level code*, rather than arbitrarily tossing a coin or guessing what your callers might want.
>
>
> T

I have actually come up with this idea several months before. You are basically spot on. The old paradigm of programming needs to change. My idea is a big more elaborate though:

The type system is hierarchical and recursive. Essentially all programming is based on levels which is essentially a parameterized type system on the level.

One works at various levels and uses the same syntax at all levels. The compiler the simply tries to "collapse" all the levels to the lowest known level. The lowest level is runtime which uses unknown inputs from the user and so cannot be collapsed any more.

I've been trying to think about how to represent this well in programming but haven't worked on it much.

What I'm saying may not make much sense but think about it like D's CT and RT... they are just two levels. But we can have higher levels that compile code in to CT code for a higher level of abstraction. All higher levels of abstraction must be known so the code can be reduced to the lowest level, which is RT. RT code would be anything that is not known at compile time and hence the program must be run to determine the information. Essentially RT is just the file compilation step.


Such code though is unified at all levels. Same semantics, same syntax, same type system, same everything.

Essentially every statement is a statement that exists at some lowest level and that level depends on what the statement "see's" and what "see's" it.

My initial idea was to have levels specified by the user as what level of abstraction they are working in. The compiler then reduces the highest levels of abstraction filling out the inputs for the level right below it... which then should allow it to be reduced and everything cascades down to the final level of RT, which uses things that cannot be reduced. The compiler realizes it can only be reduced by "executing" the program. The program itself sort of is the compiler then, filling out the rest of the information to finally "compile" the entire program.


I've been trying to come up with a universal syntax but haven't really thought about it much as it's a long term project. I'm trying to get away from C's syntax but my initial conceptual idea was to attach levels to statements or blocks. One then just works in the level they want and to build code for lower levels.

e.g.,

int x[4] = 34;

is an int at level 4. It is known to all lower levels of code who will, if the value didn't change at any higher level, be 34 to all the lower levels.

e.g.,

int x[3] = x[4]*3; // compiler can reduce this to a constant in level 3 after level 4 has been reduced


The main idea, which we are both hitting on here, is that information that is known is known and information that isn't requires "running" the program. But really these are, in some sense, the same. The runtime information is known, but only by the user... as if the user "completes" the program.

e.g.,

readln()

The reason the statement above cannot reduce is because it is not known at "compile time"... but that is relatively arbitrary if one thinks of runtime as just the "completion of compile time".

If one abstracts out that the entire process, including the user input is "compilation" and it's all just reducing to a known state(which for user input is delayed).


The idea with the levels above is all the code syntax is the same at every level. In D, RT and CT are not unified. CTFE bridges it but static if and if are not the same syntax. They distinguish between the two levels.

To get all of it to work may be complicated. I'm only in the initial investigations of it and I don't plan on putting a lot of time in to it. I'm just recording my ideas as I come up with them. In any case all I have worked on is creating the TYPE. The TYPE is simply any type... and everything is a type. It's so general as to include anything(functions, statements, other types, etc). It's what unifies everything. The compiler then just attempts to reduce any types to "known" values. At the end of "compilation" anything that is not known is either an error or runtime(the type cannot be reduced without user input).

Typing of course is categorical and one then has to create abstract subtyping but it unifies the type system, sorta like object does with oop.

I believe though that not just this level of abstraction is required but a new IDE that supports the level of abstract along with a higher level of text/graphical entry.




May 13, 2019
On Friday, 10 May 2019 at 00:33:04 UTC, H. S. Teoh wrote:
[...]

> I haven't fully thought through this yet, but the basic concept is that there should be *no need* to distinguish between compile-time and runtime in the general case, barring a small number of situations where a decision has to be made.
[...]
I was thinking and working in the same direction,
when using regEx you can use the runtime (RT) or the compile time (CT) version,
but why not let the compiler make the decisions?

The solution was from someone else, in the forum, but it resulted in the following code:

auto reg(alias var)()
{
       static if (__traits(compiles, {enum ctfeFmt = var;}) )
         {
                // "Promotion" to compile time value
                enum ctfeReg =  var ;
                pragma(msg, "ctRegex used");
                return(ctRegex!ctfeReg);

          }else{
                return(regex(var));
                pragma(msg,"regex used");
          }
}

The trick is the alias var, which then can be checked,
with
      __traits(compiles, {enum ctfeFmt = var;}

is it an immutable value present at compiletime or not.


Now you may use

auto myregex = reg!(Regex_expression);

At every place in your code.

And depending on the question, does it compile to the ctfeReg or not the version is selected.

So there is away to promote a ctVariable to runtime, if this would be possible with a runtime value, too, you would get rid of the '!' -syntax.

Regards mt.


May 13, 2019
On Friday, 10 May 2019 at 00:33:04 UTC, H. S. Teoh wrote:
> Skimmed over Luís Marques' slides on "compile-time types", and felt compelled to say the following in response:  I think we're still approaching things from the wrong level of abstraction.  This whole divide between compile-time and runtime is IMO an artificial one, and one that we must transcend in order to break new ground.

Thanks for the feedback. I think your counter-proposal makes some sense. Whether it is the right way to attack the problem I don't know. My presentation was fairly high-level, but as part of my prototyping efforts I've gone over a lot of the details, the problems they would create, how they could be solved and so on. When I read your counter-proposal my knee jerk reaction was that it would address some deficiencies with my approach but also introduce other difficult practical problems. I'll think more about it, but in the end I believe the only way to adequately gauge the pros and cons of each approach is to actually experiment with them. That's something I'll be doing, albeit at a fairly slow pace.

BTW, thank you also for the Wiki article, I think it's a little gem.
May 15, 2019
On 5/9/19 8:33 PM, H. S. Teoh wrote:
> [...]

+1, pretty much the same conclusion I'd came to when mulling it over before.

I haven't looked at the talk or slides yet though, and I'm def. interested in seeing that take on it, too.
May 16, 2019
On Monday, 13 May 2019 at 08:35:39 UTC, Martin Tschierschke wrote:
> On Friday, 10 May 2019 at 00:33:04 UTC, H. S. Teoh wrote:
> [...]
>
>> I haven't fully thought through this yet, but the basic concept is that there should be *no need* to distinguish between compile-time and runtime in the general case, barring a small number of situations where a decision has to be made.
> [...]
> I was thinking and working in the same direction,
> when using regEx you can use the runtime (RT) or the compile time (CT) version,
> but why not let the compiler make the decisions?
>
> The solution was from someone else, in the forum, but it resulted in the following code:
>
> auto reg(alias var)()
> {
>        static if (__traits(compiles, {enum ctfeFmt = var;}) )
>          {
>                 // "Promotion" to compile time value
>                 enum ctfeReg =  var ;
>                 pragma(msg, "ctRegex used");
>                 return(ctRegex!ctfeReg);
>
>           }else{
>                 return(regex(var));
>                 pragma(msg,"regex used");
>           }
> }
>
> The trick is the alias var, which then can be checked,
> with
>       __traits(compiles, {enum ctfeFmt = var;}
>
> is it an immutable value present at compiletime or not.
>
>
> Now you may use
>
> auto myregex = reg!(Regex_expression);
>
> At every place in your code.
>
> And depending on the question, does it compile to the ctfeReg or not the version is selected.
>
> So there is away to promote a ctVariable to runtime, if this would be possible with a runtime value, too, you would get rid of the '!' -syntax.
>
> Regards mt.

The compiler should do this all internally. All it has to do is keep track of simple "Known" state of a variable.

Any expression or line that is compiled uses other things. The compiler has to known these things and it keeps track of them in the AST. But it should also know if those things are known.

That is what CTFE basically does and also how things get optimized...

But somewhere it breaks down. It most likely does this because of the way things are specified in D as being RT.

e.g., ctRegex and regex.... two different things that break the unity.

The compiler isn't going to know they are related and hence why you had to build a relation.

This is a problem of the language itself.

All programming should start as if it were CT with RT being an after effect. One would even reason about obvious RT things like readln as if they were compile time. One should even think of program execution as just finishing the compilation process. We should even have compilers written with this mentality and languages designed with it. It puts the proper emphasis on writing code.

One could say that programming language design got it backwards.. we started with RT programming when it should have been about CT. [Of course when dealing with punch cards it's hard to get it right]

By starting with CT and assuming everything is CT until it is proven not to be, the compiler knows it can optimize things(until it can't). When one starts with RT and assumes everything is RT there is no optimization that can take place. Obviously most languages and compilers use some blend.

Functional languages tend to focus more in the CT side and then have special cases like IO and state that deal with the RT.


A prototypical example would be readln.

Why is that RT? Does it have to be? What if you want to mock up a program, then the compiler could optimize it out! I don't mean.

//string s = readln();
s = "mock".

But
string s = readln();

and some point inside readln it reads a mock file at compile time because of a flag and returns mock. (and we don't have to modify source code to switch from RT to CT)

As one digs down in to readln there will eventually be some code that cannot be compiled down in to a known state at compile time. The compiler should be able to figure this out... e.g., it there is an OS call and all OS calls are marked as "RT". (but we could replace those calls with a mocking set that then is known at compile time and then it could be reduced)

All such things though require the compiler to be aware... they are far more aware than ever before, but only will get better over time as people make improvements, but it does require people to realize what can be improved.