Thread overview
Whole source-tree statefull preprocessing, notion of a whole program
Apr 08, 2017
Boris-Barboris
Apr 08, 2017
Vladimir Panteleev
Apr 08, 2017
Boris-Barboris
Apr 08, 2017
Vladimir Panteleev
Apr 08, 2017
Boris-Barboris
April 08, 2017
Hello! It's a bit long one, I guess, but I'd like to have some discussion of topic. I'll start with a concrete use case:

For the sake of entertainment, I tried to wrote generic configuration management class. I was inspired by oslo_config python package that I have to deal with on work. I started with:

# module config

class Config
{
    this(string filename) { ... }
    void save() abstract;
    void load() abstract;
    // Some implementation, json for example
}

abstract class ConfigGroup(string group_name_par)
{
    static immutable group_name = group_name_par;
    protected Config CONF;
    this(Config config) { CONF = config; }

    mixin template ConfigField(T, string opt_name)
	{
		mixin("@property " ~ T.stringof ~ " " ~ opt_name ~
			  "() { return CONF.root[\"" ~ group_name ~ "\"][\"" ~ opt_name ~
			  "\"]." ~ json_type(T.stringof) ~ "; }");
		mixin("@property " ~ T.stringof ~ " " ~ opt_name ~
			  "(" ~ T.stringof ~ " value) { return CONF.root[\"" ~ group_name
              ~ "\"][\"" ~ opt_name ~ "\"]." ~ json_type(T.stringof) ~ "(value); }");
	}
}

# module testconfig

private class TestConfigGroup: ConfigGroup!("testGroup")
{
	this(Config config) { super(config); }

	mixin ConfigField!(string, "some_string_option");
	mixin ConfigField!(double, "some_double_option");
}


... aand I stopped. And here are the blockers I saw:

1). I had to save template parameter group_name_par into group_name. Looks like template mixin doesn't support closures. A minor inconvenience, I would say, and it's not what I would like to talk about.
2). After preprocessing I wish to have fully-typed, safe and fast Config class, that contains all the groups I defined for it in it's body, and not as references. I don't want pointer lookup during runtime to get some field. This is actually quite a problem for D:
    2.1). Looks like mixin is the only instrument to extend class body. Obvious solution would be a loop that mixins some definitions from some compile-time known array, meybe even string array. And the pretties of all ways - so that such array will contain module names and class names of all ConfigGroup derivatives defined in whole program (absolute madman). Said array could be appended in compile-time by every derivative of ConfigGroup.
    2.2) Sweet dream of 2.1 is met with absence of tools to create and manipulate state during preprocessing. For example:

    immutable string[] primordial = [];  // maybe some special qualifier instead
                                         // of immutable will be better. Even
                                         // better if it shifts to immutable
                                         // during run-time
    premixin template (string toAdd) { primordial ~= toAdd; } // for example
    mixin template Populate
    {
        foreach (s; primordial)
            mixin("int " ~ s);    // create some int field
    }
    class Populated { mixin Populate; }
    // another module
    premixin("field1")  // evaluated in preprocessor in
    premixin("field2")  // order of definition
    Populated p = new Populated;
    p.field1 = 3;

        By "premixin" I mean that all such operations are performed in our special preprocessor stage, that is completed before mixins we already have now start to do their jobs.

    2.3) There is strong C ancestry in D. The one regarding compilation being performed on translation units (.d source files) is, in my opinion, quite devastating. I don't know about you guys, but in 2017 I compile programs. I don't care about individual object files and linker shenanigans, for me it's the whole program that matters, and object files are just the way C does it's thing. You definetly must respect it while interfacing with it, but that's about it. Correct me if I'm wrong, but departure from C's compile process (CLI is not the cause here I believe) allowed C# to encorporate "partial" classes, wich are a wonderfull concept - class can be extended only volunteeringly (like in D, where we need to willingly write mixin to change definition), and localized source code changes: when project functionality is extended, old code base can sometimes remain completely untouched (this is huge for very big projects IMO). I will not deny, however, that readability of such code suffers. As a counter-argument, relationships between portion of the class and other code are usually local, in a way that this class part's fields are used by source code in this folder and basically nowhere else.
        But what's done is done, I understand. However, I believe, preprocessor still has hope for it, and can be generalized to whole source tree without throwing old toolchain out of the window. In the way that would allow "primordial" string array from the example above to be the same for all translation units after preprocessing is done.
    2.4) Original configuration management example would also require the ability to import definitions cyclically. Module A containing ConfigGroupConcrete instantiation imports module B where Config is defined, wich will require B to import A in order to access ConfigGroupConcrete definition. Yet another stone in C's garden, yes. You could, for example, pass whole ConfigGroupConcrete body as a string and mixin it there, but then you would require to automatically build such string, and at this point you're better off with some kind of templating language. And templating languages make thing even less readable IMO, while simply being a crutches to replace language preprocessors, that don't follow industry needs. I do believe such case is out of reach until preprocessing is done on whole program united.


To conclude, I'll summarize my questions:
1). Is there a compiled language that is capable of the abovementiond tricks, without resorting to external templating meta-languages?
2). How deep the rabbit hole goes in terms of complexity of preprocessor modifications required? And for DMD in general?
3). How are cyclic module imports handled currently in D?
4). Is there hope that it's possible to do in, say, a year? I don't mind trying to implement it myself, but I don't want to invest time in thing that is so conceptually out of plane that will simply be too destructive for current compiler environment.
April 08, 2017
On Saturday, 8 April 2017 at 10:11:11 UTC, Boris-Barboris wrote:
> 1). I had to save template parameter group_name_par into group_name. Looks like template mixin doesn't support closures. A minor inconvenience, I would say, and it's not what I would like to talk about.

Template mixins' scope is the expansion scope, not the declaration scope. (This is useful in some situations, but recently I have been mostly avoiding template mixins.)

> 2). After preprocessing I wish to have fully-typed, safe and fast Config class, that contains all the groups I defined for it in it's body, and not as references. I don't want pointer lookup during runtime to get some field.

Looks like your current implementation does not go in that direction, seeing as it uses properties for field access.

For such tasks, I would suggest to split the representations (native data, DOM tree, JSON strings etc.) from the transformations (serialization and parsing/emitting JSON). E.g. your example could be represented as:

struct Config
{
    struct TestGroup
    {
        string some_string_option;
        double some_double_option;
    }
    TestGroup testGroup;
}

Then, separate code for serializing/deserializing this to/from a DOM or directly to/from JSON.

Individual components' configuration can be delegated to their components; their modules could contain public struct definitions that you can add to the global Config struct, which describes the configuration of the entire application. I've used this pattern successfully in some projects, incl. Digger: https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36

>     2.2) Sweet dream of 2.1 is met with absence of tools to create and manipulate state during preprocessing. For example:

I understand that you seem to be looking for a way to change types (definitions in general) inside modules you import. This is problematic from several aspects, such as other modules depending on that module may find that the definitions "change under their feet". In D, once a type is declared and its final curly brace is closed, you will know that its definition will remain the same from anywhere in the program.

D's answer to partial classes is UFCS, however this does not allow "adding" fields, only methods.

>     2.4) Original configuration management example would also require the ability to import definitions cyclically. Module A containing ConfigGroupConcrete instantiation imports module B where Config is defined, wich will require B to import A in order to access ConfigGroupConcrete definition.

I don't really understand what you mean here, but D does allow cyclic module imports. It is only forbidden when more than one module inside any cycle has static constructors, because then it is not possible to determine the correct initialization order.

> To conclude, I'll summarize my questions:
> 1). Is there a compiled language that is capable of the abovementiond tricks, without resorting to external templating meta-languages?

I don't know of any. For D, I suggest trying different approaches / paradigms.

> 2). How deep the rabbit hole goes in terms of complexity of preprocessor modifications required? And for DMD in general?

So far, I had not heard of any D project that requires preprocessing of D code. I think D's metaprogramming has enough solutions to choose from for the vast majority of conceivable situations where other languages would call for a preprocessor.

> 4). Is there hope that it's possible to do in, say, a year? I don't mind trying to implement it myself, but I don't want to invest time in thing that is so conceptually out of plane that will simply be too destructive for current compiler environment.

I suggest that you examine how established D projects deal with similar situations.
April 08, 2017
On Saturday, 8 April 2017 at 13:09:59 UTC, Vladimir Panteleev wrote:
> On Saturday, 8 April 2017 at 10:11:11 UTC, Boris-Barboris wrote:

>> 2). After preprocessing I wish to have fully-typed, safe and fast Config class, that contains all the groups I defined for it in it's body, and not as references. I don't want pointer lookup during runtime to get some field.
>
> Looks like your current implementation does not go in that direction, seeing as it uses properties for field access.

Am i mistaken in assumption that such simple getter property will be optimized to direct field access? Anyways, that's minor detail.

> For such tasks, I would suggest to split the representations (native data, DOM tree, JSON strings etc.) from the transformations (serialization and parsing/emitting JSON). E.g. your example could be represented as:
>
> struct Config
> {
>     struct TestGroup
>     {
>         string some_string_option;
>         double some_double_option;
>     }
>     TestGroup testGroup;
> }
>
> Then, separate code for serializing/deserializing this to/from a DOM or directly to/from JSON.
>
> Individual components' configuration can be delegated to their components; their modules could contain public struct definitions that you can add to the global Config struct, which describes the configuration of the entire application. I've used this pattern successfully in some projects, incl. Digger: https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36

Ok, that's nice, but it still requires manual inclusion of such field into global config struct. Some "compile-time callback" system still would scale better in my opinion.

>>     2.2) Sweet dream of 2.1 is met with absence of tools to create and manipulate state during preprocessing. For example:
>
> I understand that you seem to be looking for a way to change types (definitions in general) inside modules you import. This is problematic from several aspects, such as other modules depending on that module may find that the definitions "change under their feet".

As expected since class that allows itself to be modified in compile-time, always does so explicitly via mixin. Most of the times such manipulation is used to extend functionality (add field, plugin, method) without removing or modifying existing ones. And if the names conflict, we get nice compile-time error anyways.

> In D, once a type is declared and its final curly brace is closed, you will know that its definition will remain the same from anywhere in the program.

That's kinda my point - definition needs to stay the same because it's built by compiler as many times as there are transtaltion units, because evil old C grandpa.

> D's answer to partial classes is UFCS, however this does not allow "adding" fields, only methods.

Adding fields, or, generally, objects \ collections of objects, is the main use case. Adding methods in my experience is rare scenario.

>>     2.4) Original configuration management example would also require the ability to import definitions cyclically. Module A containing ConfigGroupConcrete instantiation imports module B where Config is defined, wich will require B to import A in order to access ConfigGroupConcrete definition.
>
> I don't really understand what you mean here, but D does allow cyclic module imports. It is only forbidden when more than one module inside any cycle has static constructors, because then it is not possible to determine the correct initialization order.

Exactly what I wanted to know, thank you.

>> To conclude, I'll summarize my questions:
>> 1). Is there a compiled language that is capable of the abovementiond tricks, without resorting to external templating meta-languages?
>
> I don't know of any. For D, I suggest trying different approaches / paradigms.
>
>> 2). How deep the rabbit hole goes in terms of complexity of preprocessor modifications required? And for DMD in general?
>
> So far, I had not heard of any D project that requires preprocessing of D code. I think D's metaprogramming has enough solutions to choose from for the vast majority of conceivable situations where other languages would call for a preprocessor.
>
>> 4). Is there hope that it's possible to do in, say, a year? I don't mind trying to implement it myself, but I don't want to invest time in thing that is so conceptually out of plane that will simply be too destructive for current compiler environment.
>
> I suggest that you examine how established D projects deal with similar situations.

Thank you.
April 08, 2017
On Saturday, 8 April 2017 at 14:20:49 UTC, Boris-Barboris wrote:
>> Looks like your current implementation does not go in that direction, seeing as it uses properties for field access.
>
> Am i mistaken in assumption that such simple getter property will be optimized to direct field access? Anyways, that's minor detail.

I don't know the type of CONF.root, but from the usage syntax in your example, it looks like an associative array. Associative array lookup will be slower than simply accessing a variable.

>> Individual components' configuration can be delegated to their components; their modules could contain public struct definitions that you can add to the global Config struct, which describes the configuration of the entire application. I've used this pattern successfully in some projects, incl. Digger: https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36
>
> Ok, that's nice, but it still requires manual inclusion of such field into global config struct.

Yes; in my opinion, I think that's desirable because it is aligned with the unidirectional flow of information from higher-level components to lower-level ones, and does not impose a particular configuration framework onto the lower-level components (they only need to declare their configuration in terms of a POD type).

> Some "compile-time callback" system still would scale better in my opinion.

A similar effect can be achieved by allowing components to register themselves in a static constructor (not at compile-time, but at program start-up).

>> I understand that you seem to be looking for a way to change types (definitions in general) inside modules you import. This is problematic from several aspects, such as other modules depending on that module may find that the definitions "change under their feet".
>
> As expected since class that allows itself to be modified in compile-time, always does so explicitly via mixin. Most of the times such manipulation is used to extend functionality (add field, plugin, method) without removing or modifying existing ones. And if the names conflict, we get nice compile-time error anyways.

Then you have problems such as the instance size of a class changing depending on whether the code that requires the instance size is seen by the compiler before the code that modifies the instance size. I think it would cause complicated design problems that limit the scalability of the language. Even without such features, DMD had to go through a number of bugs to iron out the correct semantics of evaluating types (e.g. with "typeof(this).sizeof" inside a struct declaration, or recursive struct template instantiations).

>> In D, once a type is declared and its final curly brace is closed, you will know that its definition will remain the same from anywhere in the program.
>
> That's kinda my point - definition needs to stay the same because it's built by compiler as many times as there are transtaltion units, because evil old C grandpa.

I think this is not about technical limitations, but intentional design choices. Allowing types to be modified post-declaration invalidates many contracts and assumptions that code may have, and make it harder to reason about the program as a whole. Compare with e.g. INTERCAL's COMEFROM instruction.

>> D's answer to partial classes is UFCS, however this does not allow "adding" fields, only methods.
>
> Adding fields, or, generally, objects \ collections of objects, is the main use case. Adding methods in my experience is rare scenario.

UFCS is widely used in D for component programming:

http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321

April 08, 2017
On Saturday, 8 April 2017 at 17:57:11 UTC, Vladimir Panteleev wrote:

> Yes; in my opinion, I think that's desirable because it is aligned with the unidirectional flow of information from higher-level components to lower-level ones, and does not impose a particular configuration framework onto the lower-level components (they only need to declare their configuration in terms of a POD type).
...
> A similar effect can be achieved by allowing components to register themselves in a static constructor (not at compile-time, but at program start-up).

  That is definetly possible and, I would say, trivial, and this is the most popular way. However, any run-time registration implies run-time collections to iterate over, with obvious performance drawbacks (minor ones in this case). We are not using the information we have a-priory, in compile time, and make CPU pay for it instead (either because we are too lazy (too busy) to update sources of higher-level components (while making them a mess), or just because our language lacks expressibility, wich is my point).
  I side with another set of virtues. Source code consists of files. Files contain related data, concepts, functionality, whatever. Relations between those entities must by no means be unidirectional. What direction can you impose to concept "For every configurable entity, be that package, module, or class, I need to have fields in global configuration singleton"?
  IMO program has good architecture, when during extensive development for arbitrary group of programmers it takes little time and effort to make usefull changes. It is achieved by extensive problem field research, use of abstraction to fight complexities, yada yada... Part of it is to make sure, that you can extend functionality easily. Adding new subclass in one file and registering it in two others is not hard. But there is no fundamental reason for it to not be easier: just add subclass and slap some fancy attribute on it, or add some preprocessor-related field or function in it's body.
  Onde-directional flow is a consequence, not a principle. It's because languages were made this way we are used to it. When high-level concept or idea willingly implies feedback from it's users, there is little reason to forbid it. Especially when it actually improves development iteration times, lowers risks of merge conflicts etc.

  Look at this mess:
https://github.com/Boris-Barboris/AtmosphereAutopilot/blob/master/AtmosphereAutopilot/GUI/AutoGui.cs#L190
  It's caching code for some C# reflection-based GUI I wrote some time ago, that defines attribute to mark class fields with in order to draw them in pretty little debug window. Why do I have to do this? I've got all information right in the source. All classes that will be drawn using this GUI module are there, in text, accessible to build system. Why can't I just write clean code, that doesn't involve double or tripple associative array dispatch on runtime-reflected list of subclasses and attribute-marked fields? Answer is simple - language lacks expressibility. I provide drawing functionality in my module. It is generic, it is virtuous, it is concentrated in one file, it speeds up development. However, it needs to see it's client, beneficient, in order to draw him. And it just doesn't. Because C#, and you need to do runtime reflections. Yet again, by throwing away information you already have and making CPU reconstruct it again during runtime, over and over.
  Yes, all things I describe can be done efficiently by writing a lot of boilerplate code or using some text-templating magic. I just don't see why languages can't have that functionality built-in.

> Then you have problems such as the instance size of a class changing depending on whether the code that requires the instance size is seen by the compiler before the code that modifies the instance size. I think it would cause complicated design problems that limit the scalability of the language. Even without such features, DMD had to go through a number of bugs to iron out the correct semantics of evaluating types (e.g. with "typeof(this).sizeof" inside a struct declaration, or recursive struct template instantiations).

I agree. I think such mechanisms must be applied very early, hence "premixin".

> I think this is not about technical limitations, but intentional design choices. Allowing types to be modified post-declaration invalidates many contracts and assumptions that code may have, and make it harder to reason about the program as a whole. Compare with e.g. INTERCAL's COMEFROM instruction.

  I still don't see the problem. Declaration will contain constructs that indicate that it will be changed, like, for example, "premixin" that iterates over array and adds fields. I have no doubt human can read this allright.
  Indeed, question of ordering is important.
  Well, we have goto, it's not like sky dropped down on us. COMEFROM breaks logical time flow. I only want two-staged preprocessing, when in first stage I can create and manipulate some simple, visible to preprocessor state, even if it consists of only immutable base types and arrays of those, and then being able to use that state as immutable variables in next stages we already have. We already can populate class with fields from immutable string array. All I'm wanting is ability to populate this array using preprocessor directives from across whole compiled program, before all other complex stuff starts. I think that would be beautiful.

>
> UFCS is widely used in D for component programming:
>
> http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321

I'm not stating the opposite, just sharing what I encountered on work or during programming for fun - I mostly needed to add fields. People may feel otherwise, but I don't see this concept harming them in any way.