Self-modifying code! The real kind!

Apr 05, 2017

Jethro

Apr 06, 2017

Swoorup Joshi

Apr 07, 2017

Apr 07, 2017

Apr 07, 2017

Apr 07, 2017

April 05, 2017

Self-modifying code! The real kind!

Posted by Jethro

Permalink

Jethro

Permalink

I think it would be pretty novel to have the concept of self modifying code.

I have several use cases in D where I have to repeat a process over and over such as compile, change some line, then recompile to get the effect.

The main one has to do with mixins.

//version = compiledMixins;

version(compiledMixins)
{
    import fooMixedInFile;
} else
{
    WriteFile(foo, fooMixedInFile);
    ModFile(this, "//version = compiledMixins" => "version = compiledMixins");
}

This hypothetical code, when compiled behaves like this:


1. evaluates the code string represented by foo and writes it to the file fooMixinInFile(.d).

2. Modifies the current file and uncomments the version = compiled Mixins.

3. (hit compile again, or automate somehow)

4. imports fooMixedInFile instead.


What this does, instead of exposing the mixin(foo), is instead write the mixin to a file and imports that on next build so D can parse it and return errors properly.

This works well in practice except that WriteFile and ModFile actually have to be ran at runtime requiring step 3 to also include a "dummy" run, e.g.,

version(compiledMixins)
{
    void main()
    {
        WriteFile(foo, fooMixedInFile);
        ModFile(this, "//version = compiledMixins" => "version = compiledMixins");
    }

} else {

}

Essentially this method allows debugging mixins as code with the only requirement that one build/dummy run.

The main problem is I create hacks to do it. It would be nice for a general purpose solution in a nice package. Many would benefit from being able to debug mixins as if they were code, which the process above allows. Not only that, which a little big of work, one could match the output of a mixin to the line of code that generated it to get an accurate way to find bugs in the mixin code vs it's output.

I am only talking about string mixins here, of course, and the import would have to be a valid way to run them(which possibly may not work for certain types of string mixins... but works in the majority of cases).

The self modifying code(The ModFile line) is interesting but probably requires a good D parser to be robust.

I have a feeling that the compiler could do the job internally much better and completely encapsulate all the work.

Essentially,

1. Evaluate the string mixin(doesn't actually insert it yet).
2. compute hash
3. match hash to mixin's file backing. If not matched or doesn't exist, write string to file.
4. import the file instead of the mixin code. (or, if you want, mixin(import(file) but one would need to fixup the the debugging a little)

This has 3 advantages: 1. Can be done completely by the compiler when it encounters a mixin statement and doesn't change anything for the user. 2. Allows both the string generating code to be debugged(compiler will catch those errors first) and the output of the mixin(caught on the second compile). 3. No recompilation required.

Thoughts?

Self-modifying might be the answer to all sorts of performance problems due to branching. Only problem is security I guess. Don't they disable writes to code segment anyway? On Wednesday, 5 April 2017 at 22:21:23 UTC, Jethro wrote: > I think it would be pretty novel to have the concept of self modifying code. > > [...]

On Thu, Apr 06, 2017 at 05:36:52AM +0000, Swoorup Joshi via Digitalmars-d wrote: > Self-modifying might be the answer to all sorts of performance problems due to branching. Only problem is security I guess. Don't they disable writes to code segment anyway? [...] I don't think the OP was talking about self-modifying code in that sense. I think he was talking about a program that modifies its own *source code*, which is a different thing than a program that modifies its own machine code while that machine code is running. T -- Give a man a fish, and he eats once. Teach a man to fish, and he will sit forever.

On Thursday, 6 April 2017 at 05:36:52 UTC, Swoorup Joshi wrote: > Self-modifying might be the answer to all sorts of performance problems due to branching. No it's not! You are throwing away your i-cache AND mess up the branch prediction.

On Friday, 7 April 2017 at 18:54:10 UTC, H. S. Teoh wrote: > On Thu, Apr 06, 2017 at 05:36:52AM +0000, Swoorup Joshi via Digitalmars-d wrote: >> Self-modifying might be the answer to all sorts of performance problems due to branching. Only problem is security I guess. Don't they disable writes to code segment anyway? > [...] > > I don't think the OP was talking about self-modifying code in that sense. I think he was talking about a program that modifies its own *source code*, which is a different thing than a program that modifies its own machine code while that machine code is running. > > > T Yeah, that's what I mean. Basically D's meta programming accomplishes the same effect for the most part but it is somewhat limited. Mainly since one can't write to files for "security" reasons(I'd like to know of any real world security issues that this has caused!).

On Friday, 7 April 2017 at 20:43:52 UTC, Stefan Koch wrote: > On Thursday, 6 April 2017 at 05:36:52 UTC, Swoorup Joshi wrote: >> Self-modifying might be the answer to all sorts of performance problems due to branching. > > No it's not! You are throwing away your i-cache AND mess up the branch prediction. From the opening statement it looks and sounds more like loading and unloading DLL files... rather than self-modifying code. Self modifying code isn't really that practical anymore, the best example working is compressed executables (UPX and similar), but those only expand optimized code from a compressed cache and then changes the block to executable, it doesn't really modify the code at all. Perhaps an actual use case for self-modifying code would be to give you a quick & dirty compile for a function, and then work on optimizing it, then switch the calls appropriately to the new function once it's optimized, which is more useful to for say JIT circumstances and emulation, and less in statically known source code.

Forums