July 27, 2017
I would like to encode code in such a way that each compilation produces "random" code as compared to what the previous compilation produced, but ultimately the same code is ran each time(same effect).

Basically we can code a function that does a job X in many different ways. Each way looks different in binary but does the same job(Same effect). I'd like a way to sort of randomly sample/generate the different functions that do the same job.


The easiest way to wrap your head around this is to realize that certain instructions and groups of instructions can be rearranged, producing a binary that is different but the effect is the same. Probably, ultimately, that is all that can be done(certain other tricks could possibly be added to increase the sampling coverage such as nop like instructions, dummy instructions, etc).

The main issue is how to take an actual D function and transform it in to a new D function, which, when ran, ultimately does the same thing as the original but is not the same "binary".

Encrypting is a subset of this problem as we can take any string and use it to encode the code then decrypt it. And this may be usable, but then the encryption and decryption instructions must somehow be randomizable, else we are back at square one. It might be easier though, to use the encryption method to randomize the original function since the encryption routine is known while the original function is not(as it could be any function).

I'm not looking for a mathematical solution, just something that works well in practice. i.e., The most skilled human reading the disassembly would find it very difficult to interpret what is going on. He might be able to figure out one encryption routine, say, but when he sees the "same code"(same effect) he will have to start from scratch to understand it because its been "randomized".

The best way I can see how to do this is to have a list of well known encoding routines that take an arbitrary function, encrypt it. Each routine can be "randomized" by using various techniques to disguise it such as those mentioned earlier. This expands the list of functions tremendously. If there are N functions and M different ways to alter each of those functions then there are N*M total functions that we can use to encrypt the original function. If we further allow function composition of these functions, then we can get several orders of magnitude of complexity with such as a few N.

The goal though, is to do this efficiently and effectively in a way that can be amended. It will be useful in copy protection and used with other techniques to make it much more effective. Ultimately the weak point with the encryption techniques is decryption the functions but by composing encryption routines makes it stronger.

Any ideas how to achieve this in D nicely?