Jump to page: 1 2
Thread overview
How to map machine instctions in memory and execute them? (Aka, how to create a loader)
Jun 06, 2022
rempas
Jun 06, 2022
Alain De Vos
Jun 06, 2022
rempas
Jun 06, 2022
Adam D Ruppe
Jun 06, 2022
rempas
Jun 06, 2022
Guillaume Piolat
Jun 06, 2022
rempas
Jun 06, 2022
Johan
Jun 06, 2022
rempas
Jun 08, 2022
rempas
Jun 09, 2022
max haughton
June 06, 2022

I tried to find anything that will show code but I wasn't able to find anything expect for an answer on stackoverflow. I would find a lot of theory but no practical code that works. What I want to do is allocate memory (with execution mapping), add the machine instructions and then allocate another memory block for the data and finally, execute the block of memory that contains the code. So something like what the OS loader does when reading an executable. I have come with the following code:

import core.stdc.stdio;
import core.stdc.string;
import core.stdc.stdlib;
import core.sys.linux.sys.mman;

extern (C) void main() {
  char* data = cast(char*)mmap(null, cast(ulong)15, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
  memset(data, 0x0, 15); // Default value

  *data = 'H';
  data[1] = 'e';
  data[2] = 'l';
  data[3] = 'l';
  data[4] = 'o';
  data[5] = ' ';

  data[6] = 'w';
  data[7] = 'o';
  data[8] = 'r';
  data[9] = 'l';
  data[10] = 'd';
  data[11] = '!';

  void* code = mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
  memset(code, 0xc3, 500); // Default value

  /* Call the "write" and "exit" system calls*/
  // mov rax, 0x04
  *cast(char*)code = 0x48;
  *cast(char*)(code + 1) = 0xC7;
  *cast(char*)(code + 2) = 0xC0;
  *cast(char*)(code + 3) = 0x04;
  *cast(char*)(code + 4) = 0x00;
  *cast(char*)(code + 5) = 0x00;
  *cast(char*)(code + 6) = 0x00;

  // mov rbx, 0x01
  *cast(char*)(code + 7)  = 0x48;
  *cast(char*)(code + 8)  = 0xC7;
  *cast(char*)(code + 9)  = 0xC3;
  *cast(char*)(code + 10) = 0x01;
  *cast(char*)(code + 11) = 0x00;
  *cast(char*)(code + 12) = 0x00;
  *cast(char*)(code + 13) = 0x00;

  // mov rdx, <wordLen>
  *cast(char*)(code + 14) = 0x48;
  *cast(char*)(code + 15) = 0xC7;
  *cast(char*)(code + 16) = 0xC2;
  *cast(char*)(code + 17) = 12;
  *cast(char*)(code + 18) = 0x00;
  *cast(char*)(code + 19) = 0x00;
  *cast(char*)(code + 20) = 0x00;

  // mov rdx, <location where data are allocated>
  *cast(char*)(code + 21) = 0x48;
  *cast(char*)(code + 22) = 0xC7;
  *cast(char*)(code + 23) = 0xC1;
  *cast(long*)(code + 24) = cast(long)data;
  *cast(char*)(code + 32) = 0x00;

  // int 0x80
  *cast(char*)(code + 33) = 0xcd;
  *cast(char*)(code + 34) = 0x80;

  /* Execute the code */
  (cast(void* function())&code)();
}

I'm 100% sure that the instructions work as I have tested them with another example that creates an ELF executable file and it was able to execute correctly. So unless I copy-pasted them wrong, the instructions are not the problem. The only thing that may be wrong is when I'm getting the location of the "data" "segment". In my eyes, this uses 8 bytes for the memory address (I'm in a 64bit machine) and it takes the memory address the "data" variable holds so I would expect it to work....

Any ideas?

June 06, 2022

Note , it is also possible to do inline assembly with asm{...} or __asm(T) {..}.

June 06, 2022

On Monday, 6 June 2022 at 15:27:12 UTC, Alain De Vos wrote:

>

Note , it is also possible to do inline assembly with asm{...} or __asm(T) {..}.

Thank you for the info! I am aware of that, I don't want to practically do this. I just want to learn how it works. It will be useful when I'll built my own OS.

June 06, 2022
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:
>   void* code = mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);

On a lot of systems, it can't be executable and writable at the same time, it is a security measure.

see https://en.wikipedia.org/wiki/W%5EX


so you might have to mprotect it to remove the write permission before trying to execute it.

idk though
June 06, 2022

On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:

>

Any ideas?

See:
https://github.com/GhostRain0/xbyak
https://github.com/MrSmith33/vox/blob/master/source/vox/utils/mem.d

June 06, 2022
On Monday, 6 June 2022 at 16:08:28 UTC, Adam D Ruppe wrote:
>
> On a lot of systems, it can't be executable and writable at the same time, it is a security measure.
>
> see https://en.wikipedia.org/wiki/W%5EX
>
>
> so you might have to mprotect it to remove the write permission before trying to execute it.
>
> idk though

Thank you! This was very helpful and I can see why it is a clever idea to not allow it (and I love that OpenBSD was the first introducing it!!) and I love security stuff ;)

However, even with "mprotect" or If I just use "PROT_READ" and "PROT_EXEC", it still doesn't work so there should be something else I'm doing wrong...
June 06, 2022

On Monday, 6 June 2022 at 16:24:58 UTC, Guillaume Piolat wrote:

>

See:
https://github.com/GhostRain0/xbyak
https://github.com/MrSmith33/vox/blob/master/source/vox/utils/mem.d

Thank you! And I just noticed that the second source is from Vox!!!!

June 06, 2022

On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:

>
  // mov rdx, <wordLen>
  *cast(char*)(code + 14) = 0x48;
  *cast(char*)(code + 15) = 0xC7;
  *cast(char*)(code + 16) = 0xC2;
  *cast(char*)(code + 17) = 12;
  *cast(char*)(code + 18) = 0x00;
  *cast(char*)(code + 19) = 0x00;
  *cast(char*)(code + 20) = 0x00;

  // mov rdx, <location where data are allocated>
  *cast(char*)(code + 21) = 0x48;
  *cast(char*)(code + 22) = 0xC7;
  *cast(char*)(code + 23) = 0xC1;
  *cast(long*)(code + 24) = cast(long)data;
  *cast(char*)(code + 32) = 0x00;

This instruction is wrong. Note that you are writing twice to RDX, but also that you are using mov sign_extend imm32, reg64 instead of mov imm64, reg64 (0x48 0xBA?). Third, why append an extra zero (*cast(char*)(code + 32) = 0x00;)? That must be a bug too.

cheers,
Johan

June 06, 2022

On Monday, 6 June 2022 at 18:05:23 UTC, Johan wrote:

>

This instruction is wrong. Note that you are writing twice to RDX, but also that you are using mov sign_extend imm32, reg64 instead of mov imm64, reg64 (0x48 0xBA?). Third, why append an extra zero (*cast(char*)(code + 32) = 0x00;)? That must be a bug too.

cheers,
Johan

Thanks! It seems that there is probably a "typo" from the original source that I got the code. The hex values are different however so there is only a mistake in the comment, the code normally works in the example repository (and I made a D version that works too). The padding in the end seems to be necessary else the example doesn't compile (I don't know why, I'm SUPER n00b when it comes to machine language, I don't know almost anything!). I'm also not sure how the "encode" will be for mov imm64, reg64 as I tried to type what you typed in the parenthesis and it doesn't seem to work.

June 08, 2022

On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:

In case someone is wondering, I found an answer in another
forum. The code is the following:

import core.stdc.stdio;
import core.stdc.string;
import core.stdc.stdlib;
import core.sys.posix.sys.mman;

void putbytes(char **code, const char *bytes) {
  uint bt;
  for (int i = 0, n; sscanf(bytes + i, "%x%n", &bt, &n) == 1; i += n)
    { *(*code)++ = cast(char)bt; }
}

void putdata(char **code, char** data) {
  memcpy(*code, data, (*data).sizeof);
  *code += (*data).sizeof;
}

extern (C) void main() {
  char *data = cast(char*)mmap(null, cast(ulong)15, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
  strcpy(data, "Hello world!\n");

  char *code = cast(char*)mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
  char *pos = code;

  // Call the "write" and "exit" system calls
  putbytes(&pos, "48 C7 C0 1 0 0 0");    // mov rax, 0x01     (write syscall)
  putbytes(&pos, "48 C7 C7 1 0 0 0");    // mov rdi, 0x01     (stdout)
  putbytes(&pos, "48 C7 C2 D 0 0 0");   // mov rdx, 13       (string length)
  putbytes(&pos, "48 BE");                      // movabs rsi, data  (string address)
  putdata(&pos, &data);
  putbytes(&pos, "0F 05");                        // syscall
  putbytes(&pos, "48 C7 C0 3C 0 0 0");  // mov rax, 0x3C     (exit syscall)
  putbytes(&pos, "0F 05");                       // syscall

  // Execute the code
  (cast(void* function())code)();
}
« First   ‹ Prev
1 2