July 04, 2015
https://issues.dlang.org/show_bug.cgi?id=14770

          Issue ID: 14770
           Summary: std.process should use lightweight forks where
                    available
           Product: D
           Version: D2
          Hardware: x86_64
                OS: All
            Status: NEW
          Severity: normal
          Priority: P1
         Component: phobos
          Assignee: nobody@puremagic.com
          Reporter: rsw0x@rsw0x.me

fork(even with CoW pages) can be extremely slow with modern memory usages due to the antiquated default page size of OSes because it requires copying the page table.

For example, a program using 400MiB of memory with 4KiB pages requires copying 102,400 page structs. On Linux, a page struct in the kernel is at least 72 bytes so this requires a copy of 7.2MiB.

This scales linearly with heap size, after some personal testing on Linux I found forking with almost nothing allocated to take ~60 microseconds, 100 MiB ~3 milliseconds, and 1GiB to take 30-45 milliseconds. vfork took a constant 20 microseconds no matter the heap size.

Solution: where available, use posix_spawn, vfork, etc. These do not require copying page tables. std.process does not require the page tables since it immediately replaces itself with another process.

I marked this all OSes, but really it's all POSIX OSes I guess.

--