DMC = Digital Mars Compiler? Does Mingw/GDC uses that? I think that both, g++ and GDC compiled binaries, use the mingw runtime, but I'm not sure also.

you right, only dmd uses dmc environment, gdc uses mingw's. 

And I don't think it is I/O bound. It is only around 10MB/s, whereas my HD can do ~100MB/s. Furthermore, on files more compressible, where the speed was higher, the difference between D and C++ was higher too. And if is in fact I/O bound, then D is MORE than 50% slower than C++.

to minimize system load and I/O impact, run the same file in the loop, like 5-10 times, it will be located in kernel cache. 

The difference is likely because of differences in external C libraries.
Both, the D and C++ versions, use C's stdio library. What is the difference?

probably because gdc backend make worse job on optimizing D AST vs C++ AST. I've got following results:

C:\D>echo off 
C++ compress a.doc
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.36 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.36 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.33 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.34 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.34 s.
"C++ decompress"
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.50 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.51 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.51 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.50 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.50 s.
"D compress"
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.11 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.09 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.08 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.09 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.08 s.
"D decompress"
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.17 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.19 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.19 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.22 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.25 s.

So, what's up? I'm used the same backend too, but DMC (-o -6) vs DMD (-release -inline -O -noboundscheck).
I don't know DMC optimization flags, so probably results might be better for it.

Lets try to compile by MS CL (-Ox)

C:\D>echo off 
C++ compress a.doc
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.03 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.02 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.04 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.03 s.
a.doc (2694428 bytes) -> a.doc.cmp (1459227 bytes) in 1.01 s.
"C++ decompress"
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.08 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.06 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.07 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.07 s.
a.doc.cmp (1459227 bytes) -> a.doc.cmp.or (2694428 bytes) in 1.07 s.
"D compress"
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.08 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.09 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.09 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.09 s.
a.doc (2694428 bytes) -> a.doc.dmp (1459227 bytes) in 1.09 s.
"D decompress"
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.15 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.17 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.19 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.17 s.
a.doc.dmp (1459227 bytes) -> a.doc.dmp.or (2694428 bytes) in 1.17 s.

Much better for C++, but D is not so worse and about 1.1*C++ too.

What we see - different compiler, different story. We should not compare languages for performance but compilers!
So many differencies in compilers and environment and definetelly C++ is much more mature for performance now but also D has own benefits (faster development/debugging and more reliable code).

PS. BTW, this code still can be optimized quite a lot.