December 17

On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka wrote:

>
test normal cached
>

| echo " " >> bench.go && time go run bench.go | 0.15s | 0.13s |

This is not a correct test for go. You should remove all cached artifacts in ${HOME}/go too.

December 17

On Friday, 8 December 2023 at 10:07:48 UTC, Sergey wrote:

>

On Thursday, 7 December 2023 at 16:32:32 UTC, Witold Baryluk wrote:

>

I do not use dmd -run too often, but recently was checking

Interesting project. Can you make it in the repo? Maybe others will send PRs for other implementation and versions of compilers..
It could be interesting metric. Similar idea of the repo (compilation only) for example here: https://github.com/nordlow/compiler-benchmark

I think a better option that comparing to other languages (too many variables), would be to track performance of the compiler on a public dashboard.

Compile few variants, and track compile time to object code, object code size, linking time, final executable size, and final executable runtime and peak memory usage of compiler and executable:

Few variants:

  • No phobos, just some constructs, plus maybe extern(C) printf for IO. Maybe few files (one with no templates, one with some templates, another with some mixins and CTFEs).
  • Few minor things from phobos imported (i.e. std.stdio, std.range, and maybe 1 or 2 more things) and some representative functions used from there.

Something like this for Mozilla in the past https://arewefastyet.com/win10/benchmarks/overview?numDays=60 https://awsy.netlify.app/win10/memory/overview?numDays=60

Or similar to this https://fast.vlang.io/ for V programming language (as you can see there, they can compile entire compiler in about a second, and hello world compile and link in 90ms - which is actually faster than when they started with the project).

December 17

On Sunday, 17 December 2023 at 06:59:58 UTC, Witold Baryluk wrote:

>

On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka wrote:

>

| echo " " >> bench.go && time go run bench.go | 0.15s | 0.13s |

This is not a correct test for go. You should remove all cached artifacts in ${HOME}/go too.

This properly simulates real usage (fast edit + compile + run cycles). Whereas removing all cached artifacts in ${HOME}/go does not simulate real usage.

Running touch was not enough to prevent Nim from reusing the cached version. Appending a single space character to the source code on each test iteration resolved this problem.

December 17

On Sunday, 17 December 2023 at 06:40:33 UTC, Witold Baryluk wrote:

>

On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright wrote:

>

It would be illuminating to compare using printf rather than writefln.

Ok, added extern(C) int printf(const char *format, ...);, and measured various variants.

I don't think that there's much practical value in testing printf because it's not compatible with @safe and can't be recommended for developing normal D applications. This scenario gets out of touch with reality and becomes way too artificial.

>
  • mold 2.3.3 and mold 2.4.0, segfault, when using with dmd or ldc2.

https://github.com/dlang/dmd/pull/15915 improves dmd's compatibility with mold 2.4.0 or maybe even fixes all problems if we are optimistic.

>

[...]

stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.di", 0x7ffecfc2a0f0) = -1 ENOENT (No such file or directory)
stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d", {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
stat("/usr/include/dmd/druntime/import/core/internal/array/comparison.d", {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
openat(AT_FDCWD, "/usr/include/dmd/druntime/import/core/internal/array/comparison.d", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=7333, ...}) = 0
read(3, "/**\n * This module contains comp"..., 7333) = 7333
close(3)

Each of these syscalls is about 15μs on my system (when stracing, probably little less in real run without strace overheads)

There should be a way to reduce this in half with smarter sequencing (i.e. do open first instead of stat + open + fstat).

That's an interesting discovery. Now it's necessary to implement a proof of concept patch for dmd to check how much this can actually help in reality. Can you try this?

December 17

On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka wrote:

>

That's what any normal user would see if they compare compilers out of the box in the most straightforward manner without tweaking anything.

Now if we start tweaking things, then it makes sense to use dub or dmd -run instead of rdmd, because rdmd just wastes precious milliseconds for nothing. Then there's shared vs. static Phobos and the possibility to use a faster linker (mold). Here's another comparison table (with a "printf" variant from https://forum.dlang.org/post/pfmgiokvucafwbuldjaj@forum.dlang.org added too), all timings are for running the program immediately after editing its source:

test static shared static+mold shared+mold
rdmd bench_writefln.d 1.21s 0.98s 1.02s 0.93s
dub bench_writefln.d 0.84s 0.60s 0.63s 0.55s
dmd -run bench_writefln.d 0.80s 0.55s 0.60s 0.51s
--------------------------- -------- -------- ------------- -------------
rdmd bench_writeln.d 0.60s 0.38s 0.43s 0.34s
dub bench_writeln.d 0.50s 0.27s 0.32s 0.23s
dmd -run bench_writeln.d 0.47s 0.23s 0.28s 0.19s
--------------------------- -------- -------- ------------- -------------
rdmd bench_printf.d 0.33s 0.13s 0.18s 0.09s
dub bench_printf.d 0.34s 0.14s 0.19s 0.10s
dmd -run bench_printf.d 0.31s 0.10s 0.15s 0.06s

The top left corner represents the current out of the box experience (rdmd and static Phobos library linked by bfd). The bottom right corner represents the potential for improvement after tweaking both code and the compiler setup (dmd -run and shared Phobos library linked by mold). I still don't think that the printf variant represents a typical D code, but the other writefln/writeln variants are legit. Compare this with the Go results (0.15s) from https://forum.dlang.org/post/dcggscrhrtxkyqmkljpm@forum.dlang.org

For this test I rebuilt DMD 2.106.0 from sources (make -f posix.mak HOST_DMD=ldmd2 ENABLE_RELEASE=1 ENABLE_LTO=1) with the mold fix applied and used the LDC 1.32.0 binary release to compile it. This was done in order to match the configuration of the DMD 2.106.0 binary release as close as possible. If anyone wants to reproduce this test too, please don't forget to recompile Phobos & druntime because just replacing the DMD binary alone is not enough to make mold work.

December 17

On Sunday, 17 December 2023 at 09:14:31 UTC, Siarhei Siamashka wrote:

>

On Sunday, 17 December 2023 at 06:40:33 UTC, Witold Baryluk wrote:

>

On Saturday, 16 December 2023 at 03:31:05 UTC, Walter Bright wrote:

>

It would be illuminating to compare using printf rather than writefln.

Ok, added extern(C) int printf(const char *format, ...);, and measured various variants.

I don't think that there's much practical value in testing printf because it's not compatible with @safe and can't be recommended for developing normal D applications. This scenario gets out of touch with reality and becomes way too artificial.

That is your opinion. It is completely invalid, but your opinion.

December 17

On Sunday, 17 December 2023 at 08:17:18 UTC, Siarhei Siamashka wrote:

>

Running touch was not enough to prevent Nim from reusing the cached version. Appending a single space character to the source code on each test iteration resolved this problem.

This is not how I run my tests of Nim. I cleaned all Nim cache instead.

December 17

On Sunday, 17 December 2023 at 12:52:56 UTC, Siarhei Siamashka wrote:

>

On Saturday, 16 December 2023 at 23:35:12 UTC, Siarhei Siamashka wrote:

>

That's what any normal user would see if they compare compilers out of the box in the most straightforward manner without tweaking anything.

Now if we start tweaking things, then it makes sense to use dub or dmd -run instead of rdmd, because rdmd just wastes precious milliseconds for nothing. Then there's shared vs. static Phobos and the possibility to use a faster linker (mold). Here's another comparison table (with a "printf" variant from https://forum.dlang.org/post/pfmgiokvucafwbuldjaj@forum.dlang.org added too), all timings are for running the program immediately after editing its source:

test static shared static+mold shared+mold
rdmd bench_writefln.d 1.21s 0.98s 1.02s 0.93s
dub bench_writefln.d 0.84s 0.60s 0.63s 0.55s
dmd -run bench_writefln.d 0.80s 0.55s 0.60s 0.51s
--------------------------- -------- -------- ------------- -------------
rdmd bench_writeln.d 0.60s 0.38s 0.43s 0.34s
dub bench_writeln.d 0.50s 0.27s 0.32s 0.23s
dmd -run bench_writeln.d 0.47s 0.23s 0.28s 0.19s
--------------------------- -------- -------- ------------- -------------
rdmd bench_printf.d 0.33s 0.13s 0.18s 0.09s
dub bench_printf.d 0.34s 0.14s 0.19s 0.10s
dmd -run bench_printf.d 0.31s 0.10s 0.15s 0.06s

Thank you for your tests. Quite interesting.

I do recommend running each tests few (way more than few), and taking minimums. Tool called hyperfine (packaged in many Linux distros actually), is a good option (do not take average, take minimum). If you do not other precautions (like idle system, controlling cpu boost frequency, and setting performance governor), numbers could be close to meaningless.

December 18

On Sunday, 17 December 2023 at 17:37:34 UTC, Witold Baryluk wrote:

>

If you do not other precautions (like idle system, controlling cpu boost frequency, and setting performance governor), numbers could be close to meaningless.

The numbers are fairly accurate with ~0.01s precision, which is good enough to see the differences. I have a constant CPU clock frequency in my computer without turbo boost or cpufreq. And here's possibly the final table, which additionally measures the impact of taking advantage of PGO (the PGO build instructions are at https://forum.dlang.org/post/vbrxpsqqtfelfpcbclpk@forum.dlang.org):

test static shared shared+pgo shared+pgo+mold
rdmd bench_writefln.d 1.21s 0.98s 0.84s 0.78s
dub bench_writefln.d 0.84s 0.60s 0.53s 0.47s
dmd -run bench_writefln.d 0.80s 0.55s 0.49s 0.43s
--------------------------- -------- -------- ------------ -----------------
rdmd bench_writeln.d 0.60s 0.38s 0.35s 0.30s
dub bench_writeln.d 0.50s 0.27s 0.25s 0.21s
dmd -run bench_writeln.d 0.47s 0.23s 0.21s 0.17s
--------------------------- -------- -------- ------------ -----------------
rdmd bench_printf.d 0.33s 0.13s 0.13s 0.08s
dub bench_printf.d 0.34s 0.14s 0.14s 0.09s
dmd -run bench_printf.d 0.31s 0.10s 0.10s 0.06s

PGO can be potentially tuned by training it on a different set of input data (for example, the Phobos code instead of the DMD testsuite). As an extra experiment, I also tried to replace LDC with GDC for compiling DMD, but the resulting compiler was slow. Changing -O2 to -O3 didn't help. And trying to enable LTO when compiling the DMD compiler crashed GDC.

December 19

On Friday, 8 December 2023 at 10:07:48 UTC, Sergey wrote:

>

Similar idea of the repo (compilation only) for example here: https://github.com/nordlow/compiler-benchmark

I took a look at it and that's an interesting project, albeit somewhat unpolished. It wasn't exactly clear what's the difference between "Check Time", "Compile Time" and "Build Time" from their readme until I checked the sources. Also the table is unsorted ("Sort table primarily by build time and then check time" in still in their TODO list). The difference is that they are generating a large source file with a lot of functions and calls between them to test how fast a compiler can process it. While here in this thread we are primarily looking at a different use case: a smallish program, which imports somewhat largish standard libraries.

Still enabling PGO for the downloadable DMD compiler binary releases is going to also improve nordlow's benchmark results. And using mold as a faster linker would help them too.