August 14, 2021

On Saturday, 14 August 2021 at 14:14:08 UTC, Guillaume Piolat wrote:

>

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

>

It's a simple benchmark examining:

  • execution time (sec)
  • memory consumption (kb)
  • binary size (kb)
  • conciseness of a programming language (lines of code)

Link

Using the intel-intrinsics package you can do 4x exp or log operations at once.

I know both D and C can theoretically reach the same level of performance, but why does C always lead by a few milliseconds? What is it that we aren't doing? Is it the implementation's fault? The optimizer? What can we do for those precious few milliseconds?

It's so frustrating to see C/C++ always being the winners in the absolute sense, and we always end up making the argument about how much more painstaking it is to actually create a complete program in those languages only for negligibly better performance.

Do these benchmarks even matter if it's all about the quality of implementation?

Sorry if I'm sounding a little bitter.

August 14, 2021

On Saturday, 14 August 2021 at 14:29:16 UTC, max haughton wrote:

>

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

>

It's a simple benchmark examining:

  • execution time (sec)
  • memory consumption (kb)
  • binary size (kb)
  • conciseness of a programming language (lines of code)

Link

If anyone is wondering why the GDC results look a bit weird: It's because GDC doesn't actually inline unless you compile with LTO or enable whole program optimization (The rationale is due to the interaction of linking with templates).

https://godbolt.org/z/Gj8hMjEch play with removing the '-fwhole-program' flag on that link.

A little more: I got the performance down to be less awful by using LTO (and found an LTO ICE in the process...), but as far as I can tell the limiting factor for GDC is that it's standard library by default doesn't seem to compile with either inlining or LTO support enabled so cycles are being wasted on (say) calling IsNaN sadly.

I also note that X87 code is generated in Phobos, which could be hypothetically required for the necessary precision on a generic target, but is probably quite slow.

August 14, 2021

On Saturday, 14 August 2021 at 16:20:21 UTC, Tejas wrote:

>

It's so frustrating to see C/C++ always being the winners in the absolute sense, and we always end up making the argument about how much more painstaking it is to actually create a complete program in those languages only for negligibly better performance.

Do these benchmarks even matter if it's all about the quality of implementation?

If you pay me I can produce a faster D version of whatever small program you want.
But the reality is that noone really needs those benchmark programs, and thinking about optimizing them is an adequate punishment for writing them.

The only things I can think of where C++ could wins a bit against D was that the ICC compiler could auto-vectorize transcendentals. Like logf in a loop, which LLVM doesn't do.
But the ICC compiler has been moving to LLVM recently.

When your compiler see the same IR from different front-end language, in the end it is the same codegen.

August 14, 2021

On Saturday, 14 August 2021 at 19:28:42 UTC, Guillaume Piolat wrote:

>

On Saturday, 14 August 2021 at 16:20:21 UTC, Tejas wrote:

>

[...]

If you pay me I can produce a faster D version of whatever small program you want.
But the reality is that noone really needs those benchmark programs, and thinking about optimizing them is an adequate punishment for writing them.

The only things I can think of where C++ could wins a bit against D was that the ICC compiler could auto-vectorize transcendentals. Like logf in a loop, which LLVM doesn't do.
But the ICC compiler has been moving to LLVM recently.

When your compiler see the same IR from different front-end language, in the end it is the same codegen.

ICC has moved to LLVM. Past tense now, sadly.

August 15, 2021

On Saturday, 14 August 2021 at 10:26:52 UTC, John Colvin wrote:

>

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

>

It's a simple benchmark examining:

  • execution time (sec)
  • memory consumption (kb)
  • binary size (kb)
  • conciseness of a programming language (lines of code)

Link

Lots of things to improve there.

https://github.com/rillki/humble-benchmarks/pull/4

A nice quick morning exercise :)

I have added the proposed changes. The performance of D increased to almost that of C with ~1-2 seconds difference if using LDC!

The betterC version is still slightly faster though.

To sum up:

Clang C           9.1 s
Clang C++         9.4 s
LDC Das betterC   10.3 s
LDC D libC math   12.2 s
Rust              13 s

Thank you John for you invaluable help! I didn't know that Phobos math is twice as slow as libC math.

August 15, 2021

On Sunday, 15 August 2021 at 09:20:56 UTC, Ki Rill wrote:

>

On Saturday, 14 August 2021 at 10:26:52 UTC, John Colvin wrote:

>

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

>

It's a simple benchmark examining:

  • execution time (sec)
  • memory consumption (kb)
  • binary size (kb)
  • conciseness of a programming language (lines of code)

Link

Lots of things to improve there.

https://github.com/rillki/humble-benchmarks/pull/4

A nice quick morning exercise :)

I have added the proposed changes. The performance of D increased to almost that of C with ~1-2 seconds difference if using LDC!

The betterC version is still slightly faster though.

To sum up:

Clang C           9.1 s
Clang C++         9.4 s
LDC Das betterC   10.3 s
LDC D libC math   12.2 s
Rust              13 s

Thank you John for you invaluable help! I didn't know that Phobos math is twice as slow as libC math.

I could be wrong but I think our routines internally use the max precision when the can, so they are slower but they are also more precise in the internals (where allowed by the platform). You could probably test this by running these benchmarks on ARM or similar.

August 23, 2021

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

>

It's a simple benchmark examining:

  • execution time (sec)
  • memory consumption (kb)
  • binary size (kb)
  • conciseness of a programming language (lines of code)

Link

Interesting. I know people say benchmarks aren't important, but I disagree. I think it's healthy to compare from time to time 👍

August 23, 2021

On Saturday, 14 August 2021 at 14:08:05 UTC, Ki Rill wrote:

> >

If you really want performance, you can determine which case applies to your code and then make the underlying .Call yourself. If you don't do that, you're comparing Fisher's exact test against a routine that does a lot more than Fisher's exact test. In any event, you're not comparing against an R implementation of this test.

That is the point of this benchmark, to test it against Python/R implementation irrespective of what it does additionally. And to test compiled languages in general.

That might have been the point of your benchmark, but that doesn't mean the benchmark is meaningful, in this case for at least three reasons:

  1. You're measuring the performance of completely different tasks in R and C, where the R task is much bigger.
  2. What you've done is only one way to use R. Anyone that wanted performance would use .Call rather than what you're doing.
  3. R has a JIT compiler, and you're likely not making use of it.

The comparison against R is not what you're after anyway. If you don't want to do it in a way that's meaningful - and that's perfectly understandable - it's best to delete it.

August 23, 2021

On Monday, 23 August 2021 at 13:12:21 UTC, bachmeier wrote:

>

On Saturday, 14 August 2021 at 14:08:05 UTC, Ki Rill wrote:

> >

If you really want performance, you can determine which case applies to your code and then make the underlying .Call yourself. If you don't do that, you're comparing Fisher's exact test against a routine that does a lot more than Fisher's exact test. In any event, you're not comparing against an R implementation of this test.

That is the point of this benchmark, to test it against Python/R implementation irrespective of what it does additionally. And to test compiled languages in general.

That might have been the point of your benchmark, but that doesn't mean the benchmark is meaningful, in this case for at least three reasons:

  1. You're measuring the performance of completely different tasks in R and C, where the R task is much bigger.
  2. What you've done is only one way to use R. Anyone that wanted performance would use .Call rather than what you're doing.
  3. R has a JIT compiler, and you're likely not making use of it.

The comparison against R is not what you're after anyway. If you don't want to do it in a way that's meaningful - and that's perfectly understandable - it's best to delete it.

JIT isn't something you want if you need fast execution time

And nobody gonna warm JIT 1000000 times to call a task, you want result immediately

That's why languages like R/C#/Java sucks and cheat at benchmark, they are only reliable if the programs is calling the same code 100000000 times, wich never happen, except under heavy load, wich also almost never happen for most use cases other than webdev; and even then you have crappy execution time because of cold startup

This benchmarks even mention it:

>

It's a simple benchmark examining:

  • execution time (sec)
  • memory consumption (kb)
  • binary size (kb)
  • conciseness of a programming language (lines of code)
August 23, 2021

On Monday, 23 August 2021 at 07:52:01 UTC, Imperatorn wrote:

>

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

>

It's a simple benchmark examining:

  • execution time (sec)
  • memory consumption (kb)
  • binary size (kb)
  • conciseness of a programming language (lines of code)

Link

Interesting. I know people say benchmarks aren't important, but I disagree. I think it's healthy to compare from time to time 👍

I agree