Humble benchmark (fisher's exact test) (page 2)

Posted by Tejas
in reply to Guillaume Piolat

Tejas

Posted in reply to Guillaume Piolat

On Saturday, 14 August 2021 at 14:14:08 UTC, Guillaume Piolat wrote:

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

It's a simple benchmark examining:

execution time (sec)
memory consumption (kb)
binary size (kb)
conciseness of a programming language (lines of code)

Using the intel-intrinsics package you can do 4x exp or log operations at once.

I know both D and C can theoretically reach the same level of performance, but why does C always lead by a few milliseconds? What is it that we aren't doing? Is it the implementation's fault? The optimizer? What can we do for those precious few milliseconds?

It's so frustrating to see C/C++ always being the winners in the absolute sense, and we always end up making the argument about how much more painstaking it is to actually create a complete program in those languages only for negligibly better performance.

Do these benchmarks even matter if it's all about the quality of implementation?

Sorry if I'm sounding a little bitter.

August 14, 2021

Posted by max haughton
in reply to max haughton

max haughton

Posted in reply to max haughton

On Saturday, 14 August 2021 at 14:29:16 UTC, max haughton wrote:

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

It's a simple benchmark examining:

execution time (sec)
memory consumption (kb)
binary size (kb)
conciseness of a programming language (lines of code)

If anyone is wondering why the GDC results look a bit weird: It's because GDC doesn't actually inline unless you compile with LTO or enable whole program optimization (The rationale is due to the interaction of linking with templates).

https://godbolt.org/z/Gj8hMjEch play with removing the '-fwhole-program' flag on that link.

A little more: I got the performance down to be less awful by using LTO (and found an LTO ICE in the process...), but as far as I can tell the limiting factor for GDC is that it's standard library by default doesn't seem to compile with either inlining or LTO support enabled so cycles are being wasted on (say) calling IsNaN sadly.

I also note that X87 code is generated in Phobos, which could be hypothetically required for the necessary precision on a generic target, but is probably quite slow.

August 14, 2021

Posted by Guillaume Piolat
in reply to Tejas

Guillaume Piolat

Posted in reply to Tejas

On Saturday, 14 August 2021 at 16:20:21 UTC, Tejas wrote:

Do these benchmarks even matter if it's all about the quality of implementation?

If you pay me I can produce a faster D version of whatever small program you want.
But the reality is that noone really needs those benchmark programs, and thinking about optimizing them is an adequate punishment for writing them.

The only things I can think of where C++ could wins a bit against D was that the ICC compiler could auto-vectorize transcendentals. Like logf in a loop, which LLVM doesn't do.
But the ICC compiler has been moving to LLVM recently.

When your compiler see the same IR from different front-end language, in the end it is the same codegen.

August 14, 2021

Posted by max haughton
in reply to Guillaume Piolat

max haughton

Posted in reply to Guillaume Piolat

On Saturday, 14 August 2021 at 19:28:42 UTC, Guillaume Piolat wrote:

On Saturday, 14 August 2021 at 16:20:21 UTC, Tejas wrote:

[...]

When your compiler see the same IR from different front-end language, in the end it is the same codegen.

ICC has moved to LLVM. Past tense now, sadly.

August 15, 2021

Posted by Ki Rill
in reply to John Colvin

Ki Rill

Posted in reply to John Colvin

On Saturday, 14 August 2021 at 10:26:52 UTC, John Colvin wrote:

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

It's a simple benchmark examining:

execution time (sec)
memory consumption (kb)
binary size (kb)
conciseness of a programming language (lines of code)

https://github.com/rillki/humble-benchmarks/pull/4

Lots of things to improve there.

A nice quick morning exercise :)

I have added the proposed changes. The performance of D increased to almost that of C with ~1-2 seconds difference if using LDC!

The betterC version is still slightly faster though.

To sum up:

Clang C           9.1 s
Clang C++         9.4 s
LDC Das betterC   10.3 s
LDC D libC math   12.2 s
Rust              13 s

Thank you John for you invaluable help! I didn't know that Phobos math is twice as slow as libC math.

August 15, 2021

Posted by max haughton
in reply to Ki Rill

max haughton

Posted in reply to Ki Rill

On Sunday, 15 August 2021 at 09:20:56 UTC, Ki Rill wrote:

On Saturday, 14 August 2021 at 10:26:52 UTC, John Colvin wrote:

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

It's a simple benchmark examining:

execution time (sec)
memory consumption (kb)
binary size (kb)
conciseness of a programming language (lines of code)

https://github.com/rillki/humble-benchmarks/pull/4

Lots of things to improve there.

A nice quick morning exercise :)

I have added the proposed changes. The performance of D increased to almost that of C with ~1-2 seconds difference if using LDC!

The betterC version is still slightly faster though.

To sum up:

Clang C           9.1 s
Clang C++         9.4 s
LDC Das betterC   10.3 s
LDC D libC math   12.2 s
Rust              13 s

Thank you John for you invaluable help! I didn't know that Phobos math is twice as slow as libC math.

I could be wrong but I think our routines internally use the max precision when the can, so they are slower but they are also more precise in the internals (where allowed by the platform). You could probably test this by running these benchmarks on ARM or similar.

August 23, 2021

Posted by Imperatorn
in reply to Ki Rill

Imperatorn

Posted in reply to Ki Rill

On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:

It's a simple benchmark examining:

execution time (sec)
memory consumption (kb)
binary size (kb)
conciseness of a programming language (lines of code)

Interesting. I know people say benchmarks aren't important, but I disagree. I think it's healthy to compare from time to time 👍

August 23, 2021

Posted by bachmeier
in reply to Ki Rill

bachmeier

Posted in reply to Ki Rill

On Saturday, 14 August 2021 at 14:08:05 UTC, Ki Rill wrote:

> >

If you really want performance, you can determine which case applies to your code and then make the underlying .Call yourself. If you don't do that, you're comparing Fisher's exact test against a routine that does a lot more than Fisher's exact test. In any event, you're not comparing against an R implementation of this test.

That is the point of this benchmark, to test it against Python/R implementation irrespective of what it does additionally. And to test compiled languages in general.

That might have been the point of your benchmark, but that doesn't mean the benchmark is meaningful, in this case for at least three reasons:

You're measuring the performance of completely different tasks in R and C, where the R task is much bigger.
What you've done is only one way to use R. Anyone that wanted performance would use .Call rather than what you're doing.
R has a JIT compiler, and you're likely not making use of it.

The comparison against R is not what you're after anyway. If you don't want to do it in a way that's meaningful - and that's perfectly understandable - it's best to delete it.

August 23, 2021

Posted by russhy
in reply to bachmeier

russhy

Posted in reply to bachmeier