August 23, 2021
On Mon, Aug 23, 2021 at 05:37:01PM +0000, russhy via Digitalmars-d wrote:
> On Monday, 23 August 2021 at 07:52:01 UTC, Imperatorn wrote:
[...]
> > Interesting. I know people say benchmarks aren't important, but I disagree. I think it's healthy to compare from time to time 👍
> 
> I agree

I wouldn't say benchmarks aren't *important*, I think the issue is how you interpret the results.  Sometimes people read too much into it, i.e., over-generalize it far beyond the scope of the actual test, forgetting that a benchmark is simply just that: a benchmark. It's a rough estimation of performance in a contrived use case that may or may not reflect real-world usage.

Many of the simpler benchmarks consist of repeating a task 100000... times, which can be useful to measure, but hardly represents real-world usage.  You can optimize the heck out of memcpy (substitute your favorite function to measure here) until it beats everybody else in the benchmark, but that does not necessarily mean it will actually make your real-life program run faster. It may, it may not.  Because after running the 100th time, all the code/data you're using has gone into cache, so it will run much faster than the first few iterations.  But in a real program, you usually don't need to call memcpy (etc) 100000 times in a row. Usually you need to do something else in between, and that something else may cause the memcpy-related data to be evicted from cache, change the state of the branch predictor, etc..  So your ultra-optimized code may not behave the same way in a real-life program vs. the benchmark, and may turn out to be actually slower than expected.

Note that this does not mean optimizing for a benchmark is completely worthless; usually the first few iterations represents actual bottlenecks in the code, and improving that will improve performance in a real-life scenario.  But up to a certain point.  Trying to optimize beyond that, and you risk the danger of optimizing for your specific benchmark instead of the general use-case, and therefore may end up pessimizing the code instead of optimizing it.

Real-life example: GNU wc, which counts the number of lines in a text file.  Some time ago I did a comparison with several different D implementations of the same program that I wrote, and discovered that because GNU wc uses glibc's memchr, which was ultra-optimized for scanning large buffers, if your text files contains long lines then wc will run faster; but if the text file contains many short lines, a naïve File.byLine / foreach (ch; line) D implementation will actually beat wc. The reason is that glibc's memchr implementation is optimized for large buffers, and has a rather expensive setup overhead for the main fast-scanning loop.  For long lines, the fast-scan more than outweighs this overhead, but for short lines, the overhead dominates the running time, so a naïve char-by-char implementation works faster.

I don't know the history behind glibc's memchr implementation, but I'll bet that at some point somebody came up with a benchmark that tests scanning of large buffers and said, look, this complicated bit-hack fast-scanning loop improves the benchmark by X%!  So the code change was accepted.  But not realizing that this fast scanning of large buffers comes at the cost of pessimizing the small buffers case.

Past a certain point, optimization becomes a trade-off, and which route you go depends on your specific application, not some general benchmarks that do not represent real-world usage patterns.


T

-- 
Famous last words: I *think* this will work...
August 23, 2021

On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:

>

JIT isn't something you want if you need fast execution time

?

I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.

August 23, 2021

On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:

>

On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:

>

JIT isn't something you want if you need fast execution time

?

I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.

that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)

August 24, 2021

On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:

>

that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)

What does go have to do with aot development?

August 24, 2021

On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:

>

On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:

>

On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:

>

JIT isn't something you want if you need fast execution time

?

I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.

that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)

AOT in C#/Java is only to speed up startup times. It doesn't make anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.html

August 24, 2021

On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:

>

On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:

>

JIT isn't something you want if you need fast execution time

?

I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.

JIT is a very good compromise for programming languages that are intended for application development and also server side development.

C# makes it very easy to call C functions, the Java people are also working on making calls to C functions easy.

For application development using a jitter and moving performance intensive features into C programs is an approach that has shown to work well already in the old times about 20 years ago when Smalltalk had a time where it was doing commercially well.

August 24, 2021

On Tuesday, 24 August 2021 at 09:29:03 UTC, Bienlein wrote:

>

On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:

>

On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:

>

On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:

>

JIT isn't something you want if you need fast execution time

?

I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.

that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)

AOT in C#/Java is only to speed up startup times. It doesn't make anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.html

R2R is not true AOT, it still ship JIT and IL and recompile code at runtime, this benchmarkgame is flawed and a pure lie

For true AOT you need to use NativeAOT (old-CoreRT)

August 24, 2021

On Tuesday, 24 August 2021 at 09:29:03 UTC, Bienlein wrote:

>

On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:

>

On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:

>

On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:

>

JIT isn't something you want if you need fast execution time

?

I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.

that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)

AOT in C#/Java is only to speed up startup times. It doesn't make anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.html

That doesn't use all AOT options available to C# and Java.

C# has Xamarin AOT (for Android and iOS), IL2CPP used by Unity and Burst compiler, UWP .NET Native, Windows 8 Bartok, and several community projects.

Full AOT on regular .NET is coming in .NET 6 with final touches in .NET 7.

Regarding Java, like C and C++, there are plenty of implementations to choose from, a couple of commercial JDKs with proper AOT.

Then both OpenJDK and OpenJ9 do JIT cache between runs, which gets improved each time the application runs thanks PGO profiles.

OpenJ9 and Azul go one step further by having AOT/JIT compiler daemons that generates native code with PGO data from the whole cluster.

Finally Android, despite not being really Java, uses an hand written Assembly interpreter for fast startup, then JIT, and when the device is idle, the JIT code gets AOT compiled with PGO gathered during each run. Starting with Android 10 the PGO profiles are shared across devices via the play store, so that AOT compilation can be done right away skipping the whole interpreter/JIT step.

Really, the benchmarks game is a joke, because they only use the basic FOSS tooling available to them.

And after 25/20 years apparently plenty still don't know Java and .NET ecosystems as they should.

August 24, 2021

On Tuesday, 24 August 2021 at 20:00:06 UTC, Paulo Pinto wrote:

>

On Tuesday, 24 August 2021 at 09:29:03 UTC, Bienlein wrote:

>

On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:

>

On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:

>

On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:

>

JIT isn't something you want if you need fast execution time

?

I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.

that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)

AOT in C#/Java is only to speed up startup times. It doesn't make anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.html

That doesn't use all AOT options available to C# and Java.

C# has Xamarin AOT (for Android and iOS), IL2CPP used by Unity and Burst compiler, UWP .NET Native, Windows 8 Bartok, and several community projects.

Full AOT on regular .NET is coming in .NET 6 with final touches in .NET 7.

Regarding Java, like C and C++, there are plenty of implementations to choose from, a couple of commercial JDKs with proper AOT.

Then both OpenJDK and OpenJ9 do JIT cache between runs, which gets improved each time the application runs thanks PGO profiles.

OpenJ9 and Azul go one step further by having AOT/JIT compiler daemons that generates native code with PGO data from the whole cluster.

Finally Android, despite not being really Java, uses an hand written Assembly interpreter for fast startup, then JIT, and when the device is idle, the JIT code gets AOT compiled with PGO gathered during each run. Starting with Android 10 the PGO profiles are shared across devices via the play store, so that AOT compilation can be done right away skipping the whole interpreter/JIT step.

Really, the benchmarks game is a joke, because they only use the basic FOSS tooling available to them.

And after 25/20 years apparently plenty still don't know Java and .NET ecosystems as they should.

Android is the worst of all

A compiler in a supposed resource constrained device that runs on a limited power source (battery), good on Apple to forbid JIT in their store one of their best move

1 2 3
Next ›   Last »