December 22, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Steven Schveighoffer wrote:
> I would guess this has something to do with
> the lack of inlining for algorithmic functions.
Yeah, this is almost certainly the problem. I rewrote the code using a traditional C style loop, no external functions, and I'm getting roughly equal performance.
|
December 22, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | Adam D. Ruppe Wrote:
> Steven Schveighoffer wrote:
> > I would guess this has something to do with
> > the lack of inlining for algorithmic functions.
>
> Yeah, this is almost certainly the problem. I rewrote the code using a traditional C style loop, no external functions, and I'm getting roughly equal performance.
So is it justified enough to throw my W's incompetence card on the table at this point? How else it is possible that a simple scripting language with simple JIT optimization heuristics can outperform a performance oriented systems programming language. It seems most D design decisions are based on the perceived performance value (not as aggressively as in C++ groups). I'd like to see how this theory doesn't hold water now?
|
December 22, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Gary Whatmore | Gary Whatmore Wrote: > Andreas Mayer Wrote: > > > To see what performance advantage D would give me over using a scripting language, I made a small benchmark. It consists of this code: > > > > > auto L = iota(0.0, 10000000.0); > > > auto L2 = map!"a / 2"(L); > > > auto L3 = map!"a + 2"(L2); > > > auto V = reduce!"a + b"(L3); > > First note: this is a synthetic toy benchmark. Take it with a grain of salt. It represent in no way the true state of D. True enough. Yet it doesn't make me very optimistic what the final performance tradeoff would be as long as you use high level abstractions. Sure, with D you can always go on the C or even assembly level. > Your mp3 player or file system was doing stuff while executing the benchmark. You probably don't know how to run the test many times and use the average/minimum result for both languages. For example D does not have JIT startup cost so take the minimum result for D, JIT has varying startup speed so take the average or slowest result for Luajit. Compare these. More fair for native code D. Both benchmarks were run under the same conditions. Once the executables were inside the disk cache, the run times didn't vary much. Plus this benchmark already is unfair against LuaJIT: the startup time and the time needed for optimization and code generation are included in the times I gave. The D example on the other hand doesn't include the time needed for compilation. The D compiler needs 360 ms to compile this example. If the comparison were fair and included compilation time in the D timings, D would lose even more. > My guesses are: > > 1) you didn't even test this and didn't use optimizations. -> User error I enabled all dmd optimizations I was aware of. Maybe I forgot some? > 2) whenever doing benchmarks you must compare the competing stuff against all D compilers, cut and paste the object code of different compilers and manually build the fastest executable. That seems like an unreasonable task. Writing the code in assembler would be simpler. But I'm using a high level language because I want to use high level abstractions. Like map and reduce, instead of writing assembler. > 3) you didn't use inline assembler or profiler for D See 2). > 4) you were using unstable Phobos functions. There is no doubt the final Phobos 2.0 will beat Luajit. D *is* a compiler statical language, Luajit just a joke. I used the latest dmd release (and that is very new). As you can see, LuaJIT beats D by far. I wouldn't call it a joke. If a joke beats D, then what is D? This way of argumentation doesn't sound very advantageous for you. > 5) you were using old d runtime garbage collector. One fellow here made a precise state of the art GC which beats even Java's 20 year old GC and C#. Patch your dmd to use this instead. There shouldn't be any GC activity. Ranges work lazily. They don't allocate arrays for the data they are working on. You can post a package with "bleeding edge" dmd and Phobos sources with updated GC and so on. Then I could try that. > > Not intending to start a religious war but if your native code runs slower than *JIT* code, you're doing something wrong. D will always beat JIT. Lua is also a joke language, D is for high performance servers and operating systems. In the worst case, disassemble the luajit program, steal its codes and write it using inline assembler in D. D must win these performance battles. It's technically superior. But D didn't win. Not here. And what was I doing wrong? Please point out. I posted this because I was surprised myself and I thought "that can't be". |
December 22, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | Andreas Mayer wrote:
> Or is D, unlike I thought, not suitable for high performance computing? What
> should I do?
I notice you are using doubles in D. dmd currently uses the x87 to evaluate doubles, and on some processors the x87 is slow relative to using the XMM instructions. Also, dmd's back end doesn't align the doubles on 16 byte boundaries, which can also slow down the floating point on some processors.
Both of these code gen issues with dmd are well known, and I'd like to solve them after we address higher priority issues.
If it's not clear, I'd like to emphasize that these are compiler issues, not D language issues.
|
December 22, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright Wrote: > I notice you are using doubles in D. dmd currently uses the x87 to evaluate doubles, and on some processors the x87 is slow relative to using the XMM instructions. Also, dmd's back end doesn't align the doubles on 16 byte boundaries, which can also slow down the floating point on some processors. Using long instead of double, it is still slower than LuaJIT (223 ms on my machine). Even with int it still takes 101 ms and is at least 3x slower than LuaJIT. > Both of these code gen issues with dmd are well known, and I'd like to solve them after we address higher priority issues. > > If it's not clear, I'd like to emphasize that these are compiler issues, not D language issues. I shouldn't use D now? How long until it is ready? |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote:
> Andreas Mayer wrote:
>> Or is D, unlike I thought, not suitable for high performance computing? What
>> should I do?
I forgot to mention. In the D version, use integers as a loop counter, not doubles.
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | Andreas Mayer: > To see what performance advantage D would give me over using a scripting language, I made a small benchmark. It consists of this code: I have done (and I am doing) many benchmarks with D, and I too have seen similar results. I have discussed this topic two times in past, this was one time: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=110419 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=110420 For Floating-point-heavy code Lua-JIT is often faster than D compiled with DMD. I have found that on the SciMark2 benchmark too Lua-JIT is faster than D code compiled with DMD. On the other hand if I use LDC I am often able to beat LuaJIT 2.0.0-beta3 (we are now at beta5) (if the D code doesn't ask for too much inlining). The Lua-JIT is written by a very smart person, maybe a kind of genius that has recently given ideas to designers of V8 and Firefox JS Engine. The LuaJIT uses very well SSE registers and being a JIT it has more runtime information about the code, so it is able to optimize it better. It unrolls dynamically, inlines dynamic things, etc. DMD doesn't perform enough optimizations. Keep in mind that the main purpose of DMD is now to finish implementing D (and sometimes to find what to implement! Because there are some unfinished corners in D design). Performance tuning is mostly for later. ------------------------- Walter Bright: >If it's not clear, I'd like to emphasize that these are compiler issues, not D language issues.< Surely Lua looks like a far worse language regarding optimization opportunities. But people around here (like you) must start to realize that JIT compilation is not what it used to be. Today the JIT compilation done by the JavaVM is able to perform de-virtualization, dynamic loop unrolling, inlining across "compilation units", and some other optimizations that despite are "not language issues" are not done or not done enough by static compilers like LDC, GCC, DMD. The result is that SciMark2 benchmark is about as fast in Java and C, and for some sub-benchmarks it is faster :-) Bye, bearophile |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw Wrote:
> Another may be simply that there is a lot
> more going on behind the scenes than what you give credit for in D.
What else does it do? I want to add it to the Lua version.
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | Andreas Mayer wrote:
> I shouldn't use D now? How long until it is ready?
It depends on what you want to do. A lot of people are successfully using D.
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to loser | loser:
> So is it justified enough to throw my W's incompetence card on the table at this point? How else it is possible that a simple scripting language with simple JIT optimization heuristics can outperform a performance oriented systems programming language.
It's not wise to prematurely improve the inlining a lot right now when there is no 64 bit version yet, and there are holes or missing parts in several corners of the language. Performance tuning has a lower priority.
Designing a good language and performance-tuning its implementation ask for different skills. The very good author of Lua-JIT is probably not good at designing a C++-class language :-) What's needed now is to smooth the rough corners of the D language, not to squeeze out every bit of performance.
Bye,
bearophile
|
Copyright © 1999-2021 by the D Language Foundation