December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | On 12/22/10 11:06 PM, Andreas Mayer wrote: > Andrei Alexandrescu Wrote: > >> Andreas, any chance you could run this on your machine and compare it >> with Lua? (I don't have Lua installed.) Thanks! > > Your version: 40 ms (iota and baseline give the same timings) > LuaJIT with map calls removed: 21 ms > > Interesting results. Cool, thanks. I also tested against this C++ baseline: #include <stdio.h> int main() { const double limit = 10000000.0; double result = 0.0; for (double i = 0; i != limit; ++i) { result += i; } printf("%f\n", result); } The baseline (compiled with -O3) runs in 21 ms on my machine, which means (if my and Andreas' machines are similar in performance) that Lua has essentially native performance for this loop and D has an issue in code generation that makes it 2x slower. I think this could be filed as a performance bug for dmd. I'm thinking what to do about iota, which has good features but exacts too much cost on tight loop performance. One solution would be to define iota to be the simple, forward range that I defined as Iota2 in my previous post. Then, we need a different name for the full-fledged iota (random-access, has known length, iterates through the same numbers forward and backward etc). Ideas? Andrei |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wed, 22 Dec 2010, Andrei Alexandrescu wrote:
> On 12/22/10 11:06 PM, Andreas Mayer wrote:
> > Andrei Alexandrescu Wrote:
> >
> > > Andreas, any chance you could run this on your machine and compare it with Lua? (I don't have Lua installed.) Thanks!
> >
> > Your version: 40 ms (iota and baseline give the same timings)
> > LuaJIT with map calls removed: 21 ms
> >
> > Interesting results.
>
> Cool, thanks. I also tested against this C++ baseline:
>
> #include <stdio.h>
>
> int main() {
> const double limit = 10000000.0;
> double result = 0.0;
> for (double i = 0; i != limit; ++i) {
> result += i;
> }
> printf("%f\n", result);
> }
>
> The baseline (compiled with -O3) runs in 21 ms on my machine, which means (if my and Andreas' machines are similar in performance) that Lua has essentially native performance for this loop and D has an issue in code generation that makes it 2x slower. I think this could be filed as a performance bug for dmd.
>
> I'm thinking what to do about iota, which has good features but exacts too much cost on tight loop performance. One solution would be to define iota to be the simple, forward range that I defined as Iota2 in my previous post. Then, we need a different name for the full-fledged iota (random-access, has known length, iterates through the same numbers forward and backward etc). Ideas?
>
>
> Andrei
Since the timing code isn't here, I'm assuming you guys are doing the testing around the whole app. While that might be interesting, it's hiding an awfully large and important difference, application startup time.
C has very little, D quite a bit more, and I don't know what Lua looks like there. If the goal is to test this math code, you'll need to separate the two.
At this point, I highly suspect you're really measuring the runtime costs.
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | Andreas Mayer wrote: > Walter Bright Wrote: > >> I notice you are using doubles in D. dmd currently uses the x87 to evaluate doubles, and on some processors the x87 is slow relative to using the XMM instructions. Also, dmd's back end doesn't align the doubles on 16 byte boundaries, which can also slow down the floating point on some processors. > > Using long instead of double, it is still slower than LuaJIT (223 ms on my machine). Even with int it still takes 101 ms and is at least 3x slower than LuaJIT. > >> Both of these code gen issues with dmd are well known, and I'd like to solve them after we address higher priority issues. >> >> If it's not clear, I'd like to emphasize that these are compiler issues, not D language issues. > > I shouldn't use D now? How long until it is ready? You may want to explore the great language shootout before drawing that conclusion: http://shootout.alioth.debian.org/ LuaJit ranks high there, but still a bit below the fastest compiled languages (and the fastest java). D is not included anymore, but it once was and these benchmarks can still be found: http://shootout.alioth.debian.org/debian/performance.php LuaJit performance is impressive, far above any 'scripting' language. Just look at some numbers in the shootout comparing it to ruby or python. |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lutger Blijdestijn | I meant to link this, it includes all benchmarks and ranks gdc at 5th place and dmd at 8 (from 2008): http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=all |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei:
> I'm thinking what to do about iota, which has good features but exacts too much cost on tight loop performance. One solution would be to define iota to be the simple, forward range that I defined as Iota2 in my previous post. Then, we need a different name for the full-fledged iota (random-access, has known length, iterates through the same numbers forward and backward etc). Ideas?
Is improving the compiler instead an option?
Bye,
bearophile
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | On 12/22/2010 11:04 PM, Andreas Mayer wrote:
> To see what performance advantage D would give me over using a scripting language, I made a small benchmark. It consists of this code:
>
>> auto L = iota(0.0, 10000000.0);
>> auto L2 = map!"a / 2"(L);
>> auto L3 = map!"a + 2"(L2);
>> auto V = reduce!"a + b"(L3);
>
> It runs in 281 ms on my computer.
>
> The same code in Lua (using LuaJIT) runs in 23 ms.
>
> That's about 10 times faster. I would have expected D to be faster. Did I do something wrong?
>
> The first Lua version uses a simplified design. I thought maybe that is unfair to ranges, which are more complicated. You could argue ranges have more features and do more work. To make it fair, I made a second Lua version of the above benchmark that emulates ranges. It is still 29 ms fast.
>
> The full D version is here: http://pastebin.com/R5AGHyPx
> The Lua version: http://pastebin.com/Sa7rp6uz
> Lua version that emulates ranges: http://pastebin.com/eAKMSWyr
>
> Could someone help me solving this mystery?
>
> Or is D, unlike I thought, not suitable for high performance computing? What should I do?
>
I changed the code to this:
auto L = iota(0, 10000000);
auto L2 = map!"a / 2.0"(L);
auto L3 = map!"a + 2"(L2);
auto V = reduce!"a + b"(L3);
and ripped the caching out of std.algorithm.map. :-)
This made it go from about 1.4 seconds to about 0.4 seconds on my machine. Note that I did no rigorous or scientific testing.
Also, if you really really need the performance you can change it all to lower level code, should you want to.
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wed, 22 Dec 2010 20:16:45 -0600 Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > Thanks for posting the numbers. That's a long time, particularly considering that the two map instances don't do anything. So the bulk of the computation is: > > auto L = iota(0.0, 10000000.0); > auto V = reduce!"a + b"(L3); > > There is one inherent problem that affects the speed of iota: in iota, the value at position i is computed as 0.0 + i * step, where step is computed from the limits. That's one addition and a multiplication for each pass through iota. Given that the actual workload of the loop is only one addition, we are doing a lot more work. I suspect that that's the main issue there. > > The reason for which iota does that instead of the simpler increment is that iota must iterate the same values forward and backward. Using ++ may interact with floating-point vagaries, so the code is currently conservative. There is a point I don't understand here: Iota is a range-struct template, with void popFront() { current += step; } So, how does the computation of an arbitrary element at a given index affect looping speed? For mappings (and any kind of traversal, indeed), there should be an addition per element. Else, why define a range interface at all? What do I miss? Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to spir | spir <denis.spir@gmail.com> wrote: > There is a point I don't understand here: Iota is a range-struct template, with > void popFront() > { > current += step; > } > So, how does the computation of an arbitrary element at a given index affect looping speed? For mappings (and any kind of traversal, indeed), there should be an addition per element. Else, why define a range interface at all? What do I miss? With floating-point numbers, the above solution does not always work. If step == 1, increasing current by step amount will stop working at some point, at which the range will then grind to a halt. If instead one multiplies step by the current number of steps taken, and adds to the origin, this problem disappears. As an example of when this problem shows up, try this code: float f = 16_777_216; auto f2 = f + 1; assert( f == f2 ); The assert passes. -- Simen |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Simen kjaeraas | Simen kjaeraas:
> With floating-point numbers, the above solution does not always work.
The type is known at compile time, so you can split the algorithm in two with a "static if", and do something else if it's an integral type.
Bye,
bearophile
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wed, 22 Dec 2010 22:14:34 -0600 Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > I then replaced iota's implementation with a simpler one that's a forward range. Then the performance became exactly the same as for the simple loop. After having watched Iota's very general implementation, I tried the same change, precisely. Actually, with an even simpler range requiring a single element type for (first,last,step). For any reason, this alternative is slightly slower by me than using Iota (don't cry watching absolute times, my computer is old and slow ;-). Sample code below, typical results are: 1.1 3.3 5.5 7.7 Interval time: 1149 Iota time: 1066 Note: adding an assert to ensure front or popfront is not wrongly called past the end adds ~ 20% time. Note: I think this demonstates that using Iota does not perform undue computations (multiplication to get Nth element with multiplication + addition), or do I misinterpret? Anyway, what is wrong in my code? What doesn't it perform better? import std.algorithm : map, filter, reduce; import std.range : iota; struct Interval (T) { alias T Element; Element first, last, step; private Element element; this (Element first, Element last, Element step=1) { this.first = first; this.last = last; this.step = step; this.element = first; } @property void popFront () { this.element += this.step; } @property bool empty () { return (this.element > this.last); } @property Element front () { return this.element; } } void main () { auto nums = Interval!float(1.1,8.8, 2.2); foreach(n ; nums) writef("%s ", n); writeln(); auto t1 = time(); auto nums1 = Interval!int(0, 10_000_000); auto halves1 = map!"a/2"(nums1); auto incs1 = map!"a+2"(halves1); auto result1 = reduce!"a+b"(incs1); writefln("Interval time: %s", time() - t1); auto t2 = time(); auto nums2 = iota(0, 10_000_000); auto halves2 = map!"a/2"(nums2); auto incs2 = map!"a+2"(halves2); auto result2 = reduce!"a+b"(incs2); writefln("Iota time: %s", time() - t2); } Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com |
Copyright © 1999-2021 by the D Language Foundation