December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 12/23/10 6:57 AM, bearophile wrote:
> Simen kjaeraas:
>
>> With floating-point numbers, the above solution does not always work.
>
> The type is known at compile time, so you can split the algorithm in two with a "static if", and do something else if it's an integral type.
That's what the code currently does.
Andrei
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to spir | On 12/23/10 7:04 AM, spir wrote:
> On Wed, 22 Dec 2010 22:14:34 -0600
> Andrei Alexandrescu<SeeWebsiteForEmail@erdani.org> wrote:
>
>> I then replaced iota's implementation with a simpler one that's a
>> forward range. Then the performance became exactly the same as for the
>> simple loop.
>
>
> After having watched Iota's very general implementation, I tried the same change, precisely. Actually, with an even simpler range requiring a single element type for (first,last,step). For any reason, this alternative is slightly slower by me than using Iota (don't cry watching absolute times, my computer is old and slow ;-). Sample code below, typical results are:
>
> 1.1 3.3 5.5 7.7
> Interval time: 1149
> Iota time: 1066
>
> Note: adding an assert to ensure front or popfront is not wrongly called past the end adds ~ 20% time.
I cut my losses reading here :o). No performance test is meaningful without all optimizations turned on.
Andrei
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | On 12/22/10 4:04 PM, Andreas Mayer wrote: > To see what performance advantage D would give me over using a scripting language, I made a small benchmark. It consists of this code: > >> auto L = iota(0.0, 10000000.0); >> auto L2 = map!"a / 2"(L); >> auto L3 = map!"a + 2"(L2); >> auto V = reduce!"a + b"(L3); > > It runs in 281 ms on my computer. > > The same code in Lua (using LuaJIT) runs in 23 ms. > > That's about 10 times faster. I would have expected D to be faster. Did I do something wrong? > > The first Lua version uses a simplified design. I thought maybe that is unfair to ranges, which are more complicated. You could argue ranges have more features and do more work. To make it fair, I made a second Lua version of the above benchmark that emulates ranges. It is still 29 ms fast. > > The full D version is here: http://pastebin.com/R5AGHyPx > The Lua version: http://pastebin.com/Sa7rp6uz > Lua version that emulates ranges: http://pastebin.com/eAKMSWyr > > Could someone help me solving this mystery? > > Or is D, unlike I thought, not suitable for high performance computing? What should I do? I wrote a new test bench and got 41 ms for the baseline and 220 ms for the code based on map and iota. (Surprisingly, the extra work didn't affect the run time, which suggests the loop is dominated by the counter increment and test.) Then I took out the cache in map and got 136 ms. Finally, I replaced the use of iota with iota2 and got performance equal to that of handwritten code. Code below. I decided to check in the map cache removal. We discussed it a fair amount among Phobos devs. I have no doubts caching might help in certain cases, but it does lead to surprising performance loss for simple cases like the one tested here. See http://www.dsource.org/projects/phobos/changeset/2231 If the other Phobos folks approve, I'll also specialize iota for floating point numbers to be a forward range and defer the decision on defining a "randomAccessIota" for floating point numbers to later. That would complete the library improvements pointed to by this test, leaving further optimization to compiler improvements. Thanks Andreas for starting this. Andrei import std.algorithm; import std.stdio; import std.range; import std.traits; struct Iota2(N, S) if (isFloatingPoint!N && isNumeric!S) { private N start, end, current; private S step; this(N start, N end, S step) { this.start = start; this.end = end; this.step = step; current = start; } /// Range primitives @property bool empty() const { return current >= end; } /// Ditto @property N front() { return current; } /// Ditto alias front moveFront; /// Ditto void popFront() { assert(!empty); current += step; } @property Iota2 save() { return this; } } auto iota2(B, E, S)(B begin, E end, S step) if (is(typeof((E.init - B.init) + 1 * S.init))) { return Iota2!(CommonType!(Unqual!B, Unqual!E), S)(begin, end, step); } void main(string args[]) { double result; auto limit = 10_000_000.0; if (args.length > 1) { writeln("iota"); auto L = iota2(0.0, limit, 1.0); auto L2 = map!"a / 2"(L); auto L3 = map!"a + 2"(L2); result = reduce!"a + b"(L3); } else { writeln("baseline"); result = 0.0; for (double i = 0; i != limit; ++i) { result += (i / 2) + 2; } } writefln("%f", result); } |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > I decided to check in the map cache removal. We discussed it a fair amount among Phobos devs. I have no doubts caching might help in certain cases, but it does lead to surprising performance loss for simple cases like the one tested here. See http://www.dsource.org/projects/phobos/changeset/2231 It seems to me that having a Cached range might be a better, more general solution in any case. -- Simen |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > http://www.dsource.org/projects/phobos/changeset/2231 BTW, shouldn't range constructors call .save for forward ranges? This one certainly doesn't. -- Simen |
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Simen kjaeraas | On 12/23/10 10:09 AM, Simen kjaeraas wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>
>> I decided to check in the map cache removal. We discussed it a fair
>> amount among Phobos devs. I have no doubts caching might help in
>> certain cases, but it does lead to surprising performance loss for
>> simple cases like the one tested here. See
>> http://www.dsource.org/projects/phobos/changeset/2231
>
> It seems to me that having a Cached range might be a better, more general
> solution in any case.
Agreed.
Andrei
|
December 23, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Simen kjaeraas | On 12/23/10 10:14 AM, Simen kjaeraas wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>
>> http://www.dsource.org/projects/phobos/changeset/2231
>
> BTW, shouldn't range constructors call .save for forward ranges? This
> one certainly doesn't.
Currently higher-order ranges assume that the range passed-in is good to take ownership of. A range or algorithm should call save only in case extra copies need to be created.
Andrei
|
December 24, 2010 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu Attachments:
| I hope that in the future more implementations in D can be compared for performance against their equivalent Lua translations. It seems that LuaJIT is a super speedy dynamic language, and it is specifically designed to break into the performance ranges of optimized static languages, which makes it a formidable competitor. |
June 02, 2013 Re: Why is D slower than LuaJIT? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andreas Mayer | Am Wed, 22 Dec 2010 17:04:21 -0500 schrieb Andreas Mayer <spam@bacon.eggs>: > To see what performance advantage D would give me over using a scripting language, I made a small benchmark. It consists of this code: > > > auto L = iota(0.0, 10000000.0); > > auto L2 = map!"a / 2"(L); > > auto L3 = map!"a + 2"(L2); > > auto V = reduce!"a + b"(L3); > > It runs in 281 ms on my computer. > > The same code in Lua (using LuaJIT) runs in 23 ms. > > That's about 10 times faster. I would have expected D to be faster. Did I do something wrong? Actually "D" is 1.5 times faster on my computer*: LDC** ======== 18 ms GDC*** =========== 25 ms LuaJIT 2.0.0 b7 ============ 27 ms DMD ========================================= 93 ms All compilers based on DMD 2.062 front-end. * 64-bit Linux, 2.0 Ghz Mobile Core 2 Duo. ** based on LLVM 3.2 *** based on GCC 4.7.2 I modified the iota template to more closely reflect the one used in the original Lua code: --------------------- import std.algorithm; import std.stdio; import std.traits; auto iota(B, E)(B begin, E end) if (isFloatingPoint!(CommonType!(B, E))) { alias CommonType!(B, E) Value; static struct Result { private Value start, end; @property bool empty() const { return start >= end; } @property Value front() const { return start; } void popFront() { start++; } } return Result(begin, end); } void main() { auto L = iota(0.0, 10000000.0), L2 = map!(a => a / 2)(L), L3 = map!(a => a + 2)(L2), V = reduce!((a, b) => a + b)(L3); writefln("%f", V); } -- Marco |
Copyright © 1999-2021 by the D Language Foundation