Parallel Rogue-like benchmark (page 7)

Okay, I've updated it to 83. The other entries didn't include comments, so I didn't bother checking to remove comments from the linecount. On Friday, 8 November 2013 at 13:57:31 UTC, bearophile wrote: > Your site counts 90 SLOC for the D entry, that comes from 83 lines of code plus 7 comment lines. I think you shouldn't count the lines of comments, from all the entries. > > If you want to count the comments too, then if you want I'll submit a 83 lines long D version without comments for your site, as in the Scala entry, for a little more fair comparison. > > The Scala entry has lines of code like: > > case (numLvls, threadNum) => {val rnd = new Xorshift32(rand.nextInt); if(!silent) println(s"Thread number $threadNum has seed " + rnd.seed); numLvls -> rnd} > > Bye, > bearophile

logicchains: > Okay, I've updated it to 83. The other entries didn't include comments, so I didn't bother checking to remove comments from the linecount. Thank you :-) I think few comments help the code look more natural :-) Bye, bearophile

On 08/11/13 04:13, logicchains wrote: > Benchmark author here. I left the ldmd2 entry there to represent the performance > of the D implementation from the time of the benchmark, to highlight that the > current D implementation is much newer than the others, and that there have been > no attempts to optimise the C and C++ versions similarly to how the latest D > version was optimised. If you feel it creates needless confusion I can remove > it, however, or put a note next to it stating the above. Seems fine to me to compare two different code implementations, but displaying things as you have suggests this is a compiler difference. For proper comparison, you should probably compile both codes with ldc2 and the same optimizations, and see how they compare.

On 07/11/13 14:12, bearophile wrote: > Very nice. I have made a more idiomatic version (in D global constants don't > need to be IN_UPPERCASE), I have added few missing immutable annotations, and > given the benchmark also counts line numbers, I have made the code a little more > compact (particularly the struct definitions, but I have moves the then branch > of some if on a new line, increasing the line count to make the code a little > more readable, so it's not a unnaturally compact D code): > > http://dpaste.dzfl.pl/d37ba995 How does the speed of that code change if instead of the Random struct, you use std.random.Xorshift32 ... ?

Joseph Rushton Wakeling: > How does the speed of that code change if instead of the Random struct, you use std.random.Xorshift32 ... ? That change of yours was well studied in the first blog post (the serial one) and the performance loss of using Xorshift32 was significant, even with LDC2. I don't know why. Sometimes even moving things (like the Xorshift struct) in another module changes the code performance. Performance optimization is a bit of an art still. In theory such cases should be studied, and the performance loss should be understood and fixed :-) Bye, bearophile

09-Nov-2013 16:23, bearophile пишет: > Joseph Rushton Wakeling: > >> How does the speed of that code change if instead of the Random >> struct, you use std.random.Xorshift32 ... ? > > That change of yours was well studied in the first blog post (the serial > one) and the performance loss of using Xorshift32 was significant, even > with LDC2. I don't know why. > Lack of inlining most likely. https://d.puremagic.com/issues/show_bug.cgi?id=10985 Since Xorshift32 is fully specified there is nothing to instantiate in the calling code, hence it may just be linked from Phobos. Anyhow studying the disassembly of the binary will get the answer. -- Dmitry Olshansky

November 10, 2013

Re: Parallel Rogue-like benchmark

Posted by logicchains
in reply to bearophile

Permalink

logicchains

Posted in reply to bearophile

Permalink

I imagine (although I haven't checked) that std.random.Xorshift32 uses the algorithm:

        seed ^= seed << 13;
        seed ^= seed >> 17;
        seed ^= seed << 5;
        return seed;

while the levgen benchmarks use the algorithm:

        seed += seed;
        seed ^= (seed > int.max) ? 0x88888eee : 1;
        return seed;

The former produces better random numbers, but it's possible that it may be slower.

Lack of inlining would definitely make a huge difference. I wrote an assembly function for Go that was an exact copy of the assembly generated by the LLVM at O3, and it was no faster than the native Go function, even though the assembly was much better (assembly functions aren't inlined in Go). Changing the assembly function to generate and return two random numbers, however, increased the overall program speed by around 10%, highlighting the overhead of function calls and lack of inlining.

On Saturday, 9 November 2013 at 12:23:25 UTC, bearophile wrote:
> Joseph Rushton Wakeling:
>
>> How does the speed of that code change if instead of the Random struct, you use std.random.Xorshift32 ... ?
>
> That change of yours was well studied in the first blog post (the serial one) and the performance loss of using Xorshift32 was significant, even with LDC2. I don't know why.

On 10/11/13 05:31, logicchains wrote: > The former produces better random numbers, but it's possible that it may be slower. Ahh, makes sense. Where did you get the particular RNG you used? I don't recognize it.

I got it from here: https://code.google.com/p/go/source/detail?r=3bf9ffdcca1f9585f28dcf0e4ca1c75ea29e18be. Apparently it's a linear feedback shift register, and was used in Newsqueak. On Sunday, 10 November 2013 at 09:42:30 UTC, Joseph Rushton Wakeling wrote: > On 10/11/13 05:31, logicchains wrote: >> The former produces better random numbers, but it's possible that it may be slower. > > Ahh, makes sense. Where did you get the particular RNG you used? I don't recognize it.

Forums