Jump to page: 1 2
Thread overview
D performance guideline (request)
Apr 12, 2008
bobef
Apr 12, 2008
Sean Kelly
Apr 12, 2008
janderson
Apr 12, 2008
janderson
Apr 12, 2008
downs
Apr 12, 2008
Walter Bright
Apr 13, 2008
Robert Fraser
Apr 13, 2008
Robert Fraser
Apr 13, 2008
Georg Wrede
Apr 13, 2008
Robert Fraser
Apr 13, 2008
Craig Black
Apr 13, 2008
downs
Apr 13, 2008
Georg Wrede
Apr 25, 2008
Bruno Medeiros
April 12, 2008
Hi,

I would like to request a guideline how to squeeze every last nanosecond out of D. I am writing this scripting language - FlowerScript, and I really need to take every spare bit... But the problem is... I am a noob. D performs really well, of course, but lame people like me could make even D slow :) I know here are a lot of people that are very good programmers with broad knowledge how things work and a lot of experiene, so I would like to ask you to share your experience - what to do when every tick matters. For example one could use pointers for indexing arrays, to bypass the bounds checking. Yes, it is not that pretty but gives a big boost. I am trying to experiment with things, but sometimes it is really strange. For example I have few ifs. I rearrange them a bit. Sometimes I move one if inside other and it becomes slower, sometimes I simply add curly braces and it becomes faster. Or I wrap a block of code in a try {} and it becomes faster. Why is that? If I wrap every line of code in try, will I make it fly ? :D

Thanks,
bobef
April 12, 2008
== Quote from bobef (bobef@abv-nospam.bg)'s article
> Hi,
> I would like to request a guideline how to squeeze every last nanosecond out of D. I am writing this
scripting language - FlowerScript, and I really need to take every spare bit... But the problem is... I am a
noob. D performs really well, of course, but lame people like me could make even D slow :) I know here
are a lot of people that are very good programmers with broad knowledge how things work and a lot of
experiene, so I would like to ask you to share your experience - what to do when every tick matters. For
example one could use pointers for indexing arrays, to bypass the bounds checking. Yes, it is not that
pretty but gives a big boost. I am trying to experiment with things, but sometimes it is really strange.
For example I have few ifs. I rearrange them a bit. Sometimes I move one if inside other and it becomes
slower, sometimes I simply add curly braces and it becomes faster. Or I wrap a block of code in a try {}
and it becomes faster. Why is that? If I wrap every line of code in try, will I make it fly ? :D

The code generator used by DMD is really weird.  Similar things cropped up during performance
tests of the Tango XML parser--sometimes just rearranging a set of declarations would have a
considerable effect on performance.  I think a lot of it had to to with alignment and such.  There's
generally no point in this level of optimization, because each compiler will react differently to the
changes.  GDC, for example, tended to react in an opposite manner of DMD, though it was more
consistent overall.

As for performance tuning itself--don't worry about pointers vs. indexing.  Using the -release flag will turn off bounds checking anyway.  Typically, the most notable effect you'll see is from the overarching algorithms you use.  However, if you're really counting cycles then you can do things like minimize branches by placing the condition you expect to be true most often inside an "if" statement.  ie.

    void func() {
        if( x ) {
            // A
            return;
        }
        // B
    }

With the above, A will be executed without any jumps, while execution will have to jump to B to continue.  If you expect x to almost always be true and the performance of func() is crucial, consider writing it like the above instead of:

    void func() {
        if( !x ) {
            // B
            return;
        }
        // A
    }

However, in general I suggest favoring readability over a nanosecond or two of performance gain.  There are other suggestions as well, but in general the 80-20 rule dictates that such things should really only occur once your app is done and working, and then be tuned from profiler results.


Sean
April 12, 2008
bobef wrote:
> Hi,
> 
> I would like to request a guideline how to squeeze every last nanosecond out of D. I am writing this scripting language - FlowerScript, and I really need to take every spare bit... But the problem is... I am a noob. D performs really well, of course, but lame people like me could make even D slow :) I know here are a lot of people that are very good programmers with broad knowledge how things work and a lot of experiene, so I would like to ask you to share your experience - what to do when every tick matters. For example one could use pointers for indexing arrays, to bypass the bounds checking. Yes, it is not that pretty but gives a big boost. I am trying to experiment with things, but sometimes it is really strange. For example I have few ifs. I rearrange them a bit. Sometimes I move one if inside other and it becomes slower, sometimes I simply add curly braces and it becomes faster. Or I wrap a block of code in a try {} and it becomes faster. Why is that? If I wrap every
line of code in try, will I make it fly ? :D
> 
> Thanks,
> bobef


I imagine you understand this however I'll point it out anyway.  The best optimizations programmers in the world for any given language know that only a small amount of the code actually matters to performance. They always profile to see where that is.  If they tried to optimize the entire program they would be spending less time on the parts that really matter.

Bound checking can be turned off at release time so I wouldn't worry about that.

So having said that, beware of using functions like getchar() and read blocks of data at a time.  Think about high level optimizations first, these will normally give you the biggest bang. For instance the fastest code you can run is the code that is never run.

Try to minimize memory allocation by allocating in bigger blocks.  For instance with arrays you can do this to reserve space.

array.length = 100
array.length = 0
array now has 100 spaces reserved

Don't manually flush the GC until you have idle time.  Its a very slow function.


When you do find a function that really is causing trouble here are a few tips:

1) Reduce branching.  Branching is bad for these reasons:
  - It can cause a cache miss
  - The CPU prefetcher and program optimizer will stall


2) Avoid using linked lists.  If you do absolutely need one allocate each node from an array.  99% of the time you will be spending more time traversing the array then insertion and deletion.  Inserting/deleting into an array can be faster then a linked list if its at the end (and the memory is already reserved).  The exception is generally when the link list is a memory allocator and that's its only responsibility.

3) Use foreach instead of for.  The compiler can sometimes optimize these better.

4) Try to do more stuff at compile time using templates and compile time functions however don't fall victim to making the programs memory footprint so big it actually runs slower.

5) In your innermost loops look at the generated asm code to figure out what it is doing (last resort).

6) Avoid using virtual functions in your innermost loops if they cause a problem.  Note that if the virtual function body contains more then 10 instructions you *may* be wasting your time removing the virtual.  The amount of time processing those 10 instructions is can be much larger then the time cost of the virtual function.  Although like branching they can cause a stall. The reasons
  - Compiler can't inline virtual's
  - Cachemiss is almost certain because the Vtable is located far away from the
  - The CPU prefetcher and program optimizer will stall
  - Extra pointer lookups.

7) Try to push things around as blocks of memory rather then individual pieces.

8) Profile, Profile, Profile and let use know your results.

I hope that was helpful.
-Joel
April 12, 2008
Here's another one.  Plug in nedmalloc.  There's a D port somewhere...

http://www.nedprod.com/programs/portable/nedmalloc/index.html

April 12, 2008
There's plenty to be said, and surely that will be said.  When it comes to performance, I'm more of a server-oriented guy and I worry more about concurrency and performance per request... I'm sure you'll get more relevant help from others here.

I just want to suggest that, when checking performance, you make sure you're dealing with reality.  A few things that can help with that:

1. Do any file io before your benching... for example, load a file entirely into memory, then bench/profile parsing it, executing it, etc.

2. When benchmarking, compile with -O -inline -release.  You don't really care if bounds checking, lack of inlining, etc. are slowing you down.  You care about reality.

3. In general, it's hard to get anywhere without concentrating on a specific area of code.  Find something that is slow (through profiling) and fix it.  Worry about the rest later.

4. If you destroy maintainability by optimization, you'll just pay later.  Consider how heavy the price is before you make such a move.

-[Unknown]


bobef wrote:
> Hi,
> 
> I would like to request a guideline how to squeeze every last nanosecond out of D. I am writing this scripting language - FlowerScript, and I really need to take every spare bit... But the problem is... I am a noob. D performs really well, of course, but lame people like me could make even D slow :) I know here are a lot of people that are very good programmers with broad knowledge how things work and a lot of experiene, so I would like to ask you to share your experience - what to do when every tick matters. For example one could use pointers for indexing arrays, to bypass the bounds checking. Yes, it is not that pretty but gives a big boost. I am trying to experiment with things, but sometimes it is really strange. For example I have few ifs. I rearrange them a bit. Sometimes I move one if inside other and it becomes slower, sometimes I simply add curly braces and it becomes faster. Or I wrap a block of code in a try {} and it becomes faster. Why is that? If I wrap every
line of code in try, will I make it fly ? :D
> 
> Thanks,
> bobef
April 12, 2008
I can only think of a few, sadly.

Use ~ as little as possible.
Try to determine the whole size beforehand, _then_ allocate an array, _then_ copy into it. Remember: Allocations are slow. Shit slow.

If you're creating multithreaded code that needs synchronized access, try to see if you can use TLS instead of synchronization.

Use the final keyword :)

Try both GDC and DMD. There might be significant performance differences.

 --downs
April 12, 2008
bobef wrote:
> I would like to request a guideline how to squeeze every last nanosecond out of D.

The first thing I'd suggest is get very comfortable with D's built-in profiler. It'll be invaluable in helping focus your energies on exactly where it matters.
April 13, 2008
bobef wrote:
> Hi,
> 
> [...]
> 
> Thanks,
> bobef

I'm no expert but (as others have pointed out):

- Algorithmic efficiency is more important than saving cycles.
- Memory/cache efficiency is more important than saving cycles
- Try using memory-efficient data structures like Judy (
  http://judy.sourceforge.net/ ).
- Compile with both DMD and GDC
- Reduce heap allocations as much as possible. For example, when
  possible, use scope for your classes (since this means they will be
  allocated on the stack).

So, basically, what everyone else said.
April 13, 2008
In case it wasn't already mentioned, use Tango instead of Phobos.  It has a faster GC.

-Craig 

April 13, 2008
Robert Fraser wrote:
> - Try using memory-efficient data structures like Judy (
>   http://judy.sourceforge.net/ ).

Or maybe not:

http://www.nothings.org/computer/judy/
« First   ‹ Prev
1 2