On 13 April 2012 17:25, Kagamin <spam@here.lot> wrote:
once you prefetched the function, it will remain in the icache and be reused from there the next time.

All depends how much you love object orientation. If you follow the C++ book and make loads of classes for everything, you'll thrash the hell out of it. If you only have a couple of different object, maybe they'll coexist in the icache.
The GC is a bit of a special case though because it runs in a tight loop. That said, the pipeline hazards still exist regardless of the state of icache.
Conventional virtuals are worse, since during the process of executing regular code, there's not usually such a tight loop pattern.
(note: I was answering the prior question about virtual functions in general, not as applied to the specific use case of a GC scan)

The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch target table, they are alleged to have improved prediction, but I haven't checked.
Prior chips (standard Arm9 and down, note: most non-top-end androids fall in this category, and all games consoles with arms in them) don't technically have branch prediction at all. ARM has conditional execution bits on instructions, so it can filter opcodes based on the state of the flags register. This is a cool design for binary branching, or performing 'select' operations, but it can't do anything to help an indirect jump.

Point is, the GC is the most fundamental performance hazard to D, and I just think it's important to make sure the design is such that it is possible to write a GC loop which can do its job with generated data tables if possible, instead of requiring generated marker functions.
It would seem to me that carefully crafted tables of data could technically perform the same function as marker functions, but without any function calls... and the performance characteristics may be better/worse for different architectures.