Jump to page: 1 2 3
Thread overview
Feature Request: nontrivial functions and vtable optimizations
Aug 12, 2008
downs
Aug 12, 2008
davidl
Aug 12, 2008
Craig Black
Aug 12, 2008
davidl
Aug 12, 2008
Craig Black
Aug 13, 2008
Don
Aug 15, 2008
Craig Black
Aug 13, 2008
Jb
Aug 15, 2008
Craig Black
Aug 15, 2008
Jb
Aug 15, 2008
Craig Black
Aug 13, 2008
Christopher Wright
Aug 15, 2008
Craig Black
I was wrong.
Aug 14, 2008
downs
Aug 14, 2008
superdan
Aug 14, 2008
downs
Aug 14, 2008
superdan
Aug 14, 2008
downs
Aug 14, 2008
Jb
Aug 14, 2008
superdan
Aug 14, 2008
Jb
Aug 14, 2008
Yigal Chripun
Aug 14, 2008
superdan
Aug 15, 2008
Jb
Aug 14, 2008
Jb
August 12, 2008
The overhead of vtable calls is well known.

For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls.

When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers.

So, for instance, the equivalent D code would look like so:

class A { ... } class B : A { ... } class C : A { ... }

A a = genSomeObj();
// a.test();
if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final
else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito
else a.test(); // fallback

Ideas?
August 12, 2008
在 Tue, 12 Aug 2008 16:45:08 +0800,downs <default_357-line@yahoo.de> 写道:

> The overhead of vtable calls is well known.
>
> For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls.
>
> When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers.
>
> So, for instance, the equivalent D code would look like so:
>
> class A { ... } class B : A { ... } class C : A { ... }
>
> A a = genSomeObj();
> // a.test();
> if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final
> else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito
> else a.test(); // fallback
>
> Ideas?

It's better to show your benchmark examples, there people can observe and discuss

-- 
使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
August 12, 2008
"downs" <default_357-line@yahoo.de> wrote in message news:g7ribh$1vii$1@digitalmars.com...
> The overhead of vtable calls is well known.
>
> For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls.
>
> When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers.
>
> So, for instance, the equivalent D code would look like so:
>
> class A { ... } class B : A { ... } class C : A { ... }
>
> A a = genSomeObj();
> // a.test();
> if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if
> final
> else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito
> else a.test(); // fallback
>
> Ideas?

You are right.  Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction.  I don't understand why this is so hard for some programmers to accept.

See Microsoft Visual C++ profile guided optimizations (PGO) "Virtual Call Speculation". Using PGO, the most commonly called virtual functions are optimized so that they do not require a function pointer invokation.

Rather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code.  This would indicate to the compiler what virtual functions will be called the most.

-Craig


August 12, 2008
在 Tue, 12 Aug 2008 22:46:21 +0800,Craig Black <cblack@ara.com> 写道:

> "downs" <default_357-line@yahoo.de> wrote in message
> news:g7ribh$1vii$1@digitalmars.com...
>> The overhead of vtable calls is well known.
>>
>> For this reason, it may be worthwhile to include an optimization that
>> essentially checks if a certain class is only inherited from a few times,
>> like, <= three, in the visible program, and if so, replace vtable calls
>> with a tree of classinfo pointer comparisons and direct function calls.
>>
>> When I did this manually in some hobbyist path tracing code, I gained
>> significant speed-ups, even though I didn't collect any hard numbers.
>>
>> So, for instance, the equivalent D code would look like so:
>>
>> class A { ... } class B : A { ... } class C : A { ... }
>>
>> A a = genSomeObj();
>> // a.test();
>> if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if
>> final
>> else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito
>> else a.test(); // fallback
>>
>> Ideas?
>
> You are right.  Function pointer invokation tends to slow down processing
> because it doesn't cooperate well with pipelining and branch prediction.  I
> don't understand why this is so hard for some programmers to accept.
>
> See Microsoft Visual C++ profile guided optimizations (PGO) "Virtual Call
> Speculation". Using PGO, the most commonly called virtual functions are
> optimized so that they do not require a function pointer invokation.
>
> Rather than relying on PGO, I would prefer the ability to assign a
> priorities to virtual functions in the code.  This would indicate to the
> compiler what virtual functions will be called the most.
>
> -Craig
>
>

But it's so anti-cleanness hack.

-- 
使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
August 12, 2008
>> Rather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code.  This would indicate to the compiler what virtual functions will be called the most.
>>
>> -Craig
>>
>>
>
> But it's so anti-cleanness hack.

Are you are refering to my suggestion that we should be able to specify virtual function priorities?  This is much more clean than writing your own virtual function dispatch routine whenever you want a virtual function to be optimized.  Further, the specifying of priority is completely optional.  You won't be forced to specify a priority if you don't want to.

-Craig


August 13, 2008
Craig Black wrote:
> "downs" <default_357-line@yahoo.de> wrote in message news:g7ribh$1vii$1@digitalmars.com...
>> The overhead of vtable calls is well known.
>>
>> For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls.
>>
>> When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers.
>>
>> So, for instance, the equivalent D code would look like so:
>>
>> class A { ... } class B : A { ... } class C : A { ... }
>>
>> A a = genSomeObj();
>> // a.test();
>> if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final
>> else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito
>> else a.test(); // fallback
>>
>> Ideas?
> 
> You are right.  Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction.

I don't think this is true any more.
It was true in processors older than the Pentium M and AMD K10. On recent CPUs, indirect jumps and calls are predicted as well as conditional branches are. If you're seeing this kind of the speed difference, it's something else (probably, more code is being executed).
August 13, 2008
"Craig Black" <cblack@ara.com> wrote in message news:g7s7o1$dpt$1@digitalmars.com...
>
>
> You are right.  Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction. I don't understand why this is so hard for some programmers to accept.

Thats not true these days. I know cause I've profiled it on multiple cpus, correctly predicted Indirect jumps (through a pointer) cost no more than a normal call/ret pair. They get a slot in the BTB just like conditional jumps, and IIRC they are predicted to go to the same place they did last time.

(on x86 anyway).

Virtual methods are in such high use that Intel and Amd have put a lot of effort into optimizing them.

Replacing the function pointer with a conditional branch wont make it faster.



> See Microsoft Visual C++ profile guided optimizations (PGO) "Virtual Call Speculation". Using PGO, the most commonly called virtual functions are optimized so that they do not require a function pointer invokation.

The speedup you get from this is the inlining of the function, *not* the getting rid of the function pointer lookup. It still requires a lookup in the class info and a conditional branch to check that the "inlined function" is the correct match for the speculated object.

That will still cost a whole pipline of cycles if wrongly predicted.




August 13, 2008
Craig Black wrote:
> Rather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code.  This would indicate to the compiler what virtual functions will be called the most.
> 
> -Craig 

Or you could wait a bit[1] and have LLVM do that for you, assuming you're willing to run LLVM bytecode. And considering it's LLVM, you could probably hack it to periodically dump the optimized code. So you'd optimize your application by fooling around with it for a couple hours.

[1] "A bit" may be as long as a decade. I dunno.
August 14, 2008
To clear this up, I've been running a benchmark.

module test91;

import tools.time, std.stdio, tools.base, tools.mersenne;

class A { void test() { } }
class B : A { final override void test() { } }
class C : A { final override void test() { } }

A a, b, c;
static this() { a = new A; b = new B; c = new C; }

A gen() {
  if (randf() < 1f/3f) return a;
  else if (randf() < 0.5) return b;
  else return c;
}

void main() {
  const count = 1024*1024;
  for (int z = 0; z < 4; ++z) {
    writefln("Naive: ", time({
      for (int i = 0; i < count; ++i) gen().test();
    }()));
    writefln("Speculative for B: ", time({
      for (int i = 0; i < count; ++i) {
        auto t = gen();
        if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
        else t.test();
      }
    }()));
    writefln("Speculative for B/C: ", time({
      for (int i = 0; i < count; ++i) {
        auto t = gen();
        if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
        else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test();
        else t.test();
      }
    }()));
  }
}


And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls.

Here's the output of my little proggy in the last iteration:

Naive: 560958
Speculative for B: 574602
Speculative for B/C: 572429

If anything, naive is often a little faster.

This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim.

Sorry for the confusion.
August 14, 2008
downs Wrote:

> To clear this up, I've been running a benchmark.
> 
> module test91;
> 
> import tools.time, std.stdio, tools.base, tools.mersenne;
> 
> class A { void test() { } }
> class B : A { final override void test() { } }
> class C : A { final override void test() { } }
> 
> A a, b, c;
> static this() { a = new A; b = new B; c = new C; }
> 
> A gen() {
>   if (randf() < 1f/3f) return a;
>   else if (randf() < 0.5) return b;
>   else return c;
> }
> 
> void main() {
>   const count = 1024*1024;
>   for (int z = 0; z < 4; ++z) {
>     writefln("Naive: ", time({
>       for (int i = 0; i < count; ++i) gen().test();
>     }()));
>     writefln("Speculative for B: ", time({
>       for (int i = 0; i < count; ++i) {
>         auto t = gen();
>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>         else t.test();
>       }
>     }()));
>     writefln("Speculative for B/C: ", time({
>       for (int i = 0; i < count; ++i) {
>         auto t = gen();
>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>         else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test();
>         else t.test();
>       }
>     }()));
>   }
> }
> 
> 
> And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls.
> 
> Here's the output of my little proggy in the last iteration:
> 
> Naive: 560958
> Speculative for B: 574602
> Speculative for B/C: 572429
> 
> If anything, naive is often a little faster.
> 
> This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim.
> 
> Sorry for the confusion.

you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing.

make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.
« First   ‹ Prev
1 2 3