August 14, 2008
superdan wrote:
> downs Wrote:
> 
>> To clear this up, I've been running a benchmark.
>>
>> module test91;
>>
>> import tools.time, std.stdio, tools.base, tools.mersenne;
>>
>> class A { void test() { } }
>> class B : A { final override void test() { } }
>> class C : A { final override void test() { } }
>>
>> A a, b, c;
>> static this() { a = new A; b = new B; c = new C; }
>>
>> A gen() {
>>   if (randf() < 1f/3f) return a;
>>   else if (randf() < 0.5) return b;
>>   else return c;
>> }
>>
>> void main() {
>>   const count = 1024*1024;
>>   for (int z = 0; z < 4; ++z) {
>>     writefln("Naive: ", time({
>>       for (int i = 0; i < count; ++i) gen().test();
>>     }()));
>>     writefln("Speculative for B: ", time({
>>       for (int i = 0; i < count; ++i) {
>>         auto t = gen();
>>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>>         else t.test();
>>       }
>>     }()));
>>     writefln("Speculative for B/C: ", time({
>>       for (int i = 0; i < count; ++i) {
>>         auto t = gen();
>>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>>         else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test();
>>         else t.test();
>>       }
>>     }()));
>>   }
>> }
>>
>>
>> And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls.
>>
>> Here's the output of my little proggy in the last iteration:
>>
>> Naive: 560958
>> Speculative for B: 574602
>> Speculative for B/C: 572429
>>
>> If anything, naive is often a little faster.
>>
>> This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim.
>>
>> Sorry for the confusion.
> 
> you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing.
> 
> make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.

I know. But if it doesn't matter in that case, it most likely won't matter in practial situations.

Nonetheless, here are some better timings, including a faster (dirtier) randf() function, and a null pass that only generates the object.

Null: 729908
Naive: 1615314
Speculative for B: 1692860
Speculative for B/C: 1664040
August 14, 2008
downs Wrote:

> superdan wrote:
> > downs Wrote:
> > 
> >> To clear this up, I've been running a benchmark.
> >>
> >> module test91;
> >>
> >> import tools.time, std.stdio, tools.base, tools.mersenne;
> >>
> >> class A { void test() { } }
> >> class B : A { final override void test() { } }
> >> class C : A { final override void test() { } }
> >>
> >> A a, b, c;
> >> static this() { a = new A; b = new B; c = new C; }
> >>
> >> A gen() {
> >>   if (randf() < 1f/3f) return a;
> >>   else if (randf() < 0.5) return b;
> >>   else return c;
> >> }
> >>
> >> void main() {
> >>   const count = 1024*1024;
> >>   for (int z = 0; z < 4; ++z) {
> >>     writefln("Naive: ", time({
> >>       for (int i = 0; i < count; ++i) gen().test();
> >>     }()));
> >>     writefln("Speculative for B: ", time({
> >>       for (int i = 0; i < count; ++i) {
> >>         auto t = gen();
> >>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
> >>         else t.test();
> >>       }
> >>     }()));
> >>     writefln("Speculative for B/C: ", time({
> >>       for (int i = 0; i < count; ++i) {
> >>         auto t = gen();
> >>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
> >>         else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test();
> >>         else t.test();
> >>       }
> >>     }()));
> >>   }
> >> }
> >>
> >>
> >> And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls.
> >>
> >> Here's the output of my little proggy in the last iteration:
> >>
> >> Naive: 560958
> >> Speculative for B: 574602
> >> Speculative for B/C: 572429
> >>
> >> If anything, naive is often a little faster.
> >>
> >> This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim.
> >>
> >> Sorry for the confusion.
> > 
> > you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing.
> > 
> > make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.
> 
> I know. But if it doesn't matter in that case, it most likely won't matter in practial situations.
> 
> Nonetheless, here are some better timings, including a faster (dirtier) randf() function, and a null pass that only generates the object.
> 
> Null: 729908
> Naive: 1615314
> Speculative for B: 1692860
> Speculative for B/C: 1664040

the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.
August 14, 2008
"downs" <default_357-line@yahoo.de> wrote in message news:g8193l$t88$1@digitalmars.com...
> To clear this up, I've been running a benchmark.
>
> module test91;
>
> import tools.time, std.stdio, tools.base, tools.mersenne;
>
> class A { void test() { } }
> class B : A { final override void test() { } }
> class C : A { final override void test() { } }
>
> A a, b, c;
> static this() { a = new A; b = new B; c = new C; }
>
> A gen() {
>  if (randf() < 1f/3f) return a;
>  else if (randf() < 0.5) return b;
>  else return c;
> }
>
> void main() {
>  const count = 1024*1024;
>  for (int z = 0; z < 4; ++z) {
>    writefln("Naive: ", time({
>      for (int i = 0; i < count; ++i) gen().test();
>    }()));
>    writefln("Speculative for B: ", time({
>      for (int i = 0; i < count; ++i) {
>        auto t = gen();
>        if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>        else t.test();
>      }
>    }()));
>    writefln("Speculative for B/C: ", time({
>      for (int i = 0; i < count; ++i) {
>        auto t = gen();
>        if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>        else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test();
>        else t.test();
>      }
>    }()));
>  }
> }
>
>
> And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls.
>
> Here's the output of my little proggy in the last iteration:
>
> Naive: 560958
> Speculative for B: 574602
> Speculative for B/C: 572429
>
> If anything, naive is often a little faster.
>
> This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim.

The point is that you cant get rid of the branch prediction. You either have the indirect jump, or you have a local conditional branch, either way the cpu has to speculatively go in one direct or the other. And as both are dealt with in (near enough) exactly the same way, they both are dealt with by the same prediction mechanism, you dont gain anything from replacing one with the other.

It's been that way since the Pentium 2 IIRC.





August 14, 2008
superdan wrote:
> downs Wrote:
> 
>> superdan wrote:
>>> downs Wrote:
>>>
>>>> To clear this up, I've been running a benchmark.
>>>>
>>>> module test91;
>>>>
>>>> import tools.time, std.stdio, tools.base, tools.mersenne;
>>>>
>>>> class A { void test() { } }
>>>> class B : A { final override void test() { } }
>>>> class C : A { final override void test() { } }
>>>>
>>>> A a, b, c;
>>>> static this() { a = new A; b = new B; c = new C; }
>>>>
>>>> A gen() {
>>>>   if (randf() < 1f/3f) return a;
>>>>   else if (randf() < 0.5) return b;
>>>>   else return c;
>>>> }
>>>>
>>>> void main() {
>>>>   const count = 1024*1024;
>>>>   for (int z = 0; z < 4; ++z) {
>>>>     writefln("Naive: ", time({
>>>>       for (int i = 0; i < count; ++i) gen().test();
>>>>     }()));
>>>>     writefln("Speculative for B: ", time({
>>>>       for (int i = 0; i < count; ++i) {
>>>>         auto t = gen();
>>>>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>>>>         else t.test();
>>>>       }
>>>>     }()));
>>>>     writefln("Speculative for B/C: ", time({
>>>>       for (int i = 0; i < count; ++i) {
>>>>         auto t = gen();
>>>>         if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test();
>>>>         else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test();
>>>>         else t.test();
>>>>       }
>>>>     }()));
>>>>   }
>>>> }
>>>>
>>>>
>>>> And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls.
>>>>
>>>> Here's the output of my little proggy in the last iteration:
>>>>
>>>> Naive: 560958
>>>> Speculative for B: 574602
>>>> Speculative for B/C: 572429
>>>>
>>>> If anything, naive is often a little faster.
>>>>
>>>> This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim.
>>>>
>>>> Sorry for the confusion.
>>> you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing.
>>>
>>> make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.
>> I know. But if it doesn't matter in that case, it most likely won't matter in practial situations.
>>
>> Nonetheless, here are some better timings, including a faster (dirtier) randf() function, and a null pass that only generates the object.
>>
>> Null: 729908
>> Naive: 1615314
>> Speculative for B: 1692860
>> Speculative for B/C: 1664040
> 
> the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.

As you wish.

Here's the timings for 10% A, then 81% for B, 9% for C.

Null: 629760
Naive: 1383764
Speculative for B: 1397784
Speculative for B/C: 1418163
August 14, 2008
"superdan" <super@dan.org> wrote in message news:g81e1l$18js$1@digitalmars.com...
> downs Wrote:

>> Null: 729908
>> Naive: 1615314
>> Speculative for B: 1692860
>> Speculative for B/C: 1664040
>
> the whole premise of speculation is it oils the common path. you have
> uniform probabilities.
> what you gain on speculating for B you lose in the extra test when
> misspeculating on others.
> so to really test speculation make B like 90% of cases and retry.

To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other.

So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this...

(ignoring parameter setup which will be identical)

MOV   EAX.[objptr.vtable+offset]
CALL  EAX

With this...

MOV   EAX,[objptr.classinfo]
CMP    EAX,expectedclasstype
JNE      @wrongtype
CALL   @expectedtypemethod
JMP     @done
@wrongtype
// normal vtable code here
@done

And there's no way that will be faster imo. Unless perhaps you have a 486 or P1. ;-)



August 14, 2008
Jb Wrote:

> 
> "superdan" <super@dan.org> wrote in message news:g81e1l$18js$1@digitalmars.com...
> > downs Wrote:
> 
> >> Null: 729908
> >> Naive: 1615314
> >> Speculative for B: 1692860
> >> Speculative for B/C: 1664040
> >
> > the whole premise of speculation is it oils the common path. you have
> > uniform probabilities.
> > what you gain on speculating for B you lose in the extra test when
> > misspeculating on others.
> > so to really test speculation make B like 90% of cases and retry.
> 
> To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other.

no that's wrong. conditional branch is better speculated than indirect jump. they share no other hardware than the instruction fetching pipe. the mechanism for conditional branch speculation is that stuff is executed in parallel on both branches inside the pipeline. the branch that makes it is committed. the other is retired without getting to write to the register file or memory. indirect jump is a whole different business altogether.

> So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this...

you are right here tho for the wrong reasons :) yeah speculation is good if there are a few good chunky instructions to eat while speculating. if it's a function call that in turn is just a ret... the results are nothing to write home about. as it's been shown.

processors today are so complicated there's no chance to have relevant synthetic tests. when checking for speed you always must use realistic loads. if method speculation as in here shows nothing on empty functions that's nothing. i've seen much weirder things. tests must be held on 5-6 real benchmarks to be any good.
August 14, 2008
"superdan" wrote
> Jb Wrote:
>
>>
>> "superdan" <super@dan.org> wrote in message news:g81e1l$18js$1@digitalmars.com...
>> > downs Wrote:
>>
>> >> Null: 729908
>> >> Naive: 1615314
>> >> Speculative for B: 1692860
>> >> Speculative for B/C: 1664040
>> >
>> > the whole premise of speculation is it oils the common path. you have
>> > uniform probabilities.
>> > what you gain on speculating for B you lose in the extra test when
>> > misspeculating on others.
>> > so to really test speculation make B like 90% of cases and retry.
>>
>> To speculate localy you still have to lookup the class info, compare it
>> to
>> the speculated object type, and then conditionaly branch. You cant avoid
>> a
>> conditional branch, and as it will share the same pattern as the indirect
>> jump, and (in most cases) the same prediction mechanism, neither will be
>> better predicted than the other.
>
> no that's wrong. conditional branch is better speculated than indirect jump. they share no other hardware than the instruction fetching pipe. the mechanism for conditional branch speculation is that stuff is executed in parallel on both branches inside the pipeline. the branch that makes it is committed. the other is retired without getting to write to the register file or memory. indirect jump is a whole different business altogether.
>
>> So the point is that you cant make the common path any faster unless you
>> actualy *inline* the speculated method. In fact if you dont inline you
>> are
>> most probably making the situation worse because you are replacing
>> this...
>
> you are right here tho for the wrong reasons :) yeah speculation is good if there are a few good chunky instructions to eat while speculating. if it's a function call that in turn is just a ret... the results are nothing to write home about. as it's been shown.
>
> processors today are so complicated there's no chance to have relevant synthetic tests. when checking for speed you always must use realistic loads. if method speculation as in here shows nothing on empty functions that's nothing. i've seen much weirder things. tests must be held on 5-6 real benchmarks to be any good.

Who the **** are you, and what did you do with superdan? ;)

-Steve


August 14, 2008
"superdan" <super@dan.org> wrote in message news:g81o61$21bs$1@digitalmars.com...
> Jb Wrote:
>
>>
>> "superdan" <super@dan.org> wrote in message news:g81e1l$18js$1@digitalmars.com...
>> > downs Wrote:
>>
>> >> Null: 729908
>> >> Naive: 1615314
>> >> Speculative for B: 1692860
>> >> Speculative for B/C: 1664040
>> >
>> > the whole premise of speculation is it oils the common path. you have
>> > uniform probabilities.
>> > what you gain on speculating for B you lose in the extra test when
>> > misspeculating on others.
>> > so to really test speculation make B like 90% of cases and retry.
>>
>> To speculate localy you still have to lookup the class info, compare it
>> to
>> the speculated object type, and then conditionaly branch. You cant avoid
>> a
>> conditional branch, and as it will share the same pattern as the indirect
>> jump, and (in most cases) the same prediction mechanism, neither will be
>> better predicted than the other.
>
> no that's wrong. conditional branch is better speculated than indirect
> jump.
> they share no other hardware than the instruction fetching pipe.

Sorry but you dont know what you are talking about. Indirect jumps get a slot in the branch target buffer (BTB) the same as conditional branches. They both share the same target prediction hardware. It's not only a fact it's also common sense that they would do so.

http://www.agner.org/optimize/microarchitecture.pdf

Read the chapter on branch prediction.

http://www.intel.com/design/processor/manuals/248966.pdf

Read Chapter 3.4.1.6


> the mechanism for conditional branch speculation is that stuff is executed
> in parallel
> on both branches inside the pipeline. the branch that makes it is
> committed. the other
>  is retired without getting to write to the register file or memory.

Actualy AFAIK no current x86 processor *execute* both branches. What does happen is the instruction cache will *prefetch* both branches, so that when the BTB decides which way to go it has the instruction data ready.

But it will still cost you a whole pipeline flush if the prediction was wrong.


>> So the point is that you cant make the common path any faster unless you
>> actualy *inline* the speculated method. In fact if you dont inline you
>> are
>> most probably making the situation worse because you are replacing
>> this...
>
> you are right here tho for the wrong reasons :) yeah speculation is good
> if there are
> a few good chunky instructions to eat while speculating. if it's a
> function call that in
> turn is just a ret... the results are nothing to write home about. as it's
> been shown.

No.. erm NO... and NOOooOOOO!!!!

;-)

Speculation is good if it allows you to inline the method and chop out a whole bunch of register shuffling, stack setup, and entry / exit code.

It's got naff all to do with "having shit to do" while you're speculating. Just like it's got naff all to do with conditional branches being faster / better predicted than indirect calls.


August 14, 2008
Jb wrote:
> "superdan" <super@dan.org> wrote in message news:g81o61$21bs$1@digitalmars.com...
>> Jb Wrote:
>>
>>> "superdan" <super@dan.org> wrote in message news:g81e1l$18js$1@digitalmars.com...
>>>> downs Wrote:
>>>>> Null: 729908
>>>>> Naive: 1615314
>>>>> Speculative for B: 1692860
>>>>> Speculative for B/C: 1664040
>>>> the whole premise of speculation is it oils the common path. you have
>>>> uniform probabilities.
>>>> what you gain on speculating for B you lose in the extra test when
>>>> misspeculating on others.
>>>> so to really test speculation make B like 90% of cases and retry.
>>> To speculate localy you still have to lookup the class info, compare it
>>> to
>>> the speculated object type, and then conditionaly branch. You cant avoid
>>> a
>>> conditional branch, and as it will share the same pattern as the indirect
>>> jump, and (in most cases) the same prediction mechanism, neither will be
>>> better predicted than the other.
>> no that's wrong. conditional branch is better speculated than indirect
>> jump.
>> they share no other hardware than the instruction fetching pipe.
> 
> Sorry but you dont know what you are talking about. Indirect jumps get a slot in the branch target buffer (BTB) the same as conditional branches. They both share the same target prediction hardware. It's not only a fact it's also common sense that they would do so.
> 
> http://www.agner.org/optimize/microarchitecture.pdf
> 
> Read the chapter on branch prediction.
> 
> http://www.intel.com/design/processor/manuals/248966.pdf
> 
> Read Chapter 3.4.1.6
> 
> 
>> the mechanism for conditional branch speculation is that stuff is executed
>> in parallel
>> on both branches inside the pipeline. the branch that makes it is
>> committed. the other
>>  is retired without getting to write to the register file or memory.
> 
> Actualy AFAIK no current x86 processor *execute* both branches. What does happen is the instruction cache will *prefetch* both branches, so that when the BTB decides which way to go it has the instruction data ready.
> 
> But it will still cost you a whole pipeline flush if the prediction was wrong.
> 
> 
>>> So the point is that you cant make the common path any faster unless you
>>> actualy *inline* the speculated method. In fact if you dont inline you
>>> are
>>> most probably making the situation worse because you are replacing
>>> this...
>> you are right here tho for the wrong reasons :) yeah speculation is good
>> if there are
>> a few good chunky instructions to eat while speculating. if it's a
>> function call that in
>> turn is just a ret... the results are nothing to write home about. as it's
>> been shown.
> 
> No.. erm NO... and NOOooOOOO!!!!
> 
> ;-)
> 
> Speculation is good if it allows you to inline the method and chop out a whole bunch of register shuffling, stack setup, and entry / exit code.
> 
> It's got naff all to do with "having shit to do" while you're speculating. Just like it's got naff all to do with conditional branches being faster / better predicted than indirect calls.
> 
> 

I think Superdan is just too old ;)
What he is talking about is probably the way things worked in P4 and
that design was canceled for many good reasons. As Jb says above, No
current x86 processor *execute* both branches.
My Digital Systems professor works at Intel here in Israel where they
design the chips ;) and he taught us the same thing Jb said above.
August 14, 2008
Jb Wrote:

> 
> "superdan" <super@dan.org> wrote in message news:g81o61$21bs$1@digitalmars.com...
> > Jb Wrote:
> >
> >>
> >> "superdan" <super@dan.org> wrote in message news:g81e1l$18js$1@digitalmars.com...
> >> > downs Wrote:
> >>
> >> >> Null: 729908
> >> >> Naive: 1615314
> >> >> Speculative for B: 1692860
> >> >> Speculative for B/C: 1664040
> >> >
> >> > the whole premise of speculation is it oils the common path. you have
> >> > uniform probabilities.
> >> > what you gain on speculating for B you lose in the extra test when
> >> > misspeculating on others.
> >> > so to really test speculation make B like 90% of cases and retry.
> >>
> >> To speculate localy you still have to lookup the class info, compare it
> >> to
> >> the speculated object type, and then conditionaly branch. You cant avoid
> >> a
> >> conditional branch, and as it will share the same pattern as the indirect
> >> jump, and (in most cases) the same prediction mechanism, neither will be
> >> better predicted than the other.
> >
> > no that's wrong. conditional branch is better speculated than indirect
> > jump.
> > they share no other hardware than the instruction fetching pipe.
> 
> Sorry but you dont know what you are talking about. Indirect jumps get a slot in the branch target buffer (BTB) the same as conditional branches. They both share the same target prediction hardware. It's not only a fact it's also common sense that they would do so.
> 
> http://www.agner.org/optimize/microarchitecture.pdf
> 
> Read the chapter on branch prediction.
> 
> http://www.intel.com/design/processor/manuals/248966.pdf
> 
> Read Chapter 3.4.1.6

relax. i do know what i'm talking about eh. i didn't claim it was timely :) also it got real jumbled in my head. like i described predicated execution instead of straight branch prediction. itanium has the latter and x86 has the former. (bad decision in itanium if you ask me.) so i was talking outta my... i mean i was talking nonsense.

> > the mechanism for conditional branch speculation is that stuff is executed
> > in parallel
> > on both branches inside the pipeline. the branch that makes it is
> > committed. the other
> >  is retired without getting to write to the register file or memory.
> 
> Actualy AFAIK no current x86 processor *execute* both branches. What does happen is the instruction cache will *prefetch* both branches, so that when the BTB decides which way to go it has the instruction data ready.

that's right eh. i stand corrected. what i said happens on itanium.

> But it will still cost you a whole pipeline flush if the prediction was wrong.
> 
> 
> >> So the point is that you cant make the common path any faster unless you
> >> actualy *inline* the speculated method. In fact if you dont inline you
> >> are
> >> most probably making the situation worse because you are replacing
> >> this...
> >
> > you are right here tho for the wrong reasons :) yeah speculation is good
> > if there are
> > a few good chunky instructions to eat while speculating. if it's a
> > function call that in
> > turn is just a ret... the results are nothing to write home about. as it's
> > been shown.
> 
> No.. erm NO... and NOOooOOOO!!!!
> 
> ;-)
> 
> Speculation is good if it allows you to inline the method and chop out a whole bunch of register shuffling, stack setup, and entry / exit code.
> 
> It's got naff all to do with "having shit to do" while you're speculating. Just like it's got naff all to do with conditional branches being faster / better predicted than indirect calls.

hey hey... watch your language :) otherwise i agree.