View mode: basic / threaded / horizontal-split · Log in · Help
July 31, 2006
Re: int opEquals(Object), and other legacy ints (!)
Walter Bright wrote:
> Stewart Gordon wrote:
>> Walter Bright wrote:
>>> Stewart Gordon wrote:
>>>> xs0 wrote:
>>>> <snip>
>>>>> Well, I'm just guessing, but I think something like
>>>>>
>>>>>  > int opEquals(Foo foo)
>>>>>  > {
>>>>>  >     return this.bar == foo.bar;
>>>>>  > }
>>>>>
>>>>> is compiled to something like
>>>>>
>>>>>> return this.bar-foo.bar; // 1 instruction
>>>>>
>>>>> but if the return type is bool, it becomes
>>>>>
>>>>>> return this.bar-foo.bar?1:0; // 3 instructions
>>>>
>>>> If it does this, then there's a serious bug in the compiler.
>>>
>>> What instruction sequence do expect to be generated for it?
>>
>> If anything resembling the above, then
>>
>>     return this.bar-foo.bar?0:1;
> 
> ? Let's look at an example:
> 
> class Foo
> {
>     int foo, bar;
> 
>     int Eq1(Foo foo)
>     {
>         return this.bar-foo.bar?0:1;
>     }
> 
>     int Eq2(Foo foo)
>     {
>         return this.bar-foo.bar;
>     }
> }
> 
> which generates:
> 
>     Eq1:
>                 mov     EDX,4[ESP]
>                 mov     ECX,0Ch[EAX]
>                 sub     ECX,0Ch[EDX]
>                 cmp     ECX,1
>                 sbb     EAX,EAX
>                 neg     EAX
>                 ret     4
>     Eq2:
>                 mov     ECX,4[ESP]
>                 mov     EAX,0Ch[EAX]
>                 sub     EAX,0Ch[ECX]
>                 ret     4
> 
> So we have 4 instructions generated rather than 1. If there's a trick to 
> generate only one instruction for Eq1, I'd like to know about it.
> 
>>> I can. (a == b), where a and b are ints, can be implemented as (a - 
>>> b), and the result is int 0 for equality, int !=0 for inequality.
>>
>> How is this (a == b) rather than (a != b)?
> 
> I don't understand your question.

As per the other posts, Eq2 actually takes 2 instructions:

Eq2:
	...
	sub     EAX,0Ch[ECX]
	not	EAX;


And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it 
was only 2 instructions too, CMP and SETE ! :

Eq1:
	...
	cmp     EAX,0Ch[ECX]
	sete	EAX;

(http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
It seems to me perfectly valid, is there any problem here?

What does the original Eq1 even do? :

	sub     ECX,0Ch[EDX]
	cmp     ECX,1       // Huh?
	sbb     EAX,EAX
	neg     EAX


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 31, 2006
Re: int opEquals(Object), and other legacy ints (!)
Bruno Medeiros wrote:
> Walter Bright wrote:
> 
>> Stewart Gordon wrote:
>>
>>> Walter Bright wrote:
>>>
>>>> Stewart Gordon wrote:
>>>>
>>>>> xs0 wrote:
>>>>> <snip>
>>>>>
>>>>>> Well, I'm just guessing, but I think something like
>>>>>>
>>>>>>  > int opEquals(Foo foo)
>>>>>>  > {
>>>>>>  >     return this.bar == foo.bar;
>>>>>>  > }
>>>>>>
>>>>>> is compiled to something like
>>>>>>
>>>>>>> return this.bar-foo.bar; // 1 instruction
>>>>>>
>>>>>>
>>>>>> but if the return type is bool, it becomes
>>>>>>
>>>>>>> return this.bar-foo.bar?1:0; // 3 instructions
>>>>>
>>>>>
>>>>> If it does this, then there's a serious bug in the compiler.
>>>>
>>>>
>>>> What instruction sequence do expect to be generated for it?
>>>
>>>
>>> If anything resembling the above, then
>>>
>>>     return this.bar-foo.bar?0:1;
>>
>>
>> ? Let's look at an example:
>>
>> class Foo
>> {
>>     int foo, bar;
>>
>>     int Eq1(Foo foo)
>>     {
>>         return this.bar-foo.bar?0:1;
>>     }
>>
>>     int Eq2(Foo foo)
>>     {
>>         return this.bar-foo.bar;
>>     }
>> }
>>
>> which generates:
>>
>>     Eq1:
>>                 mov     EDX,4[ESP]
>>                 mov     ECX,0Ch[EAX]
>>                 sub     ECX,0Ch[EDX]
>>                 cmp     ECX,1
>>                 sbb     EAX,EAX
>>                 neg     EAX
>>                 ret     4
>>     Eq2:
>>                 mov     ECX,4[ESP]
>>                 mov     EAX,0Ch[EAX]
>>                 sub     EAX,0Ch[ECX]
>>                 ret     4
>>
>> So we have 4 instructions generated rather than 1. If there's a trick 
>> to generate only one instruction for Eq1, I'd like to know about it.
>>
>>>> I can. (a == b), where a and b are ints, can be implemented as (a - 
>>>> b), and the result is int 0 for equality, int !=0 for inequality.
>>>
>>>
>>> How is this (a == b) rather than (a != b)?
>>
>>
>> I don't understand your question.
> 
> 
> As per the other posts, Eq2 actually takes 2 instructions:
> 
> Eq2:
>     ...
>     sub     EAX,0Ch[ECX]
>     not    EAX;
> 
> 
> And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it 
> was only 2 instructions too, CMP and SETE ! :
> 
> Eq1:
>     ...
>     cmp     EAX,0Ch[ECX]
>     sete    EAX;
> 
> (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
> It seems to me perfectly valid, is there any problem here?


Yes indeed. Well spotted! On anything supporting the 386 instruction set 
(and D is targeted for 32-bit devices only), there's really no 
performance advantage in returning an int over returning a bool.

This should be addressed, so that some of the core APIs can be cleaned 
up appropriately?


> 
> What does the original Eq1 even do? :
> 
>     sub     ECX,0Ch[EDX]
>     cmp     ECX,1       // Huh?
>     sbb     EAX,EAX
>     neg     EAX
> 
> 

That's old-skool, pre-386 hacking :)
July 31, 2006
Re: int opEquals(Object), and other legacy ints (!)
Bruno Medeiros wrote:
> And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it 
> was only 2 instructions too, CMP and SETE ! :
> 
> Eq1:
>     ...
>     cmp     EAX,0Ch[ECX]
>     sete    EAX;
> 
> (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
> It seems to me perfectly valid, is there any problem here?

Interesting instruction. Seems to have the exact semantics needed for 
these situations. You'd almost think CPU designers care about what 
people want to do with their products :P.


> What does the original Eq1 even do? :
Step by step:
>     mov     ECX,0Ch[EAX]
(You skipped this one) Loads this.bar into ECX.
>     sub     ECX,0Ch[EDX]
Subtracts foo.bar from ECX.
>     cmp     ECX,1       // Huh?
Among other things, sets borrow (aka carry) flag if ECX == 0 (i.e. if 
foo.bar == this.bar), clears it otherwise.
>     sbb     EAX,EAX
Subtracts (EAX + borrow) from EAX, setting it to either -1 (if carry == 
1) or 0 (if carry == 0).
>     neg     EAX
Negates EAX.

A bit weird at first glance, but it works as advertised :).


But indeed, a cmp/sete combo seems to do the same in less instructions.
July 31, 2006
Re: int opEquals(Object), and other legacy ints (!)
Bruno Medeiros wrote:
> 
> What does the original Eq1 even do? :
> 
>     sub     ECX,0Ch[EDX]
>     cmp     ECX,1       // Huh?
>     sbb     EAX,EAX
>     neg     EAX
> 
> 

[PS: I've read Frits answer after writing this: ]

Ah I get it now... wasn't understanding what borrow (the mathematical 
notion) was, since I'm not a native english speaker. Nothing a wikipedia 
lookup didn't solve. So, correct me if I'm wrong:

(when I say EDX I mean 0Ch[EDX] or whatever)

// sets the carry flag if zero flag is on,
// that is, if ECX == EDX (from previous instruction)
  cmp   ECX,1

// sets EAX as zero and also subtracts one if carry flag is set
// that is, EAX = -1 if ECX == EDX and EAX = 0 if ECX != EDX
  sbb	EAX,EAX

// two's complement negation of EAX, 0 becomes 0, -1 becomes 1
  neg EAX
// end result: EAX = 1 if ECX == EDX and EAX = 0 if ECX != EDX

So yeah, it seems these 3 instructions do the same as SETE ... ?



-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 31, 2006
Re: int opEquals(Object), and other legacy ints
Bruno Medeiros wrote:
> Well, let's think about the other way around then. Why should bool be 
> constrained to 0 or 1? Why not, same as kris said, 0 would be false, and 
> non zero would be true. Then we could have an opEquals or any function 
> returning a bool instead of int, without penalty loss.
> 
> The only shortcoming I see is that it would be slower to compare two 
> bool /variables/:
>    (b1 == b2)
> that expression is currently just 1 instruction, a CMP, but without the 
> 0,1 restriction it would be more (3, I think, have to check that). 
> However, is that significantly worse? I think not. I think comparison 
> between two bool _variables_ is likely very rare, and when it happens it 
> is also probably not performance critical. (statistical references?)
> Note: this would not affect at all comparisons between a bool variable 
> and a bool literal. Like (b == true) or (b == false).

I think most programmers would find this to be very surprising behavior. 
I know I would.
July 31, 2006
Re: int opEquals(Object), and other legacy ints
Walter Bright wrote:
> P.S. Inevitably, some will ask "who cares" about these small 
> efficiencies. The trouble is, these kinds of things often appear in 
> tight loops, where small inefficiencies get multiplied by millions.

I consider this kind of stuff the compilers job -- so if I write or 
maintain code that is slow, I know there is probably something I can do 
about it w/o having to drop into assembly.

Personally I've spent a huge amount of time tuning code and I can't tell 
you the positive effect that has on end-users. IMHO bad performance is 
often the "forgotten bug" (that's not to say the budget should be busted 
on that "last 20%" either though).

- Dave
August 02, 2006
Re: int opEquals(Object), and other legacy ints
Walter Bright wrote:
> Bruno Medeiros wrote:
>> Well, let's think about the other way around then. Why should bool be 
>> constrained to 0 or 1? Why not, same as kris said, 0 would be false, 
>> and non zero would be true. Then we could have an opEquals or any 
>> function returning a bool instead of int, without penalty loss.
>>
>> The only shortcoming I see is that it would be slower to compare two 
>> bool /variables/:
>>    (b1 == b2)
>> that expression is currently just 1 instruction, a CMP, but without 
>> the 0,1 restriction it would be more (3, I think, have to check that). 
>> However, is that significantly worse? I think not. I think comparison 
>> between two bool _variables_ is likely very rare, and when it happens 
>> it is also probably not performance critical. (statistical references?)
>> Note: this would not affect at all comparisons between a bool variable 
>> and a bool literal. Like (b == true) or (b == false).
> 
> I think most programmers would find this to be very surprising behavior. 
> I know I would.

Surprising behavior? What surprising behavior, those are all 
implementation details, they have not a bearing on language/program 
behavior.

And how about the alternative of using the SETE instruction for bool 
restriction?, you haven't commented on that yet...

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
August 07, 2006
Re: int opEquals(Object), and other legacy ints (!)
> But indeed, a cmp/sete combo seems to do the same in less instructions.

But is it faster? I've noticed that many of the higher-level assembly
instructions are actually slower than multiple lower-level ones. "loop" is
the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
is faster).

L.
August 07, 2006
Re: int opEquals(Object), and other legacy ints (!)
Lionello Lunesu wrote:
>> But indeed, a cmp/sete combo seems to do the same in less instructions.
> 
> But is it faster? I've noticed that many of the higher-level assembly
> instructions are actually slower than multiple lower-level ones. "loop" is
> the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
> is faster).

Heh... You may have noticed I didn't use any word related to speed :). 
The reason for that is that I don't know much about optimization for 
speed, especially where pipelines etc. are involved...

Hardware is weird.
August 07, 2006
Re: int opEquals(Object), and other legacy ints (!)
Lionello Lunesu wrote:
>>But indeed, a cmp/sete combo seems to do the same in less instructions.
> 
> 
> But is it faster? I've noticed that many of the higher-level assembly
> instructions are actually slower than multiple lower-level ones. "loop" is
> the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
> is faster).
> 
> L. 
> 
> 

If you'd looked at the setne instruction linked previously, you'd have 
seen that it consumes 3 cycles. And no; there are no jump, loops, or any 
other reason to cause pipeline bubbles. If you need a primer on what 
causes modern CPUs to stall (the silly P4 in particular) then you could 
do a lot worse than to read the articles by Jon Stokes at ArsTechnica.

Oh, and this is just daft. Why don't we all count the cycles for a 
call/return instead? Or, perhaps just exactly what it costs to compare 
the bytes of two strings until they start to look different? You'll find 
the cost of setne (and probably even the prior "extra" three 
instructions for boolean support) is relegated to background noise.

Let's face it: int is likely used instead of bool for historical 
reasons; probably just an artifact left over from pre-80386 days. Would 
be nice to get that codegen cleaned up ~ especially since it was W who 
claimed the reasons were performance related. Hacking the high-level 
code with int vs boolean, just to reflect some archaic machine 
instruction, is one of those things that come under the umbrella of 
"premature optimization".
1 2 3 4
Top | Discussion index | About this forum | D home