July 31, 2006
Walter Bright wrote:
> Stewart Gordon wrote:
>> Walter Bright wrote:
>>> Stewart Gordon wrote:
>>>> xs0 wrote:
>>>> <snip>
>>>>> Well, I'm just guessing, but I think something like
>>>>>
>>>>>  > int opEquals(Foo foo)
>>>>>  > {
>>>>>  >     return this.bar == foo.bar;
>>>>>  > }
>>>>>
>>>>> is compiled to something like
>>>>>
>>>>>> return this.bar-foo.bar; // 1 instruction
>>>>>
>>>>> but if the return type is bool, it becomes
>>>>>
>>>>>> return this.bar-foo.bar?1:0; // 3 instructions
>>>>
>>>> If it does this, then there's a serious bug in the compiler.
>>>
>>> What instruction sequence do expect to be generated for it?
>>
>> If anything resembling the above, then
>>
>>     return this.bar-foo.bar?0:1;
> 
> ? Let's look at an example:
> 
> class Foo
> {
>     int foo, bar;
> 
>     int Eq1(Foo foo)
>     {
>         return this.bar-foo.bar?0:1;
>     }
> 
>     int Eq2(Foo foo)
>     {
>         return this.bar-foo.bar;
>     }
> }
> 
> which generates:
> 
>     Eq1:
>                 mov     EDX,4[ESP]
>                 mov     ECX,0Ch[EAX]
>                 sub     ECX,0Ch[EDX]
>                 cmp     ECX,1
>                 sbb     EAX,EAX
>                 neg     EAX
>                 ret     4
>     Eq2:
>                 mov     ECX,4[ESP]
>                 mov     EAX,0Ch[EAX]
>                 sub     EAX,0Ch[ECX]
>                 ret     4
> 
> So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.
> 
>>> I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.
>>
>> How is this (a == b) rather than (a != b)?
> 
> I don't understand your question.

As per the other posts, Eq2 actually takes 2 instructions:

Eq2:
	...
	sub     EAX,0Ch[ECX]
	not	EAX;


And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! :

Eq1:
	...
	cmp     EAX,0Ch[ECX]
	sete	EAX;

(http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
It seems to me perfectly valid, is there any problem here?

What does the original Eq1 even do? :

	sub     ECX,0Ch[EDX]
	cmp     ECX,1       // Huh?
	sbb     EAX,EAX
	neg     EAX


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 31, 2006
Bruno Medeiros wrote:
> Walter Bright wrote:
> 
>> Stewart Gordon wrote:
>>
>>> Walter Bright wrote:
>>>
>>>> Stewart Gordon wrote:
>>>>
>>>>> xs0 wrote:
>>>>> <snip>
>>>>>
>>>>>> Well, I'm just guessing, but I think something like
>>>>>>
>>>>>>  > int opEquals(Foo foo)
>>>>>>  > {
>>>>>>  >     return this.bar == foo.bar;
>>>>>>  > }
>>>>>>
>>>>>> is compiled to something like
>>>>>>
>>>>>>> return this.bar-foo.bar; // 1 instruction
>>>>>>
>>>>>>
>>>>>> but if the return type is bool, it becomes
>>>>>>
>>>>>>> return this.bar-foo.bar?1:0; // 3 instructions
>>>>>
>>>>>
>>>>> If it does this, then there's a serious bug in the compiler.
>>>>
>>>>
>>>> What instruction sequence do expect to be generated for it?
>>>
>>>
>>> If anything resembling the above, then
>>>
>>>     return this.bar-foo.bar?0:1;
>>
>>
>> ? Let's look at an example:
>>
>> class Foo
>> {
>>     int foo, bar;
>>
>>     int Eq1(Foo foo)
>>     {
>>         return this.bar-foo.bar?0:1;
>>     }
>>
>>     int Eq2(Foo foo)
>>     {
>>         return this.bar-foo.bar;
>>     }
>> }
>>
>> which generates:
>>
>>     Eq1:
>>                 mov     EDX,4[ESP]
>>                 mov     ECX,0Ch[EAX]
>>                 sub     ECX,0Ch[EDX]
>>                 cmp     ECX,1
>>                 sbb     EAX,EAX
>>                 neg     EAX
>>                 ret     4
>>     Eq2:
>>                 mov     ECX,4[ESP]
>>                 mov     EAX,0Ch[EAX]
>>                 sub     EAX,0Ch[ECX]
>>                 ret     4
>>
>> So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.
>>
>>>> I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.
>>>
>>>
>>> How is this (a == b) rather than (a != b)?
>>
>>
>> I don't understand your question.
> 
> 
> As per the other posts, Eq2 actually takes 2 instructions:
> 
> Eq2:
>     ...
>     sub     EAX,0Ch[ECX]
>     not    EAX;
> 
> 
> And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! :
> 
> Eq1:
>     ...
>     cmp     EAX,0Ch[ECX]
>     sete    EAX;
> 
> (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
> It seems to me perfectly valid, is there any problem here?


Yes indeed. Well spotted! On anything supporting the 386 instruction set (and D is targeted for 32-bit devices only), there's really no performance advantage in returning an int over returning a bool.

This should be addressed, so that some of the core APIs can be cleaned up appropriately?


> 
> What does the original Eq1 even do? :
> 
>     sub     ECX,0Ch[EDX]
>     cmp     ECX,1       // Huh?
>     sbb     EAX,EAX
>     neg     EAX
> 
> 

That's old-skool, pre-386 hacking :)
July 31, 2006
Bruno Medeiros wrote:
> And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! :
> 
> Eq1:
>     ...
>     cmp     EAX,0Ch[ECX]
>     sete    EAX;
> 
> (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
> It seems to me perfectly valid, is there any problem here?

Interesting instruction. Seems to have the exact semantics needed for these situations. You'd almost think CPU designers care about what people want to do with their products :P.


> What does the original Eq1 even do? :
Step by step:
>     mov     ECX,0Ch[EAX]
(You skipped this one) Loads this.bar into ECX.
>     sub     ECX,0Ch[EDX]
Subtracts foo.bar from ECX.
>     cmp     ECX,1       // Huh?
Among other things, sets borrow (aka carry) flag if ECX == 0 (i.e. if foo.bar == this.bar), clears it otherwise.
>     sbb     EAX,EAX
Subtracts (EAX + borrow) from EAX, setting it to either -1 (if carry == 1) or 0 (if carry == 0).
>     neg     EAX
Negates EAX.

A bit weird at first glance, but it works as advertised :).


But indeed, a cmp/sete combo seems to do the same in less instructions.
July 31, 2006
Bruno Medeiros wrote:
> 
> What does the original Eq1 even do? :
> 
>     sub     ECX,0Ch[EDX]
>     cmp     ECX,1       // Huh?
>     sbb     EAX,EAX
>     neg     EAX
> 
> 

[PS: I've read Frits answer after writing this: ]

Ah I get it now... wasn't understanding what borrow (the mathematical notion) was, since I'm not a native english speaker. Nothing a wikipedia lookup didn't solve. So, correct me if I'm wrong:

(when I say EDX I mean 0Ch[EDX] or whatever)

// sets the carry flag if zero flag is on,
// that is, if ECX == EDX (from previous instruction)
  cmp   ECX,1

// sets EAX as zero and also subtracts one if carry flag is set
// that is, EAX = -1 if ECX == EDX and EAX = 0 if ECX != EDX
  sbb	EAX,EAX

// two's complement negation of EAX, 0 becomes 0, -1 becomes 1
  neg EAX
// end result: EAX = 1 if ECX == EDX and EAX = 0 if ECX != EDX

So yeah, it seems these 3 instructions do the same as SETE ... ?



-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 31, 2006
Bruno Medeiros wrote:
> Well, let's think about the other way around then. Why should bool be constrained to 0 or 1? Why not, same as kris said, 0 would be false, and non zero would be true. Then we could have an opEquals or any function returning a bool instead of int, without penalty loss.
> 
> The only shortcoming I see is that it would be slower to compare two bool /variables/:
>    (b1 == b2)
> that expression is currently just 1 instruction, a CMP, but without the 0,1 restriction it would be more (3, I think, have to check that). However, is that significantly worse? I think not. I think comparison between two bool _variables_ is likely very rare, and when it happens it is also probably not performance critical. (statistical references?)
> Note: this would not affect at all comparisons between a bool variable and a bool literal. Like (b == true) or (b == false).

I think most programmers would find this to be very surprising behavior. I know I would.
July 31, 2006
Walter Bright wrote:
> P.S. Inevitably, some will ask "who cares" about these small efficiencies. The trouble is, these kinds of things often appear in tight loops, where small inefficiencies get multiplied by millions.

I consider this kind of stuff the compilers job -- so if I write or maintain code that is slow, I know there is probably something I can do about it w/o having to drop into assembly.

Personally I've spent a huge amount of time tuning code and I can't tell you the positive effect that has on end-users. IMHO bad performance is often the "forgotten bug" (that's not to say the budget should be busted on that "last 20%" either though).

- Dave
August 02, 2006
Walter Bright wrote:
> Bruno Medeiros wrote:
>> Well, let's think about the other way around then. Why should bool be constrained to 0 or 1? Why not, same as kris said, 0 would be false, and non zero would be true. Then we could have an opEquals or any function returning a bool instead of int, without penalty loss.
>>
>> The only shortcoming I see is that it would be slower to compare two bool /variables/:
>>    (b1 == b2)
>> that expression is currently just 1 instruction, a CMP, but without the 0,1 restriction it would be more (3, I think, have to check that). However, is that significantly worse? I think not. I think comparison between two bool _variables_ is likely very rare, and when it happens it is also probably not performance critical. (statistical references?)
>> Note: this would not affect at all comparisons between a bool variable and a bool literal. Like (b == true) or (b == false).
> 
> I think most programmers would find this to be very surprising behavior. I know I would.

Surprising behavior? What surprising behavior, those are all implementation details, they have not a bearing on language/program behavior.

And how about the alternative of using the SETE instruction for bool restriction?, you haven't commented on that yet...

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
August 07, 2006
> But indeed, a cmp/sete combo seems to do the same in less instructions.

But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster).

L.


August 07, 2006
Lionello Lunesu wrote:
>> But indeed, a cmp/sete combo seems to do the same in less instructions.
> 
> But is it faster? I've noticed that many of the higher-level assembly
> instructions are actually slower than multiple lower-level ones. "loop" is
> the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
> is faster).

Heh... You may have noticed I didn't use any word related to speed :). The reason for that is that I don't know much about optimization for speed, especially where pipelines etc. are involved...

Hardware is weird.
August 07, 2006
Lionello Lunesu wrote:
>>But indeed, a cmp/sete combo seems to do the same in less instructions.
> 
> 
> But is it faster? I've noticed that many of the higher-level assembly
> instructions are actually slower than multiple lower-level ones. "loop" is
> the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
> is faster).
> 
> L. 
> 
> 

If you'd looked at the setne instruction linked previously, you'd have seen that it consumes 3 cycles. And no; there are no jump, loops, or any other reason to cause pipeline bubbles. If you need a primer on what causes modern CPUs to stall (the silly P4 in particular) then you could do a lot worse than to read the articles by Jon Stokes at ArsTechnica.

Oh, and this is just daft. Why don't we all count the cycles for a call/return instead? Or, perhaps just exactly what it costs to compare the bytes of two strings until they start to look different? You'll find the cost of setne (and probably even the prior "extra" three instructions for boolean support) is relegated to background noise.

Let's face it: int is likely used instead of bool for historical reasons; probably just an artifact left over from pre-80386 days. Would be nice to get that codegen cleaned up ~ especially since it was W who claimed the reasons were performance related. Hacking the high-level code with int vs boolean, just to reflect some archaic machine instruction, is one of those things that come under the umbrella of "premature optimization".