Thread overview
Objection: bsf() return value
May 15, 2003
Georg Wrede
May 16, 2003
Georg Wrede
May 19, 2003
Walter
May 19, 2003
Georg Wrede
May 19, 2003
Walter
May 20, 2003
Georg Wrede
May 20, 2003
Walter
May 21, 2003
Serge K
May 15, 2003
The spec says:


>  int bsf(uint v)

>  Scans the bits in v starting with bit 0, looking for the first set bit.

>  int bsr(uint v)

>  Scans the bits in v from the most significant bit to the least
>  significant bit, looking for the first set bit.

Both return the bit number of the first set bit. The return value is undefined if v is zero.


Am I the only one who finds this unnerving? Would it not be more appropriate if the return value were -1 if v is zero?

This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too.

Maybe it is for performance reasons? Yes, bsr is faster this way, but the test itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in advance for zero value would not be the wiser.

Bonus: this change would not break existing code. And it works the same with any practical size v.


May 16, 2003
It sounds like I'm talking about bsr only, but of course this goes for both.

In article <ba0fr5$1d2d$1@digitaldaemon.com>, Georg Wrede says...
>
>The spec says:
>
>
>>  int bsf(uint v)
>
>>  Scans the bits in v starting with bit 0, looking for the first set bit.
>
>>  int bsr(uint v)
>
>>  Scans the bits in v from the most significant bit to the least
>>  significant bit, looking for the first set bit.
>
>Both return the bit number of the first set bit. The return value is undefined if v is zero.
>
>
>Am I the only one who finds this unnerving? Would it not be more appropriate if the return value were -1 if v is zero?
>
>This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too.
>
>Maybe it is for performance reasons? Yes, bsr is faster this way, but the test itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in advance for zero value would not be the wiser.
>
>Bonus: this change would not break existing code. And it works the same with any practical size v.
>
>


May 19, 2003
bsr and bsf are intended as the simplest possible shell around the x86 bsr and bsf opcodes. The compiler recognizes those functions and they're well integrated into the optimizer and code generator. This makes them great as building blocks for more advanced operations.

"Georg Wrede" <Georg_member@pathlink.com> wrote in message news:ba0fr5$1d2d$1@digitaldaemon.com...
> The spec says:
>
>
> >  int bsf(uint v)
>
> >  Scans the bits in v starting with bit 0, looking for the first set bit.
>
> >  int bsr(uint v)
>
> >  Scans the bits in v from the most significant bit to the least
> >  significant bit, looking for the first set bit.
>
> Both return the bit number of the first set bit. The return value is
undefined
> if v is zero.
>
>
> Am I the only one who finds this unnerving? Would it not be more
appropriate if
> the return value were -1 if v is zero?
>
> This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too.
>
> Maybe it is for performance reasons? Yes, bsr is faster this way, but the
test
> itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in
advance for
> zero value would not be the wiser.
>
> Bonus: this change would not break existing code. And it works the same
with any
> practical size v.
>
>


May 19, 2003
I've thought about this, and I've given up.  The original thought went something like:


If speed here is really, really important for the programmer, then he'd probably code the parts of his routine in assembler anyhow. On the other hand, usually bsr would be used when examining data where you are not sure of which bits are set. In normal circumstances this might include the uint being entirely zero, for all he knows. In most cases where he has at least some idea of the bits, he'd probably test for individual bits instead.

Using bsr then is restricted to cases where you can guarantee that the uint is not entirely zero. This can be achieved either by testing for zero in D code, or by making the algorithm robust enough to never, ever encounter the zero case. The former case is bound to be way slower than having bsr return -1 and testing in D for -1 only after testing for the "valid" cases.

The latter case _requires_ the D code to be solid, which might be perfectly acceptable in a language like C or C++ (offence intended against them). D, however seems to avoid unnecessary "catches". (For which we all love Walter, right?)

Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have no strong opinion whether the added clocks for testing for zero and returning -1 would be inappropriate here.

Also, since these are "Intrinsics", the average user is implicitly discouraged from using them?

Of course, if we leave bsr as-is, the compiler might be made to recogize and optimise idioms like

if (v) {
a = bsr(v);
if (a < b) {
// do something
else if (a > c) {
// do something
}
} else {
throw new Error("Who stole all our bits?");
}


In hindsight, it seems to me the above code already compiles to something just as fast as what could be achieved by changig the bsr function. So even making a non-intrinsic bsr that does the -1 thing would be of no value.



In article <ba9vj3$ee0$1@digitaldaemon.com>, Walter says...
>
>bsr and bsf are intended as the simplest possible shell around the x86 bsr and bsf opcodes. The compiler recognizes those functions and they're well integrated into the optimizer and code generator. This makes them great as building blocks for more advanced operations.
>
>"Georg Wrede" <Georg_member@pathlink.com> wrote in message news:ba0fr5$1d2d$1@digitaldaemon.com...
>> The spec says:
>>
>>
>> >  int bsf(uint v)
>>
>> >  Scans the bits in v starting with bit 0, looking for the first set bit.
>>
>> >  int bsr(uint v)
>>
>> >  Scans the bits in v from the most significant bit to the least
>> >  significant bit, looking for the first set bit.
>>
>> Both return the bit number of the first set bit. The return value is
>undefined
>> if v is zero.
>>
>>
>> Am I the only one who finds this unnerving? Would it not be more
>appropriate if
>> the return value were -1 if v is zero?
>>
>> This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too.
>>
>> Maybe it is for performance reasons? Yes, bsr is faster this way, but the
>test
>> itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in
>advance for
>> zero value would not be the wiser.
>>
>> Bonus: this change would not break existing code. And it works the same
>with any
>> practical size v.
>>
>>
>
>


May 19, 2003
In general, you are right. However, the bsr and bsf intrinsics came about because I use bit vectors extensively in the compiler's optimizer implementation. The functions in it were large and complex, and so were not good candidates for recoding in assembler. Having direct access to bsr and bsf got about a 10% improvement in compiler optimization speed.

"Georg Wrede" <Georg_member@pathlink.com> wrote in message news:baae7r$smk$1@digitaldaemon.com...
>
> I've thought about this, and I've given up.  The original thought went
something
> like:
>
>
> If speed here is really, really important for the programmer, then he'd
probably
> code the parts of his routine in assembler anyhow. On the other hand,
usually
> bsr would be used when examining data where you are not sure of which bits
are
> set. In normal circumstances this might include the uint being entirely
zero,
> for all he knows. In most cases where he has at least some idea of the
bits,
> he'd probably test for individual bits instead.
>
> Using bsr then is restricted to cases where you can guarantee that the
uint is
> not entirely zero. This can be achieved either by testing for zero in D
code, or
> by making the algorithm robust enough to never, ever encounter the zero
case.
> The former case is bound to be way slower than having bsr return -1 and
testing
> in D for -1 only after testing for the "valid" cases.
>
> The latter case _requires_ the D code to be solid, which might be
perfectly
> acceptable in a language like C or C++ (offence intended against them). D,
> however seems to avoid unnecessary "catches". (For which we all love
Walter,
> right?)
>
> Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have no
strong
> opinion whether the added clocks for testing for zero and returning -1
would be
> inappropriate here.
>
> Also, since these are "Intrinsics", the average user is implicitly
discouraged
> from using them?
>
> Of course, if we leave bsr as-is, the compiler might be made to recogize
and
> optimise idioms like
>
> if (v) {
> a = bsr(v);
> if (a < b) {
> // do something
> else if (a > c) {
> // do something
> }
> } else {
> throw new Error("Who stole all our bits?");
> }
>
>
> In hindsight, it seems to me the above code already compiles to something
just
> as fast as what could be achieved by changig the bsr function. So even
making a
> non-intrinsic bsr that does the -1 thing would be of no value.


May 20, 2003
In article <bab492$1k6u$1@digitaldaemon.com>, Walter says...
>
>In general, you are right. However, the bsr and bsf intrinsics came about because I use bit vectors extensively in the compiler's optimizer implementation. The functions in it were large and complex, and so were not good candidates for recoding in assembler. Having direct access to bsr and bsf got about a 10% improvement in compiler optimization speed.

Ahah. Thanks, I seem to learn a lot here!

This means that when one has a bitfield representing boolean values, say for whether to execute some from a set of subroutines, one gets to test for several falses at one go with bsf -- if several of them are zero you can skip testing those individually.

Knowing (or, rather suspecting) that one day they might all be false, all one has to do is reserve one bit that always stays true, i.e. is never actually used. This lets one skip the zero test, which saves a lot of time in a tight loop.

Careful desicions about which bit represents which action seems key here. Say one uses the msb for always-true, then the most likely falses should be at the other end, near lsb. But even this is not enough, since in real life the things the bits represent are not genuinely mutually independent. This means that if bit X is true, then the probabilities for the other bits being true are different than if X is false. This affects ordering decisions, especially if bsf is going to be used more than once on the bit set. This might be useful when a specific bit being true implies lowered probability for several of the following to be false.

A rigorous theoretical analysis would of course be required here, but in real life one seldom has this option. Then the answer is "profiling, profiling, profiling!" But before that, dumping the bitset in a log from several test runs could give invaluable statistical data about probabilities and correlations, which will make the order decisions made between profiling runs more profitable.

At this time it is also possible to change the meaning of the bits. Sure, true should always mean yes, but here it would be worthwhile to use false to mean yes for some bits to further increase the number of bits skipped by bsf.

The yes-no meaning and the ordering of the bits can even be changed at runtime. All one needs is a reshuffle (maybe with a lookup table), and an xor for the changing of yes-no for some bits. A matching inline copy of the branch logic is then used instead of the original.

I wonder if these optimizations are the very reason bsr and bsf exist in CPUs?

---

Why am I littering this important mailgroup with these ruminations? Well, I can assume some readers have not thought about this, maybe since they have a non-C(++) or non-asm bacground. D is especially attractive to those who have avoided C(++), but still have always wanted to use a Real programming language. Sharing is gold.

D makes it easy and straightforward to use this technique to gain execution speed.



May 20, 2003
"Georg Wrede" <Georg_member@pathlink.com> wrote in message news:badevg$2mb0$1@digitaldaemon.com...
> In article <bab492$1k6u$1@digitaldaemon.com>, Walter says...
> >
> >In general, you are right. However, the bsr and bsf intrinsics came about because I use bit vectors extensively in the compiler's optimizer implementation. The functions in it were large and complex, and so were
not
> >good candidates for recoding in assembler. Having direct access to bsr
and
> >bsf got about a 10% improvement in compiler optimization speed.
>
> Ahah. Thanks, I seem to learn a lot here!
>
> This means that when one has a bitfield representing boolean values, say
for
> whether to execute some from a set of subroutines, one gets to test for
several
> falses at one go with bsf -- if several of them are zero you can skip
testing
> those individually.

What I was doing was looking for any bits set in a very large bit vector. There were two loops, the first  did a scasd to find a non-zero 32 bit value, the inner was replaced with a bsf. So the bsf was only run on non-zero values.


May 21, 2003
> Of course, if we leave bsr as-is, the compiler might be made to recogize
and
> optimise idioms like
>
> if (v) {
> a = bsr(v);
> if (a < b) {
> // do something
> else if (a > c) {
> // do something
> }
> } else {
> throw new Error("Who stole all our bits?");
> }
>
> In hindsight, it seems to me the above code already compiles to something
just
> as fast as what could be achieved by changig the bsr function. So even
making a
> non-intrinsic bsr that does the -1 thing would be of no value.

Actually, both BSR & BSF are conditional operations, in case of a zero input
they do nothing.
It's quite easy to make them return some predefined values in such case.

Here is how I do that (in MSVC):

inline int bsf( int x )
{
  __asm  mov  eax, 32 // if(!x) return 32;
  __asm  bsf  eax, x
}

inline int bsr( int x )
{
  __asm  mov  eax, -1 // if(!x) return -1;
  __asm  bsr  eax, x
}