[Issue 4717] New: std.bitmanip.BitArray changes

Aug 24, 2010

bearophile_hugs@eml.cc

Aug 24, 2010

bearophile_hugs@eml.cc

Aug 24, 2010

Don

Aug 24, 2010

bearophile_hugs@eml.cc

Aug 24, 2010

Don

Aug 24, 2010

bearophile_hugs@eml.cc

Aug 24, 2010

bearophile_hugs@eml.cc

Aug 25, 2010

Don

Aug 26, 2010

bearophile_hugs@eml.cc

Sep 22, 2010

bearophile_hugs@eml.cc

Sep 22, 2010

Don

August 24, 2010

[Issue 4717] New: std.bitmanip.BitArray changes

Posted by bearophile_hugs@eml.cc

Permalink

bearophile_hugs@eml.cc

Permalink

http://d.puremagic.com/issues/show_bug.cgi?id=4717

           Summary: std.bitmanip.BitArray changes
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2010-08-23 17:26:32 PDT ---
The method sort() of std.bitmanip.BitArray doesn't look so useful, and it may
be removed.

On the other hand there is some very commonly useful functionality that it is
missing in BitArray:
1) b[] = 0; and b[] = 1; to set and reset the whole array quickly, this is a
very common need.
2) countSet(): returns the number of bits set inside the bit array.
3) flip(n): to invert the state of the n-th bit of the bit array.
4) set(n): to set (to 1) the n-th bit of the bit array.
5) reset(n): to reset (set to 0) the n-th bit of the bit array.
6) flipAll(): to invert the state of all bits of the bit array.
7) toSting(): that converts the bit array into a string like
"BitArray(\"0101010011001\")".
8) this() (constructor) method that builds a bit array from a string like
"0101010011001", it's the opposite of the toString().

Optionally:
9) Basic Range interface for the BitArray, so you may use map() on it.
10) firstSet(): returns the index of the first bit that is set, starting the
search from the less significant bit. This is for more specialized usage, like
some heaps.


Notes:
- The count() is also known known as Population or Hamming weight. This is
useful for Hamming distances, to count bits in many situations, like for
example for the Sieve of Eratosthenes. There are ways and refined algorithms to
speed up this operation a lot. And this is a very commonly useful operation. I
may offer some D code if you want. See also:
http://en.wikipedia.org/wiki/Hamming_weight
http://graphics.stanford.edu/~seander/bithacks.html
And see also the __builtin_popcount() built-in function of GCC.
- The flip(n), set(n) and reset(n) methods are useful because they may be made
more efficient than opIndexAssign().
- Regarding firstSet(), see also the __builtin_ffs() built-in function of GCC.
- Methods like opXorAssign() probably need to be converted to the new operator
overloading of D2.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #1 from bearophile_hugs@eml.cc 2010-08-23 17:38:53 PDT --- As alternative flipAll() may be named flip() (with no arguments). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 Don <clugdbug@yahoo.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |clugdbug@yahoo.com.au --- Comment #2 from Don <clugdbug@yahoo.com.au> 2010-08-24 00:33:44 PDT --- (In reply to comment #0) > - The count() is also known known as Population or Hamming weight. This is > useful for Hamming distances, to count bits in many situations, like for > example for the Sieve of Eratosthenes. There are ways and refined algorithms to > speed up this operation a lot. And this is a very commonly useful operation. I > may offer some D code if you want. See also: > http://en.wikipedia.org/wiki/Hamming_weight > http://graphics.stanford.edu/~seander/bithacks.html > And see also the __builtin_popcount() built-in function of GCC. Curious fact: the built-in popcount instruction isn't much use for bit arrays. It's great for 64 bit longs (especially for chess programs!) but once you have a dozen machine words or more, it's faster to add the bits sideways. An interesting consequence of this is that Intel/AMD's new popcount instruction is hardly ever useful... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #3 from bearophile_hugs@eml.cc 2010-08-24 03:57:46 PDT --- Answer to Comment 2: The code in the bithacks site I have given URL of probably is what you were talking about. But then there are refined algorithms to use the basic code shown in bithacks, that becomes useful as the bit array gets a little larger. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #4 from Don <clugdbug@yahoo.com.au> 2010-08-24 05:55:05 PDT --- (In reply to comment #3) > Answer to Comment 2: > The code in the bithacks site I have given URL of probably is what you were > talking about. > But then there are refined algorithms to use the basic code shown in bithacks, > that becomes useful as the bit array gets a little larger. No, that's not what I meant at all. The parallel adding I'm referring to does not involve any shifts. You basically implement a half adder. Given 2 words a, b the low bit of the sum is a^b, and the high bit is a&b. And with 3 words a, b, c, the low bit of the sum is a^b^c and the high word is (a&b)|((a^b)&c). The popcount is popcount(lo word) + 2* popcount(high word). So what you do is pass through the array in pairs, grabbing the values a, b. You accumulate popcount p += 2*popcount((a&b)|((a^b)&c)). calculate a new carry c = a^b^c. Then you add p+=popcount(c); at the end. In this way, you've dealt with two words, but only done one single-word popcount. In practice, you don't just use pairs, you grab 8 or 16 values at a time, and keep a 3 or 4 bit sum. You only have to perform one single-word popcount for every 8 or 16 words. You need to do a lot of logical operations, but they pipeline quite well. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #5 from bearophile_hugs@eml.cc 2010-08-24 06:31:52 PDT --- I see, I think you are talking about using a SWAR approach then. I have never used it for this job, but it sounds intersting. I'd like to do some benchmarks to see what the most efficient solution is among those two. It looks like a simple problem, but has a surprisingly high number of interesting solutions. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #6 from bearophile_hugs@eml.cc 2010-08-24 14:02:31 PDT --- For efficiency on 64 bit systems too you may change this code from the BitArray struct: struct BitArray { size_t len; uint* ptr; ... void init(void[] v, size_t numbits) in { assert(numbits <= v.length * 8); assert((v.length & 3) == 0); } Into: struct BitArray { size_t len; size_t* ptr; // changed here ... void init(void[] v, size_t numbits) in { assert(numbits <= v.length * 8); assert(v.length % size_t.sizeof == 0); // changed here } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #7 from Don <clugdbug@yahoo.com.au> 2010-08-25 01:47:59 PDT --- (In reply to comment #5) > I see, I think you are talking about using a SWAR approach then. I have never > used it for this job, but it sounds intersting. I'd like to do some benchmarks > to see what the most efficient solution is among those two. > It looks like a simple problem, but has a surprisingly high number of > interesting solutions. Found the link: http://www.icis.ntu.edu.sg/scs-ijit/91/91-1.pdf -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #8 from bearophile_hugs@eml.cc 2010-08-26 07:56:20 PDT --- See also bug 4124 and bug 4123 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #9 from bearophile_hugs@eml.cc 2010-09-22 12:24:24 PDT --- See also: http://www.strchr.com/crc32_popcnt http://wm.ite.pl/articles/sse-popcount.html -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------

Forums