September 22, 2010 [Issue 4717] std.bitmanip.BitArray changes | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=4717 --- Comment #10 from Don <clugdbug@yahoo.com.au> 2010-09-22 13:10:44 PDT --- (In reply to comment #9) > See also: > http://www.strchr.com/crc32_popcnt > http://wm.ite.pl/articles/sse-popcount.html Yes, I saw those. I made a simple 256-entry table lookup implementation (below, not optimised for size) which runs at 5 cycles for 4 bytes. It'd be painful to beat that for general-purpose 32 bit code (because AMD 32bit processors don't support SSE2). Cache misses will kill you, though, unless the array is quite long. I include my code here anyway, for future reference. For 64 bits, SWAR on SSE2 is a clear winner. ---------- const(uint[256]) makepopcountlookup(){ uint [256] result; for (int i = 0; i<= 0xFF; ++i) { result[i] = (i&1) + ((i&2)>>1) + ((i&4)>>2) + ((i&8)>>3) + ((i&16)>>4) + ((i&32)>>5) + ((i&64)>>6) + ((i&128)>>7); } return result; } __gshared uint[256] POPCOUNT_LOOKUP_TABLE = makepopcountlookup(); /* A lookup table is normally a bad way to do popcount since it risks a cache miss. But 1K table is not so terrible, and we're dealing with a large source array. The address of the lookup table is passed as a parameter to avoid PIC problems. */ int popcountArray(uint[] src, uint *lookuptable = &POPCOUNT_LOOKUP_TABLE[0]) { enum { LASTPARAM = 4*4 } // 3* pushes + return address. // TIMING: Core2: 12uops, 5.0 cycles/uint // It's entirely limited by the 5 loads. asm { naked; push ESI; push EDI; push EBX; mov EDI, EAX; // EDI = lookup table. mov ECX, [ESP + LASTPARAM + 0*4]; // src.length; mov ESI, [ESP + LASTPARAM + 1*4]; // src.ptr xor EAX, EAX; lea ESI, [ESI + 4*ECX]; // ESI = end of src neg ECX; // count UP to zero. mov EBX, [ESI + 4*ECX]; xor EDX, EDX; add ECX, 1; jz onlyone; L1: add EAX, [EDI + EDX * 4]; movzx EDX, BL; add EAX, [EDI + EDX * 4]; movzx EDX, BH; shr EBX, 16; add EAX, [EDI + EDX * 4]; movzx EDX, BH; add EAX, [EDI + EDX * 4]; movzx EDX, BL; mov EBX, [ESI + 4*ECX]; add ECX, 1; jnz L1; onlyone: add EAX, [EDI + EDX * 4]; movzx EDX, BL; add EAX, [EDI + EDX * 4]; movzx EDX, BH; shr EBX, 16; add EAX, [EDI + EDX * 4]; movzx EDX, BH; add EAX, [EDI + EDX * 4]; movzx EDX, BL; add EAX, [EDI + EDX * 4]; pop EBX; pop EDI; pop ESI; ret 2*4; } } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation