June 08, 2020
On Saturday, 6 June 2020 at 04:19:44 UTC, sebasg wrote:
> On Friday, 5 June 2020 at 19:12:43 UTC, ttk wrote:
>> Source is here .. a bit large, since I manually unrolled loops 100x to minimize the impact of looping logic (particularly the now() call):
>>
>> http://ciar.org/h/popcount.d
>
> Jesus. Shouldn't you be able to generate that instead of
> copy-pasting? It's D, after all.

A fair point.  Duplicating it in emacs was literally five keystrokes, but I should rewrite it to use templates anyway (more excuse to fiddle with D!).

Also, I figured out why the 16-bit lookup wasn't more closely 2.0x the performance of the 8-bit lookup.  The L1 cache is 192KB, but that's divided across its six cores, so this single-threaded program only got 32KB of L1 to play with.  The 8-bit lookup table fit in that, but only half of the 16-bit lookup table did.

June 08, 2020
On Mon, Jun 08, 2020 at 06:08:55PM +0000, ttk via Digitalmars-d wrote:
> On Saturday, 6 June 2020 at 04:19:44 UTC, sebasg wrote:
[...]
> > Jesus. Shouldn't you be able to generate that instead of copy-pasting? It's D, after all.
> 
> A fair point.  Duplicating it in emacs was literally five keystrokes, but I should rewrite it to use templates anyway (more excuse to fiddle with D!).

No templates needed, just use static foreach. ;-)  (See my other reply
to your code.)


> Also, I figured out why the 16-bit lookup wasn't more closely 2.0x the performance of the 8-bit lookup.  The L1 cache is 192KB, but that's divided across its six cores, so this single-threaded program only got 32KB of L1 to play with.  The 8-bit lookup table fit in that, but only half of the 16-bit lookup table did.

I think your results are skewed; your copy-n-pasted block of count repeatedly overwrites `count` without assigning its value anywhere else, so the optimizer deleted all except the last block of code!  See my other reply.


T

-- 
Tech-savvy: euphemism for nerdy.
1 2
Next ›   Last »