April 08, 2015
On Wednesday, 8 April 2015 at 13:28:16 UTC, David Nadlinger wrote:
> On 8 Apr 2015, at 15:15, Daniel Murphy via digitalmars-d-ldc wrote:
>> I don't think it's so much about vectorizing as it is about avoiding the x87 FPU, which you can do when 80-bit precision is not needed.
>
> Indeed. On x86_64, the SSE registers (%xmm0 and so on) are used by default for single- and double-precision floating point operations. The x87 FPU is not particularly well-optimized on newer CPUs to begin with, and transferring data from the SSE registers to the FPU on function entry and then back again is quite costly too.
>
> For example, this is what made us (all D compilers) look bad on that Perlin noise microbenchmark (the thread from a couple of months ago).

Ah, ok. Didn't realize.

For future reference:
http://gruntthepeon.free.fr/ssemath

April 12, 2015
I wrote a non-asm ilogb, that actually runs quite a bit faster than what DMD or LDC do standard, and should also be much more portable.
See
https://github.com/D-Programming-Language/phobos/pull/3186
April 13, 2015
On Sunday, 12 April 2015 at 22:35:20 UTC, Johan Engelen wrote:
> I wrote a non-asm ilogb, that actually runs quite a bit faster than what DMD or LDC do standard, and should also be much more portable.
> See
> https://github.com/D-Programming-Language/phobos/pull/3186

Nice! For LDC you can replace bsr with intrinsic llvm.ctlz.i#. It is a template and also CTFE enabled.

Regards,
Kai
Next ›   Last »
1 2