How to deal with inline asm functions in Phobos/druntime? (page 2)

April 08, 2015

Re: How to deal with inline asm functions in Phobos/druntime?

Posted by Johan Engelen
in reply to David Nadlinger

Permalink

Johan Engelen

Posted in reply to David Nadlinger

Permalink

On Wednesday, 8 April 2015 at 13:28:16 UTC, David Nadlinger wrote:
> On 8 Apr 2015, at 15:15, Daniel Murphy via digitalmars-d-ldc wrote:
>> I don't think it's so much about vectorizing as it is about avoiding the x87 FPU, which you can do when 80-bit precision is not needed.
>
> Indeed. On x86_64, the SSE registers (%xmm0 and so on) are used by default for single- and double-precision floating point operations. The x87 FPU is not particularly well-optimized on newer CPUs to begin with, and transferring data from the SSE registers to the FPU on function entry and then back again is quite costly too.
>
> For example, this is what made us (all D compilers) look bad on that Perlin noise microbenchmark (the thread from a couple of months ago).

Ah, ok. Didn't realize.

For future reference:
http://gruntthepeon.free.fr/ssemath

On Sunday, 12 April 2015 at 22:35:20 UTC, Johan Engelen wrote: > I wrote a non-asm ilogb, that actually runs quite a bit faster than what DMD or LDC do standard, and should also be much more portable. > See > https://github.com/D-Programming-Language/phobos/pull/3186 Nice! For LDC you can replace bsr with intrinsic llvm.ctlz.i#. It is a template and also CTFE enabled. Regards, Kai

Forums