Thread overview
Spec for the ‘locality’ parameter to the LDC and GDC builtin magic functions for accessing special CPU prefetch instructions
Aug 19, 2023
Cecil Ward
Aug 20, 2023
Iain Buclaw
Aug 22, 2023
Guillaume Piolat
August 19, 2023
I’m trying to write a cross-platform function that gives access to the CPU’s prefetch instructions such as x86 prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the GDC and LDC compilers provide builtin magic functions for this, and are what I need. I am trying to put together a plain-English detailed spec for the respective builtin magic functions.

My questions:

Q1) I need to compare the spec for the GCC and LDC builtin magic functions’ "locality" parameter. Can anyone tell me if GDC and LDC have kept mutual compatibility here?

Q2) Could someone help me turn the GCC and LDC specs into english regarding the locality parameter ? - see (2) and (4) below.

Q3) Does the locality parameter determine which _level_ of the data cache hierarchy data is fetched into? Or is it always fetched into L1 data cache and the outer ones, and this parameter affects caches’ _future behaviour_?

Q3) Will these magic builtins work on AAarch64?

Here’s what I’ve found so far

1. GCC builtin published by the D runtime:
    import gcc.simd : prefetch;
	    	prefetch!( rw, locality )( p );

   2. GCC: builtin_prefetch (const void *addr, ...) ¶
“This function is used to minimize cache-miss latency by moving data into a cache before it is accessed. You can insert calls to __builtin_prefetch into code for which you know addresses of data in memory that is likely to be accessed soon. If the target supports them, data prefetch instructions are generated. If the prefetch is done early enough before the access then the data will be in the cache by the time it is accessed.
The value of addr is the address of the memory to prefetch. There are two optional arguments, rw and locality. The value of rw is a compile-time constant one or zero; one means that the prefetch is preparing for a write to the memory address and zero, the default, means that the prefetch is preparing for a read. The value locality must be a compile-time constant integer between zero and three. A value of zero means that the data has no temporal locality, so it need not be left in the cache after the access. A value of three means that the data has a high degree of temporal locality and should be left in all levels of cache possible. Values of one and two mean, respectively, a low or moderate degree of temporal locality. The default is three.”

3. declare void @llvm.prefetch(ptr <address>, i32 <rw>, i32 <locality>, i32 <cache type>

4. Regarding llvm.prefetch() I found the following spec:
“rw is the specifier determining if the fetch should be for a read (0) or write (1), and locality is a temporal locality specifier ranging from (0) - no locality, to (3) - extremely local keep in cache. The cache type specifies whether the prefetch is performed on the data (1) or instruction (0) cache. The rw, locality and cache type arguments must be constant integers.”

5. I also found this snippet https://dlang.org/phobos/core_builtins.html - which is great for the syntax of the call to the LDC builtin, but the call for GDC is no good as it lacks the parameters that I want. This D runtime routine might benefit from accepting all the parameters that GCC’s prefetch builtin takes.

Many thanks in advance.

August 20, 2023

On Saturday, 19 August 2023 at 19:23:38 UTC, Cecil Ward wrote:

>

I’m trying to write a cross-platform function that gives access to the CPU’s prefetch instructions such as x86 prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the GDC and LDC compilers provide builtin magic functions for this, and are what I need. I am trying to put together a plain-English detailed spec for the respective builtin magic functions.

My questions:

Q1) I need to compare the spec for the GCC and LDC builtin magic functions’ "locality" parameter. Can anyone tell me if GDC and LDC have kept mutual compatibility here?

I'd have thought GCC and LLVM have mutual compatibility thanks to a common target API in Intel's _mm_prefetch() function (and in fact, the magic locality numbers match _MM_HINT_* constants).

#define _MM_HINT_T0 1
#define _MM_HINT_T1 2
#define _MM_HINT_T2 3
#define _MM_HINT_NTA 0
>

Q2) Could someone help me turn the GCC and LDC specs into english regarding the locality parameter ? - see (2) and (4) below.

https://gcc.gnu.org/projects/prefetch.html

>

Q3) Does the locality parameter determine which level of the data cache hierarchy data is fetched into? Or is it always fetched into L1 data cache and the outer ones, and this parameter affects caches’ future behaviour?

It really depends on the CPU, and what features it has.

x86 SSE intrinsics are described in the x86 instruction manual, along with the meaning of T[012], and NTA.

https://www.felixcloutier.com/x86/prefetchh

>

Q3) Will these magic builtins work on AAarch64?

It'll work on all targets that define a prefetch insn, or it'll be a no-op. Similarly one or both read-write or locality arguments might be ignored too.

August 22, 2023
On Saturday, 19 August 2023 at 19:23:38 UTC, Cecil Ward wrote:
>
> I’m trying to write a cross-platform function that gives access to the CPU’s prefetch instructions such as x86 prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the GDC and LDC compilers provide builtin magic functions for this, and are what I need. I am trying to put together a plain-English detailed spec for the respective builtin magic functions.
>

Have you found that?

https://github.com/AuburnSounds/intel-intrinsics/blob/002da84215a58f098cee671c5ba4ab6052613865/source/inteli/xmmintrin.d#L1935C9-L1935C9