Reducing the cost of autodecoding (page 2)

On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: > > I think we should define two aliases "likely" and "unlikely" with default implementations: > > bool likely(bool b) { return b; } > bool unlikely(bool b) { return b; } > > They'd go in druntime. Then implementers can hook them into their intrinsics. > > Works? > > > Andrei I was about to suggest the same. I can prepare a PR.

On Wednesday, 12 October 2016 at 23:59:15 UTC, Stefan Koch wrote: > On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: >> >> I think we should define two aliases "likely" and "unlikely" with default implementations: >> >> bool likely(bool b) { return b; } >> bool unlikely(bool b) { return b; } >> >> They'd go in druntime. Then implementers can hook them into their intrinsics. >> >> Works? >> >> >> Andrei > > I was about to suggest the same. > I can prepare a PR. We should probably introduce a new module for stuff like this. object.d is already filled with too much unrelated things.

On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: > > Wait, so going through the bytes made almost no difference? Or did you subtract the overhead already? > It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.)

On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: > > It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.) Measurements without -mcpu=native: overhead 0.336s bytes 0.610s without branch hints 0.852s code pasted 0.766s

On 10/12/2016 08:11 PM, Stefan Koch wrote: > We should probably introduce a new module for stuff like this. > object.d is already filled with too much unrelated things. Yah, shouldn't go in object.d as it's fairly niche. On the other hand defining a new module for two functions seems excessive unless we have a good theme. On the third hand we may find an existing module that's topically close. Thoughts? -- Andrei

On 10/12/2016 08:41 PM, safety0ff wrote: > On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: >> >> It made little difference: LDC compiled into AVX2 vectorized addition >> (vpmovzxbq & vpaddq.) > > Measurements without -mcpu=native: > overhead 0.336s > bytes 0.610s > without branch hints 0.852s > code pasted 0.766s So we should be able to reduce overhead by means of proper code arrangement and interplay of inlining and outlining. The prize, however, would be to get the AVX instructions for ASCII going. Is that possible? -- Andrei

On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu wrote: > On 10/12/2016 08:11 PM, Stefan Koch wrote: >> We should probably introduce a new module for stuff like this. >> object.d is already filled with too much unrelated things. > > Yah, shouldn't go in object.d as it's fairly niche. On the other hand defining a new module for two functions seems excessive unless we have a good theme. On the third hand we may find an existing module that's topically close. Thoughts? -- Andrei maybe core.intrinsics ? or code.codelayout ? We can control the layout at object-file level. We should be able to expose some of that functionality.

On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote: > On 10/12/2016 08:41 PM, safety0ff wrote: >> On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: >>> >>> It made little difference: LDC compiled into AVX2 vectorized addition >>> (vpmovzxbq & vpaddq.) >> >> Measurements without -mcpu=native: >> overhead 0.336s >> bytes 0.610s >> without branch hints 0.852s >> code pasted 0.766s > > So we should be able to reduce overhead by means of proper code arrangement and interplay of inlining and outlining. The prize, however, would be to get the AVX instructions for ASCII going. Is that possible? -- Andrei AVX for ascii ? What are you referring to ? Most text processing is terribly incompatible with simd. sse 4.2 has a few instructions that do help, but as far as I am aware it is not yet too far spread.

On 10/12/2016 09:35 PM, Stefan Koch wrote: > On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote: >> On 10/12/2016 08:41 PM, safety0ff wrote: >>> On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: >>>> >>>> It made little difference: LDC compiled into AVX2 vectorized addition >>>> (vpmovzxbq & vpaddq.) >>> >>> Measurements without -mcpu=native: >>> overhead 0.336s >>> bytes 0.610s >>> without branch hints 0.852s >>> code pasted 0.766s >> >> So we should be able to reduce overhead by means of proper code >> arrangement and interplay of inlining and outlining. The prize, >> however, would be to get the AVX instructions for ASCII going. Is that >> possible? -- Andrei > > AVX for ascii ? > What are you referring to ? > Most text processing is terribly incompatible with simd. > sse 4.2 has a few instructions that do help, but as far as I am aware it > is not yet too far spread. Oh ok, so it's that checksum in particular that got optimized. Bad benchmark! Bad! -- Andrei

October 13, 2016

Re: Reducing the cost of autodecoding

Posted by Johan Engelen
in reply to Andrei Alexandrescu

Permalink

Johan Engelen

Posted in reply to Andrei Alexandrescu

Permalink

On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu wrote:
> On 10/12/2016 08:11 PM, Stefan Koch wrote:
>> We should probably introduce a new module for stuff like this.
>> object.d is already filled with too much unrelated things.
>
> Yah, shouldn't go in object.d as it's fairly niche. On the other hand defining a new module for two functions seems excessive unless we have a good theme. On the third hand we may find an existing module that's topically close. Thoughts? -- Andrei

There could be some kind of "expect" theme, or a microoptimization theme. Functions that have no observable effects and that provide hints for optimization (possibly compiler-dependent implementations of those functions).

Besides providing the expected value of an expression, other "expect"/microopt functionality is checking explicitly for function pointer values (to inline a likely function), that I wrote about in [1]:
```
/// Calls `fptr(args)`, optimize for the case when fptr points to Likely().
pragma(inline, true)
auto is_likely(alias Likely, Fptr, Args...)(Fptr fptr, Args args) {
    return (fptr == &Likely) ? Likely(args) : fptr(args);
}
// ...
void function() fptr = get_function_ptr();
fptr.is_likely!likely_function();
```

A similar function can be made for "expecting" a class type for virtual calls [1].

Other microopt thingies that come to mind are:
- cache prefetching
- function attributes for hot/cold functions

cheers,
  Johan


[1] https://johanengelen.github.io/ldc/2016/04/13/PGO-in-LDC-virtual-calls.html

Forums