| |
| Posted by H. S. Teoh in reply to realhet | PermalinkReply |
|
H. S. Teoh
Posted in reply to realhet
| On Sat, Oct 01, 2022 at 01:20:08PM +0000, realhet via Digitalmars-d-learn wrote:
> Hello,
>
> I just wanted to optimize a byte -> index lookup, by using a 256
> element table instead of using [1, 2, 3].countUntil(x) and I was
> amazed what I've found.
> My solution lookup[x] was not faster at all, because LDC2 amazingly
> optimized the linear search to a jump table.
>
> Anyone please can tell me, how it is possible?
Because it's LDC. ;-) Or more specifically, the D front-end tells the LLVM back-end that the table size is statically-known, and the back-end pattern-matched linear searches through fixed-size tables into a jump table.
Generally, I wouldn't bother with optimizing minutiae like these unless the profiler indicates the hotspot is there. Just let the optimizer do its job. Quite often the optimizer can do a better job than a human can.
> I looked inside the countUntil() template and I see no static ifs for
> compile time optimizations.
It has nothing to do with the Phobos code, it's a pattern recognized by LDC's optimizer.
> Is it true that: The compiler notices that there is a while loop on a static compile time array and it is clever enough to solve all the calculations in compile time and generate a completely unrolled loop? Then the optimizer transforms it to the jump table and do it with "jmp rax" ?
This is quite likely the case. I've personally seen LDC optimize an entire benchmark into a single instruction, because it figured out that it could run the function at compile-time and replace the entire function body with an instruction that just loads the final value.
> If I'm right, this is not just std library magic, it works even with
> my own template functions.
> I'm also getting crazy long compile times, so this gotta be the case
> :D
[...]
Yep.
T
--
If it tastes good, it's probably bad for you.
|