"Faster than Rust and C++: the PERFECT hash table"

Dec 11

Christopher Katko

Dec 11

Siarhei Siamashka

Dec 11

Dec 11

Dec 11

Dec 11

Dec 11

Jan 05

December 11

Re: "Faster than Rust and C++: the PERFECT hash table"

Posted by Christopher Katko
in reply to Siarhei Siamashka

Permalink

Christopher Katko

Posted in reply to Siarhei Siamashka

Permalink

If the whole set of the string identifiers to be parsed is known at compile time, then another approach for mapping them to numbers or whatever else is to use something like https://re2c.org

Holy crap! I was thinking about this kind of thing a few months ago. My situation is this:

I have textures (filename = string texturename) listed in a manfiest.json file.
all textures are loaded and placed into an texture[string] associative array
they're used in code that ends up being like drawBitmap(bitmaps["grass"], ...)

It lets me immediately add a new texture, add one line to the manifest, and I can call it from my code. (I could do filename = stringname automatically but that doesn't let me do things like pack textures into atlases to reduce texture swaps so that kind of automatic filename scanning isn't applicable.)

Noting that:

technically, a finite list of all potentially used texture names in code should be knowable to the compiler.
the list does not change at run time.

I had thought about trying some sort of D-based compile-time, or, manual tool-based lexing to build say, an enum, and replacing all those names from "grass01" to GRASS01=17.

But I was also wondering if simply getting a faster container would improve things. Because one flaw there is if I ever support mods, then I do not at compile time know the total maximum set of texture name entries.

Still, once all my texture entries are received they don't change after loading (even if I load-in assets based on scene requirements, the names will never swap/reorder. Even if grass01 texture isn't loaded, it can still have the name grass01 reserved. There are no key deletions or changes once set. So grass01 could simply be a null pointer until loaded.)

Awhile back, during some profiling, D Associative Arrays were way higher up in total CPU usage in my game than they should have been compared to the act of drawing entire textures. Like in the top 7. BUT, as I'm thinking right now, I'm wondering if now the profiling measuring code itself was inflating those AA accesses (a tiny real code compared to overhead of logging code).

On Monday, 11 December 2023 at 12:37:56 UTC, Siarhei Siamashka wrote:

On Monday, 11 December 2023 at 11:36:10 UTC, Basile.B wrote:

To fork on that subject... I was very "gperf-like-perfecthash-friendly" for years but nowadays I use lookuptables. Very fast too. You have Trie or suffix arrays that are very fast too but they tend not to be memory friendly.

If the whole set of the string identifiers to be parsed is known at compile time, then another approach for mapping them to numbers or whatever else is to use something like https://re2c.org

Sorry for the inaccuracy. This is part of what a compiler uses to generate the code to switch over a string. This is not used at run-time.

Generated code is more like: https://godbolt.org/z/Y54bKeWYe

What is used at run-time is: https://gitlab.com/styx-lang/styx/-/blob/master/library/rtl.sx?ref_type=heads#L426

In fine you have decimated a string into a unique number and you just take the path of a normal switch. The decimation is not very different of hashing. Just it's guaranteed to give unique indexes for a set of unique strings.

I believe D uses a different strategy for that. Binary search IIRC. not quite sure.

Anyway just to clarify.

Forums