Thread overview
try find the fastest way to convert a group string into index?
Sep 16, 2018
learnfirst1
Sep 16, 2018
Vladimir Panteleev
Sep 16, 2018
learnfirst1
Sep 16, 2018
learnfirst1
Sep 16, 2018
learnfirst1
Sep 16, 2018
Diederik de Groot
September 16, 2018
The use case is translate http header key into enum.

this is the current code : https://run.dlang.io/is/lpw29w

In this fake code I only list a few headers,  it should be more.  but less then 128 and only include the a few codes.

how to make this more fast like with one loop and get the results.

I am think of use PCRE compile for this, but not sure how.

It should also save the non default headers in to a new array, so when search non default headers will reduce the total size.

I am plan to write a more fast http server with D betterC compare to vibed.





September 16, 2018
On Sunday, 16 September 2018 at 10:04:09 UTC, learnfirst1 wrote:
> how to make this more fast like with one loop and get the results.

This is a more general problem than any specific programming language; you may want to look into perfect hashing:

https://en.wikipedia.org/wiki/Perfect_hash_function

There exists software to generate perfect hash functions, e.g. GNU gperf:

https://www.gnu.org/software/gperf/

It looks like Basile wrote one for D:

https://github.com/BBasile/IsItThere

If you don't want to go that far, you can generate a string switch using string mixins:

void main()
{
	static immutable string[] keywords = ["foo", "bar", "baz"];
	enum Keyword { foo, bar, baz }

	Keyword parseKeyword(string s)
	{
		switch (s)
		{
			mixin({
				string code;
				foreach (keyword; keywords)
					code ~= `case "` ~ keyword ~ `": return Keyword.` ~ keyword ~ `;`;
				return code;
			}());
			default:
				throw new Exception("No such keyword!");
		}
	}
	assert(parseKeyword("bar") == Keyword.bar);
}

September 16, 2018
Homework ?

I think it would also be better to only walk/match the Header.data and header_keys only once (using the generated switch statement that Vladimir suggested). So your Headers struct should have a parse() function, which will match all of the headers to the keys and store them locally in a 'matches[Type] = string' array (aka map). Then the get() function can simply  return matches[type] without any work.

Other improvements/Bonus Points:
- Because the parse() function is only called/initialize once, you could make the matches array immutable/const if.
- Your Type enum and header_keys contain the same data, and if you are going to use Vlafimir's suggestion of using a mixin, you can stringify the enum names an use them instead. This way you don't have to keep anything  in sync.
- If these Headers are coming from a client over the internet, it might be wise to set and check some (arbitrary) bounds for header-length and content. You are currently blindly copying it and expecting it is 'clean/correct'.
- Why copy the data to Header.data, instead of just providing it by ref to the get()/parse() function. You are copying it to then later indexing into Header.data using the map(). Why not copy/move piecemeal and assign them to an indexed array after inspecting/matching them. That  could make for cleaner code and nicer API. (With two caveats: we could loose the original headers if we don't save them somewhere else and unmatched/unknown Header Types will not be copied).

September 16, 2018
On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir Panteleev wrote:
> On Sunday, 16 September 2018 at 10:04:09 UTC, learnfirst1 wrote:
>> how to make this more fast like with one loop and get the results.


thanks for reply, minimal perfect hashing seems good for this task, I will do more test for confirm.

I try to storage the value into a static array, so when  diff middleware try access the key it will reused the array offset without need to run hash on key string. (all middleware can access the key by static enum name).

with Perfect Hash Function, I can translate the hash number into static enum and use them.

The code also need provide string get(string key) method, in this case the header key may not on the static preset arrays,  in this case it should search a smaller array (save on the parse process).   with this solution I has to run hash for each of request headers , with less than 127 items (maybe around 16 at max) I am not sure it is fast then direct string compare. for example:

foreach(int header_index, ref header; new_request.headers) {
   int hash = mpfh(header.key);
   auto enum_index = static_eum[ hash ];
   new_request.header[enum_index] = header_index  ;
}




the simple static switch is not the best way as I know here.   with compiled PCRE match it will "Context-Type", "Context-Encoding" in once ( and maybe there is some SSE4 to pass 16 byte for each loop).

>> Diederik de Groot

please ignore the Header.data assign part. which just for the demo. in the real app it will only parse once and storage in pre -alloc request/response object.


September 16, 2018
On Sunday, 16 September 2018 at 11:30:11 UTC, learnfirst1 wrote:
> On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir Panteleev wrote:

It has to be case Case Insensitive,  so before I run mpfh for each new request headers,  I need to convert the keys into low or up case.  this value can be storage on a stack local tmp cache.    and I has to run it on the get(string key) method again.


September 16, 2018
On Sunday, 16 September 2018 at 11:51:24 UTC, learnfirst1 wrote:
> On Sunday, 16 September 2018 at 11:30:11 UTC, learnfirst1 wrote:
>> On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir Panteleev wrote:


I will test pcre solution vs mpfc for benchmark.  the pcre is easy to deal with low/up case.

The PCRE:

/^(?:Accept(*:0)|Accept\-Charset(*:1)|Accept\-Encoding(*:2)|Accept\-Language(*:3)|Access\-Control\-Allow\-Credentials(*:4)|Access\-Control\-Allow\-Origin(*:5)|Access\-Control\-Allow\-Methods(*:6)|Access\-Control\-Allow\-Headers(*:7)|Access\-Control\-Max\-Age(*:8)|Access\-Control\-Expose\-Headers(*:9)|Access\-Control\-Request\-Method(*:10)|Access\-Control\-Request\-Headers(*:11)|Age(*:12)|Allow(*:13)|Authorization(*:14)|Cache\-Control(*:15)|Connection(*:16)|Content\-Encoding(*:17)|Content\-Language(*:18)|Content\-Length(*:19)|Content\-Location(*:20)|Content\-Range(*:21)|Content\-Security\-Policy(*:22)|Content\-Type(*:23)|Cookie(*:24)|Date(*:25)|ETag(*:26)|Expect(*:27)|Expires(*:28)|Host(*:29)|If\-Match(*:30)|If\-Modified\-Since(*:31)|If\-None\-Match(*:32)|If\-Range(*:33)|If\-Unmodified\-Since(*:34)|Last\-Modified(*:35)|Location(*:36)|Origin(*:37)|Pragma(*:38)|Proxy\-Authenticate(*:39)|Proxy\-Authorization(*:40)|Range(*:41)|Referer(*:42)|Retry\-After(*:43)|Sec\-Websocket\-Extensions(*:44)|Sec\-Websocket\-Key(*:45)|Sec\-Websocket\-Origin(*:46)|Sec\-Websocket\-Protocol(*:47)|Sec\-Websocket\-Version(*:48)|Server(*:49)|Set\-Cookie(*:50)|Strict\-Transport\-Security(*:51)|Transfer\-Encoding(*:52)|Upgrade(*:53)|User\-Agent(*:54)|Vary(*:55)|Via(*:56)|WWW\-Authenticate(*:57))$/i