| |
 | Posted by John Colvin in reply to Kirill | Permalink Reply |
|
John Colvin 
Posted in reply to Kirill
| On Monday, 10 November 2014 at 19:18:21 UTC, Kirill wrote:
> Dear D community (and specifically experts on cache optimization),
>
> I'm a C++ programmer and was waiting for a while to do a project in D.
>
> I'd like to build a cache-optimized decision tree forest library, and I'm debating between D and C++. I'd like to make it similar to atlas, spiral, or other libraries that partially use static optimization with recompilation and meta-programming to cache optimize the code for a specific architecture (specifically the latest xeons / xeon phi). Given D's compile speed and meta-programming, it should be a good fit. The problem that I might encounter is that C++ has a lot more information on the topic, which might be significant bottleneck given I'm just learning cache optimization (from a few papers and "what every programmer should know about memory").
>
> From my understanding, cache optimization mostly involves breaking data and loops into segments that fit in cache, and making sure that commonly used variables (for example sum in sum+=i) stay in cache.
Assing there isn't more frequently accessed data around, you would want that to stay in a register, not cache.
> Most of this should be solved by statically defining sizes and paddings of blocks to be used for caching. It's more related to low level -- C, from my understanding. Are there any hidden stones?
>
> The other question is how mature is the compiler in terms of optimizing for cache comparing to C++? I think gnu C++ does a few tricks to optimize for cache and there are ways to tweak cache line alignment.
>
> My knowledge on the subject is not yet concrete and limited but I hope this gave an idea of what I'm looking for and you can recommend me a good direction to take.
>
> Best regards,
> --Kirill
D is a good language for this sort of thing. Using various metaprogramming techniques it might even be fun.
Most advice for C(++) will also apply to D w.r.t. cache.
You will probably have to learn assembly and also make use of tools such as cachegrind and perf unless you like trying to optimise blind.
A word of warning: modern CPU caches are complicated and are sometimes difficult to understand w.r.t. performance in specific cases.
|