Thread overview
Importance of memory organization for speed
Jun 10, 2008
Bill Cox
Jun 14, 2008
renoX
Jun 15, 2008
Nick B
Jun 15, 2008
Russell Lewis
June 10, 2008
Hi, all.

Waaay back, there was a short discussion of optimizing memory layout for speed.  I've written a simple benchmark that traverses large graphs, one written in very carefully memory optimized C, the other using C++/STL.  The C version is 15X faster, and uses 2X less memory on my Ubuntu x64 Core Duo laptop.  Cachegrind shows the C version has a 16.7X lower L2 cache miss rate, which accounts for the speed difference.

So, I'll just post again the importance of keeping memory layout abstract, and hidden from the user.  More and more, speed for memory intensive applications is all about cache performance.  Benchmarks can be found in the examples/graph_benchmark directory of svn for the datadraw project:

svn co https://datadraw.svn.sourceforge.net/svnroot/datadraw/trunk datadraw

Best regards,
Bill
June 14, 2008
Bill Cox a écrit :
> Hi, all.
> 
> Waaay back, there was a short discussion of optimizing memory layout
> for speed.  I've written a simple benchmark that traverses large
> graphs, one written in very carefully memory optimized C, the other
> using C++/STL.  The C version is 15X faster, and uses 2X less memory
> on my Ubuntu x64 Core Duo laptop.  Cachegrind shows the C version has
> a 16.7X lower L2 cache miss rate, which accounts for the speed
> difference.
> 
> So, I'll just post again the importance of keeping memory layout
> abstract, and hidden from the user.

Uh? What you just did is using your knowledge of the memory layout in C to speedup your app, so it's the *opposite* of having the memory layout hidden from the user!

I don't catch your point here..

Regards,
renoX


>  More and more, speed for memory
> intensive applications is all about cache performance.  Benchmarks
> can be found in the examples/graph_benchmark directory of svn for the
> datadraw project:
> 
> svn co https://datadraw.svn.sourceforge.net/svnroot/datadraw/trunk
> datadraw
> 
> Best regards, Bill
June 15, 2008
renoX wrote:
> Bill Cox a écrit :
>> Hi, all.
>>
>> Waaay back, there was a short discussion of optimizing memory layout
>> for speed.  I've written a simple benchmark that traverses large
>> graphs, one written in very carefully memory optimized C, the other
>> using C++/STL.  The C version is 15X faster, and uses 2X less memory
>> on my Ubuntu x64 Core Duo laptop.  Cachegrind shows the C version has
>> a 16.7X lower L2 cache miss rate, which accounts for the speed
>> difference.
>>
>> So, I'll just post again the importance of keeping memory layout
>> abstract, and hidden from the user.
> 
> Uh? What you just did is using your knowledge of the memory layout in C to speedup your app, so it's the *opposite* of having the memory layout hidden from the user!
> 
> I don't catch your point here..
> 
> Regards,
> renoX
> 
> 
>>  More and more, speed for memory
>> intensive applications is all about cache performance.  Benchmarks
>> can be found in the examples/graph_benchmark directory of svn for the
>> datadraw project:
>>
>> svn co https://datadraw.svn.sourceforge.net/svnroot/datadraw/trunk
>> datadraw
>>
>> Best regards, Bill


Hi there


Does any one know how to measure the L1 & L2 cache performance using D &  Tango or is the _only_ way to do this is to use Valgrind ?

regards
Nick B
June 15, 2008
renoX wrote:
> Bill Cox a écrit :
>> Hi, all.
>>
>> Waaay back, there was a short discussion of optimizing memory layout
>> for speed.  I've written a simple benchmark that traverses large
>> graphs, one written in very carefully memory optimized C, the other
>> using C++/STL.  The C version is 15X faster, and uses 2X less memory
>> on my Ubuntu x64 Core Duo laptop.  Cachegrind shows the C version has
>> a 16.7X lower L2 cache miss rate, which accounts for the speed
>> difference.
>>
>> So, I'll just post again the importance of keeping memory layout
>> abstract, and hidden from the user.
> 
> Uh? What you just did is using your knowledge of the memory layout in C to speedup your app, so it's the *opposite* of having the memory layout hidden from the user!
> 
> I don't catch your point here..

In a perfect world, a compiler can perform deep optimizations, similar to hand-tuning your program.  But it can't do it if you have already halfway specified the memory layout.  So in that perfect world, you want to actually *underspecify* your program, so that the compiler can work miracles.  However, if you compiler isn't as good as that, then hand-tuning is the better option.

An interesting observation is that for straight-line code (constrained within a single function), it used to be that hand-tuned C (or, better yet, assembler) would be much faster than what any compiler could produce.  Nowadays, compilers generally produce code that is as good (if not better) than assembly experts.  I would suspect that 20 years from now, our compilers will rework the memory layout just like they currently rework the ordering of operations in our functions.  But I don't think that we're there yet.