Alias/template for value

Hi all, When playing with the graph library code, I noticed something odd. Here's a function to calculate the neighbours of a vertex: auto neighbours(immutable size_t v) { immutable size_t start = _sumTail[v] + _sumHead[v]; immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1]; if(!_cacheNeighbours[v]) { size_t j = start; foreach (i; _sumTail[v] .. _sumTail[v + 1]) { _neighbours[j] = _head[_indexTail[i]]; ++j; } foreach (i; _sumHead[v] .. _sumHead[v + 1]) { _neighbours[j] = _tail[_indexHead[i]]; ++j; } assert(j == end); _cacheNeighbours[v] = true; } return _neighbours[start .. end]; } Now, I noticed that if instead of declaring the variables start, end, I instead manually write out these expressions in the code, I get a small but consistent speedup in the program. So, I'm curious (i) Why? As I'd have assumed the compiler could optimize away unnecessary variables like this, and (ii) is there a way of declaring start/end in the code such that at compile time the correct expression will be put in place where it's needed? I'm guessing some kind of template solution, but I didn't get it working (I probably need to study templates more:-). (... knocks head against wall to try and dislodge current micro-optimization obsession ...)

Joseph Rushton Wakeling: > Now, I noticed that if instead of declaring the variables start, end, I instead > manually write out these expressions in the code, I get a small but consistent > speedup in the program. > > So, I'm curious (i) Why? As I'd have assumed the compiler could optimize away unnecessary variables like this, Take a look at the asm. And try to compile with ldc2 too. If gdc is missing such small optimization then I suggest to write a small program that shows the problem, and write a gdc bug report (that also contains the resulting asm). Bye, bearophile

On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton Wakeling wrote: > Hi all, > > When playing with the graph library code, I noticed something odd. > > Here's a function to calculate the neighbours of a vertex: > > auto neighbours(immutable size_t v) > { > immutable size_t start = _sumTail[v] + _sumHead[v]; > immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1]; > if(!_cacheNeighbours[v]) > { > size_t j = start; > foreach (i; _sumTail[v] .. _sumTail[v + 1]) > { > _neighbours[j] = _head[_indexTail[i]]; > ++j; > } > foreach (i; _sumHead[v] .. _sumHead[v + 1]) > { > _neighbours[j] = _tail[_indexHead[i]]; > ++j; > } > assert(j == end); > _cacheNeighbours[v] = true; > } > return _neighbours[start .. end]; > } > > Now, I noticed that if instead of declaring the variables start, end, I instead > manually write out these expressions in the code, I get a small but consistent > speedup in the program. > > So, I'm curious (i) Why? As I'd have assumed the compiler could optimize away > unnecessary variables like this, and (ii) is there a way of declaring start/end > in the code such that at compile time the correct expression will be put in > place where it's needed? > > I'm guessing some kind of template solution, but I didn't get it working (I > probably need to study templates more:-). > > (... knocks head against wall to try and dislodge current micro-optimization > obsession ...) compiler/version/flags? The answer to what's happening might be obvious from the assembly code for the function, if you posted it. Also, have you tried it on a different cpu?

On 07/31/2013 01:53 PM, bearophile wrote: > Take a look at the asm. And try to compile with ldc2 too. In this case I observed it with both GDC and LDC. > If gdc is missing such small optimization then I suggest to write a small program that shows the problem, and write a gdc bug report (that also contains the resulting asm). Good call, I'll try and do this.

July 31, 2013

Re: Alias/template for value

Posted by JS
in reply to Joseph Rushton Wakeling

Permalink

Posted in reply to Joseph Rushton Wakeling

Permalink

On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton Wakeling wrote:
> Hi all,
>
> When playing with the graph library code, I noticed something odd.
>
> Here's a function to calculate the neighbours of a vertex:
>
>         auto neighbours(immutable size_t v)
>         {
>             immutable size_t start = _sumTail[v] + _sumHead[v];
>             immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1];
>             if(!_cacheNeighbours[v])
>             {
>                 size_t j = start;
>                 foreach (i; _sumTail[v] .. _sumTail[v + 1])
>                 {
>                     _neighbours[j] = _head[_indexTail[i]];
>                     ++j;
>                 }
>                 foreach (i; _sumHead[v] .. _sumHead[v + 1])
>                 {
>                     _neighbours[j] = _tail[_indexHead[i]];
>                     ++j;
>                 }
>                 assert(j == end);
>                 _cacheNeighbours[v] = true;
>             }
>             return _neighbours[start .. end];
>         }
>
> Now, I noticed that if instead of declaring the variables start, end, I instead
> manually write out these expressions in the code, I get a small but consistent
> speedup in the program.
>
> So, I'm curious (i) Why?  As I'd have assumed the compiler could optimize away
> unnecessary variables like this, and (ii) is there a way of declaring start/end
> in the code such that at compile time the correct expression will be put in
> place where it's needed?
>
> I'm guessing some kind of template solution, but I didn't get it working (I
> probably need to study templates more:-).
>
> (... knocks head against wall to try and dislodge current micro-optimization
> obsession ...)

There is no telling, as bear said, compare the disassemblies(you can try it on dpaste and compare).

I do not believe D's code generation is optimal from what I have seen(at least for DMD). It could be cache misses due to pipelining issues(the order of instructions matter) or other weird stuff.

It could be that when you alias start and end, D ends up using a different algorithm that somehow changes the code generation.

I don't think D's code generation is even close to being as mature as many of the common C++ compilers.

On 07/31/2013 01:56 PM, John Colvin wrote: > compiler/version/flags? gdmd and ldmd2, both with -O -release -inline. > The answer to what's happening might be obvious from the assembly code for the function, if you posted it. Yes, I'll get onto that. I'm completely inexperienced with assembly, but I think it's clear I have to learn ... > Also, have you tried it on a different cpu? No, but can and will.

Joseph Rushton Wakeling: > I'm completely inexperienced with assembly, but I > think it's clear I have to learn ... Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days. Bye, bearophile

On Wed, Jul 31, 2013 at 04:54:40PM +0200, bearophile wrote: > Joseph Rushton Wakeling: > > >I'm completely inexperienced with assembly, but I > >think it's clear I have to learn ... > > Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days. [...] It's completely worth the effort though, IMO. I know many people will disagree with me, but Knuth puts it best: By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. -- D. Knuth People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. Knuth T -- My program has no bugs! Only unintentional features...

On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote: > People who are more than casually interested in computers should > have at least some idea of what the underlying hardware is like. > Otherwise the programs they write will be pretty weird. > -- D. Knuth Well, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)

On 07/31/2013 04:56 PM, H. S. Teoh wrote: > It's completely worth the effort though, IMO. It will mean I can finally talk programming with my dad, who _only_ knows how to program in assembly (and still does so regularly:-)

Forums