Jump to page: 1 2
Thread overview
Alias/template for value
Jul 31, 2013
bearophile
Jul 31, 2013
John Colvin
Jul 31, 2013
bearophile
Jul 31, 2013
H. S. Teoh
Jul 31, 2013
Dicebot
Jul 31, 2013
H. S. Teoh
Jul 31, 2013
John Colvin
Jul 31, 2013
H. S. Teoh
Jul 31, 2013
JS
Jul 31, 2013
John Colvin
Jul 31, 2013
H. S. Teoh
July 31, 2013
Hi all,

When playing with the graph library code, I noticed something odd.

Here's a function to calculate the neighbours of a vertex:

        auto neighbours(immutable size_t v)
        {
            immutable size_t start = _sumTail[v] + _sumHead[v];
            immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1];
            if(!_cacheNeighbours[v])
            {
                size_t j = start;
                foreach (i; _sumTail[v] .. _sumTail[v + 1])
                {
                    _neighbours[j] = _head[_indexTail[i]];
                    ++j;
                }
                foreach (i; _sumHead[v] .. _sumHead[v + 1])
                {
                    _neighbours[j] = _tail[_indexHead[i]];
                    ++j;
                }
                assert(j == end);
                _cacheNeighbours[v] = true;
            }
            return _neighbours[start .. end];
        }

Now, I noticed that if instead of declaring the variables start, end, I instead manually write out these expressions in the code, I get a small but consistent speedup in the program.

So, I'm curious (i) Why?  As I'd have assumed the compiler could optimize away unnecessary variables like this, and (ii) is there a way of declaring start/end in the code such that at compile time the correct expression will be put in place where it's needed?

I'm guessing some kind of template solution, but I didn't get it working (I probably need to study templates more:-).

(... knocks head against wall to try and dislodge current micro-optimization
obsession ...)
July 31, 2013
Joseph Rushton Wakeling:

> Now, I noticed that if instead of declaring the variables start, end, I instead
> manually write out these expressions in the code, I get a small but consistent
> speedup in the program.
>
> So, I'm curious (i) Why?  As I'd have assumed the compiler could optimize away unnecessary variables like this,

Take a look at the asm. And try to compile with ldc2 too.

If gdc is missing such small optimization then I suggest to write a small program that shows the problem, and write a gdc bug report (that also contains the resulting asm).

Bye,
bearophile
July 31, 2013
On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton Wakeling wrote:
> Hi all,
>
> When playing with the graph library code, I noticed something odd.
>
> Here's a function to calculate the neighbours of a vertex:
>
>         auto neighbours(immutable size_t v)
>         {
>             immutable size_t start = _sumTail[v] + _sumHead[v];
>             immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1];
>             if(!_cacheNeighbours[v])
>             {
>                 size_t j = start;
>                 foreach (i; _sumTail[v] .. _sumTail[v + 1])
>                 {
>                     _neighbours[j] = _head[_indexTail[i]];
>                     ++j;
>                 }
>                 foreach (i; _sumHead[v] .. _sumHead[v + 1])
>                 {
>                     _neighbours[j] = _tail[_indexHead[i]];
>                     ++j;
>                 }
>                 assert(j == end);
>                 _cacheNeighbours[v] = true;
>             }
>             return _neighbours[start .. end];
>         }
>
> Now, I noticed that if instead of declaring the variables start, end, I instead
> manually write out these expressions in the code, I get a small but consistent
> speedup in the program.
>
> So, I'm curious (i) Why?  As I'd have assumed the compiler could optimize away
> unnecessary variables like this, and (ii) is there a way of declaring start/end
> in the code such that at compile time the correct expression will be put in
> place where it's needed?
>
> I'm guessing some kind of template solution, but I didn't get it working (I
> probably need to study templates more:-).
>
> (... knocks head against wall to try and dislodge current micro-optimization
> obsession ...)

compiler/version/flags?

The answer to what's happening might be obvious from the assembly code for the function, if you posted it.

Also, have you tried it on a different cpu?
July 31, 2013
On 07/31/2013 01:53 PM, bearophile wrote:
> Take a look at the asm. And try to compile with ldc2 too.

In this case I observed it with both GDC and LDC.

> If gdc is missing such small optimization then I suggest to write a small program that shows the problem, and write a gdc bug report (that also contains the resulting asm).

Good call, I'll try and do this.
July 31, 2013
On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton Wakeling wrote:
> Hi all,
>
> When playing with the graph library code, I noticed something odd.
>
> Here's a function to calculate the neighbours of a vertex:
>
>         auto neighbours(immutable size_t v)
>         {
>             immutable size_t start = _sumTail[v] + _sumHead[v];
>             immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1];
>             if(!_cacheNeighbours[v])
>             {
>                 size_t j = start;
>                 foreach (i; _sumTail[v] .. _sumTail[v + 1])
>                 {
>                     _neighbours[j] = _head[_indexTail[i]];
>                     ++j;
>                 }
>                 foreach (i; _sumHead[v] .. _sumHead[v + 1])
>                 {
>                     _neighbours[j] = _tail[_indexHead[i]];
>                     ++j;
>                 }
>                 assert(j == end);
>                 _cacheNeighbours[v] = true;
>             }
>             return _neighbours[start .. end];
>         }
>
> Now, I noticed that if instead of declaring the variables start, end, I instead
> manually write out these expressions in the code, I get a small but consistent
> speedup in the program.
>
> So, I'm curious (i) Why?  As I'd have assumed the compiler could optimize away
> unnecessary variables like this, and (ii) is there a way of declaring start/end
> in the code such that at compile time the correct expression will be put in
> place where it's needed?
>
> I'm guessing some kind of template solution, but I didn't get it working (I
> probably need to study templates more:-).
>
> (... knocks head against wall to try and dislodge current micro-optimization
> obsession ...)

There is no telling, as bear said, compare the disassemblies(you can try it on dpaste and compare).

I do not believe D's code generation is optimal from what I have seen(at least for DMD). It could be cache misses due to pipelining issues(the order of instructions matter) or other weird stuff.

It could be that when you alias start and end, D ends up using a different algorithm that somehow changes the code generation.

I don't think D's code generation is even close to being as mature as many of the common C++ compilers.



July 31, 2013
On 07/31/2013 01:56 PM, John Colvin wrote:
> compiler/version/flags?

gdmd and ldmd2, both with -O -release -inline.

> The answer to what's happening might be obvious from the assembly code for the function, if you posted it.

Yes, I'll get onto that.  I'm completely inexperienced with assembly, but I think it's clear I have to learn ...

> Also, have you tried it on a different cpu?

No, but can and will.

July 31, 2013
Joseph Rushton Wakeling:

> I'm completely inexperienced with assembly, but I
> think it's clear I have to learn ...

Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days.

Bye,
bearophile
July 31, 2013
On Wed, Jul 31, 2013 at 04:54:40PM +0200, bearophile wrote:
> Joseph Rushton Wakeling:
> 
> >I'm completely inexperienced with assembly, but I
> >think it's clear I have to learn ...
> 
> Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days.
[...]

It's completely worth the effort though, IMO. I know many people will disagree with me, but Knuth puts it best:

	By understanding a machine-oriented language, the programmer
	will tend to use a much more efficient method; it is much closer
	to reality.
	-- D. Knuth

	People who are more than casually interested in computers should
	have at least some idea of what the underlying hardware is like.
	Otherwise the programs they write will be pretty weird.
	-- D. Knuth


T

-- 
My program has no bugs! Only unintentional features...
July 31, 2013
On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:
> 	People who are more than casually interested in computers should
> 	have at least some idea of what the underlying hardware is like.
> 	Otherwise the programs they write will be pretty weird.
> 	-- D. Knuth

Well, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)
July 31, 2013
On 07/31/2013 04:56 PM, H. S. Teoh wrote:
> It's completely worth the effort though, IMO.

It will mean I can finally talk programming with my dad, who _only_ knows how to program in assembly (and still does so regularly:-)

« First   ‹ Prev
1 2