Jump to page: 1 24  
Page
Thread overview
Optimisation possibilities: current, future and enhancements
Aug 25, 2016
Cecil Ward
Aug 25, 2016
Cauterite
Aug 29, 2016
Marco Leise
Aug 25, 2016
Cauterite
Aug 25, 2016
Cecil Ward
Aug 25, 2016
Cauterite
Aug 25, 2016
Andrea Fontana
Aug 25, 2016
Basile B.
Aug 25, 2016
kinke
Aug 25, 2016
Cecil Ward
Aug 25, 2016
Cecil Ward
Aug 25, 2016
kinke
Aug 25, 2016
Basile B.
Aug 25, 2016
ag0aep6g
Aug 25, 2016
kinke
Aug 26, 2016
Basile B.
Aug 26, 2016
kink
Aug 26, 2016
Timon Gehr
Aug 26, 2016
kink
Aug 26, 2016
Chris Wright
Aug 26, 2016
Basile B.
Aug 25, 2016
Cecil Ward
Aug 26, 2016
Johan Engelen
Aug 26, 2016
Meta
Aug 26, 2016
Basile B.
Aug 26, 2016
Meta
Aug 26, 2016
Patrick Schluter
Aug 26, 2016
Basile B.
Aug 26, 2016
Patrick Schluter
Aug 26, 2016
ag0aep6g
Aug 26, 2016
Patrick Schluter
Aug 26, 2016
ag0aep6g
Aug 26, 2016
Patrick Schluter
Aug 26, 2016
ag0aep6g
Aug 27, 2016
Meta
Aug 25, 2016
Cecil Ward
Aug 25, 2016
Cecil Ward
August 25, 2016
I'm wondering if there are more opportunities in D for increased optimization in compilers that have not been mined yet. I'm specifically interested in the possibilities of D over and above what is possible in C and C++ because of the characteristics of D or because of our freedom to change compared with the inertia in the C/C++ world.

A useful phrase I saw today: “declaration of intent given by the programmer to the compiler”.

As well as the opportunities that exist in D as it stands and as it is _used_ currently,
I wonder what could be achieved by enhancing the language or its usage patterns
with new semantic restrictive markings that could be implemented with varying degrees of disruptiveness (from zero, up to idiom bans or even minor grammar changes [gulp!]). New attributes or property markings have already been added, I believe, during the history of D, which fall into this category. I'm also thinking of things like pragmas, functions with magic names and so forth.

Examples from the wider world, for discussion, no guarantees they allow increased optimisation:

* In C, the “restrict” marker
* In C++, the mechanism that makes possible optimised move-constructors and a solution to temporary-object hell
* assert()’s possibilities: but only if it is native and not deleted too early in the compiler stack - guarantees of the truth of predicates and restriction of values to be in known ranges, just as compilers can exploit prior truth of if-statements. Same for static assert of course
* Contracts, invariants, pre- and postconditions’ many, many delicious possibilities. Ideally, they need to be published, at two extra levels: within the same module and globally so that callers even from other translation units who have only the prototype can have a richer spec to guide inlining with truly first-class optimisation
* Custom calling conventions
* Some C compilers have magic to allow the declaration of an ISR. Stretching the category of optimisation a bit perhaps, but certainly does aid optimisation in the widest sense, because you don't then have to have unnecessary extra function-calls just to bolt assembler to a routine in C
* Similarly, inter-language calling mechanisms in general
* GCC and LDC’s extended asm interfacing specs, constraints and other magic
* Non-return-function marking, first in GCC and then in C itself. (iirc. And in C++?)
* the GCC "__builtin_expect()" / "likely()" and "unlikely()" magic marker functions to aid branch-prediction, code layout, etc
* GCC’s “builtin_assume_aligned()” function
* The GCC -ffast-math switch allows (if I understand correctly) the compiler to assume there are no IEEE floating-point weirdnesses such as infinities, NaNs, denormals anywhere, or to assume that the programmer just doesn't care. What if there were a mechanism to give kind of control down to a much more fine-grained level? (Such as per-function/operator/block/expression/object/variable.)

Sorry it's such a paltry list. However discussion will I'm sure expand it.

August 25, 2016
On Thursday, 25 August 2016 at 11:16:52 UTC, Cecil Ward wrote:
>

I long for the day we ditch signalling NaNs — they would surely prevent `-ffast-math` from being effective.

I have a couple more ideas, here's one of them:
- if a function is pure and called with constexpr parameters, the compiler could potentially execute that call in the CTFE engine (automatically), as part of the constant-folding phase I guess. Such a technique will hopefully one day be practical, once the CTFE engine's performance improves.
August 25, 2016
On Thursday, 25 August 2016 at 11:16:52 UTC, Cecil Ward wrote:
> * the GCC "__builtin_expect()"

Instead of adding individual micro-optimisation features like this, I'd be more interested in the potential for profile-guided optimisation (which *ideally* can make these micro-optimisation decisions automatically). Since DMD already has some framework in place to support code profiling, I suspect this is at least a feasible enhancement.

On the other hand, it might not be worth trying to play catch-up with existing PGO features in GCC/LLVM. If you're using PGO, you're probably already using these other backends for their more advanced static optimisers.
August 25, 2016
On Thursday, 25 August 2016 at 11:55:08 UTC, Cauterite wrote:
> On Thursday, 25 August 2016 at 11:16:52 UTC, Cecil Ward wrote:
>> * the GCC "__builtin_expect()"
>
> Instead of adding individual micro-optimisation features like this, I'd be more interested in the potential for profile-guided optimisation (which *ideally* can make these micro-optimisation decisions automatically). Since DMD already has some framework in place to support code profiling, I suspect this is at least a feasible enhancement.
>
> On the other hand, it might not be worth trying to play catch-up with existing PGO features in GCC/LLVM. If you're using PGO, you're probably already using these other backends for their more advanced static optimisers.

One killer reason for me to use D rather than C or C++ would be if it either has or could be enhanced to have greater code optimisation possibilities. LTO isn't relevant here because it's equally applicable to other languages (in GCC at any rate, as I understand it). Aside from its own properties, D might have an advantage over C because its spec development could possibly be more agile, well, compared with C _standards_ anyway. GCC has kept innovating with non-standard features, to its credit. I think it's desirable for D not to fall _behind_ GCC's non-standard powerful and ingenious tricks.
August 25, 2016
On Thursday, 25 August 2016 at 12:27:20 UTC, Cecil Ward wrote:
>

When I said GCC/LLVM I meant GDC(GNU D Compiler)/LDC(LLVM D Compiler). I might have caused some confusion there.
August 25, 2016
On Thursday, 25 August 2016 at 11:16:52 UTC, Cecil Ward wrote:
> * Non-return-function marking, first in GCC and then in C itself. (iirc. And in C++?)

Maybe it could be implemented as

int blah()
out(result)
{
   assert(0);
}
body
{

}

instead of marking the function itself.

August 25, 2016
On Thursday, 25 August 2016 at 11:16:52 UTC, Cecil Ward wrote:
> I'm wondering if there are more opportunities in D for increased optimization in compilers that have not been mined yet. I'm specifically interested in the possibilities of D over and above what is possible in C and C++ because of the characteristics of D or because of our freedom to change compared with the inertia in the C/C++ world.
>
> [...]
> Sorry it's such a paltry list. However discussion will I'm sure expand it.

I'll add

* create temporaries based on the const function attribute.

I don't know why but I believed that it was already the case. After disassembling a short test with DMD and LDMD2 it appears clearly that this is not true:

°°°°°°°°°°°°°°°°°°°°°°°°°°
struct Foo
{
    immutable _u = 8;
    int foo() const
    {
        return 8 * _u;
    }
}
int use(ref const(Foo) foo)
{
    return foo.foo() + foo.foo();
}
°°°°°°°°°°°°°°°°°°°°°°°°°°

disasm of use (LDC2 via LDMD2, -O -release)

0000000000402930h  sub rsp, 18h
0000000000402934h  mov qword ptr [rsp+10h], rdi
0000000000402939h  call 00000000004028F0h ; (Foo.foo)
000000000040293Eh  mov rdi, qword ptr [rsp+10h]
0000000000402943h  mov dword ptr [rsp+0Ch], eax
0000000000402947h  call 00000000004028F0h ; (Foo.foo)
000000000040294Ch  mov ecx, dword ptr [rsp+0Ch]
0000000000402950h  add ecx, eax
0000000000402952h  mov eax, ecx
0000000000402954h  add rsp, 18h
0000000000402958h  ret

But Foo.foo constness guarantees that Foo state is not modified. So the result of the first CALL could be cached in a temporary and reused instead of the second CALL. This would help for example in loops when a getter function is called to know the iteration count.

August 25, 2016
I found it hard to believe LDC generates such crappy code when optimizing. These are my results using LDC master on Win64 (`ldc2 -O -release -output-s`):

struct Foo
{
    immutable _u = 8;
    int foo() const
    {
        return 8 * _u;
    }
}
int use(ref const(Foo) foo)
{
    return foo.foo() + foo.foo();
}

int main()
{
    Foo f;
    return use(f);
}


_D7current3Foo3fooMxFZi:
	movl	(%rcx), %eax
	shll	$3, %eax
	retq

_D7current3useFKxS7current3FooZi:
	movl	(%rcx), %eax
	shll	$4, %eax
	retq

_Dmain:
	movl	$128, %eax
	retq

Sure, Foo.foo() and use() could return a constant, but otherwise it can't get much better than this.
August 25, 2016
On Thursday, 25 August 2016 at 14:42:28 UTC, Basile B. wrote:
> On Thursday, 25 August 2016 at 11:16:52 UTC, Cecil Ward wrote:
>> [...]
>
> I'll add
>
> * create temporaries based on the const function attribute.
>
> I don't know why but I believed that it was already the case. After disassembling a short test with DMD and LDMD2 it appears clearly that this is not true:
>
> °°°°°°°°°°°°°°°°°°°°°°°°°°
> struct Foo
> {
>     immutable _u = 8;
>     int foo() const
>     {
>         return 8 * _u;
>     }
> }
> int use(ref const(Foo) foo)
> {
>     return foo.foo() + foo.foo();
> }
> °°°°°°°°°°°°°°°°°°°°°°°°°°
>
> disasm of use (LDC2 via LDMD2, -O -release)
>
> 0000000000402930h  sub rsp, 18h
> 0000000000402934h  mov qword ptr [rsp+10h], rdi
> 0000000000402939h  call 00000000004028F0h ; (Foo.foo)
> 000000000040293Eh  mov rdi, qword ptr [rsp+10h]
> 0000000000402943h  mov dword ptr [rsp+0Ch], eax
> 0000000000402947h  call 00000000004028F0h ; (Foo.foo)
> 000000000040294Ch  mov ecx, dword ptr [rsp+0Ch]
> 0000000000402950h  add ecx, eax
> 0000000000402952h  mov eax, ecx
> 0000000000402954h  add rsp, 18h
> 0000000000402958h  ret
>
> But Foo.foo constness guarantees that Foo state is not modified. So the result of the first CALL could be cached in a temporary and reused instead of the second CALL. This would help for example in loops when a getter function is called to know the iteration count.

The problem of the non-caching of appropriate function calls is not confined to methods. It also is observable when calling explicitly pure-marked external functions, eg. my_pure() + my_pure() makes two calls. (Checked in GCC -O3, with an extern pure-marked function.)

This is often covered up by inlining with full expansion, as non-extern functions don't show this.
August 25, 2016
On Thursday, 25 August 2016 at 17:22:27 UTC, kinke wrote:
> I found it hard to believe LDC generates such crappy code when optimizing. These are my results using LDC master on Win64 (`ldc2 -O -release -output-s`):
>
> struct Foo
> {
>     immutable _u = 8;
>     int foo() const
>     {
>         return 8 * _u;
>     }
> }
> int use(ref const(Foo) foo)
> {
>     return foo.foo() + foo.foo();
> }
>
> int main()
> {
>     Foo f;
>     return use(f);
> }
>
>
> _D7current3Foo3fooMxFZi:
> 	movl	(%rcx), %eax
> 	shll	$3, %eax
> 	retq
>
> _D7current3useFKxS7current3FooZi:
> 	movl	(%rcx), %eax
> 	shll	$4, %eax
> 	retq
>
> _Dmain:
> 	movl	$128, %eax
> 	retq
>
> Sure, Foo.foo() and use() could return a constant, but otherwise it can't get much better than this.

I think that here the optimisation is only because LDC can “see” the text of the method. When expansion is not possible, that would be the real test.

« First   ‹ Prev
1 2 3 4