Thread overview
Virtual value types during compile-time for static type safety, static optimizations and function overloading.
Jul 17, 2015
Tamas
Jul 17, 2015
ZombineDev
Jul 17, 2015
ZombineDev
Jul 17, 2015
Tamas
Jul 18, 2015
Adam D. Ruppe
Jul 18, 2015
Tamas
Jul 18, 2015
Tamas
Jul 18, 2015
Adam D. Ruppe
Jul 18, 2015
Tamas
Jul 18, 2015
Baz
July 17, 2015
I got inspired by Andrei's "Generic Programming Must Go" talk:
https://www.youtube.com/watch?v=mCrVYYlFTrA

I.e. writing functions with static if-s, based on what we know about the input.
So the question is how to annotate the input variables?

Adam showed an excellent solution for this, by wrapping the value into a struct, and using alias: http://stackoverflow.com/questions/31442059/virtual-static-value-types

Unfortunately, it's not without runtime penalty. I have checked this idea with ints, against the generated asm.dlang.org code. Unfortunately the struct wrapper results in longer asm code. In theory such run-time penalty is not necessary, as everything is available compile-time.

To understand my use-case, here is an example "math library" that skips certain validations or computations altogether, whether some properties of the input can be inferred from how the value was produced:

import std.math : sqrt;
import std.conv;

struct Positive {
	int num;
	alias num this;
}

Positive abs(T)(T n) {
	static if (is(T == Positive)) {
		return n;
	}
	if (n >= 0) return Positive(n);
	return Positive(-n);
}

Positive isqrt(T)(T x) {
	static if (is(T == Positive)) {
		return Positive(sqrt(x.to!float).to!int);
	}
	if (x<0) {
		throw new Exception("no root for negative numbers");
	}
	return Positive(sqrt(x.to!float).to!int);
}

unittest {
	assert(abs(-4) == 4); // full abs runs
	assert(isqrt(4).abs() == 2); // full isqrt runs, abs returns n immediately.
	assert(abs(-4).isqrt() == 2); // full abs runs, isqrt return immediately, skipping validation;
}

Although this code is fully operational, presents nice api and compile-time optimizations, the extra Struct wrapper is not without runtime penalty.

Is there a solution that results the same static optimizations, but has no runtime penalty, i.e. the functions just operates with ints? (At least when compiled)

Thanks, Tamas

July 17, 2015
On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
> Is there a solution that results the same static optimizations, but has no runtime penalty, i.e. the functions just operates with ints? (At least when compiled)

Did you try looking at assembly generated by GDC or LDC with full optimizations? For example GDC does quite better than DMD for the proposed SafeInt type:
https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595
July 17, 2015
On Friday, 17 July 2015 at 23:15:31 UTC, ZombineDev wrote:
> On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
>> Is there a solution that results the same static optimizations, but has no runtime penalty, i.e. the functions just operates with ints? (At least when compiled)
>
> Did you try looking at assembly generated by GDC or LDC with full optimizations? For example GDC does quite better than DMD for the proposed SafeInt type:
> https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595

Also, see the table at the bottom of this comment:
https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-117450524
July 17, 2015
On Friday, 17 July 2015 at 23:16:51 UTC, ZombineDev wrote:
> On Friday, 17 July 2015 at 23:15:31 UTC, ZombineDev wrote:
>> On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
>>> Is there a solution that results the same static optimizations, but has no runtime penalty, i.e. the functions just operates with ints? (At least when compiled)
>>
>> Did you try looking at assembly generated by GDC or LDC with full optimizations? For example GDC does quite better than DMD for the proposed SafeInt type:
>> https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595
>
> Also, see the table at the bottom of this comment:
> https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-117450524

Thanks for the pointers! My first priority a  performant library, secondary is a nice api, 3rd is a nice implementation. So that kind of rules put Amy degradation compared to ints. I used DMD, BTW.

I see no reason for the necessity of performance degradation. Essentially I just want to assign a similar qualifyer like const or immutable. They are checked and used at compile time, but erased for runtime. Same here.
July 18, 2015
On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
> Although this code is fully operational, presents nice api and compile-time optimizations, the extra Struct wrapper is not without runtime penalty.
>
> Is there a solution that results the same static optimizations, but has no runtime penalty, i.e. the functions just operates with ints? (At least when compiled)
>
> Thanks, Tamas

Hi, I don't see the runtime penalities you talk about.

I've defined these alias:

---
alias abs1 = abs!int;
alias abs2 = abs!Positive;
alias isqrt1 = isqrt!int;
alias isqrt2 = isqrt!Positive;
---

and the asm produced for the `Positive` type is clearly faster:

;------- abs1 -------
00402074h  enter 000Ch, 00h
00402078h  test eax, eax
0040207Ah  js 00402081h
0040207Ch  mov dword ptr [ebp-0Ch], eax
0040207Fh  leave
00402080h  ret
00402081h  neg eax
00402083h  mov dword ptr [ebp-08h], eax
00402086h  leave
00402087h  ret
;----------------------------

;------- abs2 (Positive)-------
00402088h  enter 0004h, 00h
0040208Ch  leave
0040208Dh  ret
;----------------------------

;------- isqrt1 -------
00402090h  enter 0008h, 00h
00402094h  test eax, eax
00402096h  jns 004020CFh
00402098h  mov ecx, 00454D60h
0040209Dh  push ecx
0040209Eh  call 00403410h
004020A3h  add esp, 04h
004020A6h  push dword ptr [004540A4h]
004020ACh  push dword ptr [004540A0h]
004020B2h  push dword ptr [004540E4h]
004020B8h  push dword ptr [004540E0h]
004020BEh  push 0000001Bh
004020C0h  push 00000000h
004020C2h  call 004035C0h
004020C7h  push eax
004020C8h  call 00403400h
004020CDh  jmp 004020E6h
004020CFh  call 0040247Ch
004020D4h  fsqrt
004020D6h  sub esp, 04h
004020D9h  fstp dword ptr [esp]
004020DCh  call 00402494h
004020E1h  mov dword ptr [ebp-08h], eax
004020E4h  leave
004020E5h  ret
004020E6h  leave
004020E7h  ret
;----------------------------

;------- isqrt2 (Positive)-------
004020E8h  enter 0008h, 00h
004020ECh  call 00402560h
004020F1h  fsqrt
004020F3h  sub esp, 04h
004020F6h  fstp dword ptr [esp]
004020F9h  call 00402494h
004020FEh  mov dword ptr [ebp-08h], eax
00402101h  leave
00402102h  ret
;----------------------------

(dmd win32 bit, -w -wi).

Also note that i've tweaked the code to get rid of a few warnings:
---
Positive abs(T)(T n) {
    static if (is(T == Positive)) {
        return n;
    }
    else
    {
        if (n >= 0) return Positive(n);
        else return Positive(-n);
    }
}

Positive isqrt(T)(T x) {
    static if (is(T == Positive)) {
        return Positive(sqrt(x.to!float).to!int);
    }
    else
    {
        if (x<0)
            throw new Exception("no root for negative numbers");
        else return Positive(sqrt(x.to!float).to!int);
    }

}
---
July 18, 2015
On Friday, 17 July 2015 at 23:44:25 UTC, Tamas wrote:
> I used DMD, BTW.

what compile flags?
July 18, 2015
I made a thorough comparison using multiple compilers and a summary of the findings. In short, there is a runtime overhead.

I reduced the code to cut out the imports and made two versions with equivalent semantic content.
positive0.d contains the hand written specializations of the abs function.
positive.d contains the solution with function templates / static type analysis.

///////

/* positive0.d:

Compile & execute:
$ dmd positive0.d; ./positive0; echo $?
$ ldc2 positive0.d; ./positive0; echo $?

generate ASM source:
$ dmd positive0.d; gobjdump -d positive0.o > positive0.dmd.s
$ ldc2 positive0.d -output-s

*/

int absPositive(int n) {
  return n;
}

int abs(int n) {
  return (n>=0) ? n : -n;
}

int square(int x) {
  return x * x;
}

int main() {
  return !((abs(-16) == 16)
    && (abs(3) == 3)
    && (square(5).abs == 25)
    && (square(-4).abs == 16));
}

///////

/* positive.d:

Compile & execute:
$ dmd positive.d; ./positive; echo $?
$ ldc2 positive.d; ./positive; echo $?

generate ASM source:
$ dmd positive.d; gobjdump -d positive.o > positive.dmd.s
$ ldc2 positive.d -output-s

*/
struct Positive {
  int num;
  alias num this;
}

Positive abs(T)(T n) {
  static if (is(T == Positive)) {
    return n;
  } else {
    return Positive((n >= 0) ? n : -n);
  }
}

Positive square(int x) {
  return Positive(x * x);
}

int main() {
  return !((abs(-16) == 16)
    && (abs(3) == 3)
    && (square(5).abs == 25)
    && (square(-4).abs == 16));
}

///////

I compared the generated asms. The asm code was substantially longer in case of non-hand written specializations of the abs function.

The 'optimized' versions of the abs function were equivalent, but the 'non-optimzed' versions shows the runtime overhead for dmd and ldc2 as well, a double 'mov' commands instead of a single ones;

The compiled hand written code was roughly half the size for both compilers:

File sizes:
ldc:
2678 positive0.s
4313 positive.s

dmd:
3442 positive0.dmd.s
8701 positive.dmd.s

You can see the abs functions below, and you can spot the double 'mov' operations:

positive.dmd.s:
0000000000000230 <_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive>:
 230:	55                   	push   %rbp
 231:	48 8b ec             	mov    %rsp,%rbp
 234:	48 83 ec 10          	sub    $0x10,%rsp
 238:	85 ff                	test   %edi,%edi
 23a:	78 02                	js     23e <_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive+0xe>
 23c:	eb 02                	jmp    240 <_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive+0x10>
 23e:	f7 df                	neg    %edi
 240:	89 7d f0             	mov    %edi,-0x10(%rbp)
 243:	48 89 f8             	mov    %rdi,%rax
 246:	c9                   	leaveq
 247:	c3                   	retq

0000000000000248 <_D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive>:
 248:	55                   	push   %rbp
 249:	48 8b ec             	mov    %rsp,%rbp
 24c:	48 83 ec 10          	sub    $0x10,%rsp
 250:	48 89 f8             	mov    %rdi,%rax
 253:	c9                   	leaveq
 254:	c3                   	retq
 255:	0f 1f 00             	nopl   (%rax)



positive0.dmd.s:
00000000000000a0 <_D9positive011absPositiveFiZi>:
  a0:	55                   	push   %rbp
  a1:	48 8b ec             	mov    %rsp,%rbp
  a4:	48 83 ec 10          	sub    $0x10,%rsp
  a8:	48 89 f8             	mov    %rdi,%rax
  ab:	c9                   	leaveq
  ac:	c3                   	retq
  ad:	0f 1f 00             	nopl   (%rax)

00000000000000b0 <_D9positive03absFiZi>:
  b0:	55                   	push   %rbp
  b1:	48 8b ec             	mov    %rsp,%rbp
  b4:	48 83 ec 10          	sub    $0x10,%rsp
  b8:	85 ff                	test   %edi,%edi
  ba:	78 05                	js     c1 <_D9positive03absFiZi+0x11>
  bc:	48 89 f8             	mov    %rdi,%rax
  bf:	eb 05                	jmp    c6 <_D9positive03absFiZi+0x16>
  c1:	48 89 f8             	mov    %rdi,%rax
  c4:	f7 d8                	neg    %eax
  c6:	c9                   	leaveq
  c7:	c3                   	retq


ldc2:
positive.s:

__D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	cmpl	$0, -4(%rsp)
	jl	LBB2_2
	leaq	-4(%rsp), %rax
	movq	%rax, -16(%rsp)
	jmp	LBB2_3
LBB2_2:
	leaq	-20(%rsp), %rax
	xorl	%ecx, %ecx
	subl	-4(%rsp), %ecx
	movl	%ecx, -20(%rsp)
	movq	%rax, -16(%rsp)
LBB2_3:
	movq	-16(%rsp), %rax
	movl	(%rax), %ecx
	movl	%ecx, -8(%rsp)
	movl	%ecx, %eax
	retq
	.cfi_endproc

	.globl	__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive
	.weak_definition	__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive
	.align	4, 0x90
__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive:
	.cfi_startproc
	movl	%edi, -8(%rsp)
	movl	%edi, %eax
	retq
	.cfi_endproc

	.section	__TEXT,__text,regular,pure_instructions
	.align	4, 0x90


positive0.s:
__D9positive011absPositiveFiZi:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	movl	-4(%rsp), %eax
	retq
	.cfi_endproc

	.globl	__D9positive03absFiZi
	.align	4, 0x90
__D9positive03absFiZi:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	cmpl	$0, -4(%rsp)
	jl	LBB1_2
	leaq	-4(%rsp), %rax
	movq	%rax, -16(%rsp)
	jmp	LBB1_3
LBB1_2:
	leaq	-20(%rsp), %rax
	xorl	%ecx, %ecx
	subl	-4(%rsp), %ecx
	movl	%ecx, -20(%rsp)
	movq	%rax, -16(%rsp)
LBB1_3:
	movq	-16(%rsp), %rax
	movl	(%rax), %eax
	retq
	.cfi_endproc

	.globl	__D9positive06squareFiZi
	.align	4, 0x90


my compilers:

$ ldc2 -version
LDC - the LLVM D compiler (6d3923):
  based on DMD v2.066.1 and LLVM 3.6.1
  Default target: x86_64-apple-darwin14.4.0
  Host CPU: core-avx2

$ dmd --version
DMD64 D Compiler v2.067


July 18, 2015
Sorry, the main function of positive0.d correctly looks like this:

int main() {
  return !((abs(-16) == 16)
    && (abs(3) == 3)
    && (square(5).absPositive == 25)
    && (square(-4).absPositive == 16));
}

But this does not affect the results, the asm file sizs or the asm abs function bodies.


July 18, 2015
On Saturday, 18 July 2015 at 10:06:07 UTC, Tamas wrote:
> Compile & execute:
> $ dmd positive0.d; ./positive0; echo $?
> $ ldc2 positive0.d; ./positive0; echo $?

Try adding the automatic optimize flags in all your cases. For dmd, `-O -inline`. Not sure about ldc but I think it is `-O` as well.



July 18, 2015
On Saturday, 18 July 2015 at 13:16:26 UTC, Adam D. Ruppe wrote:
> On Saturday, 18 July 2015 at 10:06:07 UTC, Tamas wrote:
>> Compile & execute:
>> $ dmd positive0.d; ./positive0; echo $?
>> $ ldc2 positive0.d; ./positive0; echo $?
>
> Try adding the automatic optimize flags in all your cases. For dmd, `-O -inline`. Not sure about ldc but I think it is `-O` as well.

Thanks, indeed, after -O -inline the bodies of the two abs functions are the same! :)

The asm code of the templated version is still longer overall, but I think it's only some garbage that is not really executed. (e.g some with assert and unittest in the name, although I have none such)

Soo thank you, it's really awesome! :)