July 23, 2012
Walter Bright:

> dmd currently doesn't do that, but that's not the language's fault, it's a quality of implementation issue.

I understand the difference between what compilers are able to do (today or in future), and what the language specs allow compiler writers to do. So in theory I agree.

In practice there is also the well known fallacy of the "sufficiently smart compiler":
http://c2.com/cgi/wiki?SufficientlySmartCompiler

This fallacy implies that if you want to actually see a compiler able to perform a certain optimization, such optimization must be rather "easy", this means it must be easy for the compiler to infer as true all the conditions necessary to apply that optimization (and then you need someone to actually implement it, in a community as small as the D one optimizations can't be top priority).

The other problem with optimizations is that often if you can't rely on them, that means you can't be certain they are used in the code you are writing, then it's like they don't exist. A good example of this is the Scheme standard requiring all Scheme compilers to implement the tail call optimization.


> Since the final switch does not allow a 'default' case, the check can be omitted, and the generated code is a simple index-jump, just like the computed goto example.

You have seen the asm I have shown in the next post. You see those "jmp *%ecx" at the end of each case. Computed gotos in this case are not just a single index-jump, there is an index-jump at the end of each case. Is your future hypothetical D compiler able to do that?

Bye,
bearophile
July 23, 2012
On 7/23/2012 3:23 PM, bearophile wrote:
> This fallacy implies that if you want to actually see a compiler able to perform
> a certain optimization, such optimization must be rather "easy", this means it
> must be easy for the compiler to infer as true all the conditions necessary to
> apply that optimization (and then you need someone to actually implement it, in
> a community as small as the D one optimizations can't be top priority).

It is easy. Note that the compiler already generates a jump table, this would just leave off the default check. In fact, the compiler could do a better job doing data flow analysis with final switch than computed goto, because the jump targets are constrained and are all known at compile time.

BTW, if computed gotos were put into the language, one would also require someone to implement it. You're not saving anything.


> The other problem with optimizations is that often if you can't rely on them,
> that means you can't be certain they are used in the code you are writing, then
> it's like they don't exist. A good example of this is the Scheme standard
> requiring all Scheme compilers to implement the tail call optimization.

Sorry, but I cannot buy this as an argument for loading in more language features. Even worse, requiring that a certain semantic construct be implemented in a certain way precludes the implementer from finding an even better way to do it.


> You see those "jmp *%ecx"
> at the end of each case. Computed gotos in this case are not just a single
> index-jump, there is an index-jump at the end of each case. Is your future
> hypothetical D compiler able to do that?

Of course. There's no magic technology there. It's just a minor variation on the well known loop rotation technique (which dmd does do). Computed gotos are a rather hackish design that goes around the horn rather than doing what is needed directly.


July 24, 2012
Walter Bright:

> Of course. There's no magic technology there.

Thank you for your answers Walter :-)

Bye,
bearophile
July 24, 2012
On 7/23/2012 5:05 PM, bearophile wrote:
> Walter Bright:
>
>> Of course. There's no magic technology there.
>
> Thank you for your answers Walter :-)

You're welcome.


July 24, 2012
On 24-Jul-12 02:06, Walter Bright wrote:
> On 7/23/2012 2:25 PM, bearophile wrote:
>> Walter Bright:
>>
>>> D already has it: http://dlang.org/statement.html#FinalSwitchStatement
>>
>> Do you have a proof? (Show me asm code)
>
> Since the final switch does not allow a 'default' case, the check can be
> omitted, and the generated code is a simple index-jump, just like the
> computed goto example.
>

Bounds check is actually not that important.

> dmd currently doesn't do that, but that's not the language's fault, it's
> a quality of implementation issue.
>
> The point is, computed goto offers nothing that final switch doesn't.
>


Sorry, but it's just wrong. The trick is that there is no need for jump table at all. That saves one memory access to read jump table entry and saves on cache (need only one "table" - bytecode itself).

Now to the heart of it all - threaded interpreter looks like this (I'll use switches for clarity):

switch(opcode){
case OP1:
	... //do instruction 1
	//+ decode next
	opcode = code[pc++];
	switch(opcode){
	case OP1:
		// jump to case OP1 above
	... etc.
	}
//no break as it will jump away
case OP2:
	... //do instruction 2
	//+ decode next
	opcode = code[pc++];
	switch(opcode){
	case OP1:
		// jump to case OP1 above e.g. by planting label on it
	... etc.
	}
....
}

Now if I use final switches there is still:

A) jump table per switch, or maybe less but there is no guarantees
 (= depend on another optimization that's not here)
B) it's an ugly and a roundabout way to do this

However I think that always requiring tail call optimization or providing syntax to enforce it would work:

void op_1()
{
	...//some code for instruction 1
	opcode = cast(function void ())code[pc++];
	goto opcode(); //opcode is pointer to function op_xx
}
//can do without goto fn() iff tail call is GUARANTEED

-- 
Dmitry Olshansky
July 24, 2012
On 7/24/2012 12:58 AM, Dmitry Olshansky wrote:
> Now if I use final switches there is still:
>
> A) jump table per switch, or maybe less but there is no guarantees
>   (= depend on another optimization that's not here)
> B) it's an ugly and a roundabout way to do this
>
> However I think that always requiring tail call optimization or providing syntax
> to enforce it would work:
>
> void op_1()
> {
>      ...//some code for instruction 1
>      opcode = cast(function void ())code[pc++];
>      goto opcode(); //opcode is pointer to function op_xx
> }
> //can do without goto fn() iff tail call is GUARANTEED

I believe you can do this with:

    switch (pc++)

and there are the same number of indirections.


July 24, 2012
Le 23/07/2012 18:56, bearophile a écrit :
> deadalnix:
>
>> The presented example is insanely unsafe. By giving invalid input, you
>> can basically branch randomly.
>>
>> The check added by the switch is what is required to make it safe, so
>> it isn't faster at the end.
>>
>> The case in the article isn't very strong.
>
> Thank you for your answer. We have not yet designed how D computed gotos
> are, both in syntax and semantics. This means that maybe there are ways
> to make them a little safer, in non-release mode.
>
> Part of the GCC-C code in the article was:
>
> static void* dispatch_table[] = {
> &&do_halt, &&do_inc, &&do_dec, &&do_mul2,
> &&do_div2, &&do_add7, &&do_neg};
> #define DISPATCH() goto *dispatch_table[code[pc++]]
>
> In D there are no preprocessor macros and the array bounds are enforced
> at run-time, this avoids some possible bugs of similar code.
>
> Also, instead of void* maybe a different type can be introduced, so only
> label addresses of the current function can be put in dispatch_table and
> pointer variables given to goto. This statically avoids some other
> possible bugs.
>
> It's true that in the end computed gotos are unsafe, you generally need
> to validate the bytecode/state transition before feeding them to the
> computed goto, and in D (unlike GCC-C) you are able to forbid the usage
> of computed gotos in @safe functions/methods. This doesn't avoid bugs,
> but helps contain them in delimited places. We have the
> @safe/@trusted/@system for a purpose.
>
> In D modules like std.reflection suggested by Andrei are useful, because
> D is usable to write lot of high-level code too, I use a highly
> functional D style in some little D "scripts". But D is also a system
> language, and sometimes I desire to use it for tasks where C is used.
> GCC has computed gotos since many years (and Clang supports them) and as
> you see in the linked article some very common language interpreters you
> see around use computed gotos if the C compiler supports them. This is
> not a must-have feature for D, but dismissing it just because it's not
> safe is a bad idea.
>
> Bye,
> bearophile

That would make sense.
July 24, 2012
On 24-Jul-12 14:04, Walter Bright wrote:
> On 7/24/2012 12:58 AM, Dmitry Olshansky wrote:
>> Now if I use final switches there is still:
>>
>> A) jump table per switch, or maybe less but there is no guarantees
>>   (= depend on another optimization that's not here)
>> B) it's an ugly and a roundabout way to do this
>>
>> However I think that always requiring tail call optimization or
>> providing syntax
>> to enforce it would work:
>>
>> void op_1()
>> {
>>      ...//some code for instruction 1
>>      opcode = cast(function void ())code[pc++];
>>      goto opcode(); //opcode is pointer to function op_xx
>> }
>> //can do without goto fn() iff tail call is GUARANTEED
>
> I believe you can do this with:
>
>      switch (pc++)
>
> and there are the same number of indirections.

And how is pc is supposed to be an opcode? It's a counter after all...

The trick is that it must be switch(code[pc++])...
It's just if code contains function pointers (or goto jump pointers) then separate jump table table is not needed:

code = [ &op_1, &op_2, &op_1, ... ]; //generated somewhere
pc = 0;
code[pc]();
// op_1 increments ( or changes somehow) pc, decodes next op and jumps to it


So I still of opinion that enforced tail call is the cleanest way to support this idiom.

-- 
Dmitry Olshansky
July 24, 2012
On 7/24/2012 6:58 AM, Dmitry Olshansky wrote:
>>> void op_1()
>>> {
>>>      ...//some code for instruction 1
>>>      opcode = cast(function void ())code[pc++];
>>>      goto opcode(); //opcode is pointer to function op_xx
>>> }
>>> //can do without goto fn() iff tail call is GUARANTEED
>>
>> I believe you can do this with:
>>
>>      switch (pc++)
>>
>> and there are the same number of indirections.
>
> And how is pc is supposed to be an opcode? It's a counter after all...

    switch (pc++)

and
    goto code[pc++]

are the same (same as in same number of indirections).

    switch (code[pc++])

and

    goto code[pc++]()

are the same, too.
July 24, 2012
On 24-Jul-12 21:03, Walter Bright wrote:
> On 7/24/2012 6:58 AM, Dmitry Olshansky wrote:
>>>> void op_1()
>>>> {
>>>>      ...//some code for instruction 1
>>>>      opcode = cast(function void ())code[pc++];
>>>>      goto opcode(); //opcode is pointer to function op_xx
>>>> }
>>>> //can do without goto fn() iff tail call is GUARANTEED
>>>
>>> I believe you can do this with:
>>>
>>>      switch (pc++)
>>>
>>> and there are the same number of indirections.
>>
>> And how is pc is supposed to be an opcode? It's a counter after all...
>
>      switch (pc++)
>
> and
>      goto code[pc++]
>
> are the same (same as in same number of indirections).
>
>      switch (code[pc++])
>
> and
>
>      goto code[pc++]()
>
> are the same, too.

It's not. Let's get to assembly then. It's an illustration as I'm no expert and may have made some illegal shortcuts in this listing.

goto code[pc++]() is roughly:

mov ecx, [ebx]
inc ebx
jmp [ecx]

switch(code[pc++]) is:

mov ecx, [ebx]
inc ebx
mov ecx, [edx+ecx] // assuming jump table is at edx
jump [ecx]

If you count only jumps, then yes, the same number of indirect jumps. BUT note the use of extra register to point to the table & extra read of jump table contents. (BTW I assumed jump table address is loaded in register, a luxurious assumption esp. on 32bit).

Again, the biggest practical limitation of switches (loosing some performance hurts but not show stopper) is that last time I checked dmd doesn't try to merge equivalent jump tables.

Thus I can't put VM dispatch switch at the of each branch of main opcode switch (see my earlier posts) to help branch predictor. It just spawns ton of new tables, of course it has lower performance and wastes data cache.

-- 
Dmitry Olshansky