March 24, 2014
On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Here's a baseline: http://goo.gl/91vIGc. Destroy!

Optimizing for smallest assembly size:

dchar front(char[] s)
{
 size_t bytesize;
 dchar result;
 switch (s[0])
 {
   case 0b00000000: .. case 0b01111111:
   	return s[0];
   case 0b11000000: .. case 0b11011111:
   	return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
   case 0b11100000: .. case 0b11101111:
   	result = s[0] & 0b00001111;
   	bytesize = 3;
   	break;
   case 0b11110000: .. case 0b11110111:
   	result = s[0] & 0b00000111;
   	bytesize = 4;
   default:
      return dchar.init;
 }
 foreach (i; 1..bytesize)
     result = (result << 6) | (s[i] & 0b00111111);
 return result;
}

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

March 24, 2014
On 2014-03-23 22:58:32 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Array bounds checking takes care of that.

If you want the smallest assembly size with array bound checking, make the function @safe and see the disaster it causes to the assembly size. That's what you have to optimize. If you're going to optimize while looking at the assembly, better check for bounds manually:

dchar front(char[] s)
{
 if (s.length < 1) return dchar.init;
 size_t bytesize;
 dchar result;
 switch (s[0])
 {
   case 0b00000000: .. case 0b01111111:
   	return s[0];
   case 0b11000000: .. case 0b11011111:
   	result = s[0] & 0b00011111;
   	bytesize = 2;
   	break;
   case 0b11100000: .. case 0b11101111:
   	result = s[0] & 0b00001111;
   	bytesize = 3;
   	break;
   case 0b11110000: .. case 0b11110111:
   	result = s[0] & 0b00000111;
   	bytesize = 4;
   default:
      return dchar.init;
 }
 if (s.length < bytesize) return dchar.init;
 foreach (i; 1..bytesize)
     result = (result << 6) | (s[i] & 0b00111111);
 return result;
}



-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

March 24, 2014
On 3/23/14, 6:53 PM, Michel Fortin wrote:
> On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> said:
>
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>
> Optimizing for smallest assembly size:
>
> dchar front(char[] s)
> {
>   size_t bytesize;
>   dchar result;
>   switch (s[0])
>   {
>     case 0b00000000: .. case 0b01111111:
>         return s[0];
>     case 0b11000000: .. case 0b11011111:
>         return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
>     case 0b11100000: .. case 0b11101111:
>         result = s[0] & 0b00001111;
>         bytesize = 3;
>         break;
>     case 0b11110000: .. case 0b11110111:
>         result = s[0] & 0b00000111;
>         bytesize = 4;
>     default:
>        return dchar.init;
>   }
>   foreach (i; 1..bytesize)
>       result = (result << 6) | (s[i] & 0b00111111);
>   return result;
> }
>

Nice, thanks! I'd hope for a short path for the ASCII subset, could you achieve that?

Andrei
March 24, 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>
> Andrei

Here's my attempt: http://goo.gl/alJ9JS
I interpreted the challenge slightly differently, I went for fewer lines of code and fewer branches.

I verified the output for all valid inputs.
March 24, 2014
On 3/23/14, 8:36 PM, safety0ff wrote:
> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
>> Andrei
>
> Here's my attempt: http://goo.gl/alJ9JS
> I interpreted the challenge slightly differently, I went for fewer lines
> of code and fewer branches.
>
> I verified the output for all valid inputs.

Very interesting! -- Andrei
March 24, 2014
On 2014-03-24 02:25:17 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> On 3/23/14, 6:53 PM, Michel Fortin wrote:
>> On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> said:
>> 
>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>> 
>> Optimizing for smallest assembly size:
>> 
>> dchar front(char[] s)
>> {
>>   size_t bytesize;
>>   dchar result;
>>   switch (s[0])
>>   {
>>     case 0b00000000: .. case 0b01111111:
>>         return s[0];
>>     case 0b11000000: .. case 0b11011111:
>>         return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
>>     case 0b11100000: .. case 0b11101111:
>>         result = s[0] & 0b00001111;
>>         bytesize = 3;
>>         break;
>>     case 0b11110000: .. case 0b11110111:
>>         result = s[0] & 0b00000111;
>>         bytesize = 4;
>>     default:
>>        return dchar.init;
>>   }
>>   foreach (i; 1..bytesize)
>>       result = (result << 6) | (s[i] & 0b00111111);
>>   return result;
>> }
> 
> Nice, thanks! I'd hope for a short path for the ASCII subset, could you achieve that?

Unfortunately, there's a bug in the above. A missing "break" results in a fallthrough to default case. That's why the optimizer is so good, it just omits the four-byte branch entirely. I noticed something was wrong by looking at the assembly. I really wish D had no implicit fallthrough.

But try this instead, the result is even shorter:

dchar front(char[] s)
{
 if (s[0] < 0b1000000)
   return s[0]; // ASCII

 // pattern     indicator  tailLength
 // 0b100xxxxx  0b00 (0)   1
 // 0b101xxxxx  0b01 (1)   1 == indicator
 // 0b110xxxxx  0b10 (2)   2 == indicator
 // 0b111xxxxx  0b11 (3)   3 == indicator
 // note: undefined result for illegal 0b1111xxxx case

 auto indicator = (s[0] >> 5) & 0b11;
 auto tailLength = indicator ? indicator : 1;

 dchar result = s[0] & (0b00111111 >> tailLength);
 foreach (i; 0..tailLength)
     result = (result << 6) | (s[1+i] & 0b00111111);
 return result;
}

(Disclaimer: not tested, but I did check that all the expected code paths are present in the assembly this time.)

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

March 24, 2014
Michel Fortin:

> I really wish D had no implicit fallthrough.

Isn't your wish already granted?

Bye,
bearophile
March 24, 2014
On 2014-03-24 04:39:08 +0000, "bearophile" <bearophileHUGS@lycos.com> said:

> Michel Fortin:
> 
>> I really wish D had no implicit fallthrough.
> 
> Isn't your wish already granted?

Maybe. I don't have a D compiler installed at the moment to check. I'm just playing with d.godbolt.org and it accepts implicit fallthrough.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

March 24, 2014
On 3/23/14, 9:37 PM, Michel Fortin wrote:
> On 2014-03-24 02:25:17 +0000, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> said:
>
>> On 3/23/14, 6:53 PM, Michel Fortin wrote:
>>> On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> said:
>>>
>>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>
>>> Optimizing for smallest assembly size:
>>>
>>> dchar front(char[] s)
>>> {
>>>   size_t bytesize;
>>>   dchar result;
>>>   switch (s[0])
>>>   {
>>>     case 0b00000000: .. case 0b01111111:
>>>         return s[0];
>>>     case 0b11000000: .. case 0b11011111:
>>>         return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
>>>     case 0b11100000: .. case 0b11101111:
>>>         result = s[0] & 0b00001111;
>>>         bytesize = 3;
>>>         break;
>>>     case 0b11110000: .. case 0b11110111:
>>>         result = s[0] & 0b00000111;
>>>         bytesize = 4;
>>>     default:
>>>        return dchar.init;
>>>   }
>>>   foreach (i; 1..bytesize)
>>>       result = (result << 6) | (s[i] & 0b00111111);
>>>   return result;
>>> }
>>
>> Nice, thanks! I'd hope for a short path for the ASCII subset, could
>> you achieve that?
>
> Unfortunately, there's a bug in the above. A missing "break" results in
> a fallthrough to default case. That's why the optimizer is so good, it
> just omits the four-byte branch entirely. I noticed something was wrong
> by looking at the assembly. I really wish D had no implicit fallthrough.

-w

Probably time to move that from a warning to an error. In the short time we've used D at Facebook (forgot to add "-w" to the implicit flags) we've already have not one, but two bugs related to switch's implicit fallthrough.

> But try this instead, the result is even shorter:
>
> dchar front(char[] s)
> {
>   if (s[0] < 0b1000000)
>     return s[0]; // ASCII
>
>   // pattern     indicator  tailLength
>   // 0b100xxxxx  0b00 (0)   1
>   // 0b101xxxxx  0b01 (1)   1 == indicator
>   // 0b110xxxxx  0b10 (2)   2 == indicator
>   // 0b111xxxxx  0b11 (3)   3 == indicator
>   // note: undefined result for illegal 0b1111xxxx case
>
>   auto indicator = (s[0] >> 5) & 0b11;
>   auto tailLength = indicator ? indicator : 1;
>
>   dchar result = s[0] & (0b00111111 >> tailLength);
>   foreach (i; 0..tailLength)
>       result = (result << 6) | (s[1+i] & 0b00111111);
>   return result;
> }
>
> (Disclaimer: not tested, but I did check that all the expected code
> paths are present in the assembly this time.)

Nicely done. I collected what we have at http://goo.gl/W9V6E7.


Andrei

March 24, 2014
On 2014-03-24 04:37:22 +0000, Michel Fortin <michel.fortin@michelf.ca> said:

> But try this instead, the result is even shorter:

Oops, messed up my patterns. Here's a hopefully fixed front():

dchar front(char[] s)
{
 if (s[0] < 0b1000000)
   return s[0]; // ASCII

 // pattern     indicator  tailLength
 // 0b1100xxxx  0b00 (0)   1
 // 0b1101xxxx  0b01 (1)   1 == indicator
 // 0b1110xxxx  0b10 (2)   2 == indicator
 // 0b1111xxxx  0b11 (3)   3 == indicator
 // note: undefined result for illegal 0b11111xxx case

 auto indicator = (s[0] >> 4) & 0b11;
 auto tailLength = indicator ? indicator : 1;

 dchar result = s[0] & (0b00111111 >> tailLength);
 foreach (i; 0..tailLength)
     result = (result << 6) | (s[1+i] & 0b00111111);
 return result;
}

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca