Reducing the cost of autodecoding (page 3)

On 2016-10-13 03:26, Andrei Alexandrescu wrote: > Yah, shouldn't go in object.d as it's fairly niche. On the other hand > defining a new module for two functions seems excessive unless we have a > good theme. On the third hand we may find an existing module that's > topically close. Thoughts? -- Andrei I think it should be a new module. I think core.intrinsics, as Stefan suggested, sounds like a good idea. I don't think having a module with only two functions is a problem, assuming we expect more of these functions. We already have that case with core.attribute [1], which only have _one_ attribute defined. [1] https://github.com/dlang/druntime/blob/master/src/core/attribute.d#L54 -- /Jacob Carlborg

On Thursday, 13 October 2016 at 14:51:50 UTC, Kagamin wrote: > On Wednesday, 12 October 2016 at 20:24:54 UTC, safety0ff wrote: >> Code: http://pastebin.com/CFCpUftW > > Line 25 doesn't look trusted: reads past the end of an empty string. Length is checked in the loop that calls this function. In phobos length is only checked with an assertion,

On Thursday, 13 October 2016 at 01:36:44 UTC, Andrei Alexandrescu wrote: > > Oh ok, so it's that checksum in particular that got optimized. Bad benchmark! Bad! -- Andrei Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.

On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote: >> Bad benchmark! Bad! -- Andrei > > Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one. I disagree in longer loops code compactness is as important as in small ones. This is about the smallest inline version of decode I could come up with : __gshared static immutable ubyte[] charWidthTab = [ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1 ]; dchar myFront(ref char[] str) pure nothrow { dchar c = cast(dchar) str[0]; if ((c & 128)) { if (c & 64) final switch(charWidthTab[c - 192]) { case 2 : c |= ((str[1] & 0x80) >> 5); break; case 3 : c |= ((str[1] & 0x80) >> 4); c |= ((str[2] & 0x80) >> 10); break; case 4 : c |= ((str[1] & 0x80) >> 3); c |= ((str[2] & 0x80) >> 9); c |= ((str[3] & 0x80) >> 15); break; case 5,6,1 : goto Linvalid; } else Linvalid : c = dchar.init; } return c; }

On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote: > On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote: >>> Bad benchmark! Bad! -- Andrei >> >> Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one. > > I disagree in longer loops code compactness is as important as in small ones. > > This is about the smallest inline version of decode I could come up with : > > __gshared static immutable ubyte[] charWidthTab = [ > 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, > 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1 > ]; > > dchar myFront(ref char[] str) pure nothrow > { > dchar c = cast(dchar) str[0]; > if ((c & 128)) > { > if (c & 64) > final switch(charWidthTab[c - 192]) > { > case 2 : > c |= ((str[1] & 0x80) >> 5); > break; > case 3 : > c |= ((str[1] & 0x80) >> 4); > c |= ((str[2] & 0x80) >> 10); > break; > case 4 : > c |= ((str[1] & 0x80) >> 3); > c |= ((str[2] & 0x80) >> 9); > c |= ((str[3] & 0x80) >> 15); > break; > case 5,6,1 : > goto Linvalid; > } > else > Linvalid : > c = dchar.init; > > } > return c; > } Disregard all that code. It is horribly wrong! This is more correct : (Tough for some reason it does not pass the unittests) dchar myFront(ref char[] str) pure { dchar c = cast(dchar) str.ptr[0]; if (c & 128) { if (c & 64) { auto l = charWidthTab.ptr[c - 192]; if (str.length < l) goto Linvalid; final switch (l) { case 2: c = ((c & ~(64 | 128)) << 6); c |= (str.ptr[1] & ~0x80); break; case 3: c = ((c & ~(32 | 64 | 128)) << 12); c |= ((str.ptr[1] & ~0x80) << 6); c |= ((str.ptr[2] & ~0x80)); break; case 4: c = ((c & ~(16 | 32 | 64 | 128)) << 18); c |= ((str.ptr[1] & ~0x80) << 12); c |= ((str.ptr[2] & ~0x80) << 6); c |= ((str.ptr[3] & ~0x80)); break; case 5, 6, 1: goto Linvalid; } } else Linvalid : throw new Exception("yadayada"); } return c; }

October 15, 2016

Re: Reducing the cost of autodecoding

Posted by Patrick Schluter
in reply to Stefan Koch

Permalink

Patrick Schluter

Posted in reply to Stefan Koch

Permalink

On Saturday, 15 October 2016 at 00:50:08 UTC, Stefan Koch wrote:
> On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote:
>> On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
>>>> Bad benchmark! Bad! -- Andrei
>>>
>>> Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.
>>
>> I disagree in longer loops code compactness is as important as in small ones.
>>
>> This is about the smallest inline version of decode I could come up with :
>>
>> __gshared static immutable ubyte[] charWidthTab = [
>>             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>>             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>>             3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
>>             4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1
>> ];
>>
>> dchar myFront(ref char[] str) pure nothrow
>> {
>>     dchar c = cast(dchar) str[0];
>>     if ((c & 128))
>>     {
>>         if (c & 64)
>>         	final switch(charWidthTab[c - 192])
>>         {
>>             case 2 :
>>                 c |= ((str[1] & 0x80) >> 5);
>>             break;
>>             case 3 :
>>                c |= ((str[1] & 0x80) >> 4);
>>                c |= ((str[2] & 0x80) >> 10);
>>             break;
>>             case 4 :
>>                c |= ((str[1] & 0x80) >> 3);
>>                c |= ((str[2] & 0x80) >> 9);
>>                c |= ((str[3] & 0x80) >> 15);
>>             break;
>>             case 5,6,1 :
>>               goto Linvalid;
>>         }
>>         else
>>         Linvalid :
>>         	c = dchar.init;
>>
>>     }
>> 	return c;
>> }
>
> Disregard all that code.
> It is horribly wrong!
>
> This is more correct : (Tough for some reason it does not pass the unittests)
>
> dchar myFront(ref char[] str) pure
> {
>     dchar c = cast(dchar) str.ptr[0];
>     if (c & 128)
>     {
>         if (c & 64)
>         {
>             auto l = charWidthTab.ptr[c - 192];
>             if (str.length < l)
>                 goto Linvalid;
>
>             final switch (l)
>             {
>             case 2:
>                 c = ((c & ~(64 | 128)) << 6);
>                 c |= (str.ptr[1] & ~0x80);
>                 break;
>             case 3:
>                 c = ((c & ~(32 | 64 | 128)) << 12);
>                 c |= ((str.ptr[1] & ~0x80) << 6);
>                 c |= ((str.ptr[2] & ~0x80));
>                 break;
>             case 4:
>                 c = ((c & ~(16 | 32 | 64 | 128)) << 18);
>                 c |= ((str.ptr[1] & ~0x80) << 12);
>                 c |= ((str.ptr[2] & ~0x80) << 6);
>                 c |= ((str.ptr[3] & ~0x80));
>                 break;
>             case 5, 6, 1:
>                 goto Linvalid;
>             }
>         }
>         else
>     Linvalid : throw new Exception("yadayada");
>
>     }
>     return c;
> }

Looks very verbose to me. I had found in the BSD codebase a very clever utf-8 conversion function in C, maybe it can be used here. Sorry if I do not participate on the testing as I don't have a proper compilation environment here at home. Here the routine I use at work (it's in C), put that here for inspiration.

DEFINE_INLINE uint_t xctomb(char *r, wchar_t wc)
{
uint_t u8l = utf8len(wc);

  switch(u8l) {
    /* Note: code falls through cases! */
    case 4: r[3] = 0x80 | (wc & 0x3f); wc >>= 6; wc |= 0x10000;
    case 3: r[2] = 0x80 | (wc & 0x3f); wc >>= 6; wc |= 0x800;
    case 2: r[1] = 0x80 | (wc & 0x3f); wc >>= 6; wc |= 0xc0;
    case 1: r[0] = wc;
  }
  return u8l;
}

utf8len being

DEFINE_INLINE uint_t utf8len(wchar_t wc)
{
  if(wc < 0x80)
    return 1;
  else if(wc < 0x800)
    return 2;
  else
    if(wc < 0x10000)
      return 3;
    else
      return 4;
}


The code generated on SPARC with gcc 3.4.6 was really good. On x86_64 with gcc 5.1 was also not bad. I have not tried a lot of alternatives as UTF-8 coding is not a bottle neck on our project. There's also no check for length 5 and 6 as they are not possible on our system, but for here it has to be added. (the DEFINE_INLINE macro is either extern inline or inline depending on some macro magic that is not of importance here).

Oooops, I should not post after drinking 2 glasses of Châteauneuf-du-pape. That function does exactly the contrary of what popFront does. This one is conversion from dchar to multibyte not multibyte to dchar as you did. Sorry for the inconvenience.

Forums