October 13, 2016
On 2016-10-13 03:26, Andrei Alexandrescu wrote:

> Yah, shouldn't go in object.d as it's fairly niche. On the other hand
> defining a new module for two functions seems excessive unless we have a
> good theme. On the third hand we may find an existing module that's
> topically close. Thoughts? -- Andrei

I think it should be a new module. I think core.intrinsics, as Stefan suggested, sounds like a good idea. I don't think having a module with only two functions is a problem, assuming we expect more of these functions.

We already have that case with core.attribute [1], which only have _one_ attribute defined.

[1] https://github.com/dlang/druntime/blob/master/src/core/attribute.d#L54

-- 
/Jacob Carlborg
October 13, 2016
On Wednesday, 12 October 2016 at 20:24:54 UTC, safety0ff wrote:
> Code: http://pastebin.com/CFCpUftW

Line 25 doesn't look trusted: reads past the end of an empty string.
October 13, 2016
On Thursday, 13 October 2016 at 14:51:50 UTC, Kagamin wrote:
> On Wednesday, 12 October 2016 at 20:24:54 UTC, safety0ff wrote:
>> Code: http://pastebin.com/CFCpUftW
>
> Line 25 doesn't look trusted: reads past the end of an empty string.

Length is checked in the loop that calls this function.

In phobos length is only checked with an assertion,
October 13, 2016
On Thursday, 13 October 2016 at 01:36:44 UTC, Andrei Alexandrescu wrote:
>
> Oh ok, so it's that checksum in particular that got optimized. Bad benchmark! Bad! -- Andrei

Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.
October 14, 2016
On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
>> Bad benchmark! Bad! -- Andrei
>
> Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.

I disagree in longer loops code compactness is as important as in small ones.

This is about the smallest inline version of decode I could come up with :

__gshared static immutable ubyte[] charWidthTab = [
            2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
            2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
            3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
            4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1
];

dchar myFront(ref char[] str) pure nothrow
{
    dchar c = cast(dchar) str[0];
    if ((c & 128))
    {
        if (c & 64)
        	final switch(charWidthTab[c - 192])
        {
            case 2 :
                c |= ((str[1] & 0x80) >> 5);
            break;
            case 3 :
               c |= ((str[1] & 0x80) >> 4);
               c |= ((str[2] & 0x80) >> 10);
            break;
            case 4 :
               c |= ((str[1] & 0x80) >> 3);
               c |= ((str[2] & 0x80) >> 9);
               c |= ((str[3] & 0x80) >> 15);
            break;
            case 5,6,1 :
              goto Linvalid;
        }
        else
        Linvalid :
        	c = dchar.init;

    }
	return c;
}
October 15, 2016
On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote:
> On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
>>> Bad benchmark! Bad! -- Andrei
>>
>> Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.
>
> I disagree in longer loops code compactness is as important as in small ones.
>
> This is about the smallest inline version of decode I could come up with :
>
> __gshared static immutable ubyte[] charWidthTab = [
>             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>             3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
>             4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1
> ];
>
> dchar myFront(ref char[] str) pure nothrow
> {
>     dchar c = cast(dchar) str[0];
>     if ((c & 128))
>     {
>         if (c & 64)
>         	final switch(charWidthTab[c - 192])
>         {
>             case 2 :
>                 c |= ((str[1] & 0x80) >> 5);
>             break;
>             case 3 :
>                c |= ((str[1] & 0x80) >> 4);
>                c |= ((str[2] & 0x80) >> 10);
>             break;
>             case 4 :
>                c |= ((str[1] & 0x80) >> 3);
>                c |= ((str[2] & 0x80) >> 9);
>                c |= ((str[3] & 0x80) >> 15);
>             break;
>             case 5,6,1 :
>               goto Linvalid;
>         }
>         else
>         Linvalid :
>         	c = dchar.init;
>
>     }
> 	return c;
> }

Disregard all that code.
It is horribly wrong!

This is more correct : (Tough for some reason it does not pass the unittests)

dchar myFront(ref char[] str) pure
{
    dchar c = cast(dchar) str.ptr[0];
    if (c & 128)
    {
        if (c & 64)
        {
            auto l = charWidthTab.ptr[c - 192];
            if (str.length < l)
                goto Linvalid;

            final switch (l)
            {
            case 2:
                c = ((c & ~(64 | 128)) << 6);
                c |= (str.ptr[1] & ~0x80);
                break;
            case 3:
                c = ((c & ~(32 | 64 | 128)) << 12);
                c |= ((str.ptr[1] & ~0x80) << 6);
                c |= ((str.ptr[2] & ~0x80));
                break;
            case 4:
                c = ((c & ~(16 | 32 | 64 | 128)) << 18);
                c |= ((str.ptr[1] & ~0x80) << 12);
                c |= ((str.ptr[2] & ~0x80) << 6);
                c |= ((str.ptr[3] & ~0x80));
                break;
            case 5, 6, 1:
                goto Linvalid;
            }
        }
        else
    Linvalid : throw new Exception("yadayada");

    }
    return c;
}
October 15, 2016
On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote:
> On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
>>> Bad benchmark! Bad! -- Andrei
>>
>> Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.
>
> I disagree in longer loops code compactness is as important as in small ones.

You must have misunderstood:

My thought was simply that with a larger loop body, LLVM might not make such dramatic rearrangement of the basic blocks.

Take your straw man elsewhere :-/

>
> This is more correct : (Tough for some reason it does not pass the unittests)

You're only validating the first byte, current code validates all of them.
October 15, 2016
On Saturday, 15 October 2016 at 00:50:08 UTC, Stefan Koch wrote:
> On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote:
>> On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
>>>> Bad benchmark! Bad! -- Andrei
>>>
>>> Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.
>>
>> I disagree in longer loops code compactness is as important as in small ones.
>>
>> This is about the smallest inline version of decode I could come up with :
>>
>> __gshared static immutable ubyte[] charWidthTab = [
>>             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>>             2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>>             3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
>>             4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1
>> ];
>>
>> dchar myFront(ref char[] str) pure nothrow
>> {
>>     dchar c = cast(dchar) str[0];
>>     if ((c & 128))
>>     {
>>         if (c & 64)
>>         	final switch(charWidthTab[c - 192])
>>         {
>>             case 2 :
>>                 c |= ((str[1] & 0x80) >> 5);
>>             break;
>>             case 3 :
>>                c |= ((str[1] & 0x80) >> 4);
>>                c |= ((str[2] & 0x80) >> 10);
>>             break;
>>             case 4 :
>>                c |= ((str[1] & 0x80) >> 3);
>>                c |= ((str[2] & 0x80) >> 9);
>>                c |= ((str[3] & 0x80) >> 15);
>>             break;
>>             case 5,6,1 :
>>               goto Linvalid;
>>         }
>>         else
>>         Linvalid :
>>         	c = dchar.init;
>>
>>     }
>> 	return c;
>> }
>
> Disregard all that code.
> It is horribly wrong!
>
> This is more correct : (Tough for some reason it does not pass the unittests)
>
> dchar myFront(ref char[] str) pure
> {
>     dchar c = cast(dchar) str.ptr[0];
>     if (c & 128)
>     {
>         if (c & 64)
>         {
>             auto l = charWidthTab.ptr[c - 192];
>             if (str.length < l)
>                 goto Linvalid;
>
>             final switch (l)
>             {
>             case 2:
>                 c = ((c & ~(64 | 128)) << 6);
>                 c |= (str.ptr[1] & ~0x80);
>                 break;
>             case 3:
>                 c = ((c & ~(32 | 64 | 128)) << 12);
>                 c |= ((str.ptr[1] & ~0x80) << 6);
>                 c |= ((str.ptr[2] & ~0x80));
>                 break;
>             case 4:
>                 c = ((c & ~(16 | 32 | 64 | 128)) << 18);
>                 c |= ((str.ptr[1] & ~0x80) << 12);
>                 c |= ((str.ptr[2] & ~0x80) << 6);
>                 c |= ((str.ptr[3] & ~0x80));
>                 break;
>             case 5, 6, 1:
>                 goto Linvalid;
>             }
>         }
>         else
>     Linvalid : throw new Exception("yadayada");
>
>     }
>     return c;
> }

Looks very verbose to me. I had found in the BSD codebase a very clever utf-8 conversion function in C, maybe it can be used here. Sorry if I do not participate on the testing as I don't have a proper compilation environment here at home. Here the routine I use at work (it's in C), put that here for inspiration.

DEFINE_INLINE uint_t xctomb(char *r, wchar_t wc)
{
uint_t u8l = utf8len(wc);

  switch(u8l) {
    /* Note: code falls through cases! */
    case 4: r[3] = 0x80 | (wc & 0x3f); wc >>= 6; wc |= 0x10000;
    case 3: r[2] = 0x80 | (wc & 0x3f); wc >>= 6; wc |= 0x800;
    case 2: r[1] = 0x80 | (wc & 0x3f); wc >>= 6; wc |= 0xc0;
    case 1: r[0] = wc;
  }
  return u8l;
}

utf8len being

DEFINE_INLINE uint_t utf8len(wchar_t wc)
{
  if(wc < 0x80)
    return 1;
  else if(wc < 0x800)
    return 2;
  else
    if(wc < 0x10000)
      return 3;
    else
      return 4;
}


The code generated on SPARC with gcc 3.4.6 was really good. On x86_64 with gcc 5.1 was also not bad. I have not tried a lot of alternatives as UTF-8 coding is not a bottle neck on our project. There's also no check for length 5 and 6 as they are not possible on our system, but for here it has to be added. (the DEFINE_INLINE macro is either extern inline or inline depending on some macro magic that is not of importance here).
October 15, 2016
Oooops, I should not post after drinking 2 glasses of Châteauneuf-du-pape. That function does exactly the contrary of what popFront does. This one is conversion from dchar to multibyte not multibyte to dchar as you did.
Sorry for the inconvenience.
October 15, 2016
On 10/15/2016 12:42 PM, Patrick Schluter wrote:
> Sorry if I do not participate on the testing as I don't have a proper
> compilation environment here at home.

https://ldc.acomirei.ru

Andrei