October 11, 2016
On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote:
> That looks good. I'm just worried about the jump forward - ideally the case c < 127 would simply entail a quick return. I tried a fix, but it didn't do what I wanted in ldc. We shouldn't assert(0) if wrong - just skip one byte. Also, are we right to not worry about 5- and 6-byte sequences? The docs keep on threatening with it, and then immediately mention those are not valid.
>
> [ ... ]
>
> Andrei
>

If you want to skip a byte it's easy to do as well.

void popFront3(ref char[] s) @trusted pure nothrow {
   immutable c = s[0];
   uint char_length = 1;
   if (c < 127)
   {
   Lend :
     s = s.ptr[char_length .. s.length];
   } else {
     if ((c & b01100_0000) == 0b1000_0000)
     {
       //just skip one in case this is not the beginning of a code-point char
       goto Lend;
     }
     if (c < 192)
     {
       char_length = 2;
       goto Lend;
     }
     if (c < 240)
     {
       char_length = 3;
       goto Lend;
     }
     if (c < 248)
     {
       char_length = 4;
       goto Lend;
     }
   }
 }

October 11, 2016
On Tuesday, 11 October 2016 at 03:00:45 UTC, Stefan Koch wrote:
> On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote:
>> That looks good. I'm just worried about the jump forward - ideally the case c < 127 would simply entail a quick return. I tried a fix, but it didn't do what I wanted in ldc. We shouldn't assert(0) if wrong - just skip one byte. Also, are we right to not worry about 5- and 6-byte sequences? The docs keep on threatening with it, and then immediately mention those are not valid.
>>
>> [ ... ]
>>
>> Andrei
>>
>
> If you want to skip a byte it's easy to do as well.
>
> void popFront3(ref char[] s) @trusted pure nothrow {
>    immutable c = s[0];
>    uint char_length = 1;
>    if (c < 127)
>    {
>    Lend :
>      s = s.ptr[char_length .. s.length];
>    } else {
>      if ((c & b01100_0000) == 0b1000_0000)
>      {
>        //just skip one in case this is not the beginning of a code-point char
>        goto Lend;
>      }
>      if (c < 192)
>      {
>        char_length = 2;
>        goto Lend;
>      }
>      if (c < 240)
>      {
>        char_length = 3;
>        goto Lend;
>      }
>      if (c < 248)
>      {
>        char_length = 4;
>        goto Lend;
>      }
>    }
>  }

Pardon me asking, but why all these gotos instead of else ifs:

  if (c < 192)
  {
    char_length = 2;
  }
  else if (c < 240)
  {
    char_length = 3;
  } else if (...) {
  }

Does it have any effect on generated code (I don't think it should)?
October 11, 2016
On Tuesday, 11 October 2016 at 03:18:24 UTC, Lurker wrote:
>
> Pardon me asking, but why all these gotos instead of else ifs:
>
>   if (c < 192)
>   {
>     char_length = 2;
>   }
>   else if (c < 240)
>   {
>     char_length = 3;
>   } else if (...) {
>   }
>
> Does it have any effect on generated code (I don't think it should)?

No it does not.
I wrote the gotos because that is how I am used to thinking about code.
If else should work fine here.

October 10, 2016
On 10/10/16 11:00 PM, Stefan Koch wrote:
> On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote:
>> That looks good. I'm just worried about the jump forward - ideally the
>> case c < 127 would simply entail a quick return. I tried a fix, but it
>> didn't do what I wanted in ldc. We shouldn't assert(0) if wrong - just
>> skip one byte. Also, are we right to not worry about 5- and 6-byte
>> sequences? The docs keep on threatening with it, and then immediately
>> mention those are not valid.
>>
>> [ ... ]
>>
>> Andrei
>>
>
> If you want to skip a byte it's easy to do as well.
>
> void popFront3(ref char[] s) @trusted pure nothrow {
>    immutable c = s[0];
>    uint char_length = 1;
>    if (c < 127)
>    {
>    Lend :
>      s = s.ptr[char_length .. s.length];
>    } else {
>      if ((c & b01100_0000) == 0b1000_0000)
>      {
>        //just skip one in case this is not the beginning of a code-point
> char
>        goto Lend;
>      }
>      if (c < 192)
>      {
>        char_length = 2;
>        goto Lend;
>      }
>      if (c < 240)
>      {
>        char_length = 3;
>        goto Lend;
>      }
>      if (c < 248)
>      {
>        char_length = 4;
>        goto Lend;
>      }
>    }
>  }
>

Affirmative. That's identical to the code in "[ ... ]" :o). Generated code still does a jmp forward though. -- Andrei
October 11, 2016
On Tuesday, 11 October 2016 at 03:58:59 UTC, Andrei Alexandrescu wrote:
> On 10/10/16 11:00 PM, Stefan Koch wrote:
>> On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote:
>>> That looks good. I'm just worried about the jump forward - ideally the
>>> case c < 127 would simply entail a quick return. I tried a fix, but it
>>> didn't do what I wanted in ldc. We shouldn't assert(0) if wrong - just
>>> skip one byte. Also, are we right to not worry about 5- and 6-byte
>>> sequences? The docs keep on threatening with it, and then immediately
>>> mention those are not valid.
>>>
>>> [ ... ]
>>>
>>> Andrei
>>>
>>
>> If you want to skip a byte it's easy to do as well.
>>
>> void popFront3(ref char[] s) @trusted pure nothrow {
>>    immutable c = s[0];
>>    uint char_length = 1;
>>    if (c < 127)
>>    {
>>    Lend :
>>      s = s.ptr[char_length .. s.length];
>>    } else {
>>      if ((c & b01100_0000) == 0b1000_0000)
>>      {
>>        //just skip one in case this is not the beginning of a code-point
>> char
>>        goto Lend;
>>      }
>>      if (c < 192)
>>      {
>>        char_length = 2;
>>        goto Lend;
>>      }
>>      if (c < 240)
>>      {
>>        char_length = 3;
>>        goto Lend;
>>      }
>>      if (c < 248)
>>      {
>>        char_length = 4;
>>        goto Lend;
>>      }
>>    }
>>  }
>>
>
> Affirmative. That's identical to the code in "[ ... ]" :o). Generated code still does a jmp forward though. -- Andrei

It was not identical.
((c & b01100_0000) == 0b1000_0000))
Can be true in all of the 3 following cases.
If we do not do a jmp to return here, we cannot guarantee that we will not skip over the next valid char.
Thereby corrupting already corrupt strings even more.

For best performance we need to leave the gotos in there.

October 11, 2016
On Tuesday, 11 October 2016 at 04:05:47 UTC, Stefan Koch wrote:
> On Tuesday, 11 October 2016 at 03:58:59 UTC, Andrei Alexandrescu wrote:
>> On 10/10/16 11:00 PM, Stefan Koch wrote:
>>> On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei Alexandrescu wrote:
>>>>[...]
>>>
>>> If you want to skip a byte it's easy to do as well.
>>>
>>> void popFront3(ref char[] s) @trusted pure nothrow {
>>>    immutable c = s[0];
>>>    uint char_length = 1;
>>>    if (c < 127)
>>>    {
>>>    Lend :
>>>      s = s.ptr[char_length .. s.length];
>>>    } else {
>>>      if ((c & b01100_0000) == 0b1000_0000)
>>>      {
>>>        //just skip one in case this is not the beginning of a code-point
>>> char
>>>        goto Lend;
>>>      }
>>>      if (c < 192)
>>>      {
>>>        char_length = 2;
>>>        goto Lend;
>>>      }
>>>      if (c < 240)
>>>      {
>>>        char_length = 3;
>>>        goto Lend;
>>>      }
>>>      if (c < 248)
>>>      {
>>>        char_length = 4;
>>>        goto Lend;
>>>      }
>>>    }
>>>  }
>>>
>>
>> Affirmative. That's identical to the code in "[ ... ]" :o). Generated code still does a jmp forward though. -- Andrei
>
> It was not identical.
> ((c & b01100_0000) == 0b1000_0000))
> Can be true in all of the 3 following cases.
> If we do not do a jmp to return here, we cannot guarantee that we will not skip over the next valid char.
> Thereby corrupting already corrupt strings even more.
>
> For best performance we need to leave the gotos in there.

A branch-free version:

void popFront4(ref char[] s) @trusted pure nothrow {
  immutable c = s[0];
  uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248);
  s = s.ptr[char_length .. s.length];
}

Theoretically the char_length could be computed with three sub and addc instructions, but no compiler is smart enough to detect that.
October 11, 2016
On Tuesday, 11 October 2016 at 07:30:26 UTC, Matthias Bentrup wrote:
>
> A branch-free version:
>
> void popFront4(ref char[] s) @trusted pure nothrow {
>   immutable c = s[0];
>   uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248);
>   s = s.ptr[char_length .. s.length];
> }
>
> Theoretically the char_length could be computed with three sub and addc instructions, but no compiler is smart enough to detect that.

You still need to special case c < 128
as well as the follow chars.

also smaller c's are more common the bigger ones making the branching version faster on average.
October 11, 2016
On Tuesday, 11 October 2016 at 08:03:40 UTC, Stefan Koch wrote:
> On Tuesday, 11 October 2016 at 07:30:26 UTC, Matthias Bentrup wrote:
>>
>> A branch-free version:
>>
>> void popFront4(ref char[] s) @trusted pure nothrow {
>>   immutable c = s[0];
>>   uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248);
>>   s = s.ptr[char_length .. s.length];
>> }
>>
>> Theoretically the char_length could be computed with three sub and addc instructions, but no compiler is smart enough to detect that.
>
> You still need to special case c < 128
> as well as the follow chars.
>
> also smaller c's are more common the bigger ones making the branching version faster on average.

Also the code produces conditional set instructions which have a higher latency.
And worse throughput.

October 11, 2016
On Tuesday, 11 October 2016 at 08:17:52 UTC, Stefan Koch wrote:
> On Tuesday, 11 October 2016 at 08:03:40 UTC, Stefan Koch wrote:
>> On Tuesday, 11 October 2016 at 07:30:26 UTC, Matthias Bentrup wrote:
>>>
>>> A branch-free version:
>>>
>>> void popFront4(ref char[] s) @trusted pure nothrow {
>>>   immutable c = s[0];
>>>   uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248);
>>>   s = s.ptr[char_length .. s.length];
>>> }
>>>
>>> Theoretically the char_length could be computed with three sub and addc instructions, but no compiler is smart enough to detect that.
>>
>> You still need to special case c < 128
>> as well as the follow chars.
>>
>> also smaller c's are more common the bigger ones making the branching version faster on average.
>
> Also the code produces conditional set instructions which have a higher latency.
> And worse throughput.

void popFront1(ref char[] s) @trusted pure nothrow
{
  import core.bitop, std.algorithm;
  auto v = bsr(~s[0] | 1);
  s = s[clamp(v, 1, v > 6 ? 1 : $)..$];
}

Seems to be less if i'm not wrong.
October 11, 2016
On Tuesday, 11 October 2016 at 08:17:52 UTC, Stefan Koch wrote:
>
> Also the code produces conditional set instructions which have a higher latency.
> And worse throughput.

On my arguably a bit dated laptop:
popFront3 performs 109 us best time
and popFront4 performs with 265 us best time

Testcode :
void main()
{
import std.datetime : StopWatch;
import std.stdio;
foreach(_;0 .. 255) {
char[] test1 = (import("blns.txt")).dup;
StopWatch sw;
sw.start;
while(test1.length) popFront(test1);
sw.stop;
writeln("pf1 took ", sw.peek.usecs, "us");
sw.reset();
}
}

blns.txt is taken from
https://github.com/minimaxir/big-list-of-naughty-strings/blob/master/blns.txt