October 12, 2016
On 10/12/2016 10:39 AM, Stefan Koch wrote:
> On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote:
>> On 10/12/2016 09:39 AM, Stefan Koch wrote:
>
>>
>> Thanks! I'd say make sure there is exactly 0% loss on performance
>> compared to the popFront in the ASCII case, and if so make a PR with
>> the table version. -- Andrei
>
> I measured again.
> The table version has a DECREASES the performance for dmd by 1%.
> I think we should keep performance for dmd in mind.
> I could add the table version in a version (LDC) block.

No need. 1% for dmd is negligible. 25% would raise an eyebrow. -- Andrei
October 12, 2016
On Wednesday, 12 October 2016 at 14:46:32 UTC, Andrei Alexandrescu wrote:
>
> No need. 1% for dmd is negligible. 25% would raise an eyebrow. -- Andrei

Alright then
PR: https://github.com/dlang/phobos/pull/4849
October 12, 2016
My current favorites:

void popFront(ref char[] s) @trusted pure nothrow {
  immutable byte c = s[0];
  if (c >= -2) {
    s = s.ptr[1 .. s.length];
  } else {
    import core.bitop;
    size_t i = 7u - bsr(~c);
    import std.algorithm;
    s = s.ptr[min(i, s.length) .. s.length];
  }
}

I also experimented with explicit speculation:

void popFront(ref char[] s) @trusted pure nothrow {
  immutable byte c = s[0];
  s = s.ptr[1 .. s.length];
  if (c < -2) {
    import core.bitop;
    size_t i = 6u - bsr(~c);
    import std.algorithm;
    s = s.ptr[min(i, s.length) .. s.length];
  }
}


LDC and GDC both compile these to 23 instructions.
DMD does worse than with my other code.

You can influence GDC's block layout with __builtin_expect.

I notice that many other snippets posted use uint instead of size_t in the multi-byte branch. This generates extra instructions for me.
October 12, 2016
On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote:
> [Snip]

Didn't see the LUT implementation, nvm!

October 12, 2016
On 10/12/2016 01:05 PM, safety0ff wrote:
> On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote:
>> [Snip]
>
> Didn't see the LUT implementation, nvm!

Yah, that's pretty clever. Better yet, I suspect we can reuse the look-up table for front() as well. -- Andrei

October 14, 2016
On Wednesday, 12 October 2016 at 17:59:51 UTC, Andrei Alexandrescu wrote:
> On 10/12/2016 01:05 PM, safety0ff wrote:
>> On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote:
>>> [Snip]
>>
>> Didn't see the LUT implementation, nvm!
>
> Yah, that's pretty clever. Better yet, I suspect we can reuse the look-up table for front() as well. -- Andrei

The first results from stoke are in.
It turns out stoke likes to produce garbage :(
It's smallest result so far has around 100 instructions.
However it might get better if I give it a few more hours to explore.
October 14, 2016
On Friday, 14 October 2016 at 04:21:28 UTC, Stefan Koch wrote:
> On Wednesday, 12 October 2016 at 17:59:51 UTC, Andrei Alexandrescu wrote:
>> On 10/12/2016 01:05 PM, safety0ff wrote:
>>> On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote:
>>>> [Snip]
>>>
>>> Didn't see the LUT implementation, nvm!
>>
>> Yah, that's pretty clever. Better yet, I suspect we can reuse the look-up table for front() as well. -- Andrei
>
> The first results from stoke are in.
> It turns out stoke likes to produce garbage :(
> It's smallest result so far has around 100 instructions.
> However it might get better if I give it a few more hours to explore.

Also I doubt that it is correct :(

testb $0x8, 0x200aa9(%rip)
movl $0x6, %eax
prefetchnta 0x200a9d(%rip)
je .L_400650
mulb -0x4(%rsp)
movb $0xfa, -0x5(%rsp)
vmovd (%rax), %xmm6
pmovzxbd -0x5(%rsp), %xmm11      1
psrad $0xf9, %xmm6
movl $0xef, %esp
pextrd $0xfe, %xmm6, (%rax)
.L_4005b0:
vrsqrtps 0x200a69(%rip), %ymm13
vzeroall
incl %edi
cmpb %ah, %dl
cmpq %rdi, %rdi
jbe .L_400640
ja .L_4005f0
pcmpeqq -0x4(%rsp), %xmm10
sbbb %ah, 0x200a4d(%rip)
jmpq .L_400643
.L_4005f0:
ja .L_40060c
jmpq .L_400643
.L_40060c:
ja .L_400628
minsd 0x200a3c(%rip), %xmm10
jmpq .L_400643
.L_400628:
vmovsldup %ymm3, %ymm3
vrcpps %ymm12, %ymm7
vrsqrtps -0x4(%rsp), %xmm0
fldl2t
vmovmskpd %xmm8, %r10
vrcpps %xmm6, %xmm13
rcrw $0xf7, %ax
jbe .L_400643
sbbq $0x40, %rax
xorb $0xfe, 0x200a0d(%rip)
adcw $0xf0, %r10w
.L_400640:
vmaskmovpd %xmm4, %xmm10, 0x2009ff(%rip)
pabsb %xmm12, %xmm15
.L_400643:
jne .L_4005b0
.L_400650:
retq

I am not quite sure what this does.
But I am certain it has nothing to do with UTF-8 decoding :)

Oh btw using an end pointer instead of a length reduces the table version to 12 instructions.
1 2 3 4 5 6
Next ›   Last »