October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Temtaime | On 10/11/2016 05:45 AM, Temtaime wrote:
> void popFront7(ref char[] s) @trusted pure nothrow
> {
> import core.bitop;
> auto v = 7 - bsr(~s[0] | 1);
> s = s[v > 6 ? 1 : (v ? (v > s.length ? s.length : v) : 1)..$];
> }
>
> Please check this.
Thanks. This does a lot of work on the frequent path c < 0x80:
pure nothrow @trusted void example.popFront7(ref char[]):
movq 8(%rdi), %rax
movzbl (%rax), %ecx
xorq $254, %rcx
orq $1, %rcx
bsrq %rcx, %rcx
notl %ecx
addl $8, %ecx
cmpl $6, %ecx
jg .LBB0_2
testl %ecx, %ecx
je .LBB0_2
movslq %ecx, %rdx
movq (%rdi), %rcx
cmpq %rcx, %rdx
cmovaq %rcx, %rdx
jmp .LBB0_3
.LBB0_2:
movq (%rdi), %rcx
movl $1, %edx
.LBB0_3:
addq %rdx, %rax
subq %rdx, %rcx
movq %rcx, (%rdi)
movq %rax, 8(%rdi)
retq
So I changed it to:
void popFront7(ref char[] s) @trusted pure nothrow
{
immutable c = s[0];
if (c < 0x80) {
s = s.ptr[1 .. s.length];
} else {
import core.bitop;
uint v = 7 - bsr(~c | (c > 0xfd) << 6u);
s = s.ptr[v > s.length ? s.length : v .. s.length];
}
}
That's about as large as the baseline.
Andrei
|
October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthias Bentrup | On Tuesday, 11 October 2016 at 14:49:28 UTC, Matthias Bentrup wrote:
>
> This is the result I'd like to get, but I can't find a way to write it without inline assembly :(
>
> void popFrontAsmIntel(ref char[] s) @trusted pure nothrow {
> immutable c = s[0];
> if (c < 0x80) {
> s = s[1 .. $];
> } else {
> uint l = void;
> asm pure nothrow @nogc {
> mov EAX, 1;
> mov BL, 0xf8-1;
> sub BL, c;
> cmp BL, 0xf8-0xc0;
> adc EAX, 0;
> cmp BL, 0xf8-0xe0;
> adc EAX, 0;
> cmp BL, 0xf8-0xf0;
> adc EAX, 0;
> mov l, EAX;
> }
> s = s[l <= $ ? l : $ .. $];
> }
> }
This takes 180us.
Baseline takes 124us.
|
October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthias Bentrup | On 10/11/2016 10:49 AM, Matthias Bentrup wrote: > > void popFrontAsmIntel(ref char[] s) @trusted pure nothrow { > immutable c = s[0]; > if (c < 0x80) { > s = s[1 .. $]; > } else { > uint l = void; > asm pure nothrow @nogc { > mov EAX, 1; > mov BL, 0xf8-1; > sub BL, c; > cmp BL, 0xf8-0xc0; > adc EAX, 0; > cmp BL, 0xf8-0xe0; > adc EAX, 0; > cmp BL, 0xf8-0xf0; > adc EAX, 0; > mov l, EAX; > } > s = s[l <= $ ? l : $ .. $]; > } > } Did you take a look at the codegen on http://ldc.acomirei.ru? It's huge. -- Andrei |
October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 10/10/2016 11:00 PM, Stefan Koch wrote:
>
> void popFront3(ref char[] s) @trusted pure nothrow {
> immutable c = s[0];
> uint char_length = 1;
> if (c < 127)
> {
> Lend :
> s = s.ptr[char_length .. s.length];
> } else {
> if ((c & b01100_0000) == 0b1000_0000)
> {
> //just skip one in case this is not the beginning of a code-point
> char
> goto Lend;
> }
> if (c < 192)
> {
> char_length = 2;
> goto Lend;
> }
> if (c < 240)
> {
> char_length = 3;
> goto Lend;
> }
> if (c < 248)
> {
> char_length = 4;
> goto Lend;
> }
> }
> }
Looked at this, still seems to generate a jump forward with ldc. Also, why do you leave a fallthrough path? Progress needs to be made on all paths, otherwise we have infinite loops.
Andrei
|
October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tuesday, 11 October 2016 at 15:08:34 UTC, Andrei Alexandrescu wrote:
> Looked at this, still seems to generate a jump forward with ldc.
ldc.intrinsics.llvm_expect might help to influence basic block layout.
— David
|
October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tuesday, 11 October 2016 at 15:08:34 UTC, Andrei Alexandrescu wrote:
> On 10/10/2016 11:00 PM, Stefan Koch wrote:
>>
>> [...]
>
> Looked at this, still seems to generate a jump forward with ldc. Also, why do you leave a fallthrough path? Progress needs to be made on all paths, otherwise we have infinite loops.
I forgot that the fall-trough did no longer end in Lend;
That forward jump to Lend is a very common and therefore predicted branch.
I will now run this problem through STOKE.
Let's see what it comes up with :)
|
October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 10/11/16 11:15 AM, Stefan Koch wrote: > I will now run this problem through STOKE. > Let's see what it comes up with :) http://stoke.stanford.edu you mean? That would be cool. Keep us posted! -- Andrei |
October 11, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tuesday, 11 October 2016 at 16:13:45 UTC, Andrei Alexandrescu wrote:
> On 10/11/16 11:15 AM, Stefan Koch wrote:
>> I will now run this problem through STOKE.
>> Let's see what it comes up with :)
>
> http://stoke.stanford.edu you mean? That would be cool. Keep us posted! -- Andrei
Yep I mean that one.
It will take a while to work out the right cost-functions.
I'll do a PR as soon as this bears fruit.
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tuesday, 11 October 2016 at 15:01:47 UTC, Andrei Alexandrescu wrote:
> On 10/11/2016 10:49 AM, Matthias Bentrup wrote:
>>
>> void popFrontAsmIntel(ref char[] s) @trusted pure nothrow {
>> immutable c = s[0];
>> if (c < 0x80) {
>> s = s[1 .. $];
>> } else {
>> uint l = void;
>> asm pure nothrow @nogc {
>> mov EAX, 1;
>> mov BL, 0xf8-1;
>> sub BL, c;
>> cmp BL, 0xf8-0xc0;
>> adc EAX, 0;
>> cmp BL, 0xf8-0xe0;
>> adc EAX, 0;
>> cmp BL, 0xf8-0xf0;
>> adc EAX, 0;
>> mov l, EAX;
>> }
>> s = s[l <= $ ? l : $ .. $];
>> }
>> }
>
> Did you take a look at the codegen on http://ldc.acomirei.ru? It's huge. -- Andrei
Here are three branch-less variants that use the sign instead of the carry bit.
The last one is the fastest on my machine, although it mixes the rare error case and the common 1-byte case into one branch.
void popFront1(ref char[] s) @trusted pure nothrow {
immutable c = cast(byte)s[0];
if (c >= 0) {
s = s[1 .. $];
} else if (c < -8) {
uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31);
import std.algorithm;
s = s[min(i, $) .. $];
} else {
s = s[1 .. $];
}
}
void popFront1a(ref char[] s) @trusted pure nothrow {
immutable c = cast(byte)s[0];
if (c >= 0) {Three
s = s[1 .. $];
} else {
uint i = 1 + ((3 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31)) & (c + 8 >> 31));
import std.algorithm;
s = s[min(i, $) .. $];
}
}
void popFront1b(ref char[] s) @trusted pure nothrow {
immutable c = cast(byte)s[0];
if (c >= -8) {
s = s[1 .. $];
} else {
uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31);
import std.algorithm;
s = s[min(i, $) .. $];
}
}
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthias Bentrup | On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote:
>
> Here are three branch-less variants that use the sign instead of the carry bit.
>
> The last one is the fastest on my machine, although it mixes the rare error case and the common 1-byte case into one branch.
>
> void popFront1(ref char[] s) @trusted pure nothrow {
> immutable c = cast(byte)s[0];
> if (c >= 0) {
> s = s[1 .. $];
> } else if (c < -8) {
> uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31);
> import std.algorithm;
> s = s[min(i, $) .. $];
> } else {
> s = s[1 .. $];
> }
> }
>
> void popFront1a(ref char[] s) @trusted pure nothrow {
> immutable c = cast(byte)s[0];
> if (c >= 0) {Three
> s = s[1 .. $];
> } else {
> uint i = 1 + ((3 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31)) & (c + 8 >> 31));
> import std.algorithm;
> s = s[min(i, $) .. $];
> }
> }
>
> void popFront1b(ref char[] s) @trusted pure nothrow {
> immutable c = cast(byte)s[0];
> if (c >= -8) {
> s = s[1 .. $];
> } else {
> uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31);
> import std.algorithm;
> s = s[min(i, $) .. $];
> }
> }
All three are slower than baseline, for my test-case.
What did you test it against.
|
Copyright © 1999-2021 by the D Language Foundation