Can you shrink it further? (page 5)

On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote: > On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: >> [...] > > All three are slower than baseline, for my test-case. > What did you test it against. The blns.txt file mentioned upthread.

On Wednesday, 12 October 2016 at 10:15:17 UTC, Matthias Bentrup wrote: > On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote: >> On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: >>> [...] >> >> All three are slower than baseline, for my test-case. >> What did you test it against. > > The blns.txt file mentioned upthread. And what were your timings ? BTW. the code you posted would not be a proper replacement for utf8 popFront since our UTF8-popFront does handle depreacted sequences correctly. making it equivalent to this code : void pfnew(ref char[] s) @trusted pure nothrow { immutable c = s[0]; uint char_length = 1; if (c < 192) { Lend : s = s.ptr[char_length .. s.length]; } else { if (c < 192+32) { char_length = 2; } else if (c < 192+32+16) { char_length = 3; } else if (c < 192+32+16+8) { char_length = 4; } else if (c < 192+32+16+8+4) { char_length = 5; } else if (c < 192+32+16+8+4+2) { char_length = 6; } char_length = char_length > s.length ? s.length : char_length; goto Lend; } }

I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produce the smallest code though.

On 10/12/2016 04:56 AM, Matthias Bentrup wrote: > > void popFront1b(ref char[] s) @trusted pure nothrow { > immutable c = cast(byte)s[0]; > if (c >= -8) { > s = s[1 .. $]; > } else { > uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31); > import std.algorithm; > s = s[min(i, $) .. $]; > } > } This is really small although it does two jumps on the frequent path. Perhaps an improvement over the current implementation. Nice work! -- Andrei

On 10/12/2016 05:23 AM, Stefan Koch wrote: > All three are slower than baseline, for my test-case. > What did you test it against. I'd say: (a) test for speed of ASCII-only text; (b) make it small. That's all we need. Nobody worries about 10-20% in multibyte-heavy text. -- Andrei

On 10/12/2016 06:56 AM, Stefan Koch wrote: > I just confirmed that branching version is faster then table-lookup. > > please test it our for yourself > http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj > > The table-lookup does produce the smallest code though. Nice. I like that the table is NOT looked up on the ASCII path, so it stays cold in ASCII text. However, there's a problem with this pattern: size_t char_length = 1; immutable c = s[0]; if (c < 192) { Lend : s = s.ptr[char_length .. s.length]; return ; } as opposed to: immutable c = s[0]; if (c < 192) { Lend : s = s.ptr[1 .. s.length]; return ; } In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_length is used, add/sub must be used (larger). -- Andrei

October 12, 2016

Re: Can you shrink it further?

Posted by Stefan Koch
in reply to Andrei Alexandrescu

Permalink

Stefan Koch

Posted in reply to Andrei Alexandrescu

Permalink

On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote:
> On 10/12/2016 06:56 AM, Stefan Koch wrote:
>> I just confirmed that branching version is faster then table-lookup.
>>
>> please test it our for yourself
>> http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj
>>
>> The table-lookup does produce the smallest code though.
>
> Nice. I like that the table is NOT looked up on the ASCII path, so it stays cold in ASCII text. However, there's a problem with this pattern:
>
>     size_t char_length = 1;
>     immutable c = s[0];
>     if (c < 192)
>     {
>         Lend :
>         s = s.ptr[char_length .. s.length];
>         return ;
>     }
>
> as opposed to:
>
>     immutable c = s[0];
>     if (c < 192)
>     {
>         Lend :
>         s = s.ptr[1 .. s.length];
>         return ;
>     }
>
> In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_length is used, add/sub must be used (larger). -- Andrei

btw.
We could get rid of of a sub if we changed slices from holding a pointer and a length to holding two pointers.

On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote: > On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: >> >> In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_length is used, add/sub must be used (larger). -- Andrei When I write code so i can keep the str = str[1 .. str.length] It leads to lager code overall and decreases performance. both in table lookup and in the multi-branch code. update: for ldc the table-lookup code is faster 27% then the current popFront. on dmd the table-lookup is 2% faster then the current popFront. for ldc the branching code is 23% faster then the current popFront for dmd the branching code is 20% faster then the current popFront

On 10/12/2016 09:39 AM, Stefan Koch wrote: > On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote: >> On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: >>> >>> In the second case, the compiler generates an inc for bumping the >>> pointer and a dec for decreasing the length (small instructions). If >>> the variable char_length is used, add/sub must be used (larger). -- >>> Andrei > > When I write code so i can keep the > str = str[1 .. str.length] > It leads to lager code overall and decreases performance. > both in table lookup and in the multi-branch code. > > update: > for ldc the table-lookup code is faster 27% then the current popFront. > on dmd the table-lookup is 2% faster then the current popFront. > > for ldc the branching code is 23% faster then the current popFront > for dmd the branching code is 20% faster then the current popFront Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR with the table version. -- Andrei

On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote: > On 10/12/2016 09:39 AM, Stefan Koch wrote: > > Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR with the table version. -- Andrei I measured again. The table version has a DECREASES the performance for dmd by 1%. I think we should keep performance for dmd in mind. I could add the table version in a version (LDC) block. I cannot currently measure on gdc. I'd appreciate more perf data.

Forums