October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote:
> On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote:
>> [...]
>
> All three are slower than baseline, for my test-case.
> What did you test it against.
The blns.txt file mentioned upthread.
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthias Bentrup | On Wednesday, 12 October 2016 at 10:15:17 UTC, Matthias Bentrup wrote:
> On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote:
>> On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote:
>>> [...]
>>
>> All three are slower than baseline, for my test-case.
>> What did you test it against.
>
> The blns.txt file mentioned upthread.
And what were your timings ?
BTW. the code you posted would not be a proper replacement for utf8 popFront since our UTF8-popFront does handle depreacted sequences correctly.
making it equivalent to this code :
void pfnew(ref char[] s) @trusted pure nothrow
{
immutable c = s[0];
uint char_length = 1;
if (c < 192)
{
Lend :
s = s.ptr[char_length .. s.length];
} else {
if (c < 192+32)
{
char_length = 2;
}
else if (c < 192+32+16)
{
char_length = 3;
}
else if (c < 192+32+16+8)
{
char_length = 4;
}
else if (c < 192+32+16+8+4)
{
char_length = 5;
}
else if (c < 192+32+16+8+4+2)
{
char_length = 6;
}
char_length = char_length > s.length ? s.length : char_length;
goto Lend;
}
}
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produce the smallest code though. |
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthias Bentrup | On 10/12/2016 04:56 AM, Matthias Bentrup wrote:
>
> void popFront1b(ref char[] s) @trusted pure nothrow {
> immutable c = cast(byte)s[0];
> if (c >= -8) {
> s = s[1 .. $];
> } else {
> uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31);
> import std.algorithm;
> s = s[min(i, $) .. $];
> }
> }
This is really small although it does two jumps on the frequent path. Perhaps an improvement over the current implementation. Nice work! -- Andrei
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 10/12/2016 05:23 AM, Stefan Koch wrote:
> All three are slower than baseline, for my test-case.
> What did you test it against.
I'd say: (a) test for speed of ASCII-only text; (b) make it small. That's all we need. Nobody worries about 10-20% in multibyte-heavy text. -- Andrei
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 10/12/2016 06:56 AM, Stefan Koch wrote:
> I just confirmed that branching version is faster then table-lookup.
>
> please test it our for yourself
> http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj
>
> The table-lookup does produce the smallest code though.
Nice. I like that the table is NOT looked up on the ASCII path, so it stays cold in ASCII text. However, there's a problem with this pattern:
size_t char_length = 1;
immutable c = s[0];
if (c < 192)
{
Lend :
s = s.ptr[char_length .. s.length];
return ;
}
as opposed to:
immutable c = s[0];
if (c < 192)
{
Lend :
s = s.ptr[1 .. s.length];
return ;
}
In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_length is used, add/sub must be used (larger). -- Andrei
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote:
> On 10/12/2016 06:56 AM, Stefan Koch wrote:
>> I just confirmed that branching version is faster then table-lookup.
>>
>> please test it our for yourself
>> http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj
>>
>> The table-lookup does produce the smallest code though.
>
> Nice. I like that the table is NOT looked up on the ASCII path, so it stays cold in ASCII text. However, there's a problem with this pattern:
>
> size_t char_length = 1;
> immutable c = s[0];
> if (c < 192)
> {
> Lend :
> s = s.ptr[char_length .. s.length];
> return ;
> }
>
> as opposed to:
>
> immutable c = s[0];
> if (c < 192)
> {
> Lend :
> s = s.ptr[1 .. s.length];
> return ;
> }
>
> In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_length is used, add/sub must be used (larger). -- Andrei
btw.
We could get rid of of a sub if we changed slices from holding a pointer and a length to holding two pointers.
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote:
> On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote:
>>
>> In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_length is used, add/sub must be used (larger). -- Andrei
When I write code so i can keep the
str = str[1 .. str.length]
It leads to lager code overall and decreases performance.
both in table lookup and in the multi-branch code.
update:
for ldc the table-lookup code is faster 27% then the current popFront.
on dmd the table-lookup is 2% faster then the current popFront.
for ldc the branching code is 23% faster then the current popFront
for dmd the branching code is 20% faster then the current popFront
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 10/12/2016 09:39 AM, Stefan Koch wrote:
> On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote:
>> On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote:
>>>
>>> In the second case, the compiler generates an inc for bumping the
>>> pointer and a dec for decreasing the length (small instructions). If
>>> the variable char_length is used, add/sub must be used (larger). --
>>> Andrei
>
> When I write code so i can keep the
> str = str[1 .. str.length]
> It leads to lager code overall and decreases performance.
> both in table lookup and in the multi-branch code.
>
> update:
> for ldc the table-lookup code is faster 27% then the current popFront.
> on dmd the table-lookup is 2% faster then the current popFront.
>
> for ldc the branching code is 23% faster then the current popFront
> for dmd the branching code is 20% faster then the current popFront
Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR with the table version. -- Andrei
|
October 12, 2016 Re: Can you shrink it further? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote: > On 10/12/2016 09:39 AM, Stefan Koch wrote: > > Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR with the table version. -- Andrei I measured again. The table version has a DECREASES the performance for dmd by 1%. I think we should keep performance for dmd in mind. I could add the table version in a version (LDC) block. I cannot currently measure on gdc. I'd appreciate more perf data. |
Copyright © 1999-2021 by the D Language Foundation