March 25, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Am 24.03.2014 17:44, schrieb Andrei Alexandrescu:
> On 3/24/14, 5:51 AM, w0rp wrote:
>> On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:
>>> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>>
>>>> Andrei
>>>
>>> Before we roll this out, could we discuss a strategy/guideline in
>>> regards to detecting and handling invalid UTF sequences?
>>>
>>> Having a fast "front" is fine and all, but if it means your program
>>> asserting in release (or worst, silently corrupting memory) just
>>> because the client was trying to read a bad text file, I'm unsure this
>>> is acceptable.
>>
>> I would strongly advise to at least offer an option
>
> Options are fine for functions etc. But front would need to find an
> all-around good compromise between speed and correctness.
>
> Andrei
>
b"\255".decode("utf-8", errors="strict") # UnicodeDecodeError
b"\255".decode("utf-8", errors="replace") # replacement character used
b"\255".decode("utf-8", errors="ignore") # Empty string, invalid
sequence removed.
i think there should be a base range for UTF8 iteration - with policy based error extension (like in python) and some variants that defer this base UTF8 range with different error behavior - and one of these become the phobos standard = default parameter so its still switchable
|
March 25, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel N | On 25 March 2014 00:04, Daniel N <ufo@orbiting.us> wrote:
> On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
>>
>> I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.
>
>
> I now managed to dig up my old C source... but I'm still blocked by dmd not accepting the 'pext' instruction...
>
> 1) I know my solution is not directly comparable to the rest in this
> thread(for many reasons).
> 2) It's of course trivial to add a fast path for ascii... if desired.
> 3) It throws safety and standards out the window.
>
4) It's tied to one piece of hardware.
No Thankee.
|
March 25, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | On 3/25/2014 4:00 AM, Iain Buclaw wrote:
> On 25 March 2014 00:04, Daniel N <ufo@orbiting.us> wrote:
>> On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
>>>
>>> I'm currently too busy to submit a complete solution, but please feel free
>>> to use my idea if you think it sounds promising.
>>
>>
>> I now managed to dig up my old C source... but I'm still blocked by dmd not
>> accepting the 'pext' instruction...
>>
>> 1) I know my solution is not directly comparable to the rest in this
>> thread(for many reasons).
>> 2) It's of course trivial to add a fast path for ascii... if desired.
>> 3) It throws safety and standards out the window.
>>
>
>
> 4) It's tied to one piece of hardware.
>
> No Thankee.
>
bool supportCpuFeatureX;
void main() {
supportCpuFeatureX = detectCpuFeatureX();
doStuff();
}
void doStuff() {
if(supportCpuFeatureX)
doStuff_FeatureX();
else
doStuff_Fallback();
}
> dmd -inline blah.d
|
March 25, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nick Sabalausky | Am 25.03.2014 11:38, schrieb Nick Sabalausky:
> On 3/25/2014 4:00 AM, Iain Buclaw wrote:
>> On 25 March 2014 00:04, Daniel N <ufo@orbiting.us> wrote:
>>> On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
>>>>
>>>> I'm currently too busy to submit a complete solution, but please feel free
>>>> to use my idea if you think it sounds promising.
>>>
>>>
>>> I now managed to dig up my old C source... but I'm still blocked by dmd not
>>> accepting the 'pext' instruction...
>>>
>>> 1) I know my solution is not directly comparable to the rest in this
>>> thread(for many reasons).
>>> 2) It's of course trivial to add a fast path for ascii... if desired.
>>> 3) It throws safety and standards out the window.
>>>
>>
>>
>> 4) It's tied to one piece of hardware.
>>
>> No Thankee.
> void doStuff() {
> if(supportCpuFeatureX)
> doStuff_FeatureX();
> else
> doStuff_Fallback();
> }
>
> > dmd -inline blah.d
the extra branch could kill the performance benefit if doStuff is too small
|
March 25, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | On Tuesday, 25 March 2014 at 10:42:59 UTC, dennis luehring wrote:
>> void doStuff() {
>> if(supportCpuFeatureX)
>> doStuff_FeatureX();
>> else
>> doStuff_Fallback();
>> }
>>
>> > dmd -inline blah.d
>
> the extra branch could kill the performance benefit if doStuff is too small
you'd simply have to hoist the condition outside the inner loop.
Furthermore the branch prediction would never fail, only unpredictable branches are terrible.
|
March 25, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | dchar front(char[] s) { dchar c = s[0]; if (!(c & 0x80)) return c; byte b = (c >> 4) & 3; b += !b; c &= 63 >> b; char *p = s.ptr; do { p++; c = c << 6 | *p & 63; } while(--b); return c; } |
March 25, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | 24-Mar-2014 23:53, Dmitry Olshansky пишет: > 24-Mar-2014 01:22, Andrei Alexandrescu пишет: >> Here's a baseline: http://goo.gl/91vIGc. Destroy! >> >> Andrei > > I had to join the party at some point. > This seems like 25 instructions: > http://goo.gl/N7sHtK > Interestingly gdc-4.8 produces better results. http://goo.gl/1R7GMs -- Dmitry Olshansky |
March 26, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | W dniu 2014-03-25 11:42, dennis luehring pisze:
> Am 25.03.2014 11:38, schrieb Nick Sabalausky:
>> On 3/25/2014 4:00 AM, Iain Buclaw wrote:
>>> On 25 March 2014 00:04, Daniel N <ufo@orbiting.us> wrote:
>>>> On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
>>>>>
>>>>> I'm currently too busy to submit a complete solution, but please
>>>>> feel free
>>>>> to use my idea if you think it sounds promising.
>>>>
>>>>
>>>> I now managed to dig up my old C source... but I'm still blocked by
>>>> dmd not
>>>> accepting the 'pext' instruction...
>>>>
>>>> 1) I know my solution is not directly comparable to the rest in this
>>>> thread(for many reasons).
>>>> 2) It's of course trivial to add a fast path for ascii... if desired.
>>>> 3) It throws safety and standards out the window.
>>>>
>>>
>>>
>>> 4) It's tied to one piece of hardware.
>>>
>>> No Thankee.
>> void doStuff() {
>> if(supportCpuFeatureX)
>> doStuff_FeatureX();
>> else
>> doStuff_Fallback();
>> }
>>
>> > dmd -inline blah.d
>
> the extra branch could kill the performance benefit if doStuff is too small
void function() doStuff;
void main() {
auto supportCpuFeatureX = detectCpuFeatureX();
if (supportCpuFeatureX)
doStuff = &doStuff_FeatureX;
else
doStuff = &doStuff_Fallback;
}
|
March 26, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Piotr Szturmaj | http://goo.gl/4RSWhr Only 3 ifs. |
Copyright © 1999-2021 by the D Language Foundation