May 27, 2013
On 05/26/2013 10:49 PM, Dmitry Olshansky wrote:
> If there is anything that come out of UTF-8 discussion is that I decided
> to dust off my experimental implementation of UTF-8 stride function.
> Just for fun.
>
> The key difference vs std is in handling non-ASCII case.
> I'm replacing bsr intrinsic with a what I call an "in-register lookup
> table" (neat stuff that is a piece of cake, thx to CTFE).
>
> See unittest/benchmark here:
> https://gist.github.com/blackwhale/5653927
>
Looks promising.

> Test files I used:
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/arwiki-latest-all-titles-in-ns0
>
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/dewiki-latest-all-titles-in-ns0
>
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/dewiki-latest-all-titles-in-ns0
>
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/ruwiki-latest-all-titles-in-ns0
>
These are huge and most likely the performance is limited by the memory bandwith.

May 27, 2013
27-May-2013 23:21, Martin Nowak пишет:
>
> On 05/26/2013 10:49 PM, Dmitry Olshansky wrote:
>  > If there is anything that come out of UTF-8 discussion is that I decided
>  > to dust off my experimental implementation of UTF-8 stride function.
>  > Just for fun.
>  >
>  > The key difference vs std is in handling non-ASCII case.
>  > I'm replacing bsr intrinsic with a what I call an "in-register lookup
>  > table" (neat stuff that is a piece of cake, thx to CTFE).
>  >
>  > See unittest/benchmark here:
>  > https://gist.github.com/blackwhale/5653927
>  >
> Looks promising.

Cool, I'm not alone in this :)

The only definitive results so far is that it takes less cycles on 32 bit. For me AMD CodeAnalyst confirms this is literally in cycles of up to 33% less with smaller samples in a loop. ASCII-only case seems to stay more or less the same (at least cycle-wise but not in time...) saving my sanity.

>
> These are huge and most likely the performance is limited by the memory
> bandwith.
>

That could be it. I'll be making measurement on smaller samples of said files and spin on them. More tests to come tomorrow.


-- 
Dmitry Olshansky
May 27, 2013
On 05/27/2013 09:21 PM, Martin Nowak wrote:
>  > See unittest/benchmark here:
>  > https://gist.github.com/blackwhale/5653927
>  >
> Looks promising.

This will not detect 0xFF as invalid UTF-8 sequence.
For sequences with 5 or 6 bytes, that aren't used for unicode, it will return a stride of 4.

May 28, 2013
28-May-2013 00:42, Martin Nowak пишет:
> On 05/27/2013 09:21 PM, Martin Nowak wrote:
>>  > See unittest/benchmark here:
>>  > https://gist.github.com/blackwhale/5653927
>>  >
>> Looks promising.
>
> This will not detect 0xFF as invalid UTF-8 sequence.
> For sequences with 5 or 6 bytes, that aren't used for unicode, it will
> return a stride of 4.
>

First of all there is a minor bug in std.utf in a sense that it accepts sequences of 5 and 6 bytes. They are simply explicitly not defined per Unicode standard and should throw invalid UTF as well.

OK I just need to consider the next bit making the whole mask 4bits wide. Thus I need 16 slots in a register.

64bit version will fit just fine  in a register 4*16 = 64.
32bit version will have to go with packing 2bits per slot and doing +1 afterwards.

Here is an updated version that I'm testing again:
https://github.com/blackwhale/gsoc-bench-2012/blob/master/fast_stride.d

-- 
Dmitry Olshansky
1 2
Next ›   Last »