Thread overview | |||||
---|---|---|---|---|---|
|
November 10, 2013 Combining decoding and matching | ||||
---|---|---|---|---|
| ||||
Following up on D parsing thread I had sometime to experiment with decode-less matching of full Unicode code point range. The end result is very pleasant, I'm still benching it but it shows great speed up already. Without further ado: Pull & peek at preliminary results https://github.com/D-Programming-Language/phobos/pull/1685 Docs http://blackwhale.github.io/phobos/std_uni.html#MatcherConcept Caveats: like it's 'backend' std.uni Trie it suffers from poor performance on DMD. Kudos to LDC team, or I'd gave up on the whole idea. -- Dmitry Olshansky |
November 16, 2013 Re: Combining decoding and matching | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | Dmitry Olshansky: > Pull & peek at preliminary results > https://github.com/D-Programming-Language/phobos/pull/1685 > > Docs > http://blackwhale.github.io/phobos/std_uni.html#MatcherConcept Good. Are those ideas usable for other Phobos functions, like group? http://forum.dlang.org/thread/snnmkdmhxouqjqaneshu@forum.dlang.org?page=3#post-crnqodahnxjtuoqzisxw:40forum.dlang.org Bye, bearophile |
November 16, 2013 Re: Combining decoding and matching | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | 16-Nov-2013 17:02, bearophile пишет: > Dmitry Olshansky: > >> Pull & peek at preliminary results >> https://github.com/D-Programming-Language/phobos/pull/1685 >> >> Docs >> http://blackwhale.github.io/phobos/std_uni.html#MatcherConcept > > Good. Are those ideas usable for other Phobos functions, like group? > > http://forum.dlang.org/thread/snnmkdmhxouqjqaneshu@forum.dlang.org?page=3#post-crnqodahnxjtuoqzisxw:40forum.dlang.org > Directly? - no. It's was all about preparing a matcher for a set of codepoints in advance by using 4 (for UTF-8) distinct tables one per encoded length. As to group it has to find runs of identical items. It can be speed up for Unicode if you take into account 2 simple tricks: - you don't need to decode - just identify the size of current dchar (stride) and see how many repetitions of such follow it; - special case if the current (w)char ASCII (or BMP for UTF-16) so as to speed up counting (1 char vs variable length slice of 1-4 chars, ditto with wchar) > > Bye, > bearophile -- Dmitry Olshansky |
Copyright © 1999-2021 by the D Language Foundation