Why do the same work about 'IndexOfAny' and 'indexOf' function? (page 3)

On Fri, 09 Jan 2015 13:54:00 +0000 Robert burner Schadek via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via Digitalmars-d-learn wrote: > > if you *really* concerned with speed here, you'd better > > consider using > > regular expressions. as regular expression can be precompiled > > and then > > search for multiple words with only one pass over the source > > string. i > > believe that std.regex will use variation of Thomson algorithm > > for > > regular expressions when it is able to do so. > > IMO that is not sound advice. Creating the state machine and running will be more costly than using canFind or indexOf how basically only compare char by char. > > If speed is really need use strstr and look if it uses sse to compare multiple chars at a time. Anyway benchmark and then benchmark some more. std.regex can use CTFE to compile regular expressions (yet it sometimes slower than non-CTFE variant), and i mean that we compile regexp before doing alot of searches, not before each single search. if you have alot of words to match or alot of strings to check, regexp can give a huge boost. sure, it all depends of code patterns.

On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via Digitalmars-d-learn wrote: > std.regex can use CTFE to compile regular expressions (yet it sometimes > slower than non-CTFE variant), and i mean that we compile regexp before > doing alot of searches, not before each single search. if you have alot > of words to match or alot of strings to check, regexp can give a huge > boost. > > sure, it all depends of code patterns. even with CTFE regex still uses a state machine _mm256_cmpeq_epi8 will beat that even for multiple strings. Basically all lexer are handwritten, if regex where fast enough nobody would do the work.

On Fri, 09 Jan 2015 14:11:49 +0000 Robert burner Schadek via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via Digitalmars-d-learn wrote: > > > std.regex can use CTFE to compile regular expressions (yet it > > sometimes > > slower than non-CTFE variant), and i mean that we compile > > regexp before > > doing alot of searches, not before each single search. if you > > have alot > > of words to match or alot of strings to check, regexp can give > > a huge > > boost. > > > > sure, it all depends of code patterns. > > even with CTFE regex still uses a state machine _mm256_cmpeq_epi8 will beat that even for multiple strings. Basically all lexer are handwritten, if regex where fast enough nobody would do the work. heh. regexps *are* fast enough. it's hard to beat well-optimised generated thingy on a complex grammar. ;-)

On Friday, 9 January 2015 at 14:21:04 UTC, ketmar via Digitalmars-d-learn wrote: > heh. regexps *are* fast enough. it's hard to beat well-optimised > generated thingy on a complex grammar. ;-) I don't see your point, anyway I think he got his help or at least some help.

On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via Digitalmars-d-learn wrote: > On Fri, 09 Jan 2015 13:54:00 +0000 > Robert burner Schadek via Digitalmars-d-learn > <digitalmars-d-learn@puremagic.com> wrote: > >> On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via Digitalmars-d-learn wrote: >> > if you *really* concerned with speed here, you'd better consider using >> > regular expressions. as regular expression can be precompiled and then >> > search for multiple words with only one pass over the source string. i >> > believe that std.regex will use variation of Thomson algorithm for >> > regular expressions when it is able to do so. >> >> IMO that is not sound advice. Creating the state machine and running will be more costly than using canFind or indexOf how basically only compare char by char. >> >> If speed is really need use strstr and look if it uses sse to compare multiple chars at a time. Anyway benchmark and then benchmark some more. > std.regex can use CTFE to compile regular expressions (yet it sometimes > slower than non-CTFE variant), and i mean that we compile regexp before > doing alot of searches, not before each single search. if you have alot > of words to match or alot of strings to check, regexp can give a huge > boost. > > sure, it all depends of code patterns. import std.regex; auto ctr = ctRegex!(`(home|office|sea|plane)`); auto c2 = !matchFirst("He is in the sea.", ctr).empty; ---------------------------------------------------------- Test by auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000); Result is : filter is 42ms 85us findAmong is 37ms 268us foreach indexOf is 37ms 841us canFind is 13ms canFind indexOf is 39ms 455us ctRegex is 138ms

On Fri, 09 Jan 2015 15:36:21 +0000 FrankLike via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via Digitalmars-d-learn wrote: > > On Fri, 09 Jan 2015 13:54:00 +0000 > > Robert burner Schadek via Digitalmars-d-learn > > <digitalmars-d-learn@puremagic.com> wrote: > > > >> On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via Digitalmars-d-learn wrote: > >> > if you *really* concerned with speed here, you'd better > >> > consider using > >> > regular expressions. as regular expression can be > >> > precompiled and then > >> > search for multiple words with only one pass over the source > >> > string. i > >> > believe that std.regex will use variation of Thomson > >> > algorithm for > >> > regular expressions when it is able to do so. > >> > >> IMO that is not sound advice. Creating the state machine and running will be more costly than using canFind or indexOf how basically only compare char by char. > >> > >> If speed is really need use strstr and look if it uses sse to compare multiple chars at a time. Anyway benchmark and then benchmark some more. > > std.regex can use CTFE to compile regular expressions (yet it > > sometimes > > slower than non-CTFE variant), and i mean that we compile > > regexp before > > doing alot of searches, not before each single search. if you > > have alot > > of words to match or alot of strings to check, regexp can give > > a huge > > boost. > > > > sure, it all depends of code patterns. > import std.regex; > auto ctr = ctRegex!(`(home|office|sea|plane)`); > auto c2 = !matchFirst("He is in the sea.", ctr).empty; > ---------------------------------------------------------- > Test by auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000); > > Result is : > filter is 42ms 85us > findAmong is 37ms 268us > foreach indexOf is 37ms 841us > canFind is 13ms > canFind indexOf is 39ms 455us > ctRegex is 138ms 1. stop doing captures in regexp, this will speedup the comparison. 2. your sample is very artificial. i was talking about alot more keywords and alot longer strings. sorry, i wasn't told that clear enough.

January 09, 2015

Re: Why do the same work about 'IndexOfAny' and 'indexOf' function?

Posted by FrankLike
in reply to ketmar

Permalink

FrankLike

Posted in reply to ketmar

Permalink

On Friday, 9 January 2015 at 15:57:21 UTC, ketmar via Digitalmars-d-learn wrote:
> On Fri, 09 Jan 2015 15:36:21 +0000
> FrankLike via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
> wrote:
>
>> On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via Digitalmars-d-learn wrote:
>> > On Fri, 09 Jan 2015 13:54:00 +0000
>> > Robert burner Schadek via Digitalmars-d-learn
>> > <digitalmars-d-learn@puremagic.com> wrote:
>> >
>> >> On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via Digitalmars-d-learn wrote:
>> >> > if you *really* concerned with speed here, you'd better consider using
>> >> > regular expressions. as regular expression can be precompiled and then
>> >> > search for multiple words with only one pass over the source string. i
>> >> > believe that std.regex will use variation of Thomson algorithm for
>> >> > regular expressions when it is able to do so.
>> >> 
>> >> IMO that is not sound advice. Creating the state machine and running will be more costly than using canFind or indexOf how basically only compare char by char.
>> >> 
>> >> If speed is really need use strstr and look if it uses sse to compare multiple chars at a time. Anyway benchmark and then benchmark some more.
>> > std.regex can use CTFE to compile regular expressions (yet it sometimes
>> > slower than non-CTFE variant), and i mean that we compile regexp before
>> > doing alot of searches, not before each single search. if you have alot
>> > of words to match or alot of strings to check, regexp can give a huge
>> > boost.
>> >
>> > sure, it all depends of code patterns.
>> import std.regex;
>> auto ctr = ctRegex!(`(home|office|sea|plane)`);
>> auto c2 = !matchFirst("He is in the sea.", ctr).empty;
>> ----------------------------------------------------------
>> Test by  auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000);
>> 
>> Result is :
>> filter is          42ms 85us
>> findAmong is       37ms 268us
>> foreach indexOf is 37ms 841us
>> canFind is         13ms
>> canFind indexOf is 39ms 455us
>> ctRegex is         138ms
> 1. stop doing captures in regexp, this will speedup the comparison.
> 2. your sample is very artificial. i was talking about alot more
> keywords and alot longer strings. sorry, i wasn't told that clear
> enough.

Yes. regex doing 'a lot more keywords and a lot longer strings' will be better.
Thank you.

Forums