Jump to page: 1 2
Thread overview
How to avoid ctRegex (solved)
Aug 21, 2016
cy
Aug 21, 2016
ag0aep6g
Aug 23, 2016
cy
Aug 23, 2016
ag0aep6g
Aug 24, 2016
cy
Aug 24, 2016
ag0aep6g
Aug 24, 2016
Seb
Aug 27, 2016
cy
Aug 27, 2016
Dicebot
Aug 27, 2016
David Nadlinger
Aug 27, 2016
ag0aep6g
August 21, 2016
At seconds PER (character range) pattern, ctRegex slows down compilation like crazy, but it's not obvious how to avoid using it, since Regex(Char) is kind of weird for a type. So, here's what I do. I think this is right.

in the module scope, you start with:
auto pattern = ctRegex!"foobar";

and you substitute with:
typeof(regex("")) pattern;
static this() {
  pattern = regex("foobar");
}

That way you don't have to worry about whether to use a Regex!char, or a Regex!dchar, or a Regex!ubyte. It gives you the same functionality, at the cost a few microseconds slowdown on running your program. And once you're done debugging, you can always switch back, so...

string defineRegex(string name, string pattern)() {
  import std.string: replace;
  return q{
		debug {
			pragma(msg, "fast $name");
			import std.regex: regex;
			typeof(regex("")) $name;
			static this() {
				$name = regex(`$pattern`);
			}
		} else {
			pragma(msg, "slooow $name");
			import std.regex: ctRegex;
			auto $name = ctRegex!`$pattern`;
		}
	}.replace("$pattern",pattern)
			.replace("$name",name);
}

mixin(defineRegex!("naword",r"[\W]+"));
mixin(defineRegex!("alnum",r"[a-zA-Z]+"));
mixin(defineRegex!("pattern","foo([a-z]*?)bar"));
mixin(defineRegex!("pattern2","foobar([^0-9z]+)"));

void main() {
}

/*
$ time rdmd -release /tmp/derp.d
slooow naword
slooow alnum
slooow pattern
slooow pattern2
slooow naword
slooow alnum
slooow pattern
slooow pattern2
rdmd -release /tmp/derp.d  17.57s user 1.57s system 82% cpu 23.210 total

$ time rdmd -debug /tmp/derp.d
fast naword
fast alnum
fast pattern
fast pattern2
fast naword
fast alnum
fast pattern
fast pattern2
rdmd -debug /tmp/derp.d  2.92s user 0.37s system 71% cpu 4.623 total
*/

...sure would be nice if you could cache precompiled regular expressions as files.
August 21, 2016
On 08/21/2016 10:06 PM, cy wrote:
> in the module scope, you start with:
> auto pattern = ctRegex!"foobar";
>
> and you substitute with:
> typeof(regex("")) pattern;
> static this() {
>   pattern = regex("foobar");
> }

I may be missing the point here, but just putting `auto pattern = regex("foobar");` at module level works for me.
August 23, 2016
On Sunday, 21 August 2016 at 21:18:11 UTC, ag0aep6g wrote:

> I may be missing the point here, but just putting `auto pattern = regex("foobar");` at module level works for me.

Really? I thought global variables could only be initialized with static stuff available during compile time, and you needed a "static this() {}" block to initialize them otherwise.
August 23, 2016
On 08/23/2016 06:06 AM, cy wrote:
> On Sunday, 21 August 2016 at 21:18:11 UTC, ag0aep6g wrote:
>
>> I may be missing the point here, but just putting `auto pattern =
>> regex("foobar");` at module level works for me.
>
> Really? I thought global variables could only be initialized with static
> stuff available during compile time, and you needed a "static this() {}"
> block to initialize them otherwise.

That's true, and apparently `regex("foobar")` can be evaluated at compile time.
August 24, 2016
On Tuesday, 23 August 2016 at 04:51:19 UTC, ag0aep6g wrote:

> That's true, and apparently `regex("foobar")` can be evaluated at compile time.

Then what's ctRegex in there for at all...?

August 24, 2016
On 08/24/2016 03:07 AM, cy wrote:
> Then what's ctRegex in there for at all...?

Optimization.

ctRegex requires that the pattern is available as a compile time constant. It uses that property to "generate optimized native machine code".

The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".
August 24, 2016
On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:
> On 08/24/2016 03:07 AM, cy wrote:
>> Then what's ctRegex in there for at all...?
>
> Optimization.
>
> ctRegex requires that the pattern is available as a compile time constant. It uses that property to "generate optimized native machine code".
>
> The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".

Yep, that's why ctRegex is 2x faster than the highly-tuned grep, e.g.

https://github.com/dlang/phobos/pull/4286
August 27, 2016
On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:
> The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".

It's not using it with a compile time constant that struck me as weird. It's using it to assign a global variable that struck me as weird.

When I saw `auto a = b;` at the module level, I thought that b had to be something you could evaluate at compile time. But I guess it can be a runtime calculated value, acting like it was assigned in a a static this() clause, and the requirement for it to be compile time generated is only for immutable? like `immutable auto a = b`?
August 27, 2016
On Saturday, 27 August 2016 at 17:35:04 UTC, cy wrote:
> On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:
>> The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".
>
> It's not using it with a compile time constant that struck me as weird. It's using it to assign a global variable that struck me as weird.

But actual value of that Regex struct is perfectly known during compile time. Thus it is possible and fine to use it as initializer. You can use any struct or class as initializer if it can be computed during compile-time.
August 27, 2016
On Saturday, 27 August 2016 at 17:47:33 UTC, Dicebot wrote:
> But actual value of that Regex struct is perfectly known during compile time. Thus it is possible and fine to use it as initializer. You can use any struct or class as initializer if it can be computed during compile-time.

Yes, regex() is CTFEable, but this still comes at a significant compile-time cost as the constructor does quite a bit of string manipulation, etc. I've seen this, i.e. inconsiderate use of regex() globals, cost tens of seconds in build time for bigger codebases.

 — David
« First   ‹ Prev
1 2