Jump to page: 1 2
Thread overview
December 17

So I’ve been working on rewind-regex trying to correct all of the decisions in the original engine that slowed it down, dropping some features that I knew I cannot implement efficiently (backreferences have to go).

So while I’m obsessed with simplicity and speed I thought I’d ask people if it was an issue and what they really want from gen2 regex library.


Dmitry Olshansky
CEO @ Glowlabs
https://olshansky.me

December 17

On Sunday, 17 December 2023 at 15:43:22 UTC, Dmitry Olshansky wrote:

>

So I’ve been working on rewind-regex trying to correct all of the decisions in the original engine that slowed it down, dropping some features that I knew I cannot implement efficiently (backreferences have to go).

So while I’m obsessed with simplicity and speed I thought I’d ask people if it was an issue and what they really want from gen2 regex library.


Dmitry Olshansky
CEO @ Glowlabs
https://olshansky.me

I never needed to use backreferences in last 25 years. Something that has similar performance and scaling like RE2, Plan 9 grep, or original awk (modulo unicode), would be the best. I somebody needs backreferences, they can use PCRE, or some 3rd party libraries.

December 18

On Sunday, 17 December 2023 at 17:40:21 UTC, Witold Baryluk wrote:

>

On Sunday, 17 December 2023 at 15:43:22 UTC, Dmitry Olshansky wrote:

>

So I’ve been working on rewind-regex trying to correct all of the decisions in the original engine that slowed it down, dropping some features that I knew I cannot implement efficiently (backreferences have to go).

So while I’m obsessed with simplicity and speed I thought I’d ask people if it was an issue and what they really want from gen2 regex library.


Dmitry Olshansky
CEO @ Glowlabs
https://olshansky.me

I never needed to use backreferences in last 25 years. Something that has similar performance and scaling like RE2, Plan 9 grep, or original awk (modulo unicode), would be the best.

Yes, RE2 is the main contender, rewind-regex aims at basically the same feature set and better/same performance.

>

I somebody needs backreferences, they can use PCRE, or some 3rd party libraries.

Same thoughts here.


Dmitry Olshansky
CEO @ Glowlabs
https://olshansky.me

December 18

On Sunday, 17 December 2023 at 15:43:22 UTC, Dmitry Olshansky wrote:

>

[...]

I really like regex as a thing, but I have had to drop it because of the increased compilation memory and time requirements that use of std.regex incurred. Maybe that can't be avoided, I don't know.

I have not had that much use for backreferences, so to me the important part is that it doesn't require me to have 300+ Mb more RAM and an extra second or two on the wall clock to compile.

December 18
On 18/12/2023 10:20 PM, Anonymouse wrote:
> I really like regex as a thing, but I have had to drop it because of the increased compilation memory and time requirements that use of |std.regex| incurred. Maybe that can't be avoided, I don't know.
> 
> I have not had that much use for backreferences, so to me the important part is that it doesn't require me to have 300+ Mb more RAM and an extra second or two on the wall clock to compile.

As long as you use std.regex at runtime this is no longer the case.

https://github.com/dlang/phobos/pull/8699

https://github.com/dlang/phobos/pull/8698

Looks like I messed up and not got it into the changelog.

Anyway 2.103 should be the release that have them in it.
December 18
On Sun, Dec 17, 2023 at 03:43:22PM +0000, Dmitry Olshansky via Digitalmars-d wrote:
> So I’ve been working on rewind-regex trying to correct all of the decisions in the original engine that slowed it down, dropping some features that I knew I cannot implement efficiently (backreferences have to go).
> 
> So while I’m obsessed with simplicity and speed I thought I’d ask people if it was an issue and what they really want from gen2 regex library.
[...]

What I really want:

- Reduce compile-time cost of `import std.regex;` to zero, or at least
  close enough it's no longer noticeable.

- Automatic caching of fixed-string regexes, i.e., the equivalent of:

	struct Re(string ctKnownRe) {
		Regex!char re;
		shared static this() {
			re = regex(ctKnownRe);
		}
		Regex!char Re() {
			return re;
		}
	}

	void main() {
		string s;
		if (s.matchFirst(Re!`some\+pattern`)) {
			...
		}

		// This should reuse the Regex instance from before:
		if (s.matchFirst(Re!`some\+pattern`)) {
			...
		}
	}

- Reasonably fast runtime performance. I don't really care if it's the
  top-of-the-line superfast regex matcher, even though that would be
  really nice.  The primary pain points are the cost of import, and the
  need to manually write code for automatic caching of fixed runtime
  regexen.

- Get rid of ctRegex -- it adds a huge compile-time cost with
  questionable runtime benefit. Unless there's a way to do this at
  compile-time that *doesn't* add like 5 seconds per regex to compile
  times.


T

-- 
That's not a bug; that's a feature!
December 19
If you import std.regex and do nothing else:

sema1 61ms
sema2 101ms

std.internal.unicode_tables:
sema1 4ms
sema2 100ms

https://github.com/dlang/phobos/blob/master/std/internal/unicode_tables.d



Now for:

```d
auto ctr = ctRegex!(`^.*/([^/]+)/?$`);

// It works just like a normal regex:
auto c2 = matchFirst("foo/bar", ctr);   // First match found here, if any
assert(!c2.empty);   // Be sure to check if there is a match before examining contents!
assert(c2[1] == "bar");   // Captures is a range of submatches: 0 = full match.
```

toImpl:
sema3 125ms

parseSet:
Sema3 111ms

findAny (child of parseRegex and parseSet):

CTFE 51ms





While there are improvements to be had, I have already done all the big ones when I shaved off ~600ms earlier this year.

Gotta love time trace!
December 18
On Tue, Dec 19, 2023 at 06:34:04AM +1300, Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:
> If you import std.regex and do nothing else:
> 
> sema1 61ms
> sema2 101ms
> 
> std.internal.unicode_tables:
> sema1 4ms
> sema2 100ms
[...]
> While there are improvements to be had, I have already done all the big ones when I shaved off ~600ms earlier this year.
> 
> Gotta love time trace!

Cool, awesome stuff!


T

-- 
If blunt statements had a point, they wouldn't be blunt...
December 19
Yeah basically std.regex is no longer the cause for importing std.regex slowdown.

Its stuff like std.conv and std.uni.
December 18
On Tue, Dec 19, 2023 at 06:47:00AM +1300, Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:
> Yeah basically std.regex is no longer the cause for importing std.regex slowdown.
> 
> Its stuff like std.conv and std.uni.

I haven't noticed too much horrible slowdown from std.conv, but std.uni could use some fixing. I'm tempted to suggest that those internal tables in std.uni should be pre-generated rather than done at compile-time. There comes a point where repeatedly doing something at every compile just isn't worth it when the desired output could be autogenerated beforehand and saved as a straight .d file with hard-coded values.


T

-- 
Heads I win, tails you lose.
« First   ‹ Prev
1 2