Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
September 22, 2009 D 1.0: std.regexp incredibly slow! | ||||
---|---|---|---|---|
| ||||
Attachments: | Hi *, i stumbled on what seems to be a bug in std.regexp: It is incredibly slow using the following pattern: RegExp("^\\s+(\\d+)\\s+(\\d+)\\s+\\w+\\s+(\\w+)\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+(.*)\r?\n?$") I don't really get the regexp code, so i can't debug it myself, but i have a PHP (!!) script that executes the same regexp in milliseconds. I attached code to test it, can someone please confirm? Thanks, Markus PS: Is there a quick way to fix this or are there bindings for other RegExp libs that i can use (Linux and Windows required) - i need to fix my program soon :) atm i'm looking for workarounds (splitting it into small regexps). |
September 22, 2009 Re: D 1.0: std.regexp incredibly slow! | ||||
---|---|---|---|---|
| ||||
Posted in reply to Markus Dangl | On Tue, 22 Sep 2009 11:55:53 -0400, Markus Dangl <danglm@in.tum.de> wrote:
> Hi *,
>
> i stumbled on what seems to be a bug in std.regexp: It is incredibly
> slow using the following pattern:
> RegExp("^\\s+(\\d+)\\s+(\\d+)\\s+\\w+\\s+(\\w+)\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+(.*)\r?\n?$")
>
> I don't really get the regexp code, so i can't debug it myself, but i
> have a PHP (!!) script that executes the same regexp in milliseconds.
>
> I attached code to test it, can someone please confirm?
>
> Thanks,
> Markus
>
> PS: Is there a quick way to fix this or are there bindings for other
> RegExp libs that i can use (Linux and Windows required) - i need to fix
> my program soon :) atm i'm looking for workarounds (splitting it into
> small regexps).
This is a common problem with some regex designs. Java has (or had) the same problem. I don't know if its fixable, you may want to try Tango's regex package.
-Steve
|
September 22, 2009 Re: D 1.0: std.regexp incredibly slow! | ||||
---|---|---|---|---|
| ||||
Posted in reply to Markus Dangl | Markus Dangl wrote:
> Hi *,
>
> i stumbled on what seems to be a bug in std.regexp: It is incredibly
> slow using the following pattern:
> RegExp("^\\s+(\\d+)\\s+(\\d+)\\s+\\w+\\s+(\\w+)\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+(.*)\r?\n?$")
>
> I don't really get the regexp code, so i can't debug it myself, but i
> have a PHP (!!) script that executes the same regexp in milliseconds.
>
> I attached code to test it, can someone please confirm?
>
> Thanks,
> Markus
>
> PS: Is there a quick way to fix this or are there bindings for other
> RegExp libs that i can use (Linux and Windows required) - i need to fix
> my program soon :) atm i'm looking for workarounds (splitting it into
> small regexps).
>
You could write your own bindings to PCRE and use that, since its what PHP uses. Maybe someone on dsource already did it too.
|
September 22, 2009 Re: D 1.0: std.regexp incredibly slow! | ||||
---|---|---|---|---|
| ||||
Posted in reply to Markus Dangl | Some regexes are very slow with phobos, I believe this is due to backtracking. I'm not familiar enough with the issue, whether some other regex engine might be able to avoid backtracking or not or how to rewrite it. I found this link though, perhaps it is useful to you: http://www.regular-expressions.info/catastrophic.html |
September 22, 2009 Re: D 1.0: std.regexp incredibly slow! | ||||
---|---|---|---|---|
| ||||
Posted in reply to Markus Dangl | Markus Dangl wrote:
> Hi *,
>
> i stumbled on what seems to be a bug in std.regexp: It is incredibly
> slow using the following pattern:
> RegExp("^\\s+(\\d+)\\s+(\\d+)\\s+\\w+\\s+(\\w+)\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+(.*)\r?\n?$")
>
I admit my regex-fu is weak (especially for PCRE), but doesn't $ match end of line, making \r?\n? unnecessary or even causing the thing to match one line with a bunch of stuff followed by an empty line?
If you take \r?\n? out, RegExp performs considerably faster, though I couldn't say what implications it would have on what you're using it for.
|
September 22, 2009 Re: D 1.0: std.regexp incredibly slow! | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ellery Newcomer | Ellery Newcomer schrieb:
> Markus Dangl wrote:
>> Hi *,
>>
>> i stumbled on what seems to be a bug in std.regexp: It is incredibly
>> slow using the following pattern:
>> RegExp("^\\s+(\\d+)\\s+(\\d+)\\s+\\w+\\s+(\\w+)\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+\\S+\\s+(.*)\r?\n?$")
>>
>
> I admit my regex-fu is weak (especially for PCRE), but doesn't $ match end of line, making \r?\n? unnecessary or even causing the thing to match one line with a bunch of stuff followed by an empty line?
>
> If you take \r?\n? out, RegExp performs considerably faster, though I couldn't say what implications it would have on what you're using it for.
In this case $ matches end of subject, i.e. the end of the string. I assume there is an option for making it match at end-of-line. (At least thats the way it works in PCRE).
|
September 22, 2009 Re: D 1.0: std.regexp incredibly slow! | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jeremie Pelletier | Jeremie Pelletier schrieb: > You could write your own bindings to PCRE and use that, since its what PHP uses. Maybe someone on dsource already did it too. Thank you for the tip, i found some bindings at: http://svn.dsource.org/projects/dwin/trunk/text/pcre/ I'll have a look at them later. At the moment i wrote a custom workaround for the problematic regexp. (Doing std.string.split before the regexp can help in the presented case...). Anyways the implementation in Phobos is questionable. |
Copyright © 1999-2021 by the D Language Foundation