Thread overview
Why does "*" cause my tiny regextester program to crash?
Jan 31, 2011
Alex Folland
Jan 31, 2011
Jesse Phillips
Jan 31, 2011
Vladimir Panteleev
Jan 31, 2011
Alex Folland
Jan 31, 2011
Alex Folland
Jan 31, 2011
Alex Folland
Jan 31, 2011
Vladimir Panteleev
Jan 31, 2011
Dmitry Olshansky
Jan 31, 2011
Alex Folland
January 31, 2011
I wrote this little program to test for regular expression matches.  I compiled it with in Windows with DMD 2.051 through Visual Studio 2010 with Visual D.  It crashes if regexbuf is just the single character, "*".  Why?  Shouldn't it match the entire string?

Visual Studio's debug output is this:

First-chance exception at 0x76fde124 in regextester.exe: 0xE0440001: 0xe0440001.
The program '[5492] regextester.exe: Native' has exited with code 1 (0x1).

Also, why does it match an unlimited number of times on "$" instead of just once?  Is this a Phobos-specific issue, or are regular expressions supposed to do that?  I mean, it doesn't match an unlimited times on "h", for example.

Mind you, I'm very new to both regular expressions and D.  I'm also not an experienced programmer of anything else.  I've spent years dabbling in the surface various programming languages without learning anything meaty.  I've learned syntax mostly.

My debug build is here: http://lex.clansfx.co.uk/projects/regextester.exe

Here's the source code:

import std.stdio, std.regex;

void main()
{
  char[] regexbuf;
  char[] teststring;
  while(1)
  {
    write("test string: ");
    std.stdio.readln(teststring); teststring.length=teststring.length-1;
    while(teststring.length>0)
    {
      uint i=0;
      write("regex input: ");
      std.stdio.readln(regexbuf); regexbuf.length=regexbuf.length-1;
      if(regexbuf.length>0)
      foreach(m; match(teststring, regex(regexbuf)))
      {
        i++;
        writefln("Match number %s: %s[%s]%s",i,m.pre,m.hit,m.post);
        if(i >= 50) { writefln("There have been %s matches.  I'm breaking for safety.",i); break; }
      }
    }
  }
}
January 31, 2011
Alex Folland Wrote:

> I wrote this little program to test for regular expression matches.  I compiled it with in Windows with DMD 2.051 through Visual Studio 2010 with Visual D.  It crashes if regexbuf is just the single character, "*".  Why?  Shouldn't it match the entire string?

While it would be best to give your example data. The regular expression for matching all data is ".*"

* represents a repeating something of zero or more.
. represents anything

So "*" just makes no sense.
January 31, 2011
On Mon, 31 Jan 2011 03:57:44 +0200, Alex Folland <lexlexlex@gmail.com> wrote:

> I wrote this little program to test for regular expression matches.  I compiled it with in Windows with DMD 2.051 through Visual Studio 2010 with Visual D.  It crashes if regexbuf is just the single character, "*".  Why?  Shouldn't it match the entire string?

"*" in regular expressions means 0 or more instances of the previous entity:
http://www.regular-expressions.info/repeat.html
It doesn't make sense at the start of an expression. ".*"  is the regexp that matches anything[1].

std.regex probably can't handle invalid regexps very well. Note that std.regex is a new module that intends to replace the older std.regexp, but still has some problems.

> Also, why does it match an unlimited number of times on "$" instead of just once?

Looks like another std.regex bug.

> My debug build is here: http://lex.clansfx.co.uk/projects/regextester.exe

A note for the future: compiled executables aren't very useful when source is available, especially considering many people here don't use Windows.

  [1]: A dot in a regular expression may not match newlines, depending on the implementation and search options.

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
January 31, 2011
On 2011-01-30 21:47, Vladimir Panteleev wrote:
> On Mon, 31 Jan 2011 03:57:44 +0200, Alex Folland <lexlexlex@gmail.com>
> wrote:
>
>> I wrote this little program to test for regular expression matches. I
>> compiled it with in Windows with DMD 2.051 through Visual Studio 2010
>> with Visual D. It crashes if regexbuf is just the single character,
>> "*". Why? Shouldn't it match the entire string?
>
> "*" in regular expressions means 0 or more instances of the previous
> entity:
> http://www.regular-expressions.info/repeat.html
> It doesn't make sense at the start of an expression. ".*" is the regexp
> that matches anything[1].
>
> std.regex probably can't handle invalid regexps very well. Note that
> std.regex is a new module that intends to replace the older std.regexp,
> but still has some problems.

Okay, so that particular regex is invalid.  Yeah, it still shouldn't crash.  You're right.  How should I prevent my program from crashing without fixing std.regex (code I definitely don't trust myself to touch)?  Would the Scope statement be useful?  I still can't figure out exactly what it does.  I tried using scope(exit)writeln("Bad regex."); just before my foreach loop, but it still crashes.  I then tried changing "exit" to "failure", but that didn't help either; same behavior.  Am I using scope wrong?

>> Also, why does it match an unlimited number of times on "$" instead of
>> just once?
>
> Looks like another std.regex bug.

I thought it through and decided that it might not be std.regex' bug.  I mean, there's no way m could have an unlimited number of elements for foreach to loop through, right?  Actually, it probably is std.regex' bug.  Though, all of this doesn't really matter since nobody uses just "$" as a regex, since it'd match an obvious point in any input.  I bet Andrei would still be irked by it if he knew though.

>> My debug build is here: http://lex.clansfx.co.uk/projects/regextester.exe
>
> A note for the future: compiled executables aren't very useful when
> source is available, especially considering many people here don't use
> Windows.

Right.

> [1]: A dot in a regular expression may not match newlines, depending on
> the implementation and search options.

Thanks for the extra info.  :)
January 31, 2011
On 2011-01-31 0:50, Alex Folland wrote:
> On 2011-01-30 21:47, Vladimir Panteleev wrote:
>> On Mon, 31 Jan 2011 03:57:44 +0200, Alex Folland <lexlexlex@gmail.com>
>> wrote:
>>
>>> I wrote this little program to test for regular expression matches. I
>>> compiled it with in Windows with DMD 2.051 through Visual Studio 2010
>>> with Visual D. It crashes if regexbuf is just the single character,
>>> "*". Why? Shouldn't it match the entire string?
>>
>> "*" in regular expressions means 0 or more instances of the previous
>> entity:
>> http://www.regular-expressions.info/repeat.html
>> It doesn't make sense at the start of an expression. ".*" is the regexp
>> that matches anything[1].
>>
>> std.regex probably can't handle invalid regexps very well. Note that
>> std.regex is a new module that intends to replace the older std.regexp,
>> but still has some problems.
>
> Okay, so that particular regex is invalid. Yeah, it still shouldn't
> crash. You're right. How should I prevent my program from crashing
> without fixing std.regex (code I definitely don't trust myself to
> touch)? Would the Scope statement be useful? I still can't figure out
> exactly what it does. I tried using scope(exit)writeln("Bad regex.");
> just before my foreach loop, but it still crashes. I then tried changing
> "exit" to "failure", but that didn't help either; same behavior. Am I
> using scope wrong?

Yeah, you nitwit.  You didn't realize that scope(failure) doesn't prevent the program from exiting and throwing an exception.  It merely runs the code inside itself before throwing an exception and exiting. However, one can use scope(failure){writeln("Bad regex");break;} to prevent an actual crash and just write "Bad regex"!  I expected that it would break automatically out of scope.  I was wrong.  Anyway, this is beautiful.  I love D.  Haha.
January 31, 2011
On 2011-01-31 2:28, Alex Folland wrote:
> scope(failure){writeln("Bad regex");break;}

Oh, that resets the program rather than continuing from where that line was placed.  The continue statement is what I wanted.
January 31, 2011
On Mon, 31 Jan 2011 09:28:25 +0200, Alex Folland <lexlexlex@gmail.com> wrote:

> scope(failure){writeln("Bad regex");break;}

I think the proper construct here is a try/catch block.

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
January 31, 2011
On 31.01.2011 4:57, Alex Folland wrote:
> I wrote this little program to test for regular expression matches.  I compiled it with in Windows with DMD 2.051 through Visual Studio 2010 with Visual D.  It crashes if regexbuf is just the single character, "*".  Why?  Shouldn't it match the entire string?
>
> Visual Studio's debug output is this:
>
> First-chance exception at 0x76fde124 in regextester.exe: 0xE0440001: 0xe0440001.
> The program '[5492] regextester.exe: Native' has exited with code 1 (0x1).
To shed some light on this output.
After using Visual D quite a lot I can tell you that this output *doesn't* mean the program crushed (in terms of segfault etc.).  All it means is that there is uncaught exception, which is quite reasonable. Regexp parsing function throws when it detects wrong pattern.  In general, when you are uncertain on what the **uk is going on I suggest to wrap suspicious code with :
try{
     ... //code
}catch(Exception e){
    writeln(e);
}
to see what's the errors are.

with lone "*" in regexp output would be :
object.Exception: *+? not allowed in atom
not very detailed perhaps but gives a hint ;)

-- 
Dmitry Olshansky

January 31, 2011
On 2011-01-31 15:43, Dmitry Olshansky wrote:
> On 31.01.2011 4:57, Alex Folland wrote:
>> I wrote this little program to test for regular expression matches. I
>> compiled it with in Windows with DMD 2.051 through Visual Studio 2010
>> with Visual D. It crashes if regexbuf is just the single character,
>> "*". Why? Shouldn't it match the entire string?
>>
>> Visual Studio's debug output is this:
>>
>> First-chance exception at 0x76fde124 in regextester.exe: 0xE0440001:
>> 0xe0440001.
>> The program '[5492] regextester.exe: Native' has exited with code 1
>> (0x1).
> To shed some light on this output.
> After using Visual D quite a lot I can tell you that this output
> *doesn't* mean the program crushed (in terms of segfault etc.). All it
> means is that there is uncaught exception, which is quite reasonable.
> Regexp parsing function throws when it detects wrong pattern. In
> general, when you are uncertain on what the **uk is going on I suggest
> to wrap suspicious code with :
> try{
> ... //code
> }catch(Exception e){
> writeln(e);
> }
> to see what's the errors are.

Ty.  Did that.  My updated code is here: http://lex.clansfx.co.uk/projects/regextester.d