Jump to page: 1 2
Thread overview
regex issue
Mar 16, 2012
Joshua Niehus
Mar 16, 2012
Dmitry Olshansky
Mar 16, 2012
Joshua Niehus
Mar 17, 2012
Dmitry Olshansky
Mar 19, 2012
Jay Norwood
Mar 19, 2012
Dmitry Olshansky
Mar 19, 2012
Dmitry Olshansky
Mar 19, 2012
Jay Norwood
Mar 19, 2012
Dmitry Olshansky
Mar 19, 2012
Jay Norwood
Mar 19, 2012
Jay Norwood
Mar 19, 2012
Dmitry Olshansky
Mar 19, 2012
Jay Norwood
Mar 20, 2012
Jay Norwood
Mar 20, 2012
Dmitry Olshansky
Mar 20, 2012
Jay Norwood
Mar 20, 2012
James Miller
Mar 19, 2012
Dmitry Olshansky
March 16, 2012
Hello,

Does anyone know why I would get different results between
ctRegex and regex in the following snippet?

Thanks,
Josh

---
#!/usr/local/bin/rdmd
import std.stdio, std.regex;

void main() {
    string strcmd = "./myApp.rb -os OSX -path \"/GIT/Ruby
Apps/sec\" -conf 'no timer'";

    auto ctre = ctRegex!(`(".*")|('.*')`, "g");
    auto   re =   regex (`(".*")|('.*')`, "g");

    auto ctm = match(strcmd, ctre);
    foreach(ct; ctm)
      writeln(ct.hit());

    auto m = match(strcmd, re);
    foreach(h; m)
      writeln(h.hit());
}
/* output */
"/GIT/Ruby Apps/sec"
'no timer'
"/GIT/Ruby Apps/sec"
March 16, 2012
On 16.03.2012 7:36, Joshua Niehus wrote:
> Hello,
>
> Does anyone know why I would get different results between
> ctRegex and regex in the following snippet?

Ehm, because they have different engines that _should_ give identical results. And the default one apparently has a bug, that I'm looking into.
Fill the bug report plz.

>
> Thanks,
> Josh
>
> ---
> #!/usr/local/bin/rdmd
> import std.stdio, std.regex;
>
> void main() {
> string strcmd = "./myApp.rb -os OSX -path \"/GIT/Ruby
> Apps/sec\" -conf 'no timer'";
>
> auto ctre = ctRegex!(`(".*")|('.*')`, "g");
> auto re = regex (`(".*")|('.*')`, "g");
>
> auto ctm = match(strcmd, ctre);
> foreach(ct; ctm)
> writeln(ct.hit());
>
> auto m = match(strcmd, re);
> foreach(h; m)
> writeln(h.hit());
> }
> /* output */
> "/GIT/Ruby Apps/sec"
> 'no timer'
> "/GIT/Ruby Apps/sec"


-- 
Dmitry Olshansky
March 16, 2012
On Friday, 16 March 2012 at 08:34:18 UTC, Dmitry Olshansky wrote:
> Ehm, because they have different engines that _should_ give identical results. And the default one apparently has a bug, that I'm looking into.
> Fill the bug report plz.

Ok, submitted: id 7718

Thanks,
Josh
March 17, 2012
On 16.03.2012 20:05, Joshua Niehus wrote:
> On Friday, 16 March 2012 at 08:34:18 UTC, Dmitry Olshansky wrote:
>> Ehm, because they have different engines that _should_ give identical
>> results. And the default one apparently has a bug, that I'm looking into.
>> Fill the bug report plz.
>
> Ok, submitted: id 7718
>
> Thanks,
> Josh

And the fix is coming
https://github.com/D-Programming-Language/phobos/pull/462

I take this time to also thank you, as this was interestingly big oversight in that engine code that revealed to me some fundamental things.

-- 
Dmitry Olshansky
March 19, 2012
On Friday, 16 March 2012 at 03:36:12 UTC, Joshua Niehus wrote:
> Hello,
>
> Does anyone know why I would get different results between
> ctRegex and regex in the following snippet?
>
> Thanks,
> Josh
>
>

I'm also having questions about the matchers.  From what I understand in the docs, if I use this greedy matcher to count lines, it should have counted all the lines in the first match (when I hade it outside the foreach.  In that case, I should have been able to do something like:

matches=match(input,ctr);
l_cnt = matches.length();

But I only get length=1, and so I'm a bit concerned that greedy is not really working. In fact, it is about 3x faster to just run the second piece of code, so I think something must be wrong...


void wcp_ctRegex(string fn)
{
	string input = cast(string)std.file.read(fn);
	enum ctr =  ctRegex!("\n","g");
	ulong l_cnt;
	foreach(m; match(input,ctr))
	{
		l_cnt ++;
	}
}


void wcp_char(string fn)
{
	string input = cast(string)std.file.read(fn);
	ulong l_cnt;
	foreach(c; input)
	{
		if (c == '\n')
		l_cnt ++;
	}
}

March 19, 2012
On 19.03.2012 6:50, Jay Norwood wrote:
> On Friday, 16 March 2012 at 03:36:12 UTC, Joshua Niehus wrote:
>> Hello,
>>
>> Does anyone know why I would get different results between
>> ctRegex and regex in the following snippet?
>>
>> Thanks,
>> Josh
>>
>>
>
> I'm also having questions about the matchers. From what I understand in
> the docs, if I use this greedy matcher to count lines, it should have
> counted all the lines in the first match (when I hade it outside the
> foreach.

Like I told in main D group it's wrong - regex doesn't only count matches. It finds slices that do match.
Thus to make it more efficient, it returns lazy range that does searches on request. "g" - means global :)
Then code like this is cool and fast:
foreach(m; match(input, ctr))
{
	if(m.hit == "magic we are looking for")
		break; // <<< ---- no greedy find it all syndrome
}

 In that case, I should have been able to do something like:
>
> matches=match(input,ctr);
> l_cnt = matches.length();
>
> But I only get length=1, and so I'm a bit concerned that greedy is not
> really working. In fact, it is about 3x faster to just run the second
> piece of code, so I think something must be wrong...
>
>
> void wcp_ctRegex(string fn)
> {
> string input = cast(string)std.file.read(fn);
> enum ctr = ctRegex!("\n","g");
> ulong l_cnt;
> foreach(m; match(input,ctr))
> {
> l_cnt ++;
> }
> }
>
>
> void wcp_char(string fn)
> {
> string input = cast(string)std.file.read(fn);
> ulong l_cnt;
> foreach(c; input)
> {
> if (c == '\n')
> l_cnt ++;
> }
> }
>


-- 
Dmitry Olshansky
March 19, 2012
On 19.03.2012 12:05, Dmitry Olshansky wrote:
> On 19.03.2012 6:50, Jay Norwood wrote:
>> On Friday, 16 March 2012 at 03:36:12 UTC, Joshua Niehus wrote:
>>> Hello,
>>>
>>> Does anyone know why I would get different results between
>>> ctRegex and regex in the following snippet?
>>>
>>> Thanks,
>>> Josh
>>>
>>>
>>
>> I'm also having questions about the matchers. From what I understand in
>> the docs, if I use this greedy matcher to count lines, it should have
>> counted all the lines in the first match (when I hade it outside the
>> foreach.
>
> Like I told in main D group it's wrong - regex doesn't only count
> matches. It finds slices that do match.
> Thus to make it more efficient, it returns lazy range that does searches
> on request. "g" - means global :)
> Then code like this is cool and fast:
> foreach(m; match(input, ctr))
> {
> if(m.hit == "magic we are looking for")
> break; // <<< ---- no greedy find it all syndrome
> }
>
> In that case, I should have been able to do something like:
>>
>> matches=match(input,ctr);
>> l_cnt = matches.length();

I'm curious what this length() does as I have no length for RegexMatch in the API :)

>>
>> But I only get length=1, and so I'm a bit concerned that greedy is not
>> really working. In fact, it is about 3x faster to just run the second
>> piece of code, so I think something must be wrong...


-- 
Dmitry Olshansky
March 19, 2012
On Monday, 19 March 2012 at 08:14:18 UTC, Dmitry Olshansky wrote:
> On 19.03.2012 12:05, Dmitry Olshansky wrote:
>>
>> In that case, I should have been able to do something like:
>>>
>>> matches=match(input,ctr);
>>> l_cnt = matches.length();
>
> I'm curious what this length() does as I have no length for RegexMatch in the API :)
>
>>>
>>> But I only get length=1, and so I'm a bit concerned that greedy is not
>>> really working. In fact, it is about 3x faster to just run the second
>>> piece of code, so I think something must be wrong...

http://dlang.org/phobos/std_regex.html#length

Yes, I should have typed matches.captures.length.  It is  always returning 1, even though the desciption indicates the "g" flag should create a match object that contains all the submatches.

March 19, 2012
On 19.03.2012 16:59, Jay Norwood wrote:
> On Monday, 19 March 2012 at 08:14:18 UTC, Dmitry Olshansky wrote:
>> On 19.03.2012 12:05, Dmitry Olshansky wrote:
>>>
>>> In that case, I should have been able to do something like:
>>>>
>>>> matches=match(input,ctr);
>>>> l_cnt = matches.length();
>>
>> I'm curious what this length() does as I have no length for RegexMatch
>> in the API :)
>>
>>>>
>>>> But I only get length=1, and so I'm a bit concerned that greedy is not
>>>> really working. In fact, it is about 3x faster to just run the second
>>>> piece of code, so I think something must be wrong...
>
> http://dlang.org/phobos/std_regex.html#length
>
> Yes, I should have typed matches.captures.length. It is always returning
> 1, even though the desciption indicates the "g" flag should create a
> match object that contains all the submatches.
>

Captures is a range of submatches as in "(a)(b)(c)" has 3 sub matches + 1 whole match == 4.

-- 
Dmitry Olshansky
March 19, 2012
On Monday, 19 March 2012 at 08:05:18 UTC, Dmitry Olshansky wrote:
> Like I told in main D group it's wrong - regex doesn't only count matches. It finds slices that do match.
> Thus to make it more efficient, it returns lazy range that does searches on request. "g" - means global :)
> Then code like this is cool and fast:
> foreach(m; match(input, ctr))
> {
> 	if(m.hit == "magic we are looking for")
> 		break; // <<< ---- no greedy find it all syndrome
> }
>

ok, global.  So the document implies that I should be able to get a single match object with a count of the submatches.  So I think maybe I've jumped to the wrong conclusion about how to use it, thinking I could just use "\n" and "g" flag got get all the matches for the range of "\n".  So it looks like instead that the term "submatches" needs more explanation.  What exactly constitutes a submatch?  I infered it just meant any single match among many.

  //create static regex at compile-time, contains fast native code
  enum ctr = ctRegex!(`^.*/([^/]+)/?$`);

  //works just like normal regex:
  auto m2 = match("foo/bar", ctr);   //first match found here if any
  assert(m2);   // be sure to check if there is a match, before examining contents!
  assert(m2.captures[1] == "bar");//captures is a range of submatches, 0 - full match


btw, I couldn't get this \p option to work for the uni properties.  Can you provide some example of that which works?

\p{PropertyName}  Matches character that belongs to unicode PropertyName set. Single letter abreviations could be used without surrounding {,}.

« First   ‹ Prev
1 2