June 05, 2012
I am trying to see if all regex matches in one file are present in another file.
The code works; but, part way through the nested foreach(s) I get the error listed in the subject line.  I would think this error would come up when the Regex expressions were executed not when I'm iterating through the resultant matches.

Is there a better way to do this or can I just allocate more memory?
Thanks.

// Execute Regex expressions
auto uniCapturesOld = match(uniFileOld, regex(r"^NAME   = (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
auto uniCapturesNew = match(uniFileNew, regex(r"^NAME   = (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));

// Iterate through match collections to see if both files contain the same matches.
    foreach (matchOld; uniCapturesOld) {
        cntOld++;
        found = false;
        foreach (matchNew; uniCapturesNew) {
            cntNew++;
            // Following line is for troublshooting.
            writeln(cntOld,"  ",cntNew,"  ",matchOld.hit,"  ",matchNew.hit);
            if (matchOld.hit == matchNew.hit) {found=true;break;}}
        if (!found) writeln(cntNF++," ",matchOld.hit," not found);}

June 05, 2012
On 06.06.2012 0:25, Paul wrote:
> I am trying to see if all regex matches in one file are present in
> another file.
> The code works; but, part way through the nested foreach(s) I get the
> error listed in the subject line. I would think this error would come up
> when the Regex expressions were executed not when I'm iterating through
> the resultant matches.
>
To get next match engine is run again, then again for the next match and so on - it's lazy evaluation at it's finest (how knows maybe you'll break loop half-way through). Obviously it either looses some RAM in between calls or it just bugs out when reaches some specific text.

> Is there a better way to do this or can I just allocate more memory?
> Thanks.

Looks like you found a bug. Meaning that I probably miscalculated required amount of RAM or lose some free list nodes between calls.

File a bug report, keep in mind that I need the data to reproduce it.

Untill I figure it out, I recommend to fallback on bmatch function that is slower and in general unbound on used memory but should work.

Another idea - try to modify one of regexes insignificantly, so that they don't reuse data structures internally (just in case it has to do with that).

> // Execute Regex expressions
> auto uniCapturesOld = match(uniFileOld, regex(r"^NAME =
> (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
> auto uniCapturesNew = match(uniFileNew, regex(r"^NAME =
> (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));


>
> // Iterate through match collections to see if both files contain the
> same matches.
> foreach (matchOld; uniCapturesOld) {
> cntOld++;
> found = false;
> foreach (matchNew; uniCapturesNew) {
> cntNew++;
> // Following line is for troublshooting.
> writeln(cntOld," ",cntNew," ",matchOld.hit," ",matchNew.hit);
> if (matchOld.hit == matchNew.hit) {found=true;break;}}
> if (!found) writeln(cntNF++," ",matchOld.hit," not found);}
>


-- 
Dmitry Olshansky