Thread overview
Capture offset of matches in std.regex.matchAll?
Jul 07, 2014
JD
Jul 08, 2014
JR
Jul 08, 2014
Justin Whear
Jul 08, 2014
JD
Sep 25, 2020
Lewis
July 07, 2014
I'm using a compile time regex to find some tags in an input
string. Is it possible to capture the offset of the matches in
some way? Otherwise I have to "calculate" the offsets myself by
iterating over the results of matchAll.

Thanks,
Jeroen

---

Example code:

import std.stdio;
import std.regex;

void main()
{
	auto input = "<html><body><p>{{ message }}</p></body></html>";
	
	auto ctr = ctRegex!(`(\{\{|\{\%|\{\#)?`, "s");

	auto matches = matchAll(input, ctr);

          /*
	auto offset = 0;
	foreach(match;matches)
	{
		writeln(offset, ":", match);
		++offset;
	}
         */
}
July 08, 2014
On Monday, 7 July 2014 at 21:32:30 UTC, JD wrote:
> I'm using a compile time regex to find some tags in an input
> string. Is it possible to capture the offset of the matches in
> some way? Otherwise I have to "calculate" the offsets myself by
> iterating over the results of matchAll.
>
> Thanks,
> Jeroen

I believe what matchAll returns evaluates its .front lazily, so aye; you need to pop it until you get to the match you want. :< (assuming I understand the question correctly)

You can however index the *captured fields* in a specific match. I couldn't wrap my head around your example pattern but see http://dpaste.dzfl.pl/f693db93c3a4 for a dumbed-down version.

You can't slice match, nor can you have foreach provide an index variable. This may be to have foreach include named fields? Not sure.
July 08, 2014
On Mon, 07 Jul 2014 21:32:29 +0000, JD wrote:

> I'm using a compile time regex to find some tags in an input string. Is it possible to capture the offset of the matches in some way? Otherwise I have to "calculate" the offsets myself by iterating over the results of matchAll.
> 
> Thanks,
> Jeroen
> 
> ---
> 
> Example code:
> 
> import std.stdio;
> import std.regex;
> 
> void main()
> {
> 	auto input = "<html><body><p>{{ message }}</p></body></html>";
> 
> 	auto ctr = ctRegex!(`(\{\{|\{\%|\{\#)?`, "s");
> 
> 	auto matches = matchAll(input, ctr);
> 
>            /*
> 	auto offset = 0;
> 	foreach(match;matches)
> 	{
> 		writeln(offset, ":", match);
> 		++offset;
> 	}
>           */
> }

What do you mean by offset?  If you simply mean the index of the match, as your example seems to indicate, you can zip the matches with iota or sequence!"n".

If you want the offset in the string where each match begins I think you're out of luck.  I needed something similar a while ago and the best I could find was using the cumulative length of the pre property.
July 08, 2014
On Tuesday, 8 July 2014 at 15:58:47 UTC, Justin Whear wrote:
>
> What do you mean by offset?  If you simply mean the index of the match,
> as your example seems to indicate, you can zip the matches with iota or
> sequence!"n".
>
> If you want the offset in the string where each match begins I think
> you're out of luck.  I needed something similar a while ago and the best
> I could find was using the cumulative length of the pre property.


Sorry for my confusing example!

Yes, I was looking for the offset in the string where the matches begin.
I did some programming in PHP in the past. Their preg_match_all function has an optional offset_capture flag. I was hoping for something similar in std.regex...

Good tip, I'll use the cumulative length of pre.

Thanks you both for your replies!



September 25, 2020
On Monday, 7 July 2014 at 21:32:30 UTC, JD wrote:
> I'm using a compile time regex to find some tags in an input
> string. Is it possible to capture the offset of the matches in
> some way? Otherwise I have to "calculate" the offsets myself by
> iterating over the results of matchAll.
>
> Thanks,
> Jeroen
>

For anyone coming to this later, I believe the capture strings are all slices. So you can do 'capture[1].ptr - originalString.ptr' to get the index the capture starts at.