regex format string problem

Nov 22, 2015

yawniek

Nov 23, 2015

Rikki Cattermole

Nov 23, 2015

yawniek

Nov 23, 2015

Rikki Cattermole

hi! how can i format a string with captures from a regular expression? basically make this pass: https://gist.github.com/f17647fb2f8ff2261d42 context: i'm trying to write a implementation for https://github.com/ua-parser where the regular expression as well as the format strings are given.

On 23/11/15 12:41 PM, yawniek wrote: > hi! > > how can i format a string with captures from a regular expression? > basically make this pass: > https://gist.github.com/f17647fb2f8ff2261d42 > > > context: i'm trying to write a implementation for > https://github.com/ua-parser > where the regular expression as well as the format strings are given. I take it that browscap[0] does it not do what you want? I have an generator at [1]. Feel free to steal. Also once you do get yours working, you'll want to use ctRegex and generate a file with all of them in it. That'll increase performance significantly. Reguarding regex, if you want a named sub part use: (?<text>[a-z]*) Where [a-z]* is just an example. I would recommend you learning how input ranges work. They are used with how to get the matches out, e.g. auto rgx = ctRegex!`([a-z])[123]`; foreach(match; rgx.matchAll("b3")) { writeln(match.hit); } Or something along those lines, I did it off the top of my head. [0] https://github.com/rikkimax/Cmsed/blob/master/tools/browser_detection/browscap.ini [1] https://github.com/rikkimax/Cmsed/blob/master/tools/browser_detection/generator.d

Hi Rikki, On Monday, 23 November 2015 at 03:57:06 UTC, Rikki Cattermole wrote: > I take it that browscap[0] does it not do what you want? > I have an generator at [1]. > Feel free to steal. This looks interesting, thanks for the hint. However it might be a bit limited, i have 15M+ different User Agents with all kind of weird cases, sometimes not even the extensive ua-core regexs work. (if you're interested for testing let me know) > Also once you do get yours working, you'll want to use ctRegex and generate a file with all of them in it. That'll increase performance significantly. that was my plan. > Reguarding regex, if you want a named sub part use: > (?<text>[a-z]*) > Where [a-z]* is just an example. > > I would recommend you learning how input ranges work. They are used with how to get the matches out, e.g. > > auto rgx = ctRegex!`([a-z])[123]`; > foreach(match; rgx.matchAll("b3")) { > writeln(match.hit); > } i'm aware how this works, the problem is a different one: i do have a second string that contains $n's which can occur in any order. now of course i can just go and write another regex and replace it, job done. but from looking at std.regex this seems to be built in, i just failed to get it to work properly, see my gist. i hoped this to be a 1liner.

November 23, 2015

Re: regex format string problem

Posted by Rikki Cattermole
in reply to yawniek

Permalink

Rikki Cattermole

Posted in reply to yawniek

Permalink

On 23/11/15 9:22 PM, yawniek wrote:
> Hi Rikki,
>
> On Monday, 23 November 2015 at 03:57:06 UTC, Rikki Cattermole wrote:
>> I take it that browscap[0] does it not do what you want?
>> I have an generator at [1].
>> Feel free to steal.
>
> This looks interesting, thanks for the hint. However it might be a bit
> limited,
> i have 15M+ different User Agents with all kind of weird cases,
> sometimes not even the extensive ua-core regexs work. (if you're
> interested for testing let me know)
>
>> Also once you do get yours working, you'll want to use ctRegex and
>> generate a file with all of them in it. That'll increase performance
>> significantly.
>
> that was my plan.
>
>> Reguarding regex, if you want a named sub part use:
>> (?<text>[a-z]*)
>> Where [a-z]* is just an example.
>>
>> I would recommend you learning how input ranges work. They are used
>> with how to get the matches out, e.g.
>>
>> auto rgx = ctRegex!`([a-z])[123]`;
>> foreach(match; rgx.matchAll("b3")) {
>>     writeln(match.hit);
>> }
>
> i'm aware how this works, the problem is a different  one:
>
> i do have a second string that contains $n's which can occur in any order.
> now of course i can just go and write another regex and replace it, job
> done.
> but from looking at std.regex this seems to be built in, i just failed
> to get it to work properly, see my gist. i hoped this to be a 1liner.

So like this?

import std.regex;
import std.stdio : readln, writeln, write, stdout;

auto REG = ctRegex!(`(\S+)(?: (.*))?`);

void main() {
        for(;;) {
                write("> ");
                stdout.flush;
                string line = readln();
                line.length--;

                if (line.length == 0)
                        return;

                writeln("< ", line.replaceAll(REG, "Unknown program: $1"));
        }
}

Forums