Thread overview
regex format string problem
Nov 22, 2015
yawniek
Nov 23, 2015
Rikki Cattermole
Nov 23, 2015
yawniek
Nov 23, 2015
Rikki Cattermole
November 22, 2015
hi!

how can i format  a string with captures from a regular expression?
basically make this pass:
https://gist.github.com/f17647fb2f8ff2261d42


context: i'm trying to write a implementation for https://github.com/ua-parser
where the regular expression as well as the format strings are given.


November 23, 2015
On 23/11/15 12:41 PM, yawniek wrote:
> hi!
>
> how can i format  a string with captures from a regular expression?
> basically make this pass:
> https://gist.github.com/f17647fb2f8ff2261d42
>
>
> context: i'm trying to write a implementation for
> https://github.com/ua-parser
> where the regular expression as well as the format strings are given.

I take it that browscap[0] does it not do what you want?
I have an generator at [1].
Feel free to steal.

Also once you do get yours working, you'll want to use ctRegex and generate a file with all of them in it. That'll increase performance significantly.

Reguarding regex, if you want a named sub part use:
(?<text>[a-z]*)
Where [a-z]* is just an example.

I would recommend you learning how input ranges work. They are used with how to get the matches out, e.g.

auto rgx = ctRegex!`([a-z])[123]`;
foreach(match; rgx.matchAll("b3")) {
    writeln(match.hit);
}

Or something along those lines, I did it off the top of my head.

[0] https://github.com/rikkimax/Cmsed/blob/master/tools/browser_detection/browscap.ini
[1] https://github.com/rikkimax/Cmsed/blob/master/tools/browser_detection/generator.d

November 23, 2015
Hi Rikki,

On Monday, 23 November 2015 at 03:57:06 UTC, Rikki Cattermole wrote:
> I take it that browscap[0] does it not do what you want?
> I have an generator at [1].
> Feel free to steal.

This looks interesting, thanks for the hint. However it might be a bit limited,
i have 15M+ different User Agents with all kind of weird cases, sometimes not even the extensive ua-core regexs work. (if you're interested for testing let me know)

> Also once you do get yours working, you'll want to use ctRegex and generate a file with all of them in it. That'll increase performance significantly.

that was my plan.

> Reguarding regex, if you want a named sub part use:
> (?<text>[a-z]*)
> Where [a-z]* is just an example.
>
> I would recommend you learning how input ranges work. They are used with how to get the matches out, e.g.
>
> auto rgx = ctRegex!`([a-z])[123]`;
> foreach(match; rgx.matchAll("b3")) {
>     writeln(match.hit);
> }

i'm aware how this works, the problem is a different  one:

i do have a second string that contains $n's which can occur in any order.
now of course i can just go and write another regex and replace it, job done.
but from looking at std.regex this seems to be built in, i just failed to get it to work properly, see my gist. i hoped this to be a 1liner.


November 23, 2015
On 23/11/15 9:22 PM, yawniek wrote:
> Hi Rikki,
>
> On Monday, 23 November 2015 at 03:57:06 UTC, Rikki Cattermole wrote:
>> I take it that browscap[0] does it not do what you want?
>> I have an generator at [1].
>> Feel free to steal.
>
> This looks interesting, thanks for the hint. However it might be a bit
> limited,
> i have 15M+ different User Agents with all kind of weird cases,
> sometimes not even the extensive ua-core regexs work. (if you're
> interested for testing let me know)
>
>> Also once you do get yours working, you'll want to use ctRegex and
>> generate a file with all of them in it. That'll increase performance
>> significantly.
>
> that was my plan.
>
>> Reguarding regex, if you want a named sub part use:
>> (?<text>[a-z]*)
>> Where [a-z]* is just an example.
>>
>> I would recommend you learning how input ranges work. They are used
>> with how to get the matches out, e.g.
>>
>> auto rgx = ctRegex!`([a-z])[123]`;
>> foreach(match; rgx.matchAll("b3")) {
>>     writeln(match.hit);
>> }
>
> i'm aware how this works, the problem is a different  one:
>
> i do have a second string that contains $n's which can occur in any order.
> now of course i can just go and write another regex and replace it, job
> done.
> but from looking at std.regex this seems to be built in, i just failed
> to get it to work properly, see my gist. i hoped this to be a 1liner.

So like this?

import std.regex;
import std.stdio : readln, writeln, write, stdout;

auto REG = ctRegex!(`(\S+)(?: (.*))?`);

void main() {
        for(;;) {
                write("> ");
                stdout.flush;
                string line = readln();
                line.length--;

                if (line.length == 0)
                        return;

                writeln("< ", line.replaceAll(REG, "Unknown program: $1"));
        }
}