regexp suggestion

It would be really nice to have a method of RegExp similar to test(), but only matching regexp at the position given, not advancing further on error, and returning number of bytes read (or 0 on failure). It could be used for easy token parsing: RegExp identifier = new RegExp('\w', ""); char[] code, token; int pos; ... int count = identifier.get(code, pos); if (count) { token = code[pos .. pos + count]; pos += count; // next token }

I believe you can already do that with regexp by looking at the match array and using it to slice the input array. "Pavel Minayev" <evilone@omen.ru> wrote in message news:a41ccn$2m50$1@digitaldaemon.com... > It would be really nice to have a method of RegExp similar to test(), > but only matching regexp at the position given, not advancing > further on error, and returning number of bytes read (or 0 on failure). > It could be used for easy token parsing: > > RegExp identifier = new RegExp('\w', ""); > char[] code, token; > int pos; > ... > int count = identifier.get(code, pos); > if (count) > { > token = code[pos .. pos + count]; > pos += count; // next token > } > > > >

"Walter" <walter@digitalmars.com> wrote in message news:a41imc$2pnk$1@digitaldaemon.com... > I believe you can already do that with regexp by looking at the match array > and using it to slice the input array. Yes, but it's sloooooow!

You can also use the "g" attribute. "Pavel Minayev" <evilone@omen.ru> wrote in message news:a41jep$2q3p$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:a41imc$2pnk$1@digitaldaemon.com... > > I believe you can already do that with regexp by looking at the match > array > > and using it to slice the input array. > > Yes, but it's sloooooow! > >

"Walter" <walter@digitalmars.com> wrote in message news:a41oek$2se5$1@digitaldaemon.com... > You can also use the "g" attribute. Sorry, I'm not very familiar with regexp... how is it supposed to do what I want?

"Pavel Minayev" <evilone@omen.ru> wrote in message news:a42jse$6h1$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:a41oek$2se5$1@digitaldaemon.com... > > > You can also use the "g" attribute. > > Sorry, I'm not very familiar with regexp... how is > it supposed to do what I want? If you use the "g" attribute to the RegExp constructor, and repeated calls to exec() will each pick up where the previous left off.

"Walter" <walter@digitalmars.com> wrote in message news:a42tc9$hrc$1@digitaldaemon.com... > If you use the "g" attribute to the RegExp constructor, and repeated calls to exec() will each pick up where the previous left off. But doesn't it try to search for the regexp further if it doens't match in current position?

"Pavel Minayev" <evilone@omen.ru> wrote in message news:a433vk$l3i$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:a42tc9$hrc$1@digitaldaemon.com... > > > If you use the "g" attribute to the RegExp constructor, and repeated calls > > to exec() will each pick up where the previous left off. > > But doesn't it try to search for the regexp further if it doens't match in current position? Yes.

"Walter" <walter@digitalmars.com> wrote in message news:a43tq3$11uk$2@digitaldaemon.com... > > But doesn't it try to search for the regexp further if it doens't match in current position? > > Yes. Then I don't understand how it can be used to tokenize the string. Suppose I have: foo123 = bar456 + 789; Now I first search for the identifier, and get "foo123" and "bar456". Then I search for numbers and get "123", "456" and "789" - and only the latter is correct... With my suggestion implemented, however, it'd look somewhat different. First I check for identifier, and get "foo123". Now I advance after the end of that token, and perform another check... when I get to "789", I check if it matches an identifier /\w.../ - it doesn't, so I check if it is a number /0-9+/ and succeed... that's how it is supposed to work.

February 09, 2002

Re: regexp suggestion

Posted by Sean L. Palmer
in reply to Pavel Minayev

Permalink

Sean L. Palmer

Posted in reply to Pavel Minayev

Permalink

I think sscanf could do this if it could return a pointer to how far it got in the input string during processing in addition to how many fields were converted.  sscanf as it exists in C is not so useful.

Sean

"Pavel Minayev" <evilone@omen.ru> wrote in message news:a443lq$147s$1@digitaldaemon.com...
> "Walter" <walter@digitalmars.com> wrote in message news:a43tq3$11uk$2@digitaldaemon.com...
>
> > > But doesn't it try to search for the regexp further if it doens't match in current position?
> >
> > Yes.
>
> Then I don't understand how it can be used to tokenize the string. Suppose I have:
>
>     foo123 = bar456 + 789;
>
> Now I first search for the identifier, and get "foo123" and "bar456". Then I search for numbers and get "123", "456" and "789" - and only the latter is correct...
>
> With my suggestion implemented, however, it'd look somewhat different. First I check for identifier, and get "foo123". Now I advance after the end of that token, and perform another check... when I get to "789", I check if it matches an identifier /\w.../ - it doesn't, so I check if it is a number /0-9+/ and succeed... that's how it is supposed to work.

Forums