Thread overview | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 08, 2002 regexp suggestion | ||||
---|---|---|---|---|
| ||||
It would be really nice to have a method of RegExp similar to test(), but only matching regexp at the position given, not advancing further on error, and returning number of bytes read (or 0 on failure). It could be used for easy token parsing: RegExp identifier = new RegExp('\w', ""); char[] code, token; int pos; ... int count = identifier.get(code, pos); if (count) { token = code[pos .. pos + count]; pos += count; // next token } |
February 08, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pavel Minayev | I believe you can already do that with regexp by looking at the match array and using it to slice the input array. "Pavel Minayev" <evilone@omen.ru> wrote in message news:a41ccn$2m50$1@digitaldaemon.com... > It would be really nice to have a method of RegExp similar to test(), > but only matching regexp at the position given, not advancing > further on error, and returning number of bytes read (or 0 on failure). > It could be used for easy token parsing: > > RegExp identifier = new RegExp('\w', ""); > char[] code, token; > int pos; > ... > int count = identifier.get(code, pos); > if (count) > { > token = code[pos .. pos + count]; > pos += count; // next token > } > > > > |
February 08, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:a41imc$2pnk$1@digitaldaemon.com... > I believe you can already do that with regexp by looking at the match array > and using it to slice the input array. Yes, but it's sloooooow! |
February 08, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pavel Minayev | You can also use the "g" attribute. "Pavel Minayev" <evilone@omen.ru> wrote in message news:a41jep$2q3p$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:a41imc$2pnk$1@digitaldaemon.com... > > I believe you can already do that with regexp by looking at the match > array > > and using it to slice the input array. > > Yes, but it's sloooooow! > > |
February 09, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:a41oek$2se5$1@digitaldaemon.com... > You can also use the "g" attribute. Sorry, I'm not very familiar with regexp... how is it supposed to do what I want? |
February 09, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pavel Minayev | "Pavel Minayev" <evilone@omen.ru> wrote in message news:a42jse$6h1$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:a41oek$2se5$1@digitaldaemon.com... > > > You can also use the "g" attribute. > > Sorry, I'm not very familiar with regexp... how is > it supposed to do what I want? If you use the "g" attribute to the RegExp constructor, and repeated calls to exec() will each pick up where the previous left off. |
February 09, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:a42tc9$hrc$1@digitaldaemon.com... > If you use the "g" attribute to the RegExp constructor, and repeated calls to exec() will each pick up where the previous left off. But doesn't it try to search for the regexp further if it doens't match in current position? |
February 09, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pavel Minayev | "Pavel Minayev" <evilone@omen.ru> wrote in message news:a433vk$l3i$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:a42tc9$hrc$1@digitaldaemon.com... > > > If you use the "g" attribute to the RegExp constructor, and repeated calls > > to exec() will each pick up where the previous left off. > > But doesn't it try to search for the regexp further if it doens't match in current position? Yes. |
February 09, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:a43tq3$11uk$2@digitaldaemon.com... > > But doesn't it try to search for the regexp further if it doens't match in current position? > > Yes. Then I don't understand how it can be used to tokenize the string. Suppose I have: foo123 = bar456 + 789; Now I first search for the identifier, and get "foo123" and "bar456". Then I search for numbers and get "123", "456" and "789" - and only the latter is correct... With my suggestion implemented, however, it'd look somewhat different. First I check for identifier, and get "foo123". Now I advance after the end of that token, and perform another check... when I get to "789", I check if it matches an identifier /\w.../ - it doesn't, so I check if it is a number /0-9+/ and succeed... that's how it is supposed to work. |
February 09, 2002 Re: regexp suggestion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pavel Minayev | I think sscanf could do this if it could return a pointer to how far it got in the input string during processing in addition to how many fields were converted. sscanf as it exists in C is not so useful. Sean "Pavel Minayev" <evilone@omen.ru> wrote in message news:a443lq$147s$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:a43tq3$11uk$2@digitaldaemon.com... > > > > But doesn't it try to search for the regexp further if it doens't match in current position? > > > > Yes. > > Then I don't understand how it can be used to tokenize the string. Suppose I have: > > foo123 = bar456 + 789; > > Now I first search for the identifier, and get "foo123" and "bar456". Then I search for numbers and get "123", "456" and "789" - and only the latter is correct... > > With my suggestion implemented, however, it'd look somewhat different. First I check for identifier, and get "foo123". Now I advance after the end of that token, and perform another check... when I get to "789", I check if it matches an identifier /\w.../ - it doesn't, so I check if it is a number /0-9+/ and succeed... that's how it is supposed to work. |
Copyright © 1999-2021 by the D Language Foundation