Thread overview | ||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 17, 2006 Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
1) Will builtin RegExp increase minimal size of D executable? I mean if this executable is not using regexp at all. 2) Is it possible to override operator ~~ ? 3) What is the main purpose of incorporating interprettable regexps in natively compileable language? 4) When happens check of regexp for syntax correctness - at compile time or at runtime? "..." ~~ "..." If ~~ is a part of language syntax then one can assume that expression is getting compiled somehow. Andrew. |
February 17, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrew Fedoniouk | Andrew Fedoniouk wrote: > 1) Will builtin RegExp increase minimal size of D executable? I mean if this executable is not using regexp at all. No. This was as far as I understood one of the considerations. > 2) Is it possible to override operator ~~ ? Yes. opMatch() and opNext(). > 3) What is the main purpose of incorporating > interprettable regexps in natively compileable language? To make regexps more accessible I guess. Makes D seem like a alternative to scripting languages. > 4) When happens check of regexp for syntax correctness - > at compile time or at runtime? "..." ~~ "..." > If ~~ is a part of language syntax then one can assume that expression > is getting compiled somehow. At runtime. For now atleast. In the future it could possibly be compiled at compile time, but there will still always be a need to support run-time regexps anyway. /Oskar |
February 17, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrew Fedoniouk | "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:dt3v1o$27nk$1@digitaldaemon.com... > 1) Will builtin RegExp increase minimal size of D executable? I mean if this executable is not using regexp at all. No. > 2) Is it possible to override operator ~~ ? Overload, yes. With opMatch(). > 3) What is the main purpose of incorporating > interprettable regexps in natively compileable language? Make them easier to use. > 4) When happens check of regexp for syntax correctness - > at compile time or at runtime? "..." ~~ "..." Right now, at runtime. But the compiler is allowed to diagnose it at compile time, if it's a string literal. > If ~~ is a part of language syntax then one can assume that expression is getting compiled somehow. |
February 17, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Thanks, Walter, >> 2) Is it possible to override operator ~~ ? > > Overload, yes. With opMatch(). Next questions then: [char string literal] ~~ [char string literal] 1) For what object I need to override opMatch to be able to get it invoked in the line above? 2) For some types of RE (alike) expressions there is no need to create instance of RegExp, e.g. test "*.ext" ~~ file_name can be implemented times faster than standard RE creation/invocation. 3) Some objects has no string representation of match operation. For example CSS selector as an object has match operation with DOM element as an argument. But you have a requirement: "Both operands must be implicitly convertible to char[]." What to do in this case? > >> 3) What is the main purpose of incorporating >> interprettable regexps in natively compileable language? > > Make them easier to use. Easier? What is wrong with standard way: regexp re = new regexp("....."); re.test(...); And easier is not mean more effective. while( true ) { if( "mask" ~~ file_name ) .... } As far as I understand you will generate: while( true ) { regexp re = new regexp("mask"); re.test(file_name); .... } > >> 4) When happens check of regexp for syntax correctness - >> at compile time or at runtime? "..." ~~ "..." > > Right now, at runtime. But the compiler is allowed to diagnose it at compile time, if it's a string literal. > If it does not compile this regexp at compile time than this is just a fake and not a a solution at all for the language of D level. Even Perl compiles its regular expresions in compile time. So the real meaning of arg1 ~~ arg2 notation is just a shortcut of arg1.test(arg2) In general shortcuts are good but in this particular case it has hidden side effects in creation of new RegExp object on each test invocation. Andrew. |
February 17, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrew Fedoniouk | Andrew Fedoniouk wrote: > Thanks, Walter, > > >>>2) Is it possible to override operator ~~ ? >> >>Overload, yes. With opMatch(). > > > Next questions then: > [char string literal] ~~ [char string literal] > > 1) For what object I need to override opMatch to be able > to get it invoked in the line above? > > 2) For some types of RE (alike) expressions there is no need > to create instance of RegExp, e.g. test > "*.ext" ~~ file_name > can be implemented times faster than standard RE creation/invocation. > > 3) Some objects has no string representation of match operation. > For example CSS selector as an object has match operation with > DOM element as an argument. But you have a requirement: > > "Both operands must be implicitly convertible to char[]." > > What to do in this case? Instead of an answer a quick example of what I tried and what works: <CODE> import std.stdio; class ArrayBeginsWith { static ArrayBeginsWith opCall(int a) { check = a; return instance; } static ArrayBeginsWith instance; static int check; static this() { instance = new ArrayBeginsWith; } static bool opMatch(int[] nums) { if(nums.length < 1)return false; if(nums[0] == check) return true; else return false; } } static bool opMatch(int[] nums) { if(nums.length < 2)return false; if(nums[0] == 0 && nums[1] == 1) return true; else return false; } void main() { static int[] somearray1 = [0,1,2]; static int[] somearray2 = [2,1,2]; writefln(ArrayBeginsWith(0) ~~ somearray1); writefln(ArrayBeginsWith(0) ~~ somearray2); writefln(ArrayBeginsWith(2) ~~ somearray1); writefln(ArrayBeginsWith(2) ~~ somearray2); } </CODE> > > >>>3) What is the main purpose of incorporating >>>interprettable regexps in natively compileable language? >> >>Make them easier to use. > > > Easier? What is wrong with standard way: > > regexp re = new regexp("....."); > re.test(...); > Nothing is wrong with this, but ~~ is easier :) > And easier is not mean more effective. > > while( true ) > { > if( "mask" ~~ file_name ) > .... > } > > As far as I understand you will generate: > > while( true ) > { > regexp re = new regexp("mask"); > re.test(file_name); > .... > } > I don't think this is to hard to optimize away. Compiler can even generate global RegExp instance for each regular expression literal and use it many times. > > >>>4) When happens check of regexp for syntax correctness - >>>at compile time or at runtime? "..." ~~ "..." >> >>Right now, at runtime. But the compiler is allowed to diagnose it at compile time, if it's a string literal. >> > > > If it does not compile this regexp at compile time than this is just a fake and not a > a solution at all for the language of D level. > Even Perl compiles its regular expresions in compile time. > > So the real meaning of > arg1 ~~ arg2 > notation is just a shortcut of > arg1.test(arg2) > > In general shortcuts are good but in this particular case > it has hidden side effects in creation of new RegExp object on each test invocation. > This generation of new RegExp doesn't have to be true. But ~~ provides us with a feature of testing arbitrary types for arbitrary things. |
February 17, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Senji | Thanks, Ivan, see below: "Ivan Senji" <ivan.senji_REMOVE_@_THIS__gmail.com> wrote in message news:dt5b54$h1q$1@digitaldaemon.com... > Andrew Fedoniouk wrote: >> Thanks, Walter, >> >> >>>>2) Is it possible to override operator ~~ ? >>> >>>Overload, yes. With opMatch(). >> >> >> Next questions then: >> [char string literal] ~~ [char string literal] >> >> 1) For what object I need to override opMatch to be able >> to get it invoked in the line above? >> >> 2) For some types of RE (alike) expressions there is no need >> to create instance of RegExp, e.g. test >> "*.ext" ~~ file_name >> can be implemented times faster than standard RE creation/invocation. >> >> 3) Some objects has no string representation of match operation. For example CSS selector as an object has match operation with DOM element as an argument. But you have a requirement: >> >> "Both operands must be implicitly convertible to char[]." >> >> What to do in this case? > > Instead of an answer a quick example of what I tried and what works: > > <CODE> > import std.stdio; > > class ArrayBeginsWith > { > static ArrayBeginsWith opCall(int a) > { > check = a; > return instance; > } > static ArrayBeginsWith instance; > static int check; > static this() > { > instance = new ArrayBeginsWith; > } > static bool opMatch(int[] nums) > { > if(nums.length < 1)return false; > if(nums[0] == check) return true; > else return false; > } > } > > static bool opMatch(int[] nums) > { > if(nums.length < 2)return false; > if(nums[0] == 0 && nums[1] == 1) return true; > else return false; > } > > > void main() > { > static int[] somearray1 = [0,1,2]; > static int[] somearray2 = [2,1,2]; > > writefln(ArrayBeginsWith(0) ~~ somearray1); > writefln(ArrayBeginsWith(0) ~~ somearray2); > > writefln(ArrayBeginsWith(2) ~~ somearray1); > writefln(ArrayBeginsWith(2) ~~ somearray2); > } > </CODE> function startsWith( int[] arr, int v ) { if(arr.length < 1) return false; return arr[0] == check); } and its usage: static int[] somearray2 = [2,1,2]; if( somearray2.startsWith( 0 ) ) ... will be more a) compact b) human readable c) maintainable d) natural the same apply to function match( const char[] str, RegExp re ) { ... } if( mystr.match(someRe) ) .... ------------------------------------ I would go to normal implementation of outer methods instead of this :p~~. > > >> >> >>>>3) What is the main purpose of incorporating >>>>interprettable regexps in natively compileable language? >>> >>>Make them easier to use. >> >> >> Easier? What is wrong with standard way: >> >> regexp re = new regexp("....."); >> re.test(...); >> > > Nothing is wrong with this, but ~~ is easier :) > >> And easier is not mean more effective. >> >> while( true ) >> { >> if( "mask" ~~ file_name ) >> .... >> } >> >> As far as I understand you will generate: >> >> while( true ) >> { >> regexp re = new regexp("mask"); >> re.test(file_name); >> .... >> } >> > > I don't think this is to hard to optimize away. Compiler can even generate global RegExp instance for each regular expression literal and use it many times. > >> >> >>>>4) When happens check of regexp for syntax correctness - >>>>at compile time or at runtime? "..." ~~ "..." >>> >>>Right now, at runtime. But the compiler is allowed to diagnose it at compile time, if it's a string literal. >>> >> >> >> If it does not compile this regexp at compile time than this is just a >> fake and not a >> a solution at all for the language of D level. >> Even Perl compiles its regular expresions in compile time. >> >> So the real meaning of >> arg1 ~~ arg2 >> notation is just a shortcut of >> arg1.test(arg2) >> >> In general shortcuts are good but in this particular case >> it has hidden side effects in creation of new RegExp object on each test >> invocation. >> > > This generation of new RegExp doesn't have to be true. But ~~ provides us with a feature of testing arbitrary types for arbitrary things. As I said having defined function with name 'match' and clearly defined parameters is way better than to make syntax of the language look like an Xmas Tree - with all possible smiley notations (http://www.helpbytes.co.uk/smileys.php) Andrew. |
February 17, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrew Fedoniouk | Andrew Fedoniouk wrote: > Thanks, Ivan, see below: ... > > function startsWith( int[] arr, int v ) > { > if(arr.length < 1) return false; > return arr[0] == check); > } > > and its usage: > > static int[] somearray2 = [2,1,2]; > > if( somearray2.startsWith( 0 ) ) ... > > will be more a) compact b) human readable c) maintainable d) natural > Naturally, but this was just a see-if-it-can-be-done example. :) > As I said having defined function with name 'match' and clearly defined parameters > is way better than to make syntax of the language look like an Xmas Tree - Well i don't see it like that, I see it as a abstracted concept of "matching", and that can be interpreted as an elementary operation. Plus we can overload ~~ to mean matching of any kind we want that makes sense. |
February 17, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Senji | >> static int[] somearray2 = [2,1,2]; >> >> if( somearray2.startsWith( 0 ) ) ... >> >> will be more a) compact b) human readable c) maintainable d) natural >> > > Naturally, but this was just a see-if-it-can-be-done example. :) :D or better :~~D > >> As I said having defined function with name 'match' and clearly defined >> parameters >> is way better than to make syntax of the language look like an Xmas >> Tree - > > Well i don't see it like that, I see it as a abstracted concept of "matching", and that can be interpreted as an elementary operation. Plus we can overload ~~ to mean matching of any kind we want that makes sense. :) 1) According to http://www.digitalmars.com/d/expression.html#MatchExpression "Both operands must be implicitly convertible to char[]. " so yours "matching of any kind we want " is not strictly true. 2) ~~ has sidefects. Moreover it is implemented as statefull comparison so consequent ~~'s on the same arguments will yeld to different results. 3) while(true) { bool r = "a" ~~ r"\w"; } must allocate new RegExp. |
February 18, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrew Fedoniouk | Andrew Fedoniouk wrote: >>>static int[] somearray2 = [2,1,2]; >>> >>>if( somearray2.startsWith( 0 ) ) ... >>> >>>will be more a) compact b) human readable c) maintainable d) natural >>> >> >>Naturally, but this was just a see-if-it-can-be-done example. :) > > > :D or better :~~D > That's a good smiley. > >>>As I said having defined function with name 'match' and clearly defined parameters >>>is way better than to make syntax of the language look like an Xmas Tree - >> >>Well i don't see it like that, I see it as a abstracted concept of "matching", and that can be interpreted as an elementary operation. Plus we can overload ~~ to mean matching of any kind we want that makes sense. > > :) > > 1) According to http://www.digitalmars.com/d/expression.html#MatchExpression > "Both operands must be implicitly convertible to char[]. " > so yours "matching of any kind we want " is not strictly true. Well it wouldn't be the first time that the documentation is wrong/incomplete. Both types *do* have to be implicitly convertible to char[] unless you use a match expression with your own type with defined opMatch operator. > > 2) ~~ has sidefects. Moreover it is implemented as statefull comparison so > consequent ~~'s on the same arguments will yeld to different results. > char[] ~~ char[] is implemented that way, but users Foo ~~ Bar[] doesn't have to behave that way (but it can if it makes sense there are more matches) > 3) > while(true) > { > bool r = "a" ~~ r"\w"; > } > > must allocate new RegExp. Why? Why couldn't a compiler optimize this away into something like: RegExp __regexp0001; static this() { __regexp0001 = new RegExp("a"); } and then later whenever literal "a" is used as regex: while(true) { bool r = __regexp0001 ~~ r"\w"; } So it is true that a new RegExp is allocated but it needs only to be done once. |
February 18, 2006 Re: Questions about builtin RegExp | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Senji |
>> 3)
>> while(true)
>> {
>> bool r = "a" ~~ r"\w";
>> }
>>
>> must allocate new RegExp.
>
> Why?
>
> Why couldn't a compiler optimize this away into something like:
>
> RegExp __regexp0001;
> static this()
> {
> __regexp0001 = new RegExp("a");
> }
>
> and then later whenever literal "a" is used as regex:
> while(true)
> {
> bool r = __regexp0001 ~~ r"\w";
> }
>
> So it is true that a new RegExp is allocated but it needs only to be done once.
>
And what is this opNext for then?
And more: traditionally there are two "test" operations in RegExps:
'match' and 'test' as far as I remember.
match returns matched substring and test returns boolean.
There is also /g flag which allow to scan the whole string (Perl)
$i = 0while ($string =~ m/regex/g) {
print "Gotcha #" . $i. "!\n";
}So what exactly this ~~ does?Andrew.
|
Copyright © 1999-2021 by the D Language Foundation