February 16, 2006
"Chr. Grade" <Chr._member@pathlink.com> wrote in message news:dt0gmk$24v3$1@digitaldaemon.com...
>
>>
>>Currently, evaluating ("^abc"~~string) invokes the full std.regexp
>>machinery. But a compiler is free to optimize (1) into (2). I'm thinking
>>of
>>Eric and Don's examples of generating custom recognizers for static regex
>>strings. This could make D's regex support into a real screamer.
>>
>
> Static regex? Umm...
> Again, this might be absurd, but there could be a type "regex".
>
> regex rxSome  = "§|&|=";
> regex rxMore  = "[a-n]";
> regex rxMerge = "foo($rxSome)?($rxMore)+";

---------------------------
import std.regexp;

auto rxSome = RegExp("§|&|=");
if (rxSome ~~ "string")
    ...
-----------------------------
works now.


February 16, 2006
In article <4z8zsk5s3ozv$.1xsunk1521nn9.dlg@40tude.net>, Derek Parnell says...

>> Maybe this explains what I meant, maybe it is just absurd.
>> 
>
>I'm really sorry, but this has just made it worse for me. I have absolutely no idea what you are trying to do or say.
>
>Are you talking about a list of pointers to strings and searching over the referenced strings in one ~~ operation?
>

Yes, whole list in one operation, indexing matches. The regexp engine would have to do the pointer hopping as needed.

Here's an example of what I mean, but it can't handle discontinuous buffers:
www.boost.org/libs/regex/example/snippets/regex_search_example.cpp
The code there could be wrapped up in a class/struct which only exposes the
iteration through the map/list with matches via overloaded operators.

Chr. Grade

>-- 
>Derek
>(skype: derek.j.parnell)
>Melbourne, Australia
>"Down with mediocracy!"
>16/02/2006 11:20:32 AM


February 16, 2006
On Wed, 15 Feb 2006 16:29:21 -0800, Walter Bright wrote:

> "Derek Parnell" <derek@psych.ward> wrote in message news:k0lbfijz1ng3$.7oaf5rf2w9ut$.dlg@40tude.net...
>> On Wed, 15 Feb 2006 13:52:12 -0800, Walter Bright wrote:
>>
>>> Added match expressions.
>>
>> Too lazy to test sorry. Do match expressions support Unicode or just ASCII?
> 
> I know it works with ASCII, and it's supposed to work with UTF. I wouldn't be surprised if the latter is buggy, though, since I haven't written test cases for it.
> 
> It's designed, however, so the compiler itself need know nothing about regular expressions. The work is all done by std.regexp.

Seems to be working, but more unittests could be written.

void main()
{
    assert( "\uff16" ~~ "\u2341\uff16" );  // succeeds correctly
    //assert( "\xff" ~~ "\u2341\uff16" );  // fails correctly
    //assert( "^\uff16" ~~ "\u2341\uff16" );  // fails correctly
    assert( "\uff16$" ~~ "\u2341\uff16" );  // succeeds correctly
}

BTW, one side effect of the new matching syntax is that you don't have to explicitly import std.regexp.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
16/02/2006 11:56:43 AM
February 16, 2006
Walter Bright wrote:
> Added match expressions.
> 
> http://www.digitalmars.com/d/changelog.html
> 
> 
> 

Is it possible to drop in compile-time regex support? (i.e. Eric's solution)
February 16, 2006
"Derek Parnell" <derek@psych.ward> wrote in message news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg@40tude.net...
> Seems to be working, but more unittests could be written.
>
> void main()
> {
>    assert( "\uff16" ~~ "\u2341\uff16" );  // succeeds correctly
>    //assert( "\xff" ~~ "\u2341\uff16" );  // fails correctly
>    //assert( "^\uff16" ~~ "\u2341\uff16" );  // fails correctly
>    assert( "\uff16$" ~~ "\u2341\uff16" );  // succeeds correctly
> }

You can use !~ for the fails cases.

> BTW, one side effect of the new matching syntax is that you don't have to explicitly import std.regexp.

That was on purpose. It uses a proxy.


February 16, 2006
In article <dt088d$1svm$1@digitaldaemon.com>, Walter Bright says...
>
>Added match expressions.
>
>http://www.digitalmars.com/d/changelog.html

A question: I wonder, do you fix the regressions that arise on each of these releases? (I really ask myself 'cos I don't see that fixes in the changelog or maybe i'm wrong)

Thanks in advance,

P.S.: Another little question (i know, it's a second one :-D), sorry about my ignorance of common emoticons and stuff, what does <g> means?

Tom;
February 16, 2006
On Wed, 15 Feb 2006 17:13:48 -0800, Walter Bright wrote:

> "Derek Parnell" <derek@psych.ward> wrote in message news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg@40tude.net...
>> Seems to be working, but more unittests could be written.

And here they are ...

void main()
{
    char[] target = "\u2341\u2201\uff16";

    assert( "\xff" !~  target );  // fails correctly
    assert( "\x22" !~  target );  // fails correctly

    assert( ".\x22." !~  target );  // fails correctly

    assert( "\uff16" ~~ target );  // succeeds correctly
    assert( "^\uff16" !~ target );  // fails correctly
    assert( "\uff16$" ~~ target );  // succeeds correctly

    assert( "\u2341" ~~ target );  // succeeds correctly
    assert( "^\u2341" ~~ target );  // succeeds correctly
    assert( "\u2341$" !~ target );  // fails correctly

    assert( "\u2201" ~~ target );  // succeeds correctly
    assert( "^\u2201" !~ target );  // fails correctly
    assert( "\u2201$" !~ target );  // fails correctly

    assert( "\u2201\uff16" ~~ target );  // succeeds correctly
    assert( "^\u2201\uff16" !~ target );  // succeeds correctly
    assert( "\u2201\uff16$" ~~ target );  // succeeds correctly

    assert( "\u2341\u2201" ~~ target );  // succeeds correctly
    assert( "^\u2341\u2201" ~~ target );  // succeeds correctly
    assert( "\u2341\u2201$" !~ target );  // fails correctly

    assert( "\u2341\u2201\uff16" ~~ target );  // succeeds correctly
    assert( "^\u2341\u2201\uff16" ~~ target );  // succeeds correctly
    assert( "\u2341\u2201\uff16$" ~~ target );  // succeeds correctly
    assert( "^\u2341\u2201\uff16$" ~~ target );  // succeeds correctly

    //assert( "\u2341.\uff16" ~~ target );  // fails
    //assert( "^\u2341.\uff16" ~~ target );  // fails
    //assert( "\u2341.\uff16$" ~~ target );  // fails
    //assert( "^\u2341.\uff16$" ~~ target );  // fails

    assert( "\u2341.." ~~ target );  // succeeds correctly
    assert( "^\u2341.." ~~ target );  // succeeds correctly
    //assert( "\u2341..$" ~~ target );  // fails
    //assert( "^\u2341..$" ~~ target );  // fails

    assert( ".." ~~ target );  // succeeds correctly
    assert( "^.." ~~ target );  // succeeds correctly
    assert( "..$" ~~ target );  // succeeds correctly
    assert( "^..$" !~ target );  // fails correctly

    assert( "..\uff16" ~~ target );  // succeeds correctly
    //assert( "^..\uff16" ~~ target );  // fails
    assert( "..\uff16$" ~~ target );  // succeeds correctly
    //assert( "^..\uff16$" ~~ target );  // fails

    assert( "..." ~~ target );  // succeeds correctly
    assert( "^..." ~~ target );  // succeeds correctly
    assert( "...$" ~~ target );  // succeeds correctly
    //assert( "^...$" ~~ target );  // fails

}

It seems that the pattern "." only tries to match a single byte and not a single character.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
16/02/2006 1:16:12 PM
February 16, 2006
In article <dt0k9b$27jr$1@digitaldaemon.com>, Kyle Furlong says...
>
>Walter Bright wrote:
>> Added match expressions.
>> 
>> http://www.digitalmars.com/d/changelog.html
>> 
>> 
>> 
>
>Is it possible to drop in compile-time regex support? (i.e. Eric's solution)

IMHO, it's not quite ready for prime-time yet.  In fact, some parts of it are still somewhat incomplete. :(

- Eric Anderton at yahoo
February 16, 2006
Walter Bright wrote:
> "Derek Parnell" <derek@psych.ward> wrote in message news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg@40tude.net...
> 
>> BTW, one side effect of the new matching syntax is that you don't have to
>> explicitly import std.regexp.
> 
> That was on purpose. It uses a proxy. 

As cool as this is, I don't entirely like the prospect of cutting yet more ties between standard library components and runtime code.  My approach with Ares has been to separate the two, which until now has meant moving only std.utf into the DMD runtime.  Now it looks like std.regex will end up there as well (along with std.outbuffer perhaps).  With the new language features, is there any reason to continue regex library support?  Just how much can't be done by the built-in syntax?


Sean
February 16, 2006
"Sean Kelly" <sean@f4.ca> wrote ...
> Walter Bright wrote:
>> "Derek Parnell" <derek@psych.ward> wrote in message news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg@40tude.net...
>>
>>> BTW, one side effect of the new matching syntax is that you don't have
>>> to
>>> explicitly import std.regexp.
>>
>> That was on purpose. It uses a proxy.
>
> As cool as this is, I don't entirely like the prospect of cutting yet more ties between standard library components and runtime code.  My approach with Ares has been to separate the two, which until now has meant moving only std.utf into the DMD runtime.  Now it looks like std.regex will end up there as well (along with std.outbuffer perhaps). With the new language features, is there any reason to continue regex library support?  Just how much can't be done by the built-in syntax?

I agree. And it's hard to fathom what the sudden rush to get this is about. I listed a number of (IMO) serious issues on the main forum, so I'll add my support here that hooking RegExp (and all its various imports) into the compiler is just bad news *at this point in time*

Let's just suppose for a minute that the regex-templates work out well. It seems to me that any built-in support for regex (within the D grammar) would be nothing more than a thin veneer over the template syntax (for regex-templates), to make it somewhat  more palatable for the masses? That may not come to pass, but it seems that we should at least wait until there's a bit of education and experience in this regard, rather than hurriedly tie the grammar to something which clearly has a number of fundamental problems. Again; what's the big rush here?

- Kris