February 06, 2018
On Tuesday, 6 February 2018 at 13:51:01 UTC, Nathan S. wrote:
>> Just use the run-time version, it’s not that much slower. But then again static ipRegex = regex(...) will parse and build regex at CTFE.
>>
>> Maybe lazy init?
>
> FYI I've made a pull request that replaces uses of regexes in std.net.isemail. It turns out they weren't being used for anything indispensable. Import benchmark results were encouraging.
>
> https://github.com/dlang/phobos/pull/6129

Then again if you may not need regex for IPv4 / IPv6.

In theory it should have been the goto case for ctRegex but not at the cost of such horrible compile times.

February 06, 2018
On Tue, Feb 06, 2018 at 05:44:17AM +0000, Dmitry Olshansky via Digitalmars-d wrote: [...]
> Honestly I’m tired to hell of working with our compiler and its compile time features. When it doesn’t pee itself due to OOM I’m almost happy.

Heh, dmd's famous memory usage is causing me tons of grief on low-memory
systems, too.  Basically if you have anything less than 2GB of RAM,
you might as well give up trying to compile anything non-trivial. We
need to get a serious handle on dmd's memory consumption -- at least let
there be an option or something that will turn out the GC or whatever.
It's better for dmd to be (gosh) slow, than for it not to be able to
compile anything at all due to it provoking the kernel OOM killer.


> In retrospect I should have just provided a C interface and compiled the whole thing separately. And CTFE could easily be replaced by a small custom JIT compiler, it would also work at run-time(!).

We seriously need to get newCTFE finished and merged.  Stefan is very busy with other stuff ATM; I wonder if a few of us can continue his work and get newCTFE into a mergeable state.  Given how much D's "compile-time" features are advertised, and D's new (ick) slogan of being fast or whatever, it's high time we actually delivered on our promises by actually making CTFE more usable.

On that note, though, I think a JIT regex compiler totally makes sense. I'd totally support that.


> Especially considering that it’s been 6 years but it’s still is not practical to use ctRegex.

I find that using just plain `regex` is Good Enough(tm) for my purposes. Do we really need ctRegex?  The idea of generating an optimal FSM at compile-time is rather appealing, but in the grand scheme of things, doesn't seem like an absolute must-have.


> > The latter department as also suffered a regression; see for example: https://github.com/dlang/phobos/pull/5981.)
> 
> Yup, Martin seems on top of it, thankfully.
[...]

Unfortunately, Martin's PR is only to improve runtime performance.  It's still dog-slow to *compile* std.regex. :-(


T

-- 
Dogs have owners ... cats have staff. -- Krista Casada
February 06, 2018
On Tue, Feb 06, 2018 at 05:35:44AM +0000, Dmitry Olshansky via Digitalmars-d wrote:
> On Tuesday, 6 February 2018 at 04:35:42 UTC, Steven Schveighoffer wrote:
> > On 2/5/18 11:09 PM, psychoticRabbit wrote:
[...]
> > > ----
> > > import std.net.isemail;
> > > 
> > > void main()
> > > {
> > >      auto checkEmail = "someone@somewhere.com".isEmail();
> > > }
> > > ----
> > 
> > I was surprised at this, then I looked at the first line of isEmail:
> > 
> >     static ipRegex =
> > ctRegex!(`\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}`~
> > 
> > `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$`.to!(const(Char)[]));
> > 
> > So it's really still related to regex.

Yeah, ctRegex is a bear at compile-time.  Why can't we just use a runtime regex?  It will at least take "only" 3 seconds to compile. :-D Or just don't use a regex at all.


> That’s really bad idea - isEmail is template so the burden of freaking
> slow ctRegex is paid on per instantiation basis. Could be horrible
> with separate compilation.
[...]

I'm not sure I'm seeing the value of using ctRegex here.  What's wrong with a module static runtime regex initialized by a static this()?

And before anyone complains about initializing the regex if user code never actually uses it, it's possible to use static this() on an as-needed basis:

	template ipRegex()
	{
		// Eponymous templates FTW!
		Regex!char ipRegex;

		static this()
		{
			ipRegex = regex(`blah blah blah`);
		}
	}

	auto isEmail(... blah blah ...)
	{
		...
		if (ipRegex.match(...)) ...
		...
	}

Basically, if `ipRegex` is never referenced, the template is never instantiated and the static this() basically doesn't exist. :-D Pay-as-you-go FTW!


T

-- 
If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...
February 06, 2018
On 2/6/18 2:07 PM, H. S. Teoh wrote:
> I'm not sure I'm seeing the value of using ctRegex here.  What's wrong
> with a module static runtime regex initialized by a static this()?

No, I'd rather have it initialized on first call.

> 
> And before anyone complains about initializing the regex if user code
> never actually uses it, it's possible to use static this() on an
> as-needed basis:
> 
> 	template ipRegex()
> 	{
> 		// Eponymous templates FTW!
> 		Regex!char ipRegex;
> 
> 		static this()
> 		{
> 			ipRegex = regex(`blah blah blah`);
> 		}
> 	}
> 
> 	auto isEmail(... blah blah ...)
> 	{
> 		...
> 		if (ipRegex.match(...)) ...
> 		...
> 	}
> 
> Basically, if `ipRegex` is never referenced, the template is never
> instantiated and the static this() basically doesn't exist. :-D
> Pay-as-you-go FTW!

You may not realize that this actually compiles it for ALL modules that use it, and the compiler puts in a gate to prevent it from running more than once. So you pay every time anyways (compile-time wise at least). It also makes any importing module now a module that defines a static ctor, so cycles are much more likely.

In any case, there is a PR in the works that should eliminate the need for regex altogether: https://github.com/dlang/phobos/pull/6129

-Steve
February 06, 2018
On 2/5/2018 9:35 PM, Dmitry Olshansky wrote:
> That’s really bad idea - isEmail is template so the burden of freaking slow ctRegex
> is paid on per instantiation basis. Could be horrible with separate compilation.

std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem.

---------------------- std.string.isEmail --------------------

/***************************
 * Does string s[] start with an email address?
 * Returns:
 *      null    it does not
 *      char[]  it does, and this is the slice of s[] that is that email address
 * References:
 *      RFC2822
 */
char[] isEmail(char[] s)
{   size_t i;

    if (!isalpha(s[0]))
        goto Lno;

    for (i = 1; 1; i++)
    {
        if (i == s.length)
            goto Lno;
        auto c = s[i];
        if (isalnum(c))
            continue;
        if (c == '-' || c == '_' || c == '.')
            continue;
        if (c != '@')
            goto Lno;
        i++;
        break;
    }
    //writefln("test1 '%s'", s[0 .. i]);

    /* Now do the part past the '@'
     */
    size_t lastdot;
    for (; i < s.length; i++)
    {
        auto c = s[i];
        if (isalnum(c))
            continue;
        if (c == '-' || c == '_')
            continue;
        if (c == '.')
        {
            lastdot = i;
            continue;
        }
        break;
    }
    if (!lastdot || (i - lastdot != 3 && i - lastdot != 4))
        goto Lno;

    return s[0 .. i];

Lno:
    return null;
}
February 06, 2018
On 2/6/18 3:11 PM, Walter Bright wrote:
> On 2/5/2018 9:35 PM, Dmitry Olshansky wrote:
>> That’s really bad idea - isEmail is template so the burden of freaking slow ctRegex
>> is paid on per instantiation basis. Could be horrible with separate compilation.
> 
> std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem.
> 

The regex in question I think is to ensure an email address like abc@192.168.0.5 has a valid IP address. The D1 function doesn't support that requirement.

I admit, I've never used it, so I don't know why it needs to be so complex. But I assume some people depend on that functionality.

-Steve
February 06, 2018
another weird gotcha:
  auto s="foo".isEmail;
  writeln(s.toString); // ok
  writeln(s); // compile error


On Tue, Feb 6, 2018 at 12:30 PM, Steven Schveighoffer via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 2/6/18 3:11 PM, Walter Bright wrote:
>>
>> On 2/5/2018 9:35 PM, Dmitry Olshansky wrote:
>>>
>>> That’s really bad idea - isEmail is template so the burden of freaking
>>> slow ctRegex
>>> is paid on per instantiation basis. Could be horrible with separate
>>> compilation.
>>
>>
>> std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem.
>>
>
> The regex in question I think is to ensure an email address like abc@192.168.0.5 has a valid IP address. The D1 function doesn't support that requirement.
>
> I admit, I've never used it, so I don't know why it needs to be so complex. But I assume some people depend on that functionality.
>
> -Steve

February 06, 2018
On 2018-02-06 21:11, Walter Bright wrote:

> std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem.

If I recall correctly, the current implementation of std.net.isEmail was requested by you.

-- 
/Jacob Carlborg
February 06, 2018
On 2/6/2018 2:03 PM, Jacob Carlborg wrote:
> On 2018-02-06 21:11, Walter Bright wrote:
> 
>> std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem.
> 
> If I recall correctly, the current implementation of std.net.isEmail was requested by you.

Regardless of whether it was requested by me or not, if the current version is not working for us, we need to explore alternatives.
February 06, 2018
On Tue, Feb 06, 2018 at 02:29:07PM -0800, Walter Bright via Digitalmars-d wrote:
> On 2/6/2018 12:30 PM, Steven Schveighoffer wrote:
> > The regex in question I think is to ensure an email address like abc@192.168.0.5 has a valid IP address. The D1 function doesn't support that requirement.
> > 
> > I admit, I've never used it, so I don't know why it needs to be so complex. But I assume some people depend on that functionality.
> 
> Regex is well known to not always be the best solution for string processing tasks. For example, it does not work well at all where recursion is desired, and nobody uses regex for lexer in a compiler.

Are you sure?  What about lex and its successors, like flex?

Of course, one could argue that the generated code isn't strictly a regex implementation in the same way as std.regex... but isn't that just a QoI issue?


T

-- 
Life would be easier if I had the source code. -- YHL