August 06, 2015
On 06-Aug-2015 11:26, MakersF wrote:
> On Sunday, 2 August 2015 at 18:22:01 UTC, Jacob Carlborg wrote:
>> On 02/08/15 19:15, Xinok wrote:
>>
>>> I guess you're not familiar with the theoretical aspect of "formal
>>> languages". The D grammar is a context-free grammar which cannot be
>>> reduced to a regular expression. As cym13 stated, there are some simple
>>> context-free grammars which can be rewritten as regular expressions, but
>>> the D grammar cannot be. Take a look at the Chomsky Hierarchy [1] for a
>>> better understanding.
>>>
>>> The classic example of a context-free language is the set of balanced
>>> parenthesis, i.e. (()) is balanced and ())))) is not. This language is
>>> not regular meaning you cannot write a regular expression for it, but
>>> you can write a context-free grammar for it.
>>
>> TextMate grammars are not _just_ regular expressions. They can define
>> balanced parentheses [1].
>>
>> The point of a language grammar in a text editor is not to have a 100%
>> correct implementation of the grammar. Rather it should syntax
>> highlight the code in a way that is useful for the user.
>>
>> [1] https://manual.macromates.com/en/language_grammars
>
> Then your best shot is to approximate the grammar with the regual
> expressions you have access to. You'll get to a point where some
> constructs can not be correctly represented; at that point you should
> probably write a regex which produces what the grammar produces and some
> more.
>

If one limits the depth of nested constructs to some reasonable value (e.g. 5-6) then any context-free grammar is regular. In big grammars combinatorial explosion may get quite high so that limit better be low.

If regular expressions are not hardcoded but rather themsleves are generated then it's quite feasible to do. In fact Perl folks have been doing this "approximate by regex" for years.

> In the example before of generating paired interleaved parentheses, you
> could generate every possible combination of parentheses, like
> ( (|)|[|]|{|}|" )*
> where only the external parentheses are syntax for the regex. That regex
> matches all the productions of the paired parentheses grammar, and many
> more strings.

Here again a regex constructed to match e.g. 10-level deep expressions is more then enough. Like this:

unittest
{
    import std.regex;
    string x = ""; // the first level is going to be ([^()]*\([^()]*\)?)*
    foreach(_; 0..10)
        x = `([^()]*\([^()]*`~ x ~`\)?)*`; // pump an extra level of parenthesised expression
    x = "^"~x~"$"; //make sure we match the whole string
    assert(matchFirst("a+(b*c-d*(e+45)+(aaaa-(d))+(c*2+1)", x));
}



-- 
Dmitry Olshansky
August 06, 2015
On 06/08/15 10:26, MakersF wrote:

> Then your best shot is to approximate the grammar with the regual
> expressions you have access to. You'll get to a point where some
> constructs can not be correctly represented; at that point you should
> probably write a regex which produces what the grammar produces and some
> more.
>
> In the example before of generating paired interleaved parentheses, you
> could generate every possible combination of parentheses, like
> ( (|)|[|]|{|}|" )*
> where only the external parentheses are syntax for the regex. That regex
> matches all the productions of the paired parentheses grammar, and many
> more strings.
>
> At the end of the day you want to highlight correct syntax, and if an
> user writes wrong syntax is OK to have wrong highlight, so be sure your
> regex work for the right syntax, and can do random stuff for the wrong one

I was hoping to enhance the current TextMate grammar for D to get it a bit closer to the D grammar [1]. That's why I started this thread.

[1] http://dlang.org/grammar.html

-- 
/Jacob Carlborg
August 06, 2015
On Sunday, 2 August 2015 at 16:08:24 UTC, cym13 wrote:
> On Sunday, 2 August 2015 at 14:50:35 UTC, Jacob Carlborg wrote:
>> I'm trying to read the D grammar [1] to enhance the D TextMate bundle. If we take the add expression as an example. It's defined like this in the grammar:
>>
>> AddExpression:
>>     MulExpression
>>     AddExpression + MulExpression
>>     AddExpression - MulExpression
>>     CatExpression
>>
>> And like this in the grammar made by Brian [2]:
>>
>> addExpression:
>>       mulExpression
>>     | addExpression ('+' | '-' | '~') mulExpression
>>     ;
>>
>> I'm not so familiar with grammars but this looks like it's recursive. Is it possible to translate this piece of grammar to a regular expression? TextMate uses regular expressions and a couple of enhancements/extensions to define a grammar for a language.
>>
>> [1] http://dlang.org/grammar.html
>> [2] https://rawgit.com/Hackerpilot/DGrammar/master/grammar.html
>
> You can't build a regular expression for any grammar. You can for some grammars but those are only a simple subset. For example, checking parens balance is impossible with common (not recursive) regular expressions only, and even with recursion it soon reaches its limitations.

You can for Regular grammar or any subclass: https://en.wikipedia.org/wiki/Regular_grammar

You can't for more complex grammars.

That being said, you can have regex with placeholder and rerun the regex on the content of the placeholder is the grammar is recursive. I don't think that is an efficient and/or convenient way to parse D.

August 06, 2015
On Sunday, 2 August 2015 at 14:50:35 UTC, Jacob Carlborg wrote:
> I'm trying to read the D grammar [1] to enhance the D TextMate bundle. If we take the add expression as an example. It's defined like this in the grammar:
>
> AddExpression:
>     MulExpression
>     AddExpression + MulExpression
>     AddExpression - MulExpression
>     CatExpression
>
> And like this in the grammar made by Brian [2]:
>
> addExpression:
>       mulExpression
>     | addExpression ('+' | '-' | '~') mulExpression
>     ;
>
> I'm not so familiar with grammars but this looks like it's recursive. Is it possible to translate this piece of grammar to a regular expression? TextMate uses regular expressions and a couple of enhancements/extensions to define a grammar for a language.
>
> [1] http://dlang.org/grammar.html
> [2] https://rawgit.com/Hackerpilot/DGrammar/master/grammar.html

http://stackoverflow.com/a/1732454/672906
August 06, 2015
On 06/08/15 20:14, deadalnix wrote:

> http://stackoverflow.com/a/1732454/672906

I don't have so much of a choice. TextMate supports what it supports. There are TextMate grammars for very many languages, so it's not a problem of not being able to represent a language grammar in TextMate. I just wanted to get closer to the D grammar to get a more accurate grammar in TextMate.

-- 
/Jacob Carlborg
August 06, 2015
On Sunday, 2 August 2015 at 16:37:06 UTC, MakersF wrote:
> Of course it's recursive! Do you want the grammar to be able to only define a finite number of programs?

a* seems pretty infinite to me. :P
August 06, 2015
On 08/07/2015 01:07 AM, "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail@hotmail.com>" wrote:
> On Sunday, 2 August 2015 at 16:37:06 UTC, MakersF wrote:
>> Of course it's recursive! Do you want the grammar to be able to only
>> define a finite number of programs?
>
> a* seems pretty infinite to me. :P

And any grammar defining that language is recursive.
August 06, 2015
On Thursday, 6 August 2015 at 23:08:01 UTC, Casper Færgemand wrote:
> On Sunday, 2 August 2015 at 16:37:06 UTC, MakersF wrote:
>> Of course it's recursive! Do you want the grammar to be able to only define a finite number of programs?
>
> a* seems pretty infinite to me. :P

(0|1)*

Just define your language in binary, problem solved
August 06, 2015
On Thu, Aug 06, 2015 at 11:15:15PM +0000, Tofu Ninja via Digitalmars-d wrote:
> On Thursday, 6 August 2015 at 23:08:01 UTC, Casper Færgemand wrote:
> >On Sunday, 2 August 2015 at 16:37:06 UTC, MakersF wrote:
> >>Of course it's recursive! Do you want the grammar to be able to only define a finite number of programs?
> >
> >a* seems pretty infinite to me. :P
> 
> (0|1)*
> 
> Just define your language in binary, problem solved

Finite is not a problem, if the upper limit is Graham's Number...


T

-- 
Bomb technician: If I'm running, try to keep up.
1 2
Next ›   Last »