Jump to page: 1 2
Thread overview
Interpreting the D grammar
Aug 02, 2015
Jacob Carlborg
Aug 02, 2015
cym13
Aug 02, 2015
Jacob Carlborg
Aug 02, 2015
cym13
Aug 06, 2015
deadalnix
Aug 02, 2015
MakersF
Aug 02, 2015
Jacob Carlborg
Aug 02, 2015
Xinok
Aug 06, 2015
Timon Gehr
Aug 06, 2015
Tofu Ninja
Aug 06, 2015
H. S. Teoh
Aug 02, 2015
Xinok
Aug 02, 2015
Jacob Carlborg
Aug 06, 2015
MakersF
Aug 06, 2015
Dmitry Olshansky
Aug 06, 2015
Jacob Carlborg
Aug 06, 2015
deadalnix
Aug 06, 2015
Jacob Carlborg
August 02, 2015
I'm trying to read the D grammar [1] to enhance the D TextMate bundle. If we take the add expression as an example. It's defined like this in the grammar:

AddExpression:
    MulExpression
    AddExpression + MulExpression
    AddExpression - MulExpression
    CatExpression

And like this in the grammar made by Brian [2]:

addExpression:
      mulExpression
    | addExpression ('+' | '-' | '~') mulExpression
    ;

I'm not so familiar with grammars but this looks like it's recursive. Is it possible to translate this piece of grammar to a regular expression? TextMate uses regular expressions and a couple of enhancements/extensions to define a grammar for a language.

[1] http://dlang.org/grammar.html
[2] https://rawgit.com/Hackerpilot/DGrammar/master/grammar.html

-- 
/Jacob Carlborg
August 02, 2015
On Sunday, 2 August 2015 at 14:50:35 UTC, Jacob Carlborg wrote:
> I'm trying to read the D grammar [1] to enhance the D TextMate bundle. If we take the add expression as an example. It's defined like this in the grammar:
>
> AddExpression:
>     MulExpression
>     AddExpression + MulExpression
>     AddExpression - MulExpression
>     CatExpression
>
> And like this in the grammar made by Brian [2]:
>
> addExpression:
>       mulExpression
>     | addExpression ('+' | '-' | '~') mulExpression
>     ;
>
> I'm not so familiar with grammars but this looks like it's recursive. Is it possible to translate this piece of grammar to a regular expression? TextMate uses regular expressions and a couple of enhancements/extensions to define a grammar for a language.
>
> [1] http://dlang.org/grammar.html
> [2] https://rawgit.com/Hackerpilot/DGrammar/master/grammar.html

You can't build a regular expression for any grammar. You can for some grammars but those are only a simple subset. For example, checking parens balance is impossible with common (not recursive) regular expressions only, and even with recursion it soon reaches its limitations.
August 02, 2015
On Sunday, 2 August 2015 at 14:50:35 UTC, Jacob Carlborg wrote:
> I'm trying to read the D grammar [1] to enhance the D TextMate bundle. If we take the add expression as an example. It's defined like this in the grammar:
>
> AddExpression:
>     MulExpression
>     AddExpression + MulExpression
>     AddExpression - MulExpression
>     CatExpression
>
> And like this in the grammar made by Brian [2]:
>
> addExpression:
>       mulExpression
>     | addExpression ('+' | '-' | '~') mulExpression
>     ;
>
> I'm not so familiar with grammars but this looks like it's recursive. Is it possible to translate this piece of grammar to a regular expression? TextMate uses regular expressions and a couple of enhancements/extensions to define a grammar for a language.
>
> [1] http://dlang.org/grammar.html
> [2] https://rawgit.com/Hackerpilot/DGrammar/master/grammar.html

Of course it's recursive! Do you want the grammar to be able to only define a finite number of programs?

But in this case you could write the original grammar rule as
mul |
cat |
(mul|cat)((+|-) (mul|cat))* (+|-) (mul|cat)

but you lose the precedence of the operation as it is a flat list and not a tree
August 02, 2015
On Sunday, 2 August 2015 at 14:50:35 UTC, Jacob Carlborg wrote:
> I'm trying to read the D grammar [1] to enhance the D TextMate bundle. If we take the add expression as an example. It's defined like this in the grammar:
>
> AddExpression:
>     MulExpression
>     AddExpression + MulExpression
>     AddExpression - MulExpression
>     CatExpression
>
> And like this in the grammar made by Brian [2]:
>
> addExpression:
>       mulExpression
>     | addExpression ('+' | '-' | '~') mulExpression
>     ;
>
> I'm not so familiar with grammars but this looks like it's recursive. Is it possible to translate this piece of grammar to a regular expression? TextMate uses regular expressions and a couple of enhancements/extensions to define a grammar for a language.
>
> [1] http://dlang.org/grammar.html
> [2] https://rawgit.com/Hackerpilot/DGrammar/master/grammar.html

I guess you're not familiar with the theoretical aspect of "formal languages". The D grammar is a context-free grammar which cannot be reduced to a regular expression. As cym13 stated, there are some simple context-free grammars which can be rewritten as regular expressions, but the D grammar cannot be. Take a look at the Chomsky Hierarchy [1] for a better understanding.

The classic example of a context-free language is the set of balanced parenthesis, i.e. (()) is balanced and ())))) is not. This language is not regular meaning you cannot write a regular expression for it, but you can write a context-free grammar for it.

[1] https://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy
August 02, 2015
On 02/08/15 18:08, cym13 wrote:

> You can't build a regular expression for any grammar. You can for some
> grammars but those are only a simple subset. For example, checking
> parens balance is impossible with common (not recursive) regular
> expressions only, and even with recursion it soon reaches its limitations.

TextMate grammars support recursion, it's possible to define a grammar with balanced parentheses [1].

[1] https://manual.macromates.com/en/language_grammars

-- 
/Jacob Carlborg
August 02, 2015
On 02/08/15 18:37, MakersF wrote:

> Of course it's recursive! Do you want the grammar to be able to only
> define a finite number of programs?

I don't know how this work, that's why I'm asking. But I read something about left recursion needs to be removed to be able to parse a grammar, at least for some parsers.

> But in this case you could write the original grammar rule as
> mul |
> cat |
> (mul|cat)((+|-) (mul|cat))* (+|-) (mul|cat)
>
> but you lose the precedence of the operation as it is a flat list and
> not a tree

I don't think that's important for syntax highlighting.

-- 
/Jacob Carlborg
August 02, 2015
On Sunday, 2 August 2015 at 17:29:57 UTC, Jacob Carlborg wrote:
> On 02/08/15 18:08, cym13 wrote:
>
>> You can't build a regular expression for any grammar. You can for some
>> grammars but those are only a simple subset. For example, checking
>> parens balance is impossible with common (not recursive) regular
>> expressions only, and even with recursion it soon reaches its limitations.
>
> TextMate grammars support recursion, it's possible to define a grammar with balanced parentheses [1].
>
> [1] https://manual.macromates.com/en/language_grammars

Yes, that will work for this simple example, but what of interleaved parentheses ? Say you want (), [] and "" to match, how can you do ?

    [[("]("), "])(", ")"]]

There are constructs that aren't possibly doable using even extend regular expressions. That's why grammars were invented after all.

Reading your documentation, it seems that you are not expected to reduce the grammar to a regular expression, rather it uses many regular expressions to describes parts of the language grammar, so that should work.
August 02, 2015
On Sunday, 2 August 2015 at 17:33:35 UTC, Jacob Carlborg wrote:
> On 02/08/15 18:37, MakersF wrote:
>
>> Of course it's recursive! Do you want the grammar to be able to only
>> define a finite number of programs?
>
> I don't know how this work, that's why I'm asking. But I read something about left recursion needs to be removed to be able to parse a grammar, at least for some parsers.

There's lots of videos online that show you how to do this. I suppose some parsers are smart enough to rewrite the grammar to remove left recursion. Otherwise, for a simple parser which does nothing more than a breadth-first search, it may require exponential time to parse a string.
August 02, 2015
On 02/08/15 19:15, Xinok wrote:

> I guess you're not familiar with the theoretical aspect of "formal
> languages". The D grammar is a context-free grammar which cannot be
> reduced to a regular expression. As cym13 stated, there are some simple
> context-free grammars which can be rewritten as regular expressions, but
> the D grammar cannot be. Take a look at the Chomsky Hierarchy [1] for a
> better understanding.
>
> The classic example of a context-free language is the set of balanced
> parenthesis, i.e. (()) is balanced and ())))) is not. This language is
> not regular meaning you cannot write a regular expression for it, but
> you can write a context-free grammar for it.

TextMate grammars are not _just_ regular expressions. They can define balanced parentheses [1].

The point of a language grammar in a text editor is not to have a 100% correct implementation of the grammar. Rather it should syntax highlight the code in a way that is useful for the user.

[1] https://manual.macromates.com/en/language_grammars

-- 
/Jacob Carlborg
August 06, 2015
On Sunday, 2 August 2015 at 18:22:01 UTC, Jacob Carlborg wrote:
> On 02/08/15 19:15, Xinok wrote:
>
>> I guess you're not familiar with the theoretical aspect of "formal
>> languages". The D grammar is a context-free grammar which cannot be
>> reduced to a regular expression. As cym13 stated, there are some simple
>> context-free grammars which can be rewritten as regular expressions, but
>> the D grammar cannot be. Take a look at the Chomsky Hierarchy [1] for a
>> better understanding.
>>
>> The classic example of a context-free language is the set of balanced
>> parenthesis, i.e. (()) is balanced and ())))) is not. This language is
>> not regular meaning you cannot write a regular expression for it, but
>> you can write a context-free grammar for it.
>
> TextMate grammars are not _just_ regular expressions. They can define balanced parentheses [1].
>
> The point of a language grammar in a text editor is not to have a 100% correct implementation of the grammar. Rather it should syntax highlight the code in a way that is useful for the user.
>
> [1] https://manual.macromates.com/en/language_grammars

Then your best shot is to approximate the grammar with the regual expressions you have access to. You'll get to a point where some constructs can not be correctly represented; at that point you should probably write a regex which produces what the grammar produces and some more.

In the example before of generating paired interleaved parentheses, you could generate every possible combination of parentheses, like
( (|)|[|]|{|}|" )*
where only the external parentheses are syntax for the regex. That regex matches all the productions of the paired parentheses grammar, and many more strings.

At the end of the day you want to highlight correct syntax, and if an user writes wrong syntax is OK to have wrong highlight, so be sure your regex work for the right syntax, and can do random stuff for the wrong one
« First   ‹ Prev
1 2