Jump to page: 1 2
Thread overview
Context-Free Grammars? What about arrays?
Feb 11, 2011
%u
Feb 11, 2011
Jonathan M Davis
Feb 11, 2011
Jacob Carlborg
Feb 11, 2011
spir
Feb 11, 2011
Jonathan M Davis
Feb 11, 2011
Ary Manzana
Feb 11, 2011
%u
Feb 11, 2011
%u
Feb 14, 2011
Nick Sabalausky
Feb 14, 2011
Nick Sabalausky
Feb 16, 2011
%u
February 11, 2011
Hi,

I think I'm having a little trouble understanding what's meant by context-free grammar. I've read that D is context-free, but is it really? What about an expression like:

int[U] s;

? You can't tell -- without looking at the context -- whether U is a data type or a number, and so because associative arrays and regular arrays are syntactically different elements of the language, the syntax of D is tied in with its semantics, just like in C++.

So is D really context-free? Or am I misunderstanding the meaning of the term?

Thanks!
February 11, 2011
On Friday 11 February 2011 01:03:21 %u wrote:
> Hi,
> 
> I think I'm having a little trouble understanding what's meant by context-free grammar. I've read that D is context-free, but is it really? What about an expression like:
> 
> int[U] s;
> 
> ? You can't tell -- without looking at the context -- whether U is a data type or a number, and so because associative arrays and regular arrays are syntactically different elements of the language, the syntax of D is tied in with its semantics, just like in C++.
> 
> So is D really context-free? Or am I misunderstanding the meaning of the term?

Yes. You're misunderstanding. It's actually quite difficult to parse a language if its grammar isn't context-free. You really should read up in context-free grammars if you want to understand it properly.

Context-free means that it's unambiguous as to which grammar rule is to be applied at any point during the parsing process. In this case, U is a symbol. The parser probably doesn't care about much more than that. That symbol could end up being a number or a type. And actually, it probably isn't even that hard. U is a class, struct, alias, or template argument. The template argument and alias would be substituted with a real type or value before the compiler had to determine what type int[U] was.

Not to mention, the context-free grammar is entirely for parsing tokens into an abstract syntax tree. The compiler then has to do whatever it does with the AST to actually generate assembly from it. So, as long as the parser can parse it into the AST, it could be the next stage's problem to determine what that really means.

Regardless, I suggest that you read a compiler book if you really want to understand context-free grammars. I have "Compiler Construction: Principles and Practices" by Kenneth C. Louden, which is quite good, but the most popular one is the so-called "dragon" book. I'm not sure what it's actual title is though.

- Jonathan M Davis
February 11, 2011
On 02/11/2011 10:03 AM, %u wrote:
> Hi,
>
> I think I'm having a little trouble understanding what's meant by context-free
> grammar. I've read that D is context-free, but is it really? What about an
> expression like:
>
> int[U] s;
>
> ? You can't tell -- without looking at the context -- whether U is a data type
> or a number, and so because associative arrays and regular arrays are
> syntactically different elements of the language, the syntax of D is tied in
> with its semantics, just like in C++.
>
> So is D really context-free? Or am I misunderstanding the meaning of the term?

Have you tried it? int[U] cannot be a plain array because the size must be constant. But I agree there is some context in play for the compiler (or rather the linker?) to determine /that/:
--> some error if U undefined
--> some other error if defined, but neither uint nore type
--> some other error if uint but not constant
--> ass array if type
(I guess)

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

February 11, 2011
On Friday 11 February 2011 03:22:54 spir wrote:
> On 02/11/2011 10:03 AM, %u wrote:
> > Hi,
> > 
> > I think I'm having a little trouble understanding what's meant by context-free grammar. I've read that D is context-free, but is it really? What about an expression like:
> > 
> > int[U] s;
> > 
> > ? You can't tell -- without looking at the context -- whether U is a data type or a number, and so because associative arrays and regular arrays are syntactically different elements of the language, the syntax of D is tied in with its semantics, just like in C++.
> > 
> > So is D really context-free? Or am I misunderstanding the meaning of the term?
> 
> Have you tried it? int[U] cannot be a plain array because the size must be
> constant. But I agree there is some context in play for the compiler (or
> rather the linker?) to determine /that/:
> --> some error if U undefined
> --> some other error if defined, but neither uint nore type
> --> some other error if uint but not constant
> --> ass array if type
> (I guess)

Actually, it _can_ be a plain, static array. U could be a template argument which is a number.

- Jonathan M Davis
February 11, 2011
On 2/11/11 6:03 AM, %u wrote:
> Hi,
>
> I think I'm having a little trouble understanding what's meant by context-free
> grammar. I've read that D is context-free, but is it really? What about an
> expression like:
>
> int[U] s;
>
> ? You can't tell -- without looking at the context -- whether U is a data type
> or a number, and so because associative arrays and regular arrays are
> syntactically different elements of the language, the syntax of D is tied in
> with its semantics, just like in C++.
>
> So is D really context-free? Or am I misunderstanding the meaning of the term?
>
> Thanks!

That will always parse to an associative array. Then in the semantic pass, if U is a constant expression that turns out to be an integer it is reinterpreted as a static array.

Take a look: https://github.com/D-Programming-Language/dmd/blob/master/src/mtype.c#L3924
February 11, 2011
> That will always parse to an associative array. Then in the semantic pass, if U
is a constant expression that turns out to be an integer it is reinterpreted as a static array.

Ah, interesting. But that describes the implementation (i.e. how the compiler copes with the ambiguity) and not the language itself.

> Context-free means that it's unambiguous as to which grammar rule is to be
applied at any point during the parsing process.

The problem is that it _is_ ambiguous what rule to apply. To me, just because static arrays and associative arrays happen to have similar _looks_ doesn't make parsing them context-free -- they're defined with completely separate syntaxes, and they just happen to look identical. The fact that the compiler has to do a semantic analysis in order to figure this out (which was originally a syntax problem) really makes me wonder why you mention that D is still context-free.

> Not to mention, the context-free grammar is entirely for parsing tokens into an
abstract syntax tree.

Again, just because the AST's _happen_ to _look_ the same for static and associative arrays, does that mean that this makes D context-free? The meaning of the expression is still ambiguous, whether or not it fits well within the tree.

Furthermore, I would say that a struct like this:

struct Temp(T...) { int[T] arr; }

should work perfectly if T.length == 0, since it would then be a dynamic array. But the parser is forced to treat it as an associative array, and so _both_ of these break:

Temp!() a;
Temp!(2) b;      //Why does this break?

Why should the second one break, in any case?
February 11, 2011
Sorry, please ignore the very last line:
   Temp!(2) b;      //Why does this break?

I totally wasn't thinking about the fact that T is a tuple in "int[T] arr;". (I would intuitively think that it should still work, given that a tuple is just a bunch of types, but I can see why it wouldn't.)

However, why the previous statement breaks:
    Temp!() a;
is still puzzling me, since giving nothing to the tuple should be equivalent to
nothing being in the array.

Thanks.
February 11, 2011
On 2011-02-11 10:37, Jonathan M Davis wrote:
> On Friday 11 February 2011 01:03:21 %u wrote:
>> Hi,
>>
>> I think I'm having a little trouble understanding what's meant by
>> context-free grammar. I've read that D is context-free, but is it really?
>> What about an expression like:
>>
>> int[U] s;
>>
>> ? You can't tell -- without looking at the context -- whether U is a data
>> type or a number, and so because associative arrays and regular arrays are
>> syntactically different elements of the language, the syntax of D is tied
>> in with its semantics, just like in C++.
>>
>> So is D really context-free? Or am I misunderstanding the meaning of the
>> term?
>
> Yes. You're misunderstanding. It's actually quite difficult to parse a language if
> its grammar isn't context-free. You really should read up in context-free
> grammars if you want to understand it properly.
>
> Context-free means that it's unambiguous as to which grammar rule is to be
> applied at any point during the parsing process. In this case, U is a symbol.
> The parser probably doesn't care about much more than that. That symbol could
> end up being a number or a type. And actually, it probably isn't even that hard.
> U is a class, struct, alias, or template argument. The template argument and
> alias would be substituted with a real type or value before the compiler had to
> determine what type int[U] was.
>
> Not to mention, the context-free grammar is entirely for parsing tokens into an
> abstract syntax tree. The compiler then has to do whatever it does with the AST
> to actually generate assembly from it. So, as long as the parser can parse it
> into the AST, it could be the next stage's problem to determine what that really
> means.
>
> Regardless, I suggest that you read a compiler book if you really want to
> understand context-free grammars. I have "Compiler Construction: Principles and
> Practices" by Kenneth C. Louden, which is quite good, but the most popular one
> is the so-called "dragon" book. I'm not sure what it's actual title is though.
>
> - Jonathan M Davis

The title of the so called "dragon" book is "Compilers: Principles, Techniques, and Tools" and can be found here:

http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811/ref=sr_1_1?s=books&ie=UTF8&qid=1297440057&sr=1-1

It's also listed on the bookshelf page on the DigitalMars site.

-- 
/Jacob Carlborg
February 14, 2011
"%u" <wfunction@hotmail.com> wrote in message news:ij3la9$t53$1@digitalmars.com...
>
> The problem is that it _is_ ambiguous what rule to apply. To me, just
> because
> static arrays and associative arrays happen to have similar _looks_
> doesn't make
> parsing them context-free -- they're defined with completely separate
> syntaxes,
> and they just happen to look identical.

Yes, they just happen to look identical: but that's the crux: How it "looks" is the *only* thing that parser/grammar are concerned with. The fact that it could be one of two different *types* is irrelevent since type systems are purely a semantic matter and (at least in D) what type something is never affects how the code's *structure* is interpreted.

True, there are times when different types are distinguished via different syntaxes (for instance, function vs integer vs array), but that's incidental. The important thing is that that semantic-level type information never actually needs to feed back into the parser.

In the case of "int[U] s;", the *only* thing the parser is concerned with is "Ok, this is a variable declaration, and a variable declaration is a statement, and a statement can occur in x, y and z places". All of that stays exactly the same no matter what "U" is and no matter what type "int[U]" is. Anything beyond that is not the parser's job.

Consider this:

double a;
bool b;

Two different variables that are treated very differently. But the *structure* is exactly the same: {type} {identifier} {semicolon}. In all cases, that means "variable declaration". And a variable declaration is a statement, etc. For both variables, that structure is identical, and that's all that the parser is concerned with. In a grammar-definition that would look something like this:

<Var Decl> ::= <Type> <Ident> ';'
<Statement> ::= <Var Decl>
                       |  any other type of statement here
                       |  etc

Likewise, in the case of "int[U] s;" we have:

<Var Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'

Note that everything about that is true no matter what the ident inside the brackets is. Also note that there is *nothing* in there the pulls in any information from the semantic phase. If it did, it wouldn't be context-free.

If it worked like this:

<Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'
<Assoc Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'

...*Then* it would no longer be context-free because the parser would have to rely on the semantics of 'U' to determine how to parse it. But D doesn't do that. It doesn't even bother distinguishing between array declarations and assoc array declarations until the semantic phase. Therefore the grammar (ie, the parsing) is still context-free.

One other thing that would prevent it from being context-free would be if the left-side of a production rule had more than one token:

<A> <B> ::= etc...


> The fact that the compiler has to do a
> semantic analysis in order to figure this out (which was originally a
> syntax
> problem) really makes me wonder why you mention that D is still
> context-free.
>

It's not a syntax problem unless the grammar defines it to be a syntax problem. Anything that's defined in the grammar is syntax; anything that isn't defined in the grammar is not syntax. Also, remember that grammar only apples to lexing/parsing and does not specify anything about semantics (and "sementics" effectively means "anything that isn't handled by the lexer or the parser").



February 14, 2011
"%u" <wfunction@hotmail.com> wrote in message news:ij3la9$t53$1@digitalmars.com...
>
> Again, just because the AST's _happen_ to _look_ the same for static and associative arrays, does that mean that this makes D context-free?

Grammar is *just* about how it *looks*, not what it means.

> The meaning of
> the expression is still ambiguous, whether or not it fits well within the
> tree.
>

"Meaning" is semanitcs, not syntax. Therefore, whether or not the *meaning* is ambiguous is irrelevent to the grammar. The parser/grammar does *not* deal with meaning, just structure.


« First   ‹ Prev
1 2