Thread overview
declaration/expression
Jun 23, 2009
Ellery Newcomer
Jun 23, 2009
BCS
Jun 23, 2009
Michal Minich
Jun 24, 2009
Ellery Newcomer
Jun 24, 2009
BCS
Jun 24, 2009
Tim Matthews
Jun 25, 2009
Ellery Newcomer
Jun 25, 2009
Tim Matthews
June 23, 2009
Sorry for not posting this in learn, but I'd also like to hear the Language Designer's input on this one.

How does dmd resolve the declaration/expression ambiguity?

My first instinct would be to try the declaration, and if it doesn't work because the type doesn't exist or something like that then try the expression, or vice versa. But that could easily lead to undefined and unexpected behavior. what if both are valid?

Are there any straightforward rules for determining how to proceed?

And I might be wrong, but I don't think any of this is mentioned in the spec? Should it be?

I can think of a number of examples, most of which dmd handles gracefully. Here are a couple which, though contrived, seem to illustrate that the rules are complicated, or more so than I would conceive off the top of my head. Is the compiler doing what it should, and if so, how?

import tango.io.Stdout;
void main(){
    int[4] i = [1,2,3,4];
    T(t); // compiler: I think this is an expression *barf*
    t(i[])(i[]); //compiler: I think this is a declaration *barf*
}

class T{
    public T opCall(int[] i){
        Stdout(i).newline;
        return this;
    }
}
June 23, 2009
Hello Ellery,

> How does dmd resolve the declaration/expression ambiguity?
> 

Could you elaborate? I'm not understanding the problem.

> 
> import tango.io.Stdout;
> void main(){
> int[4] i = [1,2,3,4];
> T(t); // compiler: I think this is an expression *barf*

I don't think this can be interpreted as a declaration

> t(i[])(i[]); //compiler: I think this is a declaration *barf*

nor that

> }
> class T{
> public T opCall(int[] i){
> Stdout(i).newline;
> return this;
> }
> }

the only problem case I know of is:

a * b = d;

where this can be a decl of a pointer to type a called b and set to d
or the result of the expression a times b getting assigned d (operator overloading can make this valid).


June 23, 2009
> import tango.io.Stdout;
> void main(){
>     int[4] i = [1,2,3,4];
>     T(t); // compiler: I think this is an expression *barf* t(i[])(i[]);
>     //compiler: I think this is a declaration *barf*
> }
> 
> class T{
>     public T opCall(int[] i){
>         Stdout(i).newline;
>         return this;
>     }
> }

In your example are at least two errors.

T(t); -- the "t" is not defined there
public T opCall(int[] i) { -- it should be called as t(i), if you want
call T(i), make this method static.

I personally do not encounter problem you are writing about. But you should be aware of: Expressions that have no effect, like (x + x), are illegal in expression statements. If such an expression is needed, casting it to void will make it legal.

http://www.digitalmars.com/d/1.0/statement.html#ExpressionStatement
June 23, 2009
On Tue, Jun 23, 2009 at 1:00 AM, Ellery Newcomer<ellery-newcomer@utulsa.edu> wrote:
> Sorry for not posting this in learn, but I'd also like to hear the Language Designer's input on this one.
>
> How does dmd resolve the declaration/expression ambiguity?
>
> My first instinct would be to try the declaration, and if it doesn't work because the type doesn't exist or something like that then try the expression, or vice versa. But that could easily lead to undefined and unexpected behavior. what if both are valid?

You're right; if a statement begins with an identifier, the compiler requires arbitrary lookahead to determine whether it's looking at an expression or a declaration.  There's a good bit of duplicated code in DMD dedicated to parsing declarations.  IIRC there's one version of the parsing that just returns whether or not it's "probably" a declaration, and another version that does the exact same thing but which actually builds the AST.  Kind of icky.

But that being said, I don't think there are actually any ambiguities in the grammar when it comes to this.  Neither of the "problem" lines in your example code could possibly be interpreted as declarations, and I don't think I can come up with any actually ambiguous code.
June 24, 2009
Jarrett Billingsley wrote:
> On Tue, Jun 23, 2009 at 1:00 AM, Ellery Newcomer<ellery-newcomer@utulsa.edu> wrote:
>> Sorry for not posting this in learn, but I'd also like to hear the Language Designer's input on this one.
>>
>> How does dmd resolve the declaration/expression ambiguity?
>>
>> My first instinct would be to try the declaration, and if it doesn't work because the type doesn't exist or something like that then try the expression, or vice versa. But that could easily lead to undefined and unexpected behavior. what if both are valid?
> 
> You're right; if a statement begins with an identifier, the compiler requires arbitrary lookahead to determine whether it's looking at an expression or a declaration.  There's a good bit of duplicated code in DMD dedicated to parsing declarations.  IIRC there's one version of the parsing that just returns whether or not it's "probably" a declaration, and another version that does the exact same thing but which actually builds the AST.  Kind of icky.

Heh. I saw that. I also saw a toExpression function in various declaration structs. Didn't look deeply into it though.
> 
> But that being said, I don't think there are actually any ambiguities in the grammar when it comes to this.  Neither of the "problem" lines in your example code could possibly be interpreted as declarations, and I don't think I can come up with any actually ambiguous code.

Wrong. Both are perfectly valid declarations (and did you miss my note?
the compiler *IS* interpreting the second as a declaration).
Okay, consider the rule declarator, which is (or should, if the grammar
wants to correctly reflect what the compiler is doing) defined like so

Declarator:
      BasicType2opt Identifier DeclaratorSuffixesopt
      BasicType2opt ( Declarator ) DeclaratorSuffixesopt

This is what allows D to accept C-style (forgot about those, didn't ya?) declarations, and it's mostly what I'm referring to. Watch:

int(i); //compiles exactly the same as 'int i;'
int(*i)(int[]); //compiles the same as 'int function(int[]) i;'

So when I give something like

T(t);
t(*i)(i[]); //changed it a little, since 'int (i[])(int[])'
//            is semantically invalid

I intend a declaration and an expression. I get the opposite. Fortunately, neither compiles, due to semantic errors.

To restate my question, if I'm a parser and I see

Identifier ( Identifier ) ;

which do I interpret it as?

Type ( NewSymbol ) ;
FunctionName ( Argument ) ;

If I see

Identifier . Identifier ( * Identifier ) ;

what do I resolve it as?

And it just goes downhill from there.
June 24, 2009
Hello Ellery,

> This is what allows D to accept C-style (forgot about those, didn't
> ya?) declarations, and it's mostly what I'm referring to. Watch:

Yes and I now I even more wish DMD would to :(


June 24, 2009
On Tue, Jun 23, 2009 at 8:35 PM, Ellery Newcomer<ellery-newcomer@utulsa.edu> wrote:

> Wrong. Both are perfectly valid declarations (and did you miss my note?
> the compiler *IS* interpreting the second as a declaration).
> Okay, consider the rule declarator, which is (or should, if the grammar
> wants to correctly reflect what the compiler is doing) defined like so
>
> Declarator:
>      BasicType2opt Identifier DeclaratorSuffixesopt
>      BasicType2opt ( Declarator ) DeclaratorSuffixesopt

Ah, fuck.  I can't believe D still accepts those.  All the ambiguity probably goes away without them, huh.
June 24, 2009
Ellery Newcomer wrote:

> 
> To restate my question, if I'm a parser and I see
> 
> Identifier ( Identifier ) ;
> 
> which do I interpret it as?
> 
> Type ( NewSymbol ) ;
> FunctionName ( Argument ) ;
> 


After some incremental parsing iterations you should be able to gradually resolve dependencies for each expression. If it's not ambiguous on what the source is trying to describe and all its dependencies are resolved then you add the new types that it may be declaring to a collection of parsed types. Repeat until everything can be passed and eventually you should know exactly what the first ID is (type, func etc). IIRC opCall can not be declared static.

Sorry if I am completely missing the point but this doesn't seem complex (in a problem solving sense but the code writing may be tedious)
June 25, 2009
Tim Matthews wrote:
> Ellery Newcomer wrote:
> 
>>
>> To restate my question, if I'm a parser and I see
>>
>> Identifier ( Identifier ) ;
>>
>> which do I interpret it as?
>>
>> Type ( NewSymbol ) ;
>> FunctionName ( Argument ) ;
>>
> 
> 
> After some incremental parsing iterations you should be able to gradually resolve dependencies for each expression. If it's not ambiguous on what the source is trying to describe and all its dependencies are resolved then you add the new types that it may be declaring to a collection of parsed types. Repeat until everything can be passed and eventually you should know exactly what the first ID is (type, func etc). IIRC opCall can not be declared static.

Remember back in D1 land when we didn't have struct constructors?
> 
> Sorry if I am completely missing the point but this doesn't seem complex (in a problem solving sense but the code writing may be tedious)

Yeah, you're missing the point. The point is the D Language is billed as one whose lexer is completely independent of its parser, which is completely independent of its semantic analysis. The parser must be able to decide all of these without any help from semantic. Anything less is either failure or just plain wrong.

If you'll have another gander at my original example, you'll see that's
exactly what DMD does. The compiler decides that T(t) is an expression
and t(i[])(i[]) is a declaration, and if they don't resolve, then by
golly that's just too bad. It's an error. Game over.

It's mildly restrictive from the user's perspective, but from the
compiler writer's perspective, it is infinitely better than mixing
semantic and syntactic analysis. And anyways, T(t) can be rewritten the
normal way, and t(i[])(i[]) can be surrounded with parentheses to force
it to be an expression.

But question remains: how does the compiler decide this? I'm hoping for some simple rule like if it is a C-style declaration, then it must have a suffix or prefix for each level. It seems to be behaving something like this.

You are right, though, none of this is complex, just tedious. Reading the compiler's source code especially, though it sounds like I'm not going to get answers any other way.
June 25, 2009
Ellery Newcomer wrote:

> 
> Yeah, you're missing the point. The point is the D Language is billed as
> one whose lexer is completely independent of its parser, which is
> completely independent of its semantic analysis. The parser must be able
> to decide all of these without any help from semantic. Anything less is
> either failure or just plain wrong.


So you are following those steps from http://digitalmars.com/d/2.0/lex.html. I don't think these are strict restrictions to allow your tool to be called a D language parser preventing you from re parsing. I think it is really trying to point out the first few steps and perhaps should be re written as:

1 source character set
The source file is checked to see what character set it is, and the appropriate scanner is loaded. ASCII and UTF formats are accepted.
2 script line
If the first line starts with #! then the first line is ignored.
3 parse

Also from this page http://digitalmars.com/d/2.0/overview.html
features to drop: C source code compatibility.

This is not valid code with dmd v2.030 because D is not strictly compatible with C/C+.

struct A
{
    int i;
}

void main()
{
    A(a);
}

Now that I've tested that with structs, classes, and typedef'd int none of which worked. This does compile however:

void main()
{
    int(a);
    a = 2;
}

From that dmd compatibility should be far simpler but going beyond that would be nicer.

> it is infinitely better than mixing
> semantic and syntactic analysis

I didn't recommend that.


> But question remains: how does the compiler decide this?

Built in types have the extra C compatibility. Dmd doesn't like this though and if it matters to you enough, report it as a bug:

alias int T;

void main()
{
    T(a);
    a = 2;
}


> 
> You are right, though, none of this is complex, just tedious. Reading
> the compiler's source code especially, though it sounds like I'm not
> going to get answers any other way.


If you have a parser that allows that syntax to work a bit more than could you please provide an example of code here that is completely ambiguous to the compiler.