View mode: basic / threaded / horizontal-split · Log in · Help
June 23, 2009
declaration/expression
Sorry for not posting this in learn, but I'd also like to hear the
Language Designer's input on this one.

How does dmd resolve the declaration/expression ambiguity?

My first instinct would be to try the declaration, and if it doesn't
work because the type doesn't exist or something like that then try the
expression, or vice versa. But that could easily lead to undefined and
unexpected behavior. what if both are valid?

Are there any straightforward rules for determining how to proceed?

And I might be wrong, but I don't think any of this is mentioned in the
spec? Should it be?

I can think of a number of examples, most of which dmd handles
gracefully. Here are a couple which, though contrived, seem to
illustrate that the rules are complicated, or more so than I would
conceive off the top of my head. Is the compiler doing what it should,
and if so, how?

import tango.io.Stdout;
void main(){
   int[4] i = [1,2,3,4];
   T(t); // compiler: I think this is an expression *barf*
   t(i[])(i[]); //compiler: I think this is a declaration *barf*
}

class T{
   public T opCall(int[] i){
       Stdout(i).newline;
       return this;
   }
}
June 23, 2009
Re: declaration/expression
Hello Ellery,

> How does dmd resolve the declaration/expression ambiguity?
> 

Could you elaborate? I'm not understanding the problem.

> 
> import tango.io.Stdout;
> void main(){
> int[4] i = [1,2,3,4];
> T(t); // compiler: I think this is an expression *barf*

I don't think this can be interpreted as a declaration

> t(i[])(i[]); //compiler: I think this is a declaration *barf*

nor that

> }
> class T{
> public T opCall(int[] i){
> Stdout(i).newline;
> return this;
> }
> }

the only problem case I know of is:

a * b = d;

where this can be a decl of a pointer to type a called b and set to d
or the result of the expression a times b getting assigned d (operator overloading 
can make this valid).
June 23, 2009
Re: declaration/expression
> import tango.io.Stdout;
> void main(){
>     int[4] i = [1,2,3,4];
>     T(t); // compiler: I think this is an expression *barf* t(i[])(i[]);
>     //compiler: I think this is a declaration *barf*
> }
> 
> class T{
>     public T opCall(int[] i){
>         Stdout(i).newline;
>         return this;
>     }
> }

In your example are at least two errors.

T(t); -- the "t" is not defined there
public T opCall(int[] i) { -- it should be called as t(i), if you want 
call T(i), make this method static.

I personally do not encounter problem you are writing about. But you 
should be aware of: Expressions that have no effect, like (x + x), are 
illegal in expression statements. If such an expression is needed, 
casting it to void will make it legal. 

http://www.digitalmars.com/d/1.0/statement.html#ExpressionStatement
June 23, 2009
Re: declaration/expression
On Tue, Jun 23, 2009 at 1:00 AM, Ellery
Newcomer<ellery-newcomer@utulsa.edu> wrote:
> Sorry for not posting this in learn, but I'd also like to hear the
> Language Designer's input on this one.
>
> How does dmd resolve the declaration/expression ambiguity?
>
> My first instinct would be to try the declaration, and if it doesn't
> work because the type doesn't exist or something like that then try the
> expression, or vice versa. But that could easily lead to undefined and
> unexpected behavior. what if both are valid?

You're right; if a statement begins with an identifier, the compiler
requires arbitrary lookahead to determine whether it's looking at an
expression or a declaration.  There's a good bit of duplicated code in
DMD dedicated to parsing declarations.  IIRC there's one version of
the parsing that just returns whether or not it's "probably" a
declaration, and another version that does the exact same thing but
which actually builds the AST.  Kind of icky.

But that being said, I don't think there are actually any ambiguities
in the grammar when it comes to this.  Neither of the "problem" lines
in your example code could possibly be interpreted as declarations,
and I don't think I can come up with any actually ambiguous code.
June 24, 2009
Re: declaration/expression
Jarrett Billingsley wrote:
> On Tue, Jun 23, 2009 at 1:00 AM, Ellery
> Newcomer<ellery-newcomer@utulsa.edu> wrote:
>> Sorry for not posting this in learn, but I'd also like to hear the
>> Language Designer's input on this one.
>>
>> How does dmd resolve the declaration/expression ambiguity?
>>
>> My first instinct would be to try the declaration, and if it doesn't
>> work because the type doesn't exist or something like that then try the
>> expression, or vice versa. But that could easily lead to undefined and
>> unexpected behavior. what if both are valid?
> 
> You're right; if a statement begins with an identifier, the compiler
> requires arbitrary lookahead to determine whether it's looking at an
> expression or a declaration.  There's a good bit of duplicated code in
> DMD dedicated to parsing declarations.  IIRC there's one version of
> the parsing that just returns whether or not it's "probably" a
> declaration, and another version that does the exact same thing but
> which actually builds the AST.  Kind of icky.

Heh. I saw that. I also saw a toExpression function in various
declaration structs. Didn't look deeply into it though.
> 
> But that being said, I don't think there are actually any ambiguities
> in the grammar when it comes to this.  Neither of the "problem" lines
> in your example code could possibly be interpreted as declarations,
> and I don't think I can come up with any actually ambiguous code.

Wrong. Both are perfectly valid declarations (and did you miss my note?
the compiler *IS* interpreting the second as a declaration).
Okay, consider the rule declarator, which is (or should, if the grammar
wants to correctly reflect what the compiler is doing) defined like so

Declarator:
     BasicType2opt Identifier DeclaratorSuffixesopt
     BasicType2opt ( Declarator ) DeclaratorSuffixesopt

This is what allows D to accept C-style (forgot about those, didn't ya?)
declarations, and it's mostly what I'm referring to. Watch:

int(i); //compiles exactly the same as 'int i;'
int(*i)(int[]); //compiles the same as 'int function(int[]) i;'

So when I give something like

T(t);
t(*i)(i[]); //changed it a little, since 'int (i[])(int[])'
//            is semantically invalid

I intend a declaration and an expression. I get the opposite.
Fortunately, neither compiles, due to semantic errors.

To restate my question, if I'm a parser and I see

Identifier ( Identifier ) ;

which do I interpret it as?

Type ( NewSymbol ) ;
FunctionName ( Argument ) ;

If I see

Identifier . Identifier ( * Identifier ) ;

what do I resolve it as?

And it just goes downhill from there.
June 24, 2009
Re: declaration/expression
Hello Ellery,

> This is what allows D to accept C-style (forgot about those, didn't
> ya?) declarations, and it's mostly what I'm referring to. Watch:

Yes and I now I even more wish DMD would to :(
June 24, 2009
Re: declaration/expression
On Tue, Jun 23, 2009 at 8:35 PM, Ellery
Newcomer<ellery-newcomer@utulsa.edu> wrote:

> Wrong. Both are perfectly valid declarations (and did you miss my note?
> the compiler *IS* interpreting the second as a declaration).
> Okay, consider the rule declarator, which is (or should, if the grammar
> wants to correctly reflect what the compiler is doing) defined like so
>
> Declarator:
>      BasicType2opt Identifier DeclaratorSuffixesopt
>      BasicType2opt ( Declarator ) DeclaratorSuffixesopt

Ah, fuck.  I can't believe D still accepts those.  All the ambiguity
probably goes away without them, huh.
June 24, 2009
Re: declaration/expression
Ellery Newcomer wrote:

> 
> To restate my question, if I'm a parser and I see
> 
> Identifier ( Identifier ) ;
> 
> which do I interpret it as?
> 
> Type ( NewSymbol ) ;
> FunctionName ( Argument ) ;
> 


After some incremental parsing iterations you should be able to 
gradually resolve dependencies for each expression. If it's not 
ambiguous on what the source is trying to describe and all its 
dependencies are resolved then you add the new types that it may be 
declaring to a collection of parsed types. Repeat until everything can 
be passed and eventually you should know exactly what the first ID is 
(type, func etc). IIRC opCall can not be declared static.

Sorry if I am completely missing the point but this doesn't seem complex 
(in a problem solving sense but the code writing may be tedious)
June 25, 2009
Re: declaration/expression
Tim Matthews wrote:
> Ellery Newcomer wrote:
> 
>>
>> To restate my question, if I'm a parser and I see
>>
>> Identifier ( Identifier ) ;
>>
>> which do I interpret it as?
>>
>> Type ( NewSymbol ) ;
>> FunctionName ( Argument ) ;
>>
> 
> 
> After some incremental parsing iterations you should be able to
> gradually resolve dependencies for each expression. If it's not
> ambiguous on what the source is trying to describe and all its
> dependencies are resolved then you add the new types that it may be
> declaring to a collection of parsed types. Repeat until everything can
> be passed and eventually you should know exactly what the first ID is
> (type, func etc). IIRC opCall can not be declared static.

Remember back in D1 land when we didn't have struct constructors?
> 
> Sorry if I am completely missing the point but this doesn't seem complex
> (in a problem solving sense but the code writing may be tedious)

Yeah, you're missing the point. The point is the D Language is billed as
one whose lexer is completely independent of its parser, which is
completely independent of its semantic analysis. The parser must be able
to decide all of these without any help from semantic. Anything less is
either failure or just plain wrong.

If you'll have another gander at my original example, you'll see that's
exactly what DMD does. The compiler decides that T(t) is an expression
and t(i[])(i[]) is a declaration, and if they don't resolve, then by
golly that's just too bad. It's an error. Game over.

It's mildly restrictive from the user's perspective, but from the
compiler writer's perspective, it is infinitely better than mixing
semantic and syntactic analysis. And anyways, T(t) can be rewritten the
normal way, and t(i[])(i[]) can be surrounded with parentheses to force
it to be an expression.

But question remains: how does the compiler decide this? I'm hoping for
some simple rule like if it is a C-style declaration, then it must have
a suffix or prefix for each level. It seems to be behaving something
like this.

You are right, though, none of this is complex, just tedious. Reading
the compiler's source code especially, though it sounds like I'm not
going to get answers any other way.
June 25, 2009
Re: declaration/expression
Ellery Newcomer wrote:

> 
> Yeah, you're missing the point. The point is the D Language is billed as
> one whose lexer is completely independent of its parser, which is
> completely independent of its semantic analysis. The parser must be able
> to decide all of these without any help from semantic. Anything less is
> either failure or just plain wrong.


So you are following those steps from 
http://digitalmars.com/d/2.0/lex.html. I don't think these are strict 
restrictions to allow your tool to be called a D language parser 
preventing you from re parsing. I think it is really trying to point out 
the first few steps and perhaps should be re written as:

1 source character set
The source file is checked to see what character set it is, and the 
appropriate scanner is loaded. ASCII and UTF formats are accepted.
2 script line
If the first line starts with #! then the first line is ignored.
3 parse

Also from this page http://digitalmars.com/d/2.0/overview.html
features to drop: C source code compatibility.

This is not valid code with dmd v2.030 because D is not strictly 
compatible with C/C+.

struct A
{
    int i;
}

void main()
{
    A(a);
}

Now that I've tested that with structs, classes, and typedef'd int none 
of which worked. This does compile however:

void main()
{
    int(a);
    a = 2;
}

From that dmd compatibility should be far simpler but going beyond that 
would be nicer.

> it is infinitely better than mixing
> semantic and syntactic analysis

I didn't recommend that.


> But question remains: how does the compiler decide this?

Built in types have the extra C compatibility. Dmd doesn't like this 
though and if it matters to you enough, report it as a bug:

alias int T;

void main()
{
    T(a);
    a = 2;
}


> 
> You are right, though, none of this is complex, just tedious. Reading
> the compiler's source code especially, though it sounds like I'm not
> going to get answers any other way.


If you have a parser that allows that syntax to work a bit more than 
could you please provide an example of code here that is completely 
ambiguous to the compiler.
Top | Discussion index | About this forum | D home