Study of GCC frontend and Walter's DMD compiler sources] (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » Study of GCC frontend and Walter's DMD compiler sources] (page 2)

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Sean L. Palmer
in reply to Martin M. Pedersen

Sean L. Palmer

Posted in reply to Martin M. Pedersen

"Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:adg4i1$1o2s$1@digitaldaemon.com...
> - It is stated:
>         "Floats can be in decimal or hexadecimal notation, as in standard
> C".
>    Hexadecimal notation is not standard, is it? Is definitely is not
> standard
>    C++ (not described in ISO/IEC 14882). It expect this will cause
>    some difficulty because the underlying C library does not support
>    hexadecimal notation using strtod().

I don't see why someone couldn't write:

-0xfeed.beef // same as -65261.7458343505859375

or

0b10.01  // same as 2.25

although since e is a valid hex char it would require a different form of scientific notation.

Sean

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by andy
in reply to Martin M. Pedersen

andy

Posted in reply to Martin M. Pedersen

> 
> 
> I agree. The problem is that the code will depend on the C++ RTL which GCC
> probably does not link with. A solution to this is to put the C++ code into
> a shared library and include the C++ RTL in the shared library and resolve
> references to that internally. There might be problems initializing the C++
> RTL, but that can be overcome. I have used this technique a couple of times.
> Its not without problems, though.
> 

Such as?  Other than not diverging from Walter's front end as much, what advantages do you see?


> I would prefer a pure C solution, and in any case, we need a solution that

I'm trying not to let my personal preferences get in the way.

> does not use features specific to DMC. I have found:
> 
> - complex_t
> 
> - Hexadecimal notation of double literals supported by strtod()
> 
> - Use of strtof() and strtold(). Neither my Windows compiler or GCC have
> them,
>   and they are not part of x/Open specification either. If strtod() is used
> in stead
>   of strtold() precision is lost. I don't know it this can be solved.
> 
> - Use of contracts.
> 
> I don't know if there are others. If Walter's source, not the documentation,
> is to be authorative, it would be nice if it could be used as-is. This would
> allow the source to be easily updated.
> 

True.  But when does the cost of the GCC pain and troubles exceed that?


> I have a working lexer, and a partial bison grammar that has all statements
> and expressions. I've stopped, temporarily anyway, for a number of reasons:
> - The D specification is incomplete in the sense that it covers all language
> constructs. For examle, BasicType and BasicType2 are not described.

Perhaps we can reconstruct them from the source?

> - I ran into conflicts attempting to model the declarations by
> reverse-engineering Walter's front-end. I may be my fault.
> - I would like to consider the D specification authorative - not the
> existing front-end. But I'm not sure if I can.
> 

Can you list all of the deficiencies (or are they all listed below)? I'll do my best to contribute to completing the Spec (running my patches by Walter of course).

> I prefer a bison solution over recursive descent because it allows grammar
> documentation to be extracted directly from the source. My approach is also
> a little different from Walter's regarding strings. I use wchar_t's
> everywhere internally.

Interesting, you don't find the memory consumption arguments compelling?

> 
> Some problems I have identified are described below. Don't take me wrong. I
> think that Walter has been doing a great job, and I hope a list with even
> the smallet things described will help making it a better product.
> 
> In http://www.digitalmars.com/d/lex.html :
> 
> - "synchronized" not listed as keyword
> 
> - "===" and "!==" are not described as tokens.
> 
> - "asm", "delegate" are is not described as a keywords.
> 
> - "volatile" is described as a keyword. However, it is stated in
>   "statement.html" that "D has no volatile storage type". I suspect
>   that way to many keywords are reserved.
> 
> - Literal grammar uses 0b, not 0x, for hexadecimal literals.
>   So this is wrong:
>             Hexadecimal:
>               0b HexDigits
> 
> - It is not specified that 0B and 0X are valid prefixes for literal
> integers;
>   Only 0b and 0X are described. Case does not matter in "lexer.c".
> 
> - It is stated:
>         "Floats can be in decimal or hexadecimal notation, as in standard
> C".
>    Hexadecimal notation is not standard, is it? Is definitely is not
> standard
>    C++ (not described in ISO/IEC 14882). It expect this will cause
>    some difficulty because the underlying C library does not support
>    hexadecimal notation using strtod().
> 
> http://www.digitalmars.com/d/expression.html
> 
>     ShiftExpression:
>       AddExpression
>       AddExpression << AddExpression
>       AddExpression >> AddExpression
>       AddExpression <<< AddExpression
>     The <<< operator is not defined by "lex.html", and "dmd" will not
> compile it.
>     It is probably a typo, and the operator should be >>>. The same problem
>     exists with <<<= vs. >>>=.
> 
>     In the definition of the ShiftExpression above, AddExpression should be
>     replaced with ShiftExpression on the left side of the operator (in a
> Bison
>     grammar)The same goes for other binary expression. If not, expressions
>     like a+b+c will not compile.
> 
>     "ArgumentList" is not described.
> 
>     "NewExpression" is not described.
> 
>     The description of variable initialization is ambiguous. It says:
>         IdentifierList:
>           Variable
>           Variable , IdentifierList
>         Variable:
>           Identifier
>           Identifier = Expression
>     "Expression" is a comma separated list of AssignmentExpression's, and it
>     therefore conflicts with IdentifierList. So, Expression should be
>     AssignmentExpression instead (like it is in C).
> 
> http://www.digitalmars.com/d/statement.html
> 
>     {} is both an EmptyStatement and a BlockStatement. EmptyStatement
>     should not be in the grammar.
> 
>     Synchronize Statement uses the keyword "synchronize". However, the
>     keyword is defined as "synchronized" (with a "d") in both "lexer.c" and
> "lex.html".
> 
>     How about rethrowing exceptions? The ThrowStatement does not
>     allow throw to be used without an expression:
>         ThrowStatement:
>           throw Expression ;
> 
>     The AsmStatement is described as:
>         AsmStatement:
>       { }
>       { AsmInstructionList }
>     It lacks the "asm" keyword.
> 
> In http://www.digitalmars.com/d/declaration.html :
> 
> - BasicType and BasicType2 are not described.
> 
> - Only const, static, final, synchronized, and deprecated are described
>    as storage specifiers (or whatever it should be called). However,
>    "parser.c" recognizes many more.
> 
> 
>

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Walter
in reply to Martin M. Pedersen

Walter

Posted in reply to Martin M. Pedersen

"Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:adg4i1$1o2s$1@digitaldaemon.com...
> I agree. The problem is that the code will depend on the C++ RTL which GCC probably does not link with. A solution to this is to put the C++ code
into
> a shared library and include the C++ RTL in the shared library and resolve references to that internally. There might be problems initializing the
C++
> RTL, but that can be overcome. I have used this technique a couple of
times.
> Its not without problems, though.

What's the issue with recompiling gcc with g++?

> I would prefer a pure C solution, and in any case, we need a solution that
> does not use features specific to DMC. I have found:
> - complex_t
> - Hexadecimal notation of double literals supported by strtod()
> - Use of strtof() and strtold(). Neither my Windows compiler or GCC have
> them,
>   and they are not part of x/Open specification either. If strtod() is
used
> in stead
>   of strtold() precision is lost. I don't know it this can be solved.

These are all part of the standard C99 spec. I'm a little surprised they are not in gcc. Doesn't gcc support long doubles, if so, isn't there a conversion in the gcc library for it? If not, just use strtod(), that's what I used for bootstrapping myself. One can always go back later and upgrade it.


> - Use of contracts.

Those can just be #ifdef'd out.

> I have a working lexer, and a partial bison grammar that has all
statements
> and expressions. I've stopped, temporarily anyway, for a number of
reasons:
> - The D specification is incomplete in the sense that it covers all
language
> constructs. For examle, BasicType and BasicType2 are not described.

What those do is enable both C and D style array declarations to work:
    int foo[];
    int[] foo;
both declare foo to be an array of int's.

> I prefer a bison solution over recursive descent because it allows grammar documentation to be extracted directly from the source. My approach is
also
> a little different from Walter's regarding strings. I use wchar_t's everywhere internally.

That will work, but since wchar_t's on linux are 4 bytes, I found with another project on linux that it consumed lots of memory and ran slower. I also ran into maddening gaps in gcc's library support for wchar_t's.

> Some problems I have identified are described below. Don't take me wrong.
I
> think that Walter has been doing a great job, and I hope a list with even the smallet things described will help making it a better product.

It's great that you're posting these problems, that enables me to fix them.

> In http://www.digitalmars.com/d/lex.html :
> - "synchronized" not listed as keyword

Yes it is!

> - "===" and "!==" are not described as tokens.

Fixed.

> - "asm", "delegate" are is not described as a keywords.

Fixed.

> - "volatile" is described as a keyword. However, it is stated in
>   "statement.html" that "D has no volatile storage type". I suspect
>   that way to many keywords are reserved.

volatile is now removed.

> - Literal grammar uses 0b, not 0x, for hexadecimal literals.
>   So this is wrong:
>             Hexadecimal:
>               0b HexDigits

Fixed.

> - It is not specified that 0B and 0X are valid prefixes for literal
> integers;
>   Only 0b and 0X are described. Case does not matter in "lexer.c".

Fixed.

> - It is stated:
>         "Floats can be in decimal or hexadecimal notation, as in standard
> C".
>    Hexadecimal notation is not standard, is it? Is definitely is not
> standard
>    C++ (not described in ISO/IEC 14882). It expect this will cause
>    some difficulty because the underlying C library does not support
>    hexadecimal notation using strtod().

It is in C99, but not in C++98.

> http://www.digitalmars.com/d/expression.html
>
>     ShiftExpression:
>       AddExpression
>       AddExpression << AddExpression
>       AddExpression >> AddExpression
>       AddExpression <<< AddExpression
>     The <<< operator is not defined by "lex.html", and "dmd" will not
> compile it.
>     It is probably a typo, and the operator should be >>>. The same
problem
>     exists with <<<= vs. >>>=.

Fixed.

>     In the definition of the ShiftExpression above, AddExpression should
be
>     replaced with ShiftExpression on the left side of the operator (in a
> Bison
>     grammar)The same goes for other binary expression. If not, expressions
>     like a+b+c will not compile.

a+b+c should parse as ((a+b)+c). As long as that works, you should be ok.

>     "ArgumentList" is not described.
>     "NewExpression" is not described.

I'll add these.

>     The description of variable initialization is ambiguous. It says:
>         IdentifierList:
>           Variable
>           Variable , IdentifierList
>         Variable:
>           Identifier
>           Identifier = Expression
>     "Expression" is a comma separated list of AssignmentExpression's, and
it
>     therefore conflicts with IdentifierList. So, Expression should be
>     AssignmentExpression instead (like it is in C).

Done.

> http://www.digitalmars.com/d/statement.html
>
>     {} is both an EmptyStatement and a BlockStatement. EmptyStatement
>     should not be in the grammar.

Fixed.

>     Synchronize Statement uses the keyword "synchronize". However, the
>     keyword is defined as "synchronized" (with a "d") in both "lexer.c"
and
> "lex.html".

Fixed.

>     How about rethrowing exceptions? The ThrowStatement does not
>     allow throw to be used without an expression:
>         ThrowStatement:
>           throw Expression ;

This has been discussed at length. It's probably a good idea.

>     The AsmStatement is described as:
>         AsmStatement:
>       { }
>       { AsmInstructionList }
>     It lacks the "asm" keyword.

Fixed.

> In http://www.digitalmars.com/d/declaration.html :
>
> - BasicType and BasicType2 are not described.

Yes, this needs work.

> - Only const, static, final, synchronized, and deprecated are described
>    as storage specifiers (or whatever it should be called). However,
>    "parser.c" recognizes many more.

This needs work too.

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Martin M. Pedersen
in reply to Pavel Minayev

Martin M. Pedersen

Posted in reply to Pavel Minayev

"Pavel Minayev" <evilone@omen.ru> wrote in message news:adg7s3$1rcf$1@digitaldaemon.com...
> "Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:adg4i1$1o2s$1@digitaldaemon.com...
>
> If this is what I think it is, then won't "typedef complex complex_t"
work?

The problem is that there is no complex type defined by the standard to my knowledge. Standard C++ defines it as template. Regarding GCC, I just studied its documentation of its extensions. It seems that it has a __complex__ type that might be used. I also noted that it supports hex floating point literals. So it seems that these issues are not problems after all with GCC. But they still represent portability problems (or challenges :-), generally speaking.

> > - Use of strtof() and strtold(). Neither my Windows compiler or GCC have
> The solution is obvious: write your own versions of these. =)

Yes its obvious. But it still represents a challenge.

> > - Use of contracts.
> Since a bugless program should never ever produce contract violations, and since contracts don't have side-effects, I think you could just strip these away.

Yes. Another approach could be to put them into #ifdef __DMC__ conditionals. I guess Walter just have forgotten it, as he does use such conditionals elsewhere.


Regards,
Martin M. Pedersen

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Walter
in reply to andy

Walter

Posted in reply to andy

"andy" <acoliver@apache.org> wrote in message news:3CFBB141.7080909@apache.org...
> Walter wrote:
> > Could GCC simply be compiled with the g++ compiler, rather than gcc?
That
> > would then make it easy to interface D's C++ front end to GCC's C.
> Cost / benefit analysis --
> Here are disadvantages:
> 1. We'd never get into the GCC standard distribution that way.  I'm not
> sure if we want that, but wider adoption for D would certainly result
> from default inclusion.

The fact that apparantly nobody has already done this with GCC is discouraging.

> 2. We'd forever be *out of sync* with the GCC folks.

That's a serious problem.

> 3. The GCC folks would think us odd and probably not work very closely with us.

They probably will anyway <g>.

> Advantages:
>
> 1. Resist the effort of rewriting the D Front end in C
> 2. The D community is largely composed of C++ advocates who might be
> more likely to contribute with a C++ effort.

Is g++ written in C, too?

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by andy
in reply to Walter

andy

Posted in reply to Walter

>>Here are disadvantages:
>>1. We'd never get into the GCC standard distribution that way.  I'm not
>>sure if we want that, but wider adoption for D would certainly result
>>from default inclusion.
> 
> 
> The fact that apparantly nobody has already done this with GCC is
> discouraging.
> 

yes.

> 
>>2. We'd forever be *out of sync* with the GCC folks.
> 
> 
> That's a serious problem.
> 

yes.

> 
>>3. The GCC folks would think us odd and probably not work very closely
>>with us.
> 
> 
> They probably will anyway <g>.
> 

Very true, but I'm sure they thought the Cobol front end folks were weird, but yet I see some collaboration there.

> 
>>Advantages:
>>
>>1. Resist the effort of rewriting the D Front end in C
>>2. The D community is largely composed of C++ advocates who might be
>>more likely to contribute with a C++ effort.
> 
> 
> Is g++ written in C, too?
> 

You bet it is.  For irony look at the $GCC_ROOT/gcc/cp/Class.c ;-).


>

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Martin M. Pedersen
in reply to Walter

Martin M. Pedersen

Posted in reply to Walter

"Walter" <walter@digitalmars.com> wrote in message news:adgbub$1vgu$1@digitaldaemon.com...

>> A solution to this is to put the C++ code into
> > a shared library and include the C++ RTL in the shared library
> What's the issue with recompiling gcc with g++?

I don't know - it might be that simple. But when I have used the other solution, I did not have that option.

> > - complex_t
> > - Hexadecimal notation of double literals supported by strtod()
> > - Use of strtof() and strtold().
> These are all part of the standard C99 spec. I'm a little surprised they
are
> not in gcc.

Then I have learned something new :-) Looking deeper into this it turns out
that it has a __complex__ type, the hex notation is supported, and both
strtold() and strtof() can be found in the headers and libraries, but not
the man pages). So these does not represent any real problems with GCC after
all.

> > I prefer a bison solution over recursive descent because it allows
grammar
> > documentation to be extracted directly from the source. My approach is also a little different from Walter's regarding strings. I use wchar_t's everywhere internally.
>
> That will work, but since wchar_t's on linux are 4 bytes, I found with another project on linux that it consumed lots of memory and ran slower. I also ran into maddening gaps in gcc's library support for wchar_t's.

Yes, I have noticed such gaps too. I started out using wchar_t's in order to support unicode identifiers and file names. The spec does not specify what a letter in an identifier is, but I see in "lexer.c" that it is [A-Za-z]. I guess that is how it should be - keeping link compability with C.

> It's great that you're posting these problems, that enables me to fix
them.

And you are quick actually doing it too :-)

Regards,
Martin M. Pedersen

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Walter
in reply to Martin M. Pedersen

Walter

Posted in reply to Martin M. Pedersen

"Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:adgfa2$23tj$1@digitaldaemon.com...
> Yes, I have noticed such gaps too. I started out using wchar_t's in order
to
> support unicode identifiers and file names. The spec does not specify what
a
> letter in an identifier is, but I see in "lexer.c" that it is [A-Za-z]. I guess that is how it should be - keeping link compability with C.

While D supports unicode source files, unicode comments, unicode strings, and generating unicode apps, the identifiers are standard C identifiers. This ensures compatibility with existing linkers, librarians, debuggers, disassemblers, etc. Rewriting all of that stuff is way, way beyond the scope of D!

June 03, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Martin M. Pedersen
in reply to andy

Martin M. Pedersen

Posted in reply to andy

"andy" <acoliver@apache.org> wrote in message news:3CFBB3C6.7090803@apache.org...
> > A solution to this is to put the C++ code into
> > a shared library and include the C++ RTL in the shared library and
resolve
> > references to that internally. There might be problems initializing the
C++
> > RTL, but that can be overcome. I have used this technique a couple of
times.
> > Its not without problems, though.
> Such as?  Other than not diverging from Walter's front end as much, what advantages do you see?

The major problem is that even though shared libraries are widely supported, there are also differences in how they are made, and how they work. I have used them on Linux, AIX, Solaris, and HP-UX. They were are all different. So using shared libraries causes portability problems.

The major advantage of using Walter's front end, is not diverging. Another advantage, if it can be easily updated, is that Walter does bug fixing :-)

> > I don't know if there are others. If Walter's source, not the
documentation,
> > is to be authorative, it would be nice if it could be used as-is. This
would
> > allow the source to be easily updated.
>
> True.  But when does the cost of the GCC pain and troubles exceed that?

I don't know.

>> For examle, BasicType and BasicType2 are not described.
> Perhaps we can reconstruct them from the source?
> > - I ran into conflicts attempting to model the declarations by reverse-engineering Walter's front-end.

More work on my part will probably help, and more updates to the spec will help too.

> Can you list all of the deficiencies (or are they all listed below)?

They were all listed.

> > I prefer a bison solution over recursive descent because it allows
grammar
> > documentation to be extracted directly from the source. My approach is
also
> > a little different from Walter's regarding strings. I use wchar_t's everywhere internally.
> Interesting, you don't find the memory consumption arguments compelling?

Memory consumption does not really bother me. Walter probably knows better, but I don't think the amount of string data used while compiling is significant. I have never experienced the amount of memory used by my compilers to be a problem (except for DOS compilers), so I figured that this would not be a problem either. If it would be, all the symbols would go into a hash table, and in this way keeping only one representation of each symbol used. The same symbols appears again and again, so much could be gained in this way. It also has the advantage that when inserted into the hash table, strings can be compared for equality simply comparing pointers.

But the question is: What is gained? Because symbols are all ASCII, there really is no point in doing it this way.


Regards,
Martin M. Pedersen

June 04, 2002

Re: Study of GCC frontend and Walter's DMD compiler sources]

Posted by Sean L. Palmer
in reply to Sean L. Palmer

Sean L. Palmer

Posted in reply to Sean L. Palmer

Oh and for hex floats, the f suffix would be ambiguous.

Drat, I've done gone and killed my own proposal.  ;(

Sean

"Sean L. Palmer" <seanpalmer@earthlink.net> wrote in message news:adgbl9$1vco$1@digitaldaemon.com...
>
> "Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:adg4i1$1o2s$1@digitaldaemon.com...
> > - It is stated:
> >         "Floats can be in decimal or hexadecimal notation, as in
standard
> > C".
> >    Hexadecimal notation is not standard, is it? Is definitely is not
> > standard
> >    C++ (not described in ISO/IEC 14882). It expect this will cause
> >    some difficulty because the underlying C library does not support
> >    hexadecimal notation using strtod().
>
> I don't see why someone couldn't write:
>
> -0xfeed.beef // same as -65261.7458343505859375
>
> or
>
> 0b10.01  // same as 2.25
>
> although since e is a valid hex char it would require a different form of scientific notation.
>
> Sean

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation