Want to help DMD bugfixing? Write a simple utility. (page 4)

"Regan Heath" <regan@netmail.co.nz> wrote in message news:op.vswbv8qj54xghj@puck.auriga.bhead.co.uk... > On Wed, 23 Mar 2011 21:16:02 -0000, Jonathan M Davis <jmdavisProg@gmx.com> wrote: >> There are tasks for which you need to be able to lex and parse D code. To 100% correctly remove unit tests would be one such task. > > Is that last bit true? You definitely need to be able to lex it, but instead of actually parsing it you just count { and } and remove 'unittest' plus { plus } plus everything in between right? > No, to do it 100% reliably, you do need lexing/parsing, and also the semantics stage. Example: string makeATest(string str) { return "unit"~"test { "~str~" }"; } mixin(makeATest(q{ // Do tests }));

"Alexey Prokhin" <alexey.prokhin@yandex.ru> wrote in message news:mailman.2713.1300954193.4748.digitalmars-d-learn@puremagic.com... >> Currently, as far as I know, there are only two lexers and two parsers >> for >> D: the C++ front end which dmd, gdc, and ldc use and the D front end >> which >> ddmd uses and which is based on the C++ front end. Both of those are >> under >> the GPL (which makes them useless for a lot of stuff) and both of them >> are >> tied to compilers. Being able to lex D code and get the list of tokens in >> a D program and being able to parse D code and get the resultant abstract >> syntax tree would be very useful for a number of programs. > There is a third one: http://code.google.com/p/dil/. The main page says > that > the lexer and the parser are fully implemented for both D1 and D2. But the > license is also the GPL. The nearly-done v0.4 of my Goldie parsing system (zlib/libpng license) comes with a mostly-complete lexing-only grammar for D2. http://www.dsource.org/projects/goldie/browser/trunk/lang/dlex.grm The limitations of it right now: - Doesn't do nested comments. That requires a feature (that's going to be introduced in the related tool GOLD Parsing System v4.2) that I haven't had a chance to add into Goldie just yet. - It's possible there might be some edge-case bugs regarding either the ".." operator and/or float literals. - It's ASCII-only. Goldie supports Unicode, but character set optimization isn't implemented yet, so unicode grammars are technically possible but impractical ATM (this will be the top priority after I get v0.4 released).

"Andrej Mitrovic" <andrej.mitrovich@gmail.com> wrote in message news:mailman.2696.1300895928.4748.digitalmars-d-learn@puremagic.com... > On 3/23/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote: >> That would require a full-blown D lexer and parser. >> >> - Jonathan M Davis >> > Isn't DDMD written in D? I'm not sure about how finished it is though. I've done a little bit of playing around with DDMD for a (still only just barely-started) project, and it seems to be fairly well up to the task of building an AST and running semantics. It is still based on a somewhat older version of D2, though, and my understanding is that actually building a real-world program with it is still impractical (though I haven't tried).

March 25, 2011

Re: Want to help DMD bugfixing? Write a simple utility.

Posted by Nick Sabalausky
in reply to Nick Sabalausky

Permalink

Nick Sabalausky

Posted in reply to Nick Sabalausky

Permalink

"Nick Sabalausky" <a@a.a> wrote in message news:imivp7$2fu$1@digitalmars.com...
> "Alexey Prokhin" <alexey.prokhin@yandex.ru> wrote in message news:mailman.2713.1300954193.4748.digitalmars-d-learn@puremagic.com...
>>> Currently, as far as I know, there are only two lexers and two parsers
>>> for
>>> D: the C++ front end which dmd, gdc, and ldc use and the D front end
>>> which
>>> ddmd uses and which is based on the C++ front end. Both of those are
>>> under
>>> the GPL (which makes them useless for a lot of stuff) and both of them
>>> are
>>> tied to compilers. Being able to lex D code and get the list of tokens
>>> in
>>> a D program and being able to parse D code and get the resultant
>>> abstract
>>> syntax tree would be very useful for a number of programs.
>> There is a third one: http://code.google.com/p/dil/. The main page says
>> that
>> the lexer and the parser are fully implemented for both D1 and D2. But
>> the
>> license is also the GPL.
>
> The nearly-done v0.4 of my Goldie parsing system (zlib/libpng license) comes with a mostly-complete lexing-only grammar for D2.
>
> http://www.dsource.org/projects/goldie/browser/trunk/lang/dlex.grm
>
> The limitations of it right now:
>
> - Doesn't do nested comments. That requires a feature (that's going to be introduced in the related tool GOLD Parsing System v4.2) that I haven't had a chance to add into Goldie just yet.
>

Note that this probably isn't a big of a problem as it sounds:

For one thing, it still recognizes "/+" and "+/" as tokens. It'll just try to lex everything in between too. And when Goldie is used to just lex, you still get the entire source lexed even if it has errors, and the lex-error tokens get included in the resulting token array. So it would be pretty easy to just call Goldie's lex function, and then step through the token array removing balanced /+ and +/ sections manually.

"Jonathan M Davis" <jmdavisProg@gmx.com> wrote in message news:mailman.2700.1300915109.4748.digitalmars-d-learn@puremagic.com... >> On 3/23/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote: >> > That would require a full-blown D lexer and parser. >> > >> > - Jonathan M Davis >> >> Isn't DDMD written in D? I'm not sure about how finished it is though. > > Yes, but the lexer and parser in ddmd are not only GPL (which would be a > problem for some stuff but not others - for something like Don's utility, > it > wouldn't be a problem), and more importantly, it is tied to the compiler > code. > It's not designed to be used by an arbitrary program. For that, you would > need > a lexer and parser which were designed with an API such that an arbitrary > D > program could use them. For instance, the lexer could produce a range of > tokens to be processed, and a program which wants to use the lexer can > then > process that range. > I don't know about the license issues, but I don't think the API is a big deal. I'm in the early stages of a DDMD-based project to compile D code down to Haxe, and all I really had to do was comment out the backend-related section at the end of main(), inject my AST-walking/processing functions into the AST classes (though, admittedly, there is 1.5 metric fuckton of these AST classes), and then add a little bit of code at the end of main() to launch my AST-traversal. The main() function could easily be converted to a non-main one. The only real difficultly is the fact that the AST isn't really documented, except for what little exists on one particular Wiki4D page (sorry, don't have the link ATM). Hmm, although, depending what you're doing with it, you may also want to hook DDMD's stdout/stderr output, or at least the error/warning functions.

Forums