D port of dmd: Lexer, Parser, AND CodeGenerator fully operational (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » D port of dmd: Lexer, Parser, AND CodeGenerator fully operational (page 3)

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Jonathan M Davis
in reply to Zach the Mystic

Jonathan M Davis

Posted in reply to Zach the Mystic

On Thursday, March 08, 2012 03:12:48 Zach the Mystic wrote:
> On Thursday, 8 March 2012 at 01:43:26 UTC, Daniel Murphy wrote:
> > "Zach the Mystic" <reachMINUSTHISzachgmail@dot.com> wrote in
> > message
> > news:afqmbmvuvizvgfooefqj@forum.dlang.org...
> > 
> >> I'll gladly put a license on it if the leaders of the community tell me which one to use ( Artistic, libpng, Boost ).
> >> 
> >> Zach
> > 
> > It will need to be the same license as the frontend
> > (GPL/Artistic).  It
> > should be at the top of each c++ source file.
> 
> It looks like the license is going to have to be GPL because it says so strictly in dmd's readme.txt. Somehow that license scares me, though. The "Free Software Foundation" seems like a very Orwellian institution to me. I hope it doesn't scare users away.

If you took it from ddmd, then it's definitely going to have to be GPL.

Now, there is interest in having a D parser and lexer in Phobos. I don't know if your version will fit the bill (e.g. it must have a range-based API), but we need one at some point. The original idea was to more or less directly port dmd's lexer and parser with some adjustments to the API as necessary (primarily to make it range-based). But no one has had the time to complete such a project yet (I originally volunteered to do it, but I just haven't had the time).

When that project was proposed, Walter agreed to let that port be Boost rather than GPL (since he holds the copyright and the port would be going in Phobos, which uses boost).

The problem with what you have (even if the API and implementation were perfect) is that it comes from ddmd, which had other contributors working on it. So, you would have to get permission from not only Walter but all of the relevant ddmd contributors. If you were able to _that_, and it could get passed the review process, then what you've done could be put into Phobos. But that requires that you take the time and effort to take care of getting the appropriate permissions, making sure that the API and implementation are acceptable for Phobos, and putting it through the Phobos review process. It would be great if you could do that though.

- Jonathan M Davis

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Daniel Murphy
in reply to Zach the Mystic

Daniel Murphy

Posted in reply to Zach the Mystic

"Zach the Mystic" <reachMINUSTHISzachgmail@dot.com> wrote in message news:duefgfqidzxwcfvgefac@forum.dlang.org...
> On Thursday, 8 March 2012 at 01:38:43 UTC, Daniel Murphy wrote:
>>
>> You should check, but I think isBit is dead code anyway.
>
> I think it is. I've left a number of dead codes because it helped me understand the whole system better. There's a lot of isXXX() functions which come in handy, so maybe isBit will too. I have no reason to cut it until I fully understand that it's not going to be useful.

It's a relic from when 'bit' was a basic type in D.  I removed a bunch of bit-related stuff about six months ago but missed this one.

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by dolive
in reply to Zach the Mystic

dolive

Posted in reply to Zach the Mystic

Zach the Mystic Wrote:

> Check it out:
> https://github.com/zachthemystic/ddmd-clean/
> 
> This program is an adaptation of the work done by the ddmd team: http://www.dsource.org/projects/ddmd
> 
> I described most of it in the README. I hope it runs smoothly for you. I only ran it on MAC OSX, and I don't know much about github or about how to get things running smoothly for others.
> 
> Don't expect miracles. The parser is NOT up to date, e.g. => with the lastest lambda syntax.
> 
> I'll gladly put a license on it if the leaders of the community tell me which one to use ( Artistic, libpng, Boost ).
> 
> Onward and upward to IDE functionality!
> 
> Zach


Great work !


Ask a few questions :
Is there documentation?  can it parse dmd c source file ? ( Auto- complete feature of the ide will involve dmd c source file ? )

thank's

Dolive

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Zach the Mystic
in reply to Daniel Murphy

Zach the Mystic

Posted in reply to Daniel Murphy

On Thursday, 8 March 2012 at 05:05:46 UTC, Daniel Murphy wrote:
> "Zach the Mystic" <reachMINUSTHISzachgmail@dot.com> wrote in message
> news:duefgfqidzxwcfvgefac@forum.dlang.org...
>> On Thursday, 8 March 2012 at 01:38:43 UTC, Daniel Murphy wrote:
>>>
>>> You should check, but I think isBit is dead code anyway.
>>
>> I think it is. I've left a number of dead codes because it helped me understand the whole system better. There's a lot of isXXX() functions which come in handy, so maybe isBit will too. I have no reason to cut it until I fully understand that it's not going to be useful.
>
> It's a relic from when 'bit' was a basic type in D.  I removed a bunch of
> bit-related stuff about six months ago but missed this one.

Okay, great. Now I DO fully understand that it's not going to be useful. Thanks for the info I couldn't possibly have known without consulting someone whose been here much longer than I!

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Zach the Mystic
in reply to Jonathan M Davis

Zach the Mystic

Posted in reply to Jonathan M Davis

On Thursday, 8 March 2012 at 04:56:07 UTC, Jonathan M Davis wrote:
> If you took it from ddmd, then it's definitely going to have to be GPL.
>
> Now, there is interest in having a D parser and lexer in Phobos. I don't know
> if your version will fit the bill (e.g. it must have a range-based API), but we
> need one at some point. The original idea was to more or less directly port
> dmd's lexer and parser with some adjustments to the API as necessary
> (primarily to make it range-based). But no one has had the time to complete
> such a project yet (I originally volunteered to do it, but I just haven't had
> the time).
>
> When that project was proposed, Walter agreed to let that port be Boost rather
> than GPL (since he holds the copyright and the port would be going in Phobos,
> which uses boost).
>
> The problem with what you have (even if the API and implementation were
> perfect) is that it comes from ddmd, which had other contributors working on
> it. So, you would have to get permission from not only Walter but all of the
> relevant ddmd contributors. If you were able to _that_, and it could get
> passed the review process, then what you've done could be put into Phobos. But
> that requires that you take the time and effort to take care of getting the
> appropriate permissions, making sure that the API and implementation are
> acceptable for Phobos, and putting it through the Phobos review process. It
> would be great if you could do that though.
>
> - Jonathan M Davis

This is great news. I was really worried that the license was etched in stone. I'll need help finding out who owns the code, plus legal advice if the process is more than just getting a simple confirmation email from each of the original authors.

I have some comments I feel are very interesting regarding the lexer and pointers. There are no pointers in any of the code besides the lexer, so I think that will be very satisfying to you. Now I don't know everything about ranges, but if you simply mean dynamic arrays, then yes, everything except the lexer uses arrays when necessary, although there's simply a lot of code which doesn't need them because most of the objects are really just lists of members, many of which are not arrays.

About the lexer, one thing I realized about the Wild-West pointer style as I was porting it is that it must be blazing fast. To my understanding, to call p.popFront() requires two operations, ++p; followed by --p.length; plus possibly array bounds checking, I don't know.

++p is all that the current lexer needs. It used to only check for EOF at each junction, but since I'm parsing little chunks of code instead of whole files now, it checks "if ( p >= endBuf )" at the beginning of each token scan, which gets pretty close to not going out of bounds, since most tokens aren't very long. That lexer is a tribute to very fast programming of an old school which will go away if it changes. Still, I can sense a tidal wave of RANGES coming, and I fear I'll just have to bid the little thing goodbye! :-(

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Zach the Mystic
in reply to dolive

Zach the Mystic

Posted in reply to dolive

On Thursday, 8 March 2012 at 05:41:46 UTC, dolive wrote:
> Great work !
>
> Ask a few questions :
> Is there documentation?  can it parse dmd c source file ? ( Auto- complete feature of the ide will involve dmd c source file ? )
>
> thank's
>
> Dolive

No documentation. Even the API is inconsistent between similar parsing calls. The only one called from outside originally was p.parseModule(), and that was after the buffer had been primed with detectByteOrderMarkandConvertToAscii (still not implemented), and #!ignoreFirstlineHashbang. Another downside is that you currently have to prime each buffer you load with p.nextToken() before calling p.parseXXX() to return what you want. Most of the parsing calls are made internally by the parser itself. It's going to take a little more effort to worked out a reasonable API for it, but I suppose that will also be the ideal time to document it.

As far as parsing C source files, I think you'll need a C compiler for that!

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Zach the Mystic
in reply to Zach the Mystic

Zach the Mystic

Posted in reply to Zach the Mystic

On Thursday, 8 March 2012 at 07:21:19 UTC, Zach the Mystic wrote:
> On Thursday, 8 March 2012 at 04:56:07 UTC, Jonathan M Davis wrote:
>> If you took it from ddmd, then it's definitely going to have to be GPL.
>>
>> Now, there is interest in having a D parser and lexer in Phobos. I don't know
>> if your version will fit the bill (e.g. it must have a range-based API), but we
>> need one at some point. The original idea was to more or less directly port
>> dmd's lexer and parser with some adjustments to the API as necessary
>> (primarily to make it range-based). But no one has had the time to complete
>> such a project yet (I originally volunteered to do it, but I just haven't had
>> the time).

I have another question. The parser need to parse the tokens into some kind of objects, which I presume will be the current dmd object structure? Many of the class members are only of value to semantic and can be cut. But as much value as a parser is, you need just as much work designing the structures of the objects it creates, which is mostly done, by the way, it's just I'm not sure what exactly you're trying to put into phobos.

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Jonathan M Davis
in reply to Zach the Mystic

Jonathan M Davis

Posted in reply to Zach the Mystic

On Thursday, March 08, 2012 08:21:17 Zach the Mystic wrote:
> On Thursday, 8 March 2012 at 04:56:07 UTC, Jonathan M Davis wrote:
> > If you took it from ddmd, then it's definitely going to have to be GPL.
> > 
> > Now, there is interest in having a D parser and lexer in
> > Phobos. I don't know
> > if your version will fit the bill (e.g. it must have a
> > range-based API), but we
> > need one at some point. The original idea was to more or less
> > directly port
> > dmd's lexer and parser with some adjustments to the API as
> > necessary
> > (primarily to make it range-based). But no one has had the time
> > to complete
> > such a project yet (I originally volunteered to do it, but I
> > just haven't had
> > the time).
> > 
> > When that project was proposed, Walter agreed to let that port
> > be Boost rather
> > than GPL (since he holds the copyright and the port would be
> > going in Phobos,
> > which uses boost).
> > 
> > The problem with what you have (even if the API and
> > implementation were
> > perfect) is that it comes from ddmd, which had other
> > contributors working on
> > it. So, you would have to get permission from not only Walter
> > but all of the
> > relevant ddmd contributors. If you were able to _that_, and it
> > could get
> > passed the review process, then what you've done could be put
> > into Phobos. But
> > that requires that you take the time and effort to take care of
> > getting the
> > appropriate permissions, making sure that the API and
> > implementation are
> > acceptable for Phobos, and putting it through the Phobos review
> > process. It
> > would be great if you could do that though.
> > 
> > - Jonathan M Davis
> 
> This is great news. I was really worried that the license was etched in stone. I'll need help finding out who owns the code, plus legal advice if the process is more than just getting a simple confirmation email from each of the original authors.
> 
> I have some comments I feel are very interesting regarding the lexer and pointers. There are no pointers in any of the code besides the lexer, so I think that will be very satisfying to you. Now I don't know everything about ranges, but if you simply mean dynamic arrays, then yes, everything except the lexer uses arrays when necessary, although there's simply a lot of code which doesn't need them because most of the objects are really just lists of members, many of which are not arrays.
> 
> About the lexer, one thing I realized about the Wild-West pointer style as I was porting it is that it must be blazing fast. To my understanding, to call p.popFront() requires two operations, ++p; followed by --p.length; plus possibly array bounds checking, I don't know.
> 
> ++p is all that the current lexer needs. It used to only check for EOF at each junction, but since I'm parsing little chunks of code instead of whole files now, it checks "if ( p >= endBuf )" at the beginning of each token scan, which gets pretty close to not going out of bounds, since most tokens aren't very long. That lexer is a tribute to very fast programming of an old school which will go away if it changes. Still, I can sense a tidal wave of RANGES coming, and I fear I'll just have to bid the little thing goodbye! :-(

A range is not necessarily a dynamic array, though a dynamic array is a range. The lexer is going to need to take  a range of dchar (which may or may not be an array), and it's probably going  to need to return a range of tokens. The parser would then take a range of  tokens and then output the AST in some form or other - it probably couldn't be  range, but I'm not sure. And while the lexer would need to operate on generic ranges of dchar, it would probably have to be special-cased for strings in a number of places in order to make it faster (e.g. checking the first char in a string rather than using front when it's known that the value being checked against is an ASCII character and will therefore fit in a single char - front has to decode the next character, which is less efficient).

So, if you're not familiar with ranges, you probably have a fair bit of learning ahead of you, and you're probably going to have to make a number of changes to your lexer and parser (though the majority of it will probably be able to stay intact). Unfortunately, a proper article and tutorial on them is currently lacking in spite of the fact that Phobos uses them heavily. Fortunately however, in a book that Ali Çehreli is writing on D, he has a chapter on ranges that should help get you started:

http://ddili.org/ders/d.en/ranges.html

But I'd suggest that you play around with ranges a fair bit (especially with strings) before trying to change what you have to use them. std.algorithm in particular makes heavy use of ranges. And it wouldn't surprise me at all if some portions of your lexer and parser really should be using some of Phobos' functions but isn't currently, because it's originally a port from C++. You should also make sure that you understand the basics of Unicode fairly well - especially with how they pertain to char, wchar, and dchar - since that will affect your ability to correctly translate code to use ranges as well as properly optimize them.

It would probably help if other D developers who are more familiar with ranges took a look at what you have and maybe even helped you start adjusting your code, but I don't know how many will both have the time and be interested. If I have time, I'll probably start poking at it, but I don't know that I'll have time any time soon, much as I'd like to.

Regardless, you need to familiarize yourself with ranges if you want to get the lexer and parser ready for inclusion in Phobos. And you really should familiarize yourself with them anyway, since they're heavily used in D code in general. Not being able to use ranges in D would be like not being able to use iterators in C++. You can program in it, but you'd be fairly crippled - particularly when dealing with the standard library.

- Jonathan M Davis

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Jonathan M Davis
in reply to Zach the Mystic

Jonathan M Davis

Posted in reply to Zach the Mystic

On Thursday, March 08, 2012 08:45:13 Zach the Mystic wrote:
> On Thursday, 8 March 2012 at 07:21:19 UTC, Zach the Mystic wrote:
> > On Thursday, 8 March 2012 at 04:56:07 UTC, Jonathan M Davis
> > 
> > wrote:
> >> If you took it from ddmd, then it's definitely going to have to be GPL.
> >> 
> >> Now, there is interest in having a D parser and lexer in
> >> Phobos. I don't know
> >> if your version will fit the bill (e.g. it must have a
> >> range-based API), but we
> >> need one at some point. The original idea was to more or less
> >> directly port
> >> dmd's lexer and parser with some adjustments to the API as
> >> necessary
> >> (primarily to make it range-based). But no one has had the
> >> time to complete
> >> such a project yet (I originally volunteered to do it, but I
> >> just haven't had
> >> the time).
> 
> I have another question. The parser need to parse the tokens into some kind of objects, which I presume will be the current dmd object structure? Many of the class members are only of value to semantic and can be cut. But as much value as a parser is, you need just as much work designing the structures of the objects it creates, which is mostly done, by the way, it's just I'm not sure what exactly you're trying to put into phobos.

Well, the first concern is the lexer. As far as getting stuff into Phobos goes, I'd advise concentrating on getting the lexer ready rather than necessarily trying to do it all at once - _especially_ since the lexer needs to be usable separately from the parser.

As for what the parser should output, it's going to need to be the Abstract Syntax Tree. What exactly that looks like in dmd, I don't know. But we're pretty much going to want all of that information outputted by the parser. Ideally, it would be possible to build a compiler on top of the lexer and parser in Phobos. But I can't really give you specifics without studying the code in detail.

- Jonathan M Davis

March 08, 2012

Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational

Posted by Jacob Carlborg
in reply to Zach the Mystic

Jacob Carlborg

Posted in reply to Zach the Mystic

On 2012-03-07 21:02, Zach the Mystic wrote:
> Check it out:
> https://github.com/zachthemystic/ddmd-clean/
>
> This program is an adaptation of the work done by the ddmd team:
> http://www.dsource.org/projects/ddmd
>
> I described most of it in the README. I hope it runs smoothly for you. I
> only ran it on MAC OSX, and I don't know much about github or about how
> to get things running smoothly for others.
>
> Don't expect miracles. The parser is NOT up to date, e.g. => with the
> lastest lambda syntax.
>
> I'll gladly put a license on it if the leaders of the community tell me
> which one to use ( Artistic, libpng, Boost ).
>
> Onward and upward to IDE functionality!
>
> Zach

This looks awesome.

-- 
/Jacob Carlborg

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation