View mode: basic / threaded / horizontal-split · Log in · Help
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On 2012-03-08 05:54, Jonathan M Davis wrote:
> On Thursday, March 08, 2012 03:12:48 Zach the Mystic wrote:
>> On Thursday, 8 March 2012 at 01:43:26 UTC, Daniel Murphy wrote:
>>> "Zach the Mystic"<reachMINUSTHISzachgmail@dot.com>  wrote in
>>> message
>>> news:afqmbmvuvizvgfooefqj@forum.dlang.org...
>>>
>>>> I'll gladly put a license on it if the leaders of the
>>>> community tell me which one to use ( Artistic, libpng, Boost ).
>>>>
>>>> Zach
>>>
>>> It will need to be the same license as the frontend
>>> (GPL/Artistic).  It
>>> should be at the top of each c++ source file.
>>
>> It looks like the license is going to have to be GPL because it
>> says so strictly in dmd's readme.txt. Somehow that license scares
>> me, though. The "Free Software Foundation" seems like a very
>> Orwellian institution to me. I hope it doesn't scare users away.
>
> If you took it from ddmd, then it's definitely going to have to be GPL.
>
> Now, there is interest in having a D parser and lexer in Phobos. I don't know
> if your version will fit the bill (e.g. it must have a range-based API), but we
> need one at some point. The original idea was to more or less directly port
> dmd's lexer and parser with some adjustments to the API as necessary
> (primarily to make it range-based). But no one has had the time to complete
> such a project yet (I originally volunteered to do it, but I just haven't had
> the time).
>
> When that project was proposed, Walter agreed to let that port be Boost rather
> than GPL (since he holds the copyright and the port would be going in Phobos,
> which uses boost).
>
> The problem with what you have (even if the API and implementation were
> perfect) is that it comes from ddmd, which had other contributors working on
> it. So, you would have to get permission from not only Walter but all of the
> relevant ddmd contributors. If you were able to _that_, and it could get
> passed the review process, then what you've done could be put into Phobos. But
> that requires that you take the time and effort to take care of getting the
> appropriate permissions, making sure that the API and implementation are
> acceptable for Phobos, and putting it through the Phobos review process. It
> would be great if you could do that though.
>
> - Jonathan M Davis

It would be nice if the frontend written in D (which ever it will be) 
could be used by DMD. Then there wouldn't be any problems of being out 
of sync.

-- 
/Jacob Carlborg
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On Thursday, March 08, 2012 09:11:03 Jacob Carlborg wrote:
> On 2012-03-08 05:54, Jonathan M Davis wrote:
> > On Thursday, March 08, 2012 03:12:48 Zach the Mystic wrote:
> >> On Thursday, 8 March 2012 at 01:43:26 UTC, Daniel Murphy wrote:
> >>> "Zach the Mystic"<reachMINUSTHISzachgmail@dot.com>  wrote in
> >>> message
> >>> news:afqmbmvuvizvgfooefqj@forum.dlang.org...
> >>> 
> >>>> I'll gladly put a license on it if the leaders of the
> >>>> community tell me which one to use ( Artistic, libpng, Boost ).
> >>>> 
> >>>> Zach
> >>> 
> >>> It will need to be the same license as the frontend
> >>> (GPL/Artistic).  It
> >>> should be at the top of each c++ source file.
> >> 
> >> It looks like the license is going to have to be GPL because it
> >> says so strictly in dmd's readme.txt. Somehow that license scares
> >> me, though. The "Free Software Foundation" seems like a very
> >> Orwellian institution to me. I hope it doesn't scare users away.
> > 
> > If you took it from ddmd, then it's definitely going to have to be GPL.
> > 
> > Now, there is interest in having a D parser and lexer in Phobos. I don't
> > know if your version will fit the bill (e.g. it must have a range-based
> > API), but we need one at some point. The original idea was to more or
> > less directly port dmd's lexer and parser with some adjustments to the
> > API as necessary (primarily to make it range-based). But no one has had
> > the time to complete such a project yet (I originally volunteered to do
> > it, but I just haven't had the time).
> > 
> > When that project was proposed, Walter agreed to let that port be Boost
> > rather than GPL (since he holds the copyright and the port would be going
> > in Phobos, which uses boost).
> > 
> > The problem with what you have (even if the API and implementation were
> > perfect) is that it comes from ddmd, which had other contributors working
> > on it. So, you would have to get permission from not only Walter but all
> > of the relevant ddmd contributors. If you were able to _that_, and it
> > could get passed the review process, then what you've done could be put
> > into Phobos. But that requires that you take the time and effort to take
> > care of getting the appropriate permissions, making sure that the API and
> > implementation are acceptable for Phobos, and putting it through the
> > Phobos review process. It would be great if you could do that though.
> > 
> > - Jonathan M Davis
> 
> It would be nice if the frontend written in D (which ever it will be)
> could be used by DMD. Then there wouldn't be any problems of being out
> of sync.

Well, having it be easier to keep in sync is one of the reasons that it was 
originally proposed to simply port the lexer in dmd's frontend to D. But 
having it _be_ the frontend of dmd would be even better. I don't know how 
interested Walter is or isn't in that, but having it be a port of the current 
frontend would likely make convincing him easier than it would be if it were 
one written from scratch.

- Jonathan M davis
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On Wednesday, 7 March 2012 at 20:02:57 UTC, Zach the Mystic wrote:
> https://github.com/zachthemystic/ddmd-clean/

By the way, in compilers, »code generation« is commonly used to 
refer to the generation of machine code; so using the term to 
refer to .di file generation/pretty-printing could be misleading 
for some (it was for me, at least^^).

David
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On 8 March 2012 22:05, David Nadlinger <see@klickverbot.at> wrote:
> On Wednesday, 7 March 2012 at 20:02:57 UTC, Zach the Mystic wrote:
>>
>> https://github.com/zachthemystic/ddmd-clean/
>

I would like to see the parser output an AST for use in other
situations. It would be nice to have a tool that can analyse the AST
and do thing with it (like autocompletion). Clang has done some pretty
cool things in this respect, to the point that it can practically do
code completion itself.

--
James Miller
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On Thursday, 8 March 2012 at 09:05:05 UTC, David Nadlinger wrote:
> On Wednesday, 7 March 2012 at 20:02:57 UTC, Zach the Mystic 
> wrote:
>> https://github.com/zachthemystic/ddmd-clean/
>
> By the way, in compilers, »code generation« is commonly used 
> to refer to the generation of machine code; so using the term 
> to refer to .di file generation/pretty-printing could be 
> misleading for some (it was for me, at least^^).
>
> David

I'm sorry. I see your point. I guess I'll call it 
pretty-printing, although it also occurred to me to call it 
"disparsing", which then humorously led to the notion 
"dyslexing". Wait, did I spell that right? :-)
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On Thursday, 8 March 2012 at 04:53:20 UTC, Zach the Mystic wrote:
> Anyway, the first thing I need is a gui, and a code generator 
> capable of coloring its output appropriately, so I'm working on 
> that, but it's not (even close to) ready for show yet!

By "Code Generator" I actually mean pretty-printer, a distinction 
I was recently made aware of.
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On Thursday, 8 March 2012 at 07:49:57 UTC, Jonathan M Davis wrote:
> The lexer is going to need to take a range of dchar (which may 
> or may not be an array),
> And while the lexer would need to operate on generic ranges of 
> dchar, it would probably have to be special-cased for strings 
> in a number of places

I know what you mean. I actually cut out ddmd's conversion stuff 
because I had glanced over phobos I saw plenty of functions 
designed for this! I must have intuited what you are saying. dmd 
does all conversion to char* prior to sending the buffer to the 
lexer. I doubt there's a reason to change this procedure, only to 
put that conversion code directly into module dmd.lexer instead.

> The parser would then take a range of  tokens and then output 
> the AST in some form  or other - it probably couldn't be  
> range, but I'm not sure.

Dmd's AST is pretty idiosyncratic.
Example: class FuncDeclaration (function declaration ) has a 
bunch of named members:
{
Identifier ident; // the function's name
Parameter[] parameters; // its parameters
Statement frequire; // the in{} contract, if present
Statement fbody; // function body
etc.

Each one has its own name. I actually was working on how to turn 
it into a more iterable format, since if you want to edit the AST 
directly you're going to need to cursor down or up to the element 
you want. It's actually doable, but it's not a natural range-ish 
format. That's where I'm confused about the licensing issues, 
since I'm not sure if the particular object structure which gets 
parsed is also going to be in phobos or if it must remain GPL, 
which I'm not sure I want to continue using.


> So, if you're not familiar with ranges, you probably have a 
> fair bit of
> learning ahead of you, and you're probably going to have to 
> make a number of
> changes to your lexer and parser (though the majority of it 
> will probably be
> able to stay intact). Unfortunately, a proper article and 
> tutorial on them is
> currently lacking in spite of the fact that Phobos uses them 
> heavily.
> Fortunately however, in a book that Ali Çehreli is writing on 
> D, he has a
> chapter on ranges that should help get you started:
>
> http://ddili.org/ders/d.en/ranges.html
>
> But I'd suggest that you play around with ranges a fair bit 
> (especially with
> strings) before trying to change what you have to use them. 
> std.algorithm in
> particular makes heavy use of ranges. And it wouldn't surprise 
> me at all if
> some portions of your lexer and parser really should be using 
> some of Phobos'
> functions but isn't currently, because it's originally a port 
> from C++. You
> should also make sure that you understand the basics of Unicode 
> fairly well -
> especially with how they pertain to char, wchar, and dchar - 
> since that will
> affect your ability to correctly translate code to use ranges 
> as well as
> properly optimize them.
>
> It would probably help if other D developers who are more 
> familiar with ranges
> took a look at what you have and maybe even helped you start 
> adjusting your
> code, but I don't know how many will both have the time and be 
> interested. If
> I have time, I'll probably start poking at it, but I don't know 
> that I'll have
> time any time soon, much as I'd like to.
>
> Regardless, you need to familiarize yourself with ranges if you 
> want to get
> the lexer and parser ready for inclusion in Phobos. And you 
> really should
> familiarize yourself with them anyway, since they're heavily 
> used in D code in
> general. Not being able to use ranges in D would be like not 
> being able to use
> iterators in C++. You can program in it, but you'd be fairly 
> crippled -
> particularly when dealing with the standard library.
>
> - Jonathan M Davis
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
I hit "send" before I was done writing!

On Thursday, 8 March 2012 at 07:49:57 UTC, Jonathan M Davis wrote:
> Fortunately however, in a book that Ali Çehreli is writing on 
> D, he has a
> chapter on ranges that should help get you started:
>
> http://ddili.org/ders/d.en/ranges.html

Thanks. This is a really helpful guide.

Okay I'm done!
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On 08.03.2012 11:48, Jonathan M Davis wrote:
> On Thursday, March 08, 2012 08:21:17 Zach the Mystic wrote:
>> On Thursday, 8 March 2012 at 04:56:07 UTC, Jonathan M Davis wrote:
>>> If you took it from ddmd, then it's definitely going to have to
>>> be GPL.
>>>
>>> Now, there is interest in having a D parser and lexer in
>>> Phobos. I don't know
>>> if your version will fit the bill (e.g. it must have a
>>> range-based API), but we
>>> need one at some point. The original idea was to more or less
>>> directly port
>>> dmd's lexer and parser with some adjustments to the API as
>>> necessary
>>> (primarily to make it range-based). But no one has had the time
>>> to complete
>>> such a project yet (I originally volunteered to do it, but I
>>> just haven't had
>>> the time).
>>>
>>> When that project was proposed, Walter agreed to let that port
>>> be Boost rather
>>> than GPL (since he holds the copyright and the port would be
>>> going in Phobos,
>>> which uses boost).
>>>
>>> The problem with what you have (even if the API and
>>> implementation were
>>> perfect) is that it comes from ddmd, which had other
>>> contributors working on
>>> it. So, you would have to get permission from not only Walter
>>> but all of the
>>> relevant ddmd contributors. If you were able to _that_, and it
>>> could get
>>> passed the review process, then what you've done could be put
>>> into Phobos. But
>>> that requires that you take the time and effort to take care of
>>> getting the
>>> appropriate permissions, making sure that the API and
>>> implementation are
>>> acceptable for Phobos, and putting it through the Phobos review
>>> process. It
>>> would be great if you could do that though.
>>>
>>> - Jonathan M Davis
>>
>> This is great news. I was really worried that the license was
>> etched in stone. I'll need help finding out who owns the code,
>> plus legal advice if the process is more than just getting a
>> simple confirmation email from each of the original authors.
>>
>> I have some comments I feel are very interesting regarding the
>> lexer and pointers. There are no pointers in any of the code
>> besides the lexer, so I think that will be very satisfying to
>> you. Now I don't know everything about ranges, but if you simply
>> mean dynamic arrays, then yes, everything except the lexer uses
>> arrays when necessary, although there's simply a lot of code
>> which doesn't need them because most of the objects are really
>> just lists of members, many of which are not arrays.
>>
>> About the lexer, one thing I realized about the Wild-West pointer
>> style as I was porting it is that it must be blazing fast. To my
>> understanding, to call p.popFront() requires two operations, ++p;
>> followed by --p.length; plus possibly array bounds checking, I
>> don't know.
>>
>> ++p is all that the current lexer needs. It used to only check
>> for EOF at each junction, but since I'm parsing little chunks of
>> code instead of whole files now, it checks "if ( p>= endBuf )"
>> at the beginning of each token scan, which gets pretty close to
>> not going out of bounds, since most tokens aren't very long. That
>> lexer is a tribute to very fast programming of an old school
>> which will go away if it changes. Still, I can sense a tidal wave
>> of RANGES coming, and I fear I'll just have to bid the little
>> thing goodbye! :-(
>
> A range is not necessarily a dynamic array, though a dynamic array is a range.
> The lexer is going to need to take  a range of dchar (which may or may not be
> an array), and it's probably going  to need to return a range of tokens. The
> parser would then take a range of  tokens and then output the AST in some form
> or other - it probably couldn't be  range, but I'm not sure. And while the
> lexer would need to operate on generic ranges of dchar, it would probably have
> to be special-cased for strings in a number of places in order to make it
> faster (e.g. checking the first char in a string rather than using front when
> it's known that the value being checked against is an ASCII character and will
> therefore fit in a single char - front has to decode the next character, which
> is less efficient).

Simply put, the decisison on decoding should belong to lexer. Thus 
strings should be wrapped as input range of char, wchar & dchar 
respectively.

>
> So, if you're not familiar with ranges, you probably have a fair bit of
> learning ahead of you, and you're probably going to have to make a number of
> changes to your lexer and parser (though the majority of it will probably be
> able to stay intact). Unfortunately, a proper article and tutorial on them is
> currently lacking in spite of the fact that Phobos uses them heavily.
> Fortunately however, in a book that Ali Çehreli is writing on D, he has a
> chapter on ranges that should help get you started:
>
> http://ddili.org/ders/d.en/ranges.html
>
> But I'd suggest that you play around with ranges a fair bit (especially with
> strings) before trying to change what you have to use them. std.algorithm in
> particular makes heavy use of ranges. And it wouldn't surprise me at all if
> some portions of your lexer and parser really should be using some of Phobos'
> functions but isn't currently, because it's originally a port from C++. You
> should also make sure that you understand the basics of Unicode fairly well -
> especially with how they pertain to char, wchar, and dchar - since that will
> affect your ability to correctly translate code to use ranges as well as
> properly optimize them.
>
> It would probably help if other D developers who are more familiar with ranges
> took a look at what you have and maybe even helped you start adjusting your
> code, but I don't know how many will both have the time and be interested. If
> I have time, I'll probably start poking at it, but I don't know that I'll have
> time any time soon, much as I'd like to.
>
> Regardless, you need to familiarize yourself with ranges if you want to get
> the lexer and parser ready for inclusion in Phobos. And you really should
> familiarize yourself with them anyway, since they're heavily used in D code in
> general. Not being able to use ranges in D would be like not being able to use
> iterators in C++. You can program in it, but you'd be fairly crippled -
> particularly when dealing with the standard library.
>
> - Jonathan M Davis


-- 
Dmitry Olshansky
March 08, 2012
Re: D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
On Thursday, March 08, 2012 22:03:12 Dmitry Olshansky wrote:
> On 08.03.2012 11:48, Jonathan M Davis wrote:
> > A range is not necessarily a dynamic array, though a dynamic array is a
> > range. The lexer is going to need to take a range of dchar (which may or
> > may not be an array), and it's probably going to need to return a range
> > of tokens. The parser would then take a range of tokens and then output
> > the AST in some form or other - it probably couldn't be range, but I'm
> > not sure. And while the lexer would need to operate on generic ranges of
> > dchar, it would probably have to be special-cased for strings in a number
> > of places in order to make it faster (e.g. checking the first char in a
> > string rather than using front when it's known that the value being
> > checked against is an ASCII character and will therefore fit in a single
> > char - front has to decode the next character, which is less efficient).
> 
> Simply put, the decisison on decoding should belong to lexer. Thus
> strings should be wrapped as input range of char, wchar & dchar
> respectively.

??? The normal way to handle this is to simply special-case certain 
operations. e.g.

static if(Unqual!(isElementEncodingType!R) == char)
{ ... }
else
{ ... }

I'm not sure that wrapping char and wchar arrays in structs that treat them as 
ranges of char or wchar is a good idea. At minimum, I'm not aware of anything 
in Phobos currently working that way (unless you did something like that in 
std.regex?). Everything either treats them as generic ranges of dchar or 
special cases them. And when you want to be most efficient with string 
processing, I would think that you'd want to treat them exactly as the arrays 
of code units that they are rather than ranges - in which case treating them 
as generic ranges of dchar in most places and then special casing certain 
sections of code which can take advantage of the fact that they're arrays of 
code units seems like the way to go. The lexer is then choosing when something 
decodes, though the default is to decode, since it requires special-casing to 
avoid it.

- Jonathan M Davis
1 2 3 4 5
Top | Discussion index | About this forum | D home