August 01, 2011
On 7/31/2011 9:29 PM, Johann MacDonagh wrote:
> Anyway, Jim, if you want to do this I can move on to something else. If
> you want, I can continue on. I didn't see a branch in your repo so I'm
> not sure what you've done.

Derp. I meant to say "Jonathan".
August 01, 2011
On Sunday 31 July 2011 21:29:31 Johann MacDonagh wrote:
> On 7/31/2011 5:57 AM, Jacob Carlborg wrote:
> >> * Lexing and parsing:
> >> 
> >> Standard facilities for these tasks could be very useful. Perhaps D could get its own dlex and dyacc or some such tools. Personally, I prefer sticking to LL(1), but LALR is generally more convenient and flexible, and thus I'd suggest something YACC/ANTLR-like.
> >> 
> >> (I know this doesn't have much to do with Phobos per se, but I figured
> >> I'd mention it.)
> > 
> > I think someone is working on this.
> 
> I've started on a port of DMD's lexer (not really a port ;) ):
> 
> https://github.com/jmacdonagh/phobos/compare/master...std.lang.d.lexer
> 
> Basically, you give it some string (string, wstring, or dstring), and it gives you a range of tokens back. The token has the type, a slice of the input that corresponds to the token, line / column, and a value (e.g. an integer constant).
> 
> Some features I'm planning:
> 
> 1. Support D1 and D2.
> 2. Warnings and errors returned in the tokens. For example, if you use
> an octal constant for D2 code, it will correctly return an integer
> constant token with some kind of warning flag set and a message. In
> terms of errors, if the lexer hits "0xz012", it will return an error
> token for the slice "0xz" and then start lexing an integer constant
> "012". No exceptions, easy peasy.
> 3. CTFEable. Although I'll probably have to wait till the next DMD release.
> 4. Support any kind of character range. Not sure if people want to lex
> something that's not a string/wstring/dstring.
> 
> I'm glad this was brought up. I remember Walter's post last year asking for this module, but the conversation seemed to kill the idea. I started on this just for the fun of it, but then doubted whether Phobos wanted it. I feel that a hand written lexer / parser is going to be faster than something generated, but maybe I'm old fashioned.
> 
> Anyway, Jim, if you want to do this I can move on to something else. If you want, I can continue on. I didn't see a branch in your repo so I'm not sure what you've done.

If we do a hand-written lexer of D for Phobos, it really should be a fairly direct port of the dmd front-end. It should be _somewhat_ D-ified as appropriate, (and the API should definitely be properly range-based and all that), but the implementation needs to be fairly close to dmd itself so that it's easy for someone to port changes and fixes back and forth between the two. Otherwise, they're going to get out of sync fairly easily. If we're not going to do a direct port, then we might as well just do the template-based lexer generator that Andrei and others would really like to see (which we should still do, but I think that the hand-written lexer is nowhere near as valuable if it's not a direct port of dmd's lexer).

Also, I see _zero_ value in making it support D1. If it's for D2's standard library, then what's the point of it lexing D1? That just complicates the lexer for what is essentially a legacy product. And given that the differences between D1 and D2 in dmd's lexer are covered with #ifdefs, it would be rather complicated to try and do a direct port which covers both D1 and D2. It would probably be easier if the two were completely separate.

As for what I've done so far, I'd have to go look. I haven't touched it in a couple of months, I expect. There has been a lot of other stuff that I've needed to do, and Andrei was trying to discourage such an implementation the last time that I brought it up. So, I haven't exactly been in a rush to get it done. I'd like to do it, but I've been rather busy.

So, if you really want to work on a potential D lexer for Phobos, that's fine, but I really think that it needs to be a rather direct port, and that doesn't sound like what you've been doing.

- Jonathan M Davis
August 01, 2011
Are you sure it's LGPL? There's two licenses AFAIK, GPL and Artistic. I'm not seeing any mention of LGPL in the code.
August 01, 2011
On Monday 01 August 2011 04:03:11 Andrej Mitrovic wrote:
> Are you sure it's LGPL? There's two licenses AFAIK, GPL and Artistic. I'm not seeing any mention of LGPL in the code.

It could be GPL. I don't remember whether it's GPL or LGPL, but it doesn't really matter as far as Phobos goes. It needs to be Boost for Phobos, and Walter gave his permission to port dmd's lexer over to D with the Boost license, and no such permission has been given from the ddmd guys. Now, I haven't asked for it yet either. It hasn't been high enough on my priority list to deal with that yet, and I want to do the majority of the port myself to ensure that I understand it. Looking at ddmd and possibly copying some of what it did would be helpful, so I'd like to get the permission thing sorted out with ddmd if possible. I just haven't taken the time to do it.

Regardless, the point is that dmd and ddmd are not under Boost - regardless of whether they're under GPL or LGPL - and without  permission from the appropriate copyright holders, a port can't be moved to Boost.

- Jonathan M Davis
August 01, 2011
Licenses are the greatest code smell.
August 01, 2011
On 7/31/2011 9:56 PM, Jonathan M Davis wrote:
> If we do a hand-written lexer of D for Phobos, it really should be a fairly
> direct port of the dmd front-end. It should be _somewhat_ D-ified as
> appropriate, (and the API should definitely be properly range-based and all
> that), but the implementation needs to be fairly close to dmd itself so that
> it's easy for someone to port changes and fixes back and forth between the two.
> Otherwise, they're going to get out of sync fairly easily. If we're not going
> to do a direct port, then we might as well just do the template-based lexer
> generator that Andrei and others would really like to see (which we should
> still do, but I think that the hand-written lexer is nowhere near as valuable
> if it's not a direct port of dmd's lexer).

Yeah, I get the point, but I feel that I could port most trivial changes from DMD's lexer to lexer.d. The layout of the code is different, of course, but I'm borrowing most of the lexing logic from DMD. Plus with unittests, I think we could ensure things are lexing the same.

> Also, I see _zero_ value in making it support D1. If it's for D2's standard
> library, then what's the point of it lexing D1? That just complicates the
> lexer for what is essentially a legacy product. And given that the differences
> between D1 and D2 in dmd's lexer are covered with #ifdefs, it would be rather
> complicated to try and do a direct port which covers both D1 and D2. It would
> probably be easier if the two were completely separate.

Well, it doesn't appear that there are that many lexical differences between D1 and D2. A few operators, a few keywords, D2 supports a few different string constants, etc...

I wanted this to be usable by IDEs. Do we not want to support D2 for development? I could certainly make this D2 only.

> As for what I've done so far, I'd have to go look. I haven't touched it in a
> couple of months, I expect. There has been a lot of other stuff that I've
> needed to do, and Andrei was trying to discourage such an implementation the
> last time that I brought it up. So, I haven't exactly been in a rush to get it
> done. I'd like to do it, but I've been rather busy.

Well, if Andrei wants to flex the power of D2's templates / etc... with a parser generator then maybe we should go down that route. Andrei, what do you think? Would the lexer/parser be generated at compile time or a a regular tool that would generate the appropriate D files?

> So, if you really want to work on a potential D lexer for Phobos, that's fine,
> but I really think that it needs to be a rather direct port, and that doesn't
> sound like what you've been doing.
>
> - Jonathan M Davis

If we decide to scrap my idea and go a more generic route, I'd probably start working on a standard database interface. I'd like to see D become a little more web framework friendly, and a nice generic database interface is definitely a start.

So, Andrei, what do you think about a lexer / parser in Phobos? Generic or a straight port?
August 01, 2011
On Sunday 31 July 2011 22:28:51 Johann MacDonagh wrote:
> On 7/31/2011 9:56 PM, Jonathan M Davis wrote:
> > If we do a hand-written lexer of D for Phobos, it really should be a fairly direct port of the dmd front-end. It should be _somewhat_ D-ified as appropriate, (and the API should definitely be properly range-based and all that), but the implementation needs to be fairly close to dmd itself so that it's easy for someone to port changes and fixes back and forth between the two. Otherwise, they're going to get out of sync fairly easily. If we're not going to do a direct port, then we might as well just do the template-based lexer generator that Andrei and others would really like to see (which we should still do, but I think that the hand-written lexer is nowhere near as valuable if it's not a direct port of dmd's lexer).
> 
> Yeah, I get the point, but I feel that I could port most trivial changes from DMD's lexer to lexer.d. The layout of the code is different, of course, but I'm borrowing most of the lexing logic from DMD. Plus with unittests, I think we could ensure things are lexing the same.
> 
> > Also, I see _zero_ value in making it support D1. If it's for D2's standard library, then what's the point of it lexing D1? That just complicates the lexer for what is essentially a legacy product. And given that the differences between D1 and D2 in dmd's lexer are covered with #ifdefs, it would be rather complicated to try and do a direct port which covers both D1 and D2. It would probably be easier if the two were completely separate.
> 
> Well, it doesn't appear that there are that many lexical differences between D1 and D2. A few operators, a few keywords, D2 supports a few different string constants, etc...
> 
> I wanted this to be usable by IDEs. Do we not want to support D2 for development? I could certainly make this D2 only.
> 
> > As for what I've done so far, I'd have to go look. I haven't touched it in a couple of months, I expect. There has been a lot of other stuff that I've needed to do, and Andrei was trying to discourage such an implementation the last time that I brought it up. So, I haven't exactly been in a rush to get it done. I'd like to do it, but I've been rather busy.
> 
> Well, if Andrei wants to flex the power of D2's templates / etc... with a parser generator then maybe we should go down that route. Andrei, what do you think? Would the lexer/parser be generated at compile time or a a regular tool that would generate the appropriate D files?
> 
> > So, if you really want to work on a potential D lexer for Phobos, that's fine, but I really think that it needs to be a rather direct port, and that doesn't sound like what you've been doing.
> > 
> > - Jonathan M Davis
> 
> If we decide to scrap my idea and go a more generic route, I'd probably start working on a standard database interface. I'd like to see D become a little more web framework friendly, and a nice generic database interface is definitely a start.
> 
> So, Andrei, what do you think about a lexer / parser in Phobos? Generic or a straight port?

I can tell you right now that he wants a generic, template-based lexer / parser generator rather than a hand-written solution. He doesn't like the idea of the hand-written solution. And I definitely think that we should have a template-based lexer / parser generator. The question is whether it's also worth having a hand-written port of the dmd front-end's lexer and parser to Phobos so that we can have on official lexer and parser in Phobos which parse D the same way that the compiler does.

- Jonathan M Davis
August 01, 2011
On 7/31/2011 10:34 PM, Jonathan M Davis wrote:
> I can tell you right now that he wants a generic, template-based lexer /
> parser generator rather than a hand-written solution. He doesn't like the idea
> of the hand-written solution. And I definitely think that we should have a
> template-based lexer / parser generator. The question is whether it's also
> worth having a hand-written port of the dmd front-end's lexer and parser to
> Phobos so that we can have on official lexer and parser in Phobos which parse D
> the same way that the compiler does.
>
> - Jonathan M Davis

I see. In that case, my hybrid solution doesn't solve either problem ;)

I'll keep my code in that branch and start on something new.
August 01, 2011
On Sunday 31 July 2011 22:28:51 Johann MacDonagh wrote:
> On 7/31/2011 9:56 PM, Jonathan M Davis wrote:
> > If we do a hand-written lexer of D for Phobos, it really should be a fairly direct port of the dmd front-end. It should be _somewhat_ D-ified as appropriate, (and the API should definitely be properly range-based and all that), but the implementation needs to be fairly close to dmd itself so that it's easy for someone to port changes and fixes back and forth between the two. Otherwise, they're going to get out of sync fairly easily. If we're not going to do a direct port, then we might as well just do the template-based lexer generator that Andrei and others would really like to see (which we should still do, but I think that the hand-written lexer is nowhere near as valuable if it's not a direct port of dmd's lexer).
> 
> Yeah, I get the point, but I feel that I could port most trivial changes from DMD's lexer to lexer.d. The layout of the code is different, of course, but I'm borrowing most of the lexing logic from DMD. Plus with unittests, I think we could ensure things are lexing the same.
> 
> > Also, I see _zero_ value in making it support D1. If it's for D2's standard library, then what's the point of it lexing D1? That just complicates the lexer for what is essentially a legacy product. And given that the differences between D1 and D2 in dmd's lexer are covered with #ifdefs, it would be rather complicated to try and do a direct port which covers both D1 and D2. It would probably be easier if the two were completely separate.
> 
> Well, it doesn't appear that there are that many lexical differences between D1 and D2. A few operators, a few keywords, D2 supports a few different string constants, etc...
> 
> I wanted this to be usable by IDEs. Do we not want to support D2 for development? I could certainly make this D2 only.

Phobos for D2 is for D2, not D1. Mixing D1 into the mix complicates things. If it's easy to tell the lexer to lex D1 or D2 without negatively impacting performance and without seriously impacting the implementation, then we can support D1. But personally, I don't think that it's worth the extra complication. The compiler separates them with ifdefs, which isn't going to work as well in Phobos. You'd probably end up having to use templates and static ifs to deal with the differences. It can be done, but I don't think that it's worth the gain. The lexer is complicated enough as it is.

- Jonathan M Davis
August 01, 2011
On 07/31/2011 09:28 PM, Johann MacDonagh wrote:
> Well, if Andrei wants to flex the power of D2's templates / etc... with
> a parser generator then maybe we should go down that route. Andrei, what
> do you think? Would the lexer/parser be generated at compile time or a a
> regular tool that would generate the appropriate D files?

Until we have fully integrated, EASY to use, and FAST lexer and parser generators, we haven't yet proven the power of CTFE technology. I don't have the time to embark on such a project for the time being.

Andrei