August 03, 2012
On 08/03/2012 05:08 AM, Jonathan M Davis wrote:
> On Thursday, August 02, 2012 23:00:41 Andrei Alexandrescu wrote:
>> On 8/2/12 10:40 PM, Walter Bright wrote:
>>> To reiterate another point, since we are in the compiler business,
>>> people will expect std.d.lexer to be of top quality, not some bag on the
>>> side. It needs to be usable as a base for writing a professional quality
>>> compiler. It's the reason why I'm pushing much harder on this than I do
>>> for other modules.
>>
>> The lexer must be configurable enough to tokenize other languages than
>> D. I confess I'm very unhappy that there seem to be no less than three
>> people determined to write lexers for D. We're wasting precious talent
>> and resources doubly. Once, several people are working in parallel on
>> the same product. Second, none of them is actually solving the problem
>> that should be solved.
>
> You're not going to get as fast a lexer if it's not written specifically for D.

This should certainly be possible.

> Writing a generic lexer is a different problem.

It is a more general problem.

August 03, 2012
On 8/2/2012 8:00 PM, Andrei Alexandrescu wrote:
> On 8/2/12 10:40 PM, Walter Bright wrote:
>> To reiterate another point, since we are in the compiler business,
>> people will expect std.d.lexer to be of top quality, not some bag on the
>> side. It needs to be usable as a base for writing a professional quality
>> compiler. It's the reason why I'm pushing much harder on this than I do
>> for other modules.
>
> The lexer must be configurable enough to tokenize other languages than D. I
> confess I'm very unhappy that there seem to be no less than three people
> determined to write lexers for D. We're wasting precious talent and resources
> doubly. Once, several people are working in parallel on the same product.
> Second, none of them is actually solving the problem that should be solved.

I agree and I hope the three can combine their efforts with the best ideas of each.

I don't think the lexer needs to be configurable to other languages. I think it's fine that it be custom for D. However, and this is a big however, its user-facing interface should be usable for other languages, I see no obvious reason why this cannot be.

August 03, 2012
On Friday, 3 August 2012 at 03:14:14 UTC, Walter Bright wrote:
> On 8/2/2012 8:00 PM, Andrei Alexandrescu wrote:
>> On 8/2/12 10:40 PM, Walter Bright wrote:
>>> To reiterate another point, since we are in the compiler business,
>>> people will expect std.d.lexer to be of top quality, not some bag on the
>>> side. It needs to be usable as a base for writing a professional quality
>>> compiler. It's the reason why I'm pushing much harder on this than I do
>>> for other modules.
>>
>> The lexer must be configurable enough to tokenize other languages than D. I
>> confess I'm very unhappy that there seem to be no less than three people
>> determined to write lexers for D. We're wasting precious talent and resources
>> doubly. Once, several people are working in parallel on the same product.
>> Second, none of them is actually solving the problem that should be solved.
>
> I agree and I hope the three can combine their efforts with the best ideas of each.

If the other guys think they've got it, then I can withdraw my
efforts. I was just thinking I had a lexer just sitting around,
may as well use it, but if the other guys have it, then I'm fine
with withdrawing.
August 03, 2012
On 8/2/12 11:11 PM, Bernard Helyer wrote:
> On Friday, 3 August 2012 at 03:00:42 UTC, Andrei Alexandrescu wrote:
>> The lexer must be configurable enough to tokenize other languages than D.
>
> You're going to have to defend that one.

I wouldn't know how to. To me it's all too obvious it's better to have a tool that allows writing lexers for many languages (and incidentally use it for D), than a tool that can only tokenize D code.

Andrei
August 03, 2012
On 8/2/12 11:08 PM, Jonathan M Davis wrote:
> You're not going to get as fast a lexer if it's not written specifically for D.
> Writing a generic lexer is a different problem. It's also one that needs to be
> solved, but I think that it's a mistake to think that a generic lexer is going
> to be able to be as fast as one specifically optimized for D.

Do you have any evidence to back that up? I mean you're just saying it.

Andrei


August 03, 2012
On Thursday, August 02, 2012 23:41:39 Andrei Alexandrescu wrote:
> On 8/2/12 11:08 PM, Jonathan M Davis wrote:
> > You're not going to get as fast a lexer if it's not written specifically for D. Writing a generic lexer is a different problem. It's also one that needs to be solved, but I think that it's a mistake to think that a generic lexer is going to be able to be as fast as one specifically optimized for D.
> 
> Do you have any evidence to back that up? I mean you're just saying it.

Because all of the rules are built directly into the code. You don't have to use regexes or anything like that. Pieces of the lexer could certainly be generic or copied over to other lexers just fine, but when you write the lexer by hand specifically for D, you can guarantee that it checks exactly what it needs to for D without any extra cruft or lost efficiency due to decoding where it doesn't need to or checking an additional character at any point or anything along those lines. And tuning it is much easier, because you have control over the whole thing. Also, given features such as token strings, I would think that using a generic lexer on D would be rather difficult anyway.

If someone wants to try and write a generic lexer for D and see if they can beat out any hand-written ones, then more power to them, but I don't see how you could possibly expect to shave the operations down to the bare minimum necessary to get the job done with a generic lexer, whereas a hand-written parser can do that given enough effort.

- Jonathan M Davis
August 03, 2012
On Friday, August 03, 2012 05:36:05 Bernard Helyer wrote:
> If the other guys think they've got it, then I can withdraw my efforts. I was just thinking I had a lexer just sitting around, may as well use it, but if the other guys have it, then I'm fine with withdrawing.

I'm a fair ways along already. Making changes according to what Walter wants will slow me down some but not a lot. It's something that I've wanted to do for quite some time but just finally got around to starting a couple of weeks ago. So, I definitely want to complete it regardless of what anyone else is doing. It also gives me an opportunity to make sure that the spec and dmd match (and I've already found a few bugs with the spec and fixed them).

Feel free to do whatever you want, but I'm definitely going to complete what I'm working on.

- Jonathan M Davis
August 03, 2012
On Friday, 3 August 2012 at 03:59:29 UTC, Jonathan M Davis wrote:
> On Friday, August 03, 2012 05:36:05 Bernard Helyer wrote:
>> If the other guys think they've got it, then I can withdraw my
>> efforts. I was just thinking I had a lexer just sitting around,
>> may as well use it, but if the other guys have it, then I'm fine
>> with withdrawing.
>
> I'm a fair ways along already. Making changes according to what Walter wants
> will slow me down some but not a lot. It's something that I've wanted to do
> for quite some time but just finally got around to starting a couple of weeks
> ago. So, I definitely want to complete it regardless of what anyone else is
> doing. It also gives me an opportunity to make sure that the spec and dmd
> match (and I've already found a few bugs with the spec and fixed them).
>
> Feel free to do whatever you want, but I'm definitely going to complete what
> I'm working on.
>
> - Jonathan M Davis

I'll let you get on with it then. I'll amuse myself with the
thought of someone asking why SDC doesn't use std.d.lexer or
a parser generator. I'll then hit them with my cane, and tell
them to get off of my lawn. :P


August 03, 2012
On Friday, August 03, 2012 06:02:29 Bernard Helyer wrote:
> I'll let you get on with it then. I'll amuse myself with the thought of someone asking why SDC doesn't use std.d.lexer or a parser generator. I'll then hit them with my cane, and tell them to get off of my lawn. :P

Well, if std.d.lexer is done correctly, then it should at least be very usable by SDC, though whether it would be worth changing SDC to use it, I have no idea. That'll be up to you and the other SDC devs. But when code gets posted, you should definitely chime in on it.

- Jonathan M Davis
August 03, 2012
On 08/03/2012 05:53 AM, Jonathan M Davis wrote:
> On Thursday, August 02, 2012 23:41:39 Andrei Alexandrescu wrote:
>> On 8/2/12 11:08 PM, Jonathan M Davis wrote:
>>> You're not going to get as fast a lexer if it's not written specifically
>>> for D. Writing a generic lexer is a different problem. It's also one that
>>> needs to be solved, but I think that it's a mistake to think that a
>>> generic lexer is going to be able to be as fast as one specifically
>>> optimized for D.
>>
>> Do you have any evidence to back that up? I mean you're just saying it.
>
> Because all of the rules are built directly into the code. You don't have to
> use regexes or anything like that.

The parts that can be specified with simple regexen are certainly not a
problem. A generic string mixin based lexer should be able to generate
very close to optimal code by eg. merging common token prefixes and the
like.

> Pieces of the lexer could certainly be
> generic or copied over to other lexers just fine, but when you write the lexer
> by hand specifically for D, you can guarantee that it checks exactly what it
> needs to for D without any extra cruft or lost efficiency due to decoding where
> it doesn't need to or checking an additional character at any point or
> anything along those lines.

This is achievable if it is fully generic as well. Just add generic
peephole optimizations until the generated lexer is identical to what
the hand-written one would have looked like.

> And tuning it is much easier,

Yes.

> because you have control over the whole thing.

The library writer has control over the whole thing in each case.

> Also, given features such as token strings, I
> would think that using a generic lexer on D would be rather difficult anyway.
>

It would of course need to incorporate custom parsing routine support.

> If someone wants to try and write a generic lexer for D and see if they can
> beat out any hand-written ones,

I'll possibly give it a shot if I can find the time.

> then more power to them, but I don't see how
> you could possibly expect to shave the operations down to the bare minimum
> necessary to get the job done with a generic lexer, whereas a hand-written
> parser can do that given enough effort.
>

If it is optimal for D lexing and close-optimal or optimal for other
languages then it is profoundly more useful than just a D lexer.