std.d.lexer : voting thread (page 8) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.d.lexer : voting thread (page 8)

October 07, 2013

Re: std.d.lexer : voting thread

Posted by Jonathan M Davis
in reply to simendsjo

Jonathan M Davis

Posted in reply to simendsjo

On Monday, October 07, 2013 17:47:27 simendsjo wrote:
> On Monday, 7 October 2013 at 13:29:30 UTC, ilya-stromberg wrote:
> > On Sunday, 6 October 2013 at 18:54:55 UTC, Dicebot wrote:
> >> Any formal review may potentially result in short voting after if no critical issues are found so I don't think it makes sense in making any additional announcements. There are no special points of attention - if review was declared and you want to make some input, it should be done right there.
> > 
> > Yes, but people are lazy. I don't talk about all of us, but
> > most of people are lazy.
> > Somebody of us will vote because it's interesting, but will not
> > read/write review tread because it requests a time.
> > So, additional announce of upcoming voting can help: "Guys, if
> > you want to vote, it's time to read documentation and write
> > your really cool idea before voting".
> 
> This is the reason I've not cast any votes for standard modules - I haven't had the time, or don't have the competence, to cast a valid vote. It would be like voting for a political party without knowing where all parties stands in all cases.

So, it would be like your typical political vote then. ;)

- Jonathan m Davis

October 07, 2013

Lexers (was: std.d.lexer : voting thread)

Posted by Artur Skawina
in reply to Jacob Carlborg

Artur Skawina

Posted in reply to Jacob Carlborg

On 10/06/13 10:57, Jacob Carlborg wrote:
> On 2013-10-05 19:52, Artur Skawina wrote:
> 
>> The assumption, that a hand-written lexer will be much faster than a generated one, is wrong.
> 
> I never said that the generated one would be slow. I only said that the hand written would be fast :)

I know, but you said that having both is an option -- that would not
make sense unless there's a significant advantage.
A lexer is really a rather trivial piece of software, there's not much
room for improvement over the obvious "fetch-a-character, use-it-to-
determine-a-new-state, repeat-until-done, return the found state
( == matched token)" approach. So the core of an efficient hand-written
lexer will not be very different from this:
http://repo.or.cz/w/girtod.git/blob/refs/heads/lexer:/mainloop.d
That is already ~2kLOC and it's *just* the top-level loop; it does not
include handling of nontrivial tokens (matches just keywords, punctuators
and identifiers). Could a handwritten lexer be faster? Not by much, and
any trick that would help the manually-written one could also be used
by the generator. In fact, working on the generator is much easier than
dealing with this kind of fragile hand-tuned mess. Imagine changing the
lexical grammar a bit, or introducing a new kind of literal. With a
more declarative solution this only involves a local change spanning
a few lines and is relatively risk-free. Updating a handwritten lexer
would involve many more changes, often in several different areas, and
lots of opportunities for making mistakes.

> Would it be able to lex Scala and Ruby? Method names in Scala can contain many symbols that is not usually allowed in other languages. You can have a method named "==". In Ruby method names are allowed to end with "=", "?" or "!".

Yes, D makes it easy, you can for example simply define a function that determines what is and what isn't an identifier and pass that as an alias or mixin parameter. "Lexing" binary formats would be possible too :^).

A complete D lexer can look as simple as this:
http://repo.or.cz/w/girtod.git/blob/refs/heads/lexer:/dlanglexer.d
which should also give you a good idea of how easy supporting
other languages would be. (The "actions" are defined in separate
modules, so that the grammars can be reused everywhere).
There's a D PEG lexical grammar in there too, btw.

I forgot to change the subject previously, sorry; was not trying
to attempt or influence the voting. I'm just saying that Andrei's
approach goes into the right direction (even if i disagree with
the details). And IMHO the time before a useful std-lib-worthy
lexer infrastructure materializes is measured in months, if not years.
So if I was voting I'd probably say "yes" - because waiting for a
better, but non-existent alternative is not going to help anybody.
The hard part of the required work isn't coding - it's the design.
If a better solution appears later, it should be able to /replace/
the hand-written one. And in the mean time, the experience from using
the less-generic lexer can only help any "new" design.

artur

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Andrei Alexandrescu

Andrei Alexandrescu

Posted in reply to Andrei Alexandrescu

On 10/4/13 5:24 PM, Andrei Alexandrescu wrote:
> On 10/2/13 7:41 AM, Dicebot wrote:
>> After brief discussion with Brian and gathering data from the review
>> thread, I have decided to start voting for `std.d.lexer` inclusion into
>> Phobos.
>
> Thanks all involved for the work, first of all Brian.
>
> I have the proverbial good news and bad news. The only bad news is that
> I'm voting "no" on this proposal.
>
> But there's plenty of good news.
>
> 1. I am not attempting to veto this, so just consider it a normal vote
> when tallying.
>
> 2. I do vote for inclusion in the /etc/ package for the time being.
>
> 3. The work is good and the code valuable, so even in the case my
> suggestions (below) will be followed, a virtually all code pulp that
> gets work done can be reused.
[snip]

To put my money where my mouth is, I have a proof-of-concept tokenizer for C++ in working state.

http://dpaste.dzfl.pl/d07dd46d

It contains some rather unsavory bits (I'm sure a ctRegex would be nicer for parsing numbers etc), but it works on a lot of code just swell.

Most importantly, there's a clear distinction between the generic core and the C++-specific part. It should be obvious how to use the generic matcher for defining a D tokenizer.

Token representation is minimalistic and expressive. Just write tk!"<<" for left shift, tk!"int" for int etc. Typos will be detected during compilation. One does NOT need to define and use TK_LEFTSHIFT or TK_INT; all needed by the generic tokenizer is the list of tokens. In return, it offers an efficient trie-based matcher for all tokens.

(Keyword matching is unusual in that keywords are first found by the trie matcher, and then a simple check figures whether more characters follow, e.g. "if" vs. "iffy". Given that many tokenizers use a hashtable anyway to look up all symbols, there's no net loss of speed with this approach.)

The lexer generator compiles fast and should run fast. If not, it should be easy to improve at the matcher level.

Now, what I'm asking for is that std.d.lexer builds on this design instead of the traditional one. At a slight delay, we get the proverbial fishing rod IN ADDITION TO of the equally proverbial fish, FOR FREE. It is quite evident there's a bunch of code sharing going on already between std.d.lexer and the proposed design, so it shouldn't be hard to effect the adaptation.

So with this I'm leaving it all within the hands of the submitter and the review manager. I didn't count the votes, but we may have a "yes" majority built up. Since additional evidence has been introduce, I suggest at least a revote. Ideally, there would be enough motivation for Brian to suspend the review and integrate the proposed design within std.d.lexer.

Andrei

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Jakob Ovrum
in reply to Andrei Alexandrescu

Jakob Ovrum

Posted in reply to Andrei Alexandrescu

On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
> http://dpaste.dzfl.pl/d07dd46d

I have to say, that `generateCases` function is rather disgusting. I'm really worried about the trend of using string mixins when not necessary, for no apparent gain. Surely you could have used static foreach to generate those cases instead, allowing code that is actually readable. It would probably have much better compile-time performance as well, but that's just speculation.

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Jakob Ovrum

Andrei Alexandrescu

Posted in reply to Jakob Ovrum

On 10/7/13 9:21 PM, Jakob Ovrum wrote:
> On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
>> http://dpaste.dzfl.pl/d07dd46d
>
> I have to say, that `generateCases` function is rather disgusting. I'm
> really worried about the trend of using string mixins when not
> necessary, for no apparent gain. Surely you could have used static
> foreach to generate those cases instead, allowing code that is actually
> readable. It would probably have much better compile-time performance as
> well, but that's just speculation.

This is the first shot, and I'm more interested in the API with the implementation to be improved. Your idea sounds great - care to put it in code so we see how it does?

Andrei

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Andrei Alexandrescu

Andrei Alexandrescu

Posted in reply to Andrei Alexandrescu

On 10/7/13 9:26 PM, Andrei Alexandrescu wrote:
> On 10/7/13 9:21 PM, Jakob Ovrum wrote:
>> On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
>>> http://dpaste.dzfl.pl/d07dd46d
>>
>> I have to say, that `generateCases` function is rather disgusting. I'm
>> really worried about the trend of using string mixins when not
>> necessary, for no apparent gain. Surely you could have used static
>> foreach to generate those cases instead, allowing code that is actually
>> readable. It would probably have much better compile-time performance as
>> well, but that's just speculation.
>
> This is the first shot, and I'm more interested in the API with the
> implementation to be improved. Your idea sounds great - care to put it
> in code so we see how it does?
>
> Andrei

FWIW I just tried this, and it seems to work swell.

int main(string[] args) {
  alias TypeTuple!(1, 2, 3, 4) tt;
  int a;
  switch (args.length) {
    foreach (i, _; tt) {
      case i + 1: return i * 42;
    }
    default: break;
  }
  return 0;
}

Interesting!


Andrei

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Andrei Alexandrescu

Andrei Alexandrescu

Posted in reply to Andrei Alexandrescu

On 10/7/13 9:34 PM, Andrei Alexandrescu wrote:
> On 10/7/13 9:26 PM, Andrei Alexandrescu wrote:
>> On 10/7/13 9:21 PM, Jakob Ovrum wrote:
>>> On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
>>>> http://dpaste.dzfl.pl/d07dd46d
>>>
>>> I have to say, that `generateCases` function is rather disgusting. I'm
>>> really worried about the trend of using string mixins when not
>>> necessary, for no apparent gain. Surely you could have used static
>>> foreach to generate those cases instead, allowing code that is actually
>>> readable. It would probably have much better compile-time performance as
>>> well, but that's just speculation.
>>
>> This is the first shot, and I'm more interested in the API with the
>> implementation to be improved. Your idea sounds great - care to put it
>> in code so we see how it does?
>>
>> Andrei
>
> FWIW I just tried this, and it seems to work swell.
>
> int main(string[] args) {
>    alias TypeTuple!(1, 2, 3, 4) tt;
>    int a;
>    switch (args.length) {
>      foreach (i, _; tt) {
>        case i + 1: return i * 42;
>      }
>      default: break;
>    }
>    return 0;
> }
>
> Interesting!
>
>
> Andrei

On the other hand, I find it difficult to figure how the needed processing can be done with reasonable ease with just the above. So I guess it's your turn.

Andrei

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Jonathan M Davis
in reply to Andrei Alexandrescu

Jonathan M Davis

Posted in reply to Andrei Alexandrescu

On Monday, October 07, 2013 17:16:45 Andrei Alexandrescu wrote:
> So with this I'm leaving it all within the hands of the submitter and the review manager. I didn't count the votes, but we may have a "yes" majority built up. Since additional evidence has been introduce, I suggest at least a revote. Ideally, there would be enough motivation for Brian to suspend the review and integrate the proposed design within std.d.lexer.

I think that it's worth noting that if this vote passes, it will be the first vote for a Phobos module which passed and had any "no" votes cast against it (at least, if any of the previous modules had any "no" votes, I don't recall them; it's always been overwhelmingly in favor of inclusion). That in and of itself implies that the situation needs further examination. Though maybe it's simply that this particular module is in an area where we have more posters with strong opinions.

Also, in general, I tend to think that we should move towards not merging new modules into Phobos as quickly as we have in the past. Whether the "stdx" proposal is the way to go or not is another matter, but I think that we should aim for having modules be more battle-tested before actually becoming full- fledged modules in Phobos. We've had great stuff reviewed and merged thus far, but we also tend to end up having to make minor tweaks to the API or later come to regret including it at all (e.g. std.net.curl). Having some sort of intermediate step prior to full inclusion for at least one or two releases would be a good move IMHO.

- Jonathan M Davis

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Brian Schott
in reply to Jonathan M Davis

Brian Schott

Posted in reply to Jonathan M Davis

On Tuesday, 8 October 2013 at 05:22:32 UTC, Jonathan M Davis wrote:
> I think that it's worth noting that if this vote passes, it will be the first
> vote for a Phobos module which passed and had any "no" votes cast against it
> (at least, if any of the previous modules had any "no" votes, I don't recall
> them; it's always been overwhelmingly in favor of inclusion). That in and of
> itself implies that the situation needs further examination. Though maybe it's
> simply that this particular module is in an area where we have more posters
> with strong opinions.

I had noticed this. I'm not sure if a simple majority is good enough for the standard library.

October 08, 2013

Re: std.d.lexer : voting thread

Posted by Dicebot
in reply to Jonathan M Davis

Dicebot

Posted in reply to Jonathan M Davis

On Tuesday, 8 October 2013 at 05:22:32 UTC, Jonathan M Davis wrote:
> I think that it's worth noting that if this vote passes, it will be the first
> vote for a Phobos module which passed and had any "no" votes cast against it
> (at least, if any of the previous modules had any "no" votes, I don't recall
> them; it's always been overwhelmingly in favor of inclusion).

Guess what was the main point of my concerns while following this voting thread. Until now there were at most one "No" vote for accepted proposals and exact "Yes" vote threshold is defined anywhere. When voting will end I will sum up some format stats on topic and after some hard thinking will make separate announcement/topic possible outcomes.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation