Request for comments: std.d.lexer (page 13)

On 2013-02-05 11:46, Jonathan M Davis wrote: > It turns out that it has nothing to do with string mixins (though it does have > to do with CTFE): > > http://d.puremagic.com/issues/show_bug.cgi?id=9452 > > Fortunately, there's a simple workaround that'll let me continue until the bug > is fixed. Ok, good that it's not a blocker. -- /Jacob Carlborg

On 2/5/13 12:45 AM, deadalnix wrote: > On Tuesday, 5 February 2013 at 03:22:52 UTC, Andrei Alexandrescu wrote: >> On 2/4/13 10:19 PM, Brian Schott wrote: >>> More optimizing: >>> http://hackerpilot.github.com/experimental/std_lexer/images/times2.png >>> >>> Still only half speed. I'm becoming more and more convinced that Walter >>> is actually a wizard. >> >> Suggestion: take lexer.c and convert it to D. Should take one day, and >> you'll have performance on par. >> > > DMD's lexer is not suitable for phobos IMO. It doesn't take a range as > input and don't produce a range. It also lack the features you may want > from a multi usage D lexer. I looked at the code. Once converted, it's trivial to convert the lexer into an input range. Andrei

February 05, 2013

Re: Request for comments: std.d.lexer

Posted by Andrei Alexandrescu
in reply to Jonathan M Davis

Permalink

Andrei Alexandrescu

Posted in reply to Jonathan M Davis

Permalink

On 2/5/13 5:44 AM, Jonathan M Davis wrote:
> On Tuesday, February 05, 2013 09:14:35 Jacob Carlborg wrote:
>> On 2013-02-05 04:22, Andrei Alexandrescu wrote:
>>> Suggestion: take lexer.c and convert it to D. Should take one day, and
>>> you'll have performance on par.
>>
>> There's reason for why nobody has just extract the lexer from DMD. It
>> will probably take more than a day just to extract the lexer to be able
>> to use it without the rest of DMD.
>
> There are basic ideas about how it works which are obviously good and should
> be in the finished product in D, but it's not range-based, which forces you to
> do things differently. It's also not configurable, which forces you to do things
> differently.
>
> If it could be ported as-is and then compared for speed, then that would be a
> great test, since it would be able to show how much of the speed problem is
> purely a compiler issue as opposed to a design issue, but you wouldn't be able
> to actually use it for anything more than what Brian is doing with his
> performance testing, because as you point out, it's too integrated into dmd.
> It _would_ be valuable though as a performance test of the compiler.

As far as I could tell the dependencies of the lexer are fairly contained (util, token, identifier) and conversion to input range is immediate.

Andrei

On 2013-02-05 14:34, Andrei Alexandrescu wrote: > As far as I could tell the dependencies of the lexer are fairly > contained (util, token, identifier) and conversion to input range is > immediate. That's not what I remember from last time I gave it a try. -- /Jacob Carlborg

On 2/5/13, Brian Schott <briancschott@gmail.com> wrote: > I gave up on getting it to compile. It seems master is broken. An earlier commit is buildable with a small change, but it doesn't seem to parse modules like std.datetime. I don't know what exact version the front-end was ported from, but I've tried to parse datetime from 2.055 to 2.061 and it didn't work. So yeah it's not usable in this state.

On Tuesday, 5 February 2013 at 08:52:53 UTC, Dmitry Olshansky wrote: > Time to do some hacking on your lexer I guess. I'll try add a couple of tricks and see if it helps. > > What command do you use for benchmarking? I've been using avgtime[1] for measuring run times, perf[2], and callgrind/kcachegrind[3] for profiling. [1] https://github.com/jmcabo/avgtime [2] https://perf.wiki.kernel.org/index.php/Tutorial [3] http://kcachegrind.sourceforge.net/html/Home.html

05-Feb-2013 22:25, Brian Schott пишет: > On Tuesday, 5 February 2013 at 08:52:53 UTC, Dmitry Olshansky > wrote: >> Time to do some hacking on your lexer I guess. I'll try add a couple >> of tricks and see if it helps. >> >> What command do you use for benchmarking? > > I've been using avgtime[1] for measuring run times, perf[2], and > callgrind/kcachegrind[3] for profiling. > > [1] https://github.com/jmcabo/avgtime > [2] https://perf.wiki.kernel.org/index.php/Tutorial > [3] http://kcachegrind.sourceforge.net/html/Home.html Thanks. I've made a pass through the code and found some places to improve. Sadly I've been rather busy at work today. Anyway get ready for pull requests should my ideas prove to be worthy performance-wise. -- Dmitry Olshansky

On Tuesday, February 05, 2013 08:34:29 Andrei Alexandrescu wrote: > As far as I could tell the dependencies of the lexer are fairly contained (util, token, identifier) and conversion to input range is immediate. I don't remember all of the details at the moment, since it's been several months since I looked at dmd's lexer, but a lot of the problem stems from the fact that it's all written around the assumption that it's dealing with a char*. Converting it to operate on string might be fairly straightforward, but it gets more complicated when dealing with ranges. Also, both Walter and others have stated that the lexer in D should be configurable in a number of ways, and dmd's lexer isn't configurable at all. So, while a direct translation would likely be quick, refactoring it to do what it's been asked to be able to do would not be. I'm quite a ways along with one that's written from scratch, but I need to find the time to finish it. Also, doing it from scratch has had the added benefit of helping me find bugs in the spec and in dmd. - Jonathan M Davis

On 2/5/13 10:29 PM, Jonathan M Davis wrote: > On Tuesday, February 05, 2013 08:34:29 Andrei Alexandrescu wrote: >> As far as I could tell the dependencies of the lexer are fairly >> contained (util, token, identifier) and conversion to input range is >> immediate. > > I don't remember all of the details at the moment, since it's been several > months since I looked at dmd's lexer, but a lot of the problem stems from the > fact that it's all written around the assumption that it's dealing with a > char*. Converting it to operate on string might be fairly straightforward, but > it gets more complicated when dealing with ranges. Also, both Walter and > others have stated that the lexer in D should be configurable in a number of > ways, and dmd's lexer isn't configurable at all. So, while a direct translation > would likely be quick, refactoring it to do what it's been asked to be able to > do would not be. > > I'm quite a ways along with one that's written from scratch, but I need to find > the time to finish it. Also, doing it from scratch has had the added benefit of > helping me find bugs in the spec and in dmd. I think it would be reasonable for a lexer to require a range of ubyte as input, and carry its own decoding. In the first approximation it may even require a random-access range of ubyte. Andrei

On Tuesday, February 05, 2013 22:51:32 Andrei Alexandrescu wrote: > I think it would be reasonable for a lexer to require a range of ubyte as input, and carry its own decoding. In the first approximation it may even require a random-access range of ubyte. I'd have to think about how you'd handle the Unicode stuff in that case, since I'm not quite sure what you mean by having it handle its own decoding if it's a range of code units, but what I've been working on works with all of the character types and is very careful about how it deals with decoding in order to avoid unnecessary decoding. And that wasn't all that hard as far as the lexer's code goes. The hard part with that was making std.utf work with ranges of code units rather than just strings, and that was committed months ago. - Jonathan M Davis

Forums