February 04, 2013
After several hours of optimizing I've managed to make it so that dmd's lexer is only three times faster.

http://hackerpilot.github.com/experimental/std_lexer/images/times.png

The funny thing is that compiling with LDC gave a bigger speed boost than any of my code refacatoring.
February 04, 2013
On 2013-02-04 01:22, Brian Schott wrote:
> After several hours of optimizing I've managed to make it so that dmd's lexer is
> only three times faster.

What are you comparing here? How do you time DMD's lexing stage?

February 04, 2013
On Monday, 4 February 2013 at 00:38:57 UTC, FG wrote:
> On 2013-02-04 01:22, Brian Schott wrote:
>> After several hours of optimizing I've managed to make it so that dmd's lexer is
>> only three times faster.
>
> What are you comparing here? How do you time DMD's lexing stage?

A simple hack of module.c that prints out the file's token count and then calls exit(0).
February 04, 2013
On 2013-02-04 01:41, Brian Schott wrote:
>> What are you comparing here? How do you time DMD's lexing stage?
>
> A simple hack of module.c that prints out the file's token count and then calls
> exit(0).

Ah, fine. Then it's not that bad. :)
Have you already made it use slices of the whole source,
without copying the string values of tokens?
What kind of optimizations have you made?
February 04, 2013
On 2013-02-04 01:50, FG wrote:

> Ah, fine. Then it's not that bad. :)
> Have you already made it use slices of the whole source,
> without copying the string values of tokens?
> What kind of optimizations have you made?

That would be interesting to hear. Three times slower than DMD doesn't sound good. I know that DMD is fast, but three times.

-- 
/Jacob Carlborg
February 04, 2013
On Monday, 4 February 2013 at 00:22:42 UTC, Brian Schott wrote:
> After several hours of optimizing I've managed to make it so that dmd's lexer is only three times faster.
>
> http://hackerpilot.github.com/experimental/std_lexer/images/times.png
>
> The funny thing is that compiling with LDC gave a bigger speed boost than any of my code refacatoring.




Where is the current bottleneck?

(Should be easy to find just by running the program ~5 times and suddenly breaking into it with a debugger.)

Also, I'm assuming you've already tried disabling range-checking on arrays?
February 04, 2013
On 2013-02-04 08:57, Jacob Carlborg wrote:
> On 2013-02-04 01:50, FG wrote:
>
>> Ah, fine. Then it's not that bad. :)
>> Have you already made it use slices of the whole source,
>> without copying the string values of tokens?
>> What kind of optimizations have you made?
>
> That would be interesting to hear. Three times slower than DMD doesn't sound
> good. I know that DMD is fast, but three times.

Looking at the current source, there is now a StringCache to hold the strings.
It is however also used to store all those long comments, so I believe quite
some time is unnecessarily wasted on generating hashes for them.
But probably there are other parts slowing things down.

February 05, 2013
More optimizing:
http://hackerpilot.github.com/experimental/std_lexer/images/times2.png

Still only half speed. I'm becoming more and more convinced that Walter is actually a wizard.
February 05, 2013
On 2/4/13 10:19 PM, Brian Schott wrote:
> More optimizing:
> http://hackerpilot.github.com/experimental/std_lexer/images/times2.png
>
> Still only half speed. I'm becoming more and more convinced that Walter
> is actually a wizard.

Suggestion: take lexer.c and convert it to D. Should take one day, and you'll have performance on par.

Andrei
February 05, 2013
On 2/5/13, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Suggestion: take lexer.c and convert it to D. Should take one day, and you'll have performance on par.

This was already done for DDMD, and the more recent minimal version of it:

https://github.com/zachthemystic/ddmd-clean/blob/master/dmd/lexer.d