Lexer in D (page 4)

> Just rip off the const why would you need it anyway? The data in 'tokens' are final, so why they should not be const? What exactly is the problem if there are const member? Is this a known problem? Otherwise, I will go and investigate it.

03-Mar-2013 20:45, Namespace пишет: >> Just rip off the const why would you need it anyway? > > The data in 'tokens' are final, so why they should not be const? Then just return const(Token) like simple full const. In general I'd avoid marking data as bitwise const unless that is actually true. For all you know some algorithm up above might tweak existing token values directly. > What exactly is the problem if there are const member? Is this a known > problem? It's quite old and ugly as a lot of things would need to bit-blit stuff over it and fail to do so. > Otherwise, I will go and investigate it. -- Dmitry Olshansky

On Sunday, March 03, 2013 17:19:25 Namespace wrote: > But hasn't each range an array internally? Many do, but some don't. Take std.algorithm.ioto, or the ranges in std.random for instance. They're generative. And if all ranges had arrays underneat the hood, infinite ranges wouldn't be possible except for something like std.algorithm.cycle. Something like struct Range { @property int front() { return _i; } void popFront() {++i} @property bool empty { return _i != int.max; } private int _i; } would be a valid range. In the case of a lexer, you generate the next token on each popFront call, so you consume the range of characters that you're lexing as you go but don't need to allocate memory for more than the current token at a time. - Jonathan M Davis

It's quite amazing, with Appender I reach between 115 and 125 msecs. I adapted the Appender for myself and solve the problem with memcpy/memmove. That works fine. But is there any other I can do (besides the LazyForwardRange) to gain more performance?

I've installed cygwin to use the time benchmark function and that are the results: $ time ./Lexer.exe real 0m0.225s user 0m0.015s sys 0m0.046s I also read in the std.d.lexer thread that the use of 'char' isn't that good for a lexer, because dmd wants to decode 'char' to 'dchar'. The usage of ubyte should be better. Is that right?

With reserving of 60% of the file size in memory I get a big speed boost. ------------------------ Total time (ms): 17544.6 Repetitions : 200 Sample mode : 87 (121 ocurrences) Median time : 87.3835 Avg time : 87.7232 Std dev. : 1.77905 Minimum : 84.238 Maximum : 101.919 95% conf.int. : [84.2363, 91.21] e = 3.48687 99% conf.int. : [83.1407, 92.3057] e = 4.58252 EstimatedAvg95%: [87.4766, 87.9697] e = 0.246559 EstimatedAvg99%: [87.3991, 88.0472] e = 0.324033 But I'm still away from: Avg time : 69.3088 http://forum.dlang.org/thread/dpdgcycrgfspcxenzrjf@forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.org I've tried also 10%, 40% and 80% but 60% seems optimal. And I think I reached my performance limit. I could implement the LazyForwardRange and, if anyone can confirm the statement of char <-> dchar decoding and that the usage of ubyte would be (crucial) better, I will try this too.

March 04, 2013

Re: Lexer in D

Posted by Dmitry Olshansky
in reply to Namespace

Permalink

Dmitry Olshansky

Posted in reply to Namespace

Permalink

04-Mar-2013 16:17, Namespace пишет:
> With reserving of 60% of the file size in memory I get a big speed boost.
>
> ------------------------
> Total time (ms): 17544.6
> Repetitions    : 200
> Sample mode    : 87 (121 ocurrences)
> Median time    : 87.3835
> Avg time       : 87.7232
> Std dev.       : 1.77905
> Minimum        : 84.238
> Maximum        : 101.919
> 95% conf.int.  : [84.2363, 91.21]  e = 3.48687
> 99% conf.int.  : [83.1407, 92.3057]  e = 4.58252
> EstimatedAvg95%: [87.4766, 87.9697]  e = 0.246559
> EstimatedAvg99%: [87.3991, 88.0472]  e = 0.324033
>
> But I'm still away from:
> Avg time       : 69.3088
> http://forum.dlang.org/thread/dpdgcycrgfspcxenzrjf@forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.org

Keep in mind the CPU you are testing on. What's you chip?

In any case Dscanner has been improved to take far far less time then 60ms. Brain listed graphs showing under 20 ms on his CPU. for all and <10 for every module in std except std.datetime.

> I've tried also 10%, 40% and 80% but 60% seems optimal.
>
> And I think I reached my performance limit. I could implement the
> LazyForwardRange and, if anyone can confirm the statement of char <->
> dchar decoding and that the usage of ubyte would be (crucial) better, I
> will try this too.




-- 
Dmitry Olshansky

04-Mar-2013 22:58, Namespace пишет: > Yes, I know. But I do not think I have a chance to get even closer to > it. At least I don't know, how. One thing that all of these lexers do (dmd's orignal and Dscanner) is get tokens as you need them - lazily. You would always be slower if taking time to append them in some array (no matter how). The idiomatic way as I said is to use ranges. > I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+. Mm that's not that far away from mine. But Brain has got something even more powerful :) You can easily download and test Dscanner on the your CPU so that you get numbers that are actually comparable. -- Dmitry Olshansky

Forums