std.d.lexer performance (WAS: std.d.lexer : voting thread) (page 12) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.d.lexer performance (WAS: std.d.lexer : voting thread) (page 12)

October 11, 2013

Re: std.d.lexer performance (WAS: std.d.lexer : voting thread)

Posted by Dmitry Olshansky
in reply to Jonathan M Davis

Dmitry Olshansky

Posted in reply to Jonathan M Davis

11-Oct-2013 13:07, Jonathan M Davis пишет:
> On Friday, October 11, 2013 12:56:14 Dmitry Olshansky wrote:
>> 04-Oct-2013 15:28, Brian Schott пишет:
>>> On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
>>>> I see we're considerably behind dmd. If improving performance would
>>>> come at the price of changing the API, it may be sensible to hold off
>>>> adoption for a bit.
>>>>
>>>> Andrei
>>>
>>> The old benchmarks measured total program run time. I ran a new set of
>>> benchmarks, placing stopwatch calls around just the lexing code to
>>> bypass any slowness caused by druntime startup. I also made a similar
>>> modification to DMD.
>>>
>>> Here's the result:
>>>
>>> https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimen
>>> tal/std_lexer/images/times5.png
>>>
>>>
>>> I suspect that I've made an error in the benchmarking due to how much
>>> faster std.d.lexer is than DMD now, so I've uploaded what I have to
>>> Github.
>>>
>>> https://github.com/Hackerpilot/lexerbenchmark
>>
>> I'm suspicious of:
>> printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100));
>>
>> Plus I think clock_gettime often has too coarse resolution (I'd use
>> gettimeofday as more reliable).
>> Also check core\time.d  TickDuration.currSystemTick as it uses
>> CLOCK_MONOTONIC on *nix. You should do the same to make timings meaningful.
>
> Why not just use use std.datetime's benchmark or StopWatch? Though looking at
> lexerbenchmark.d it looks like he's using StopWatch rather than clock_gettime
> directly, and there are no printfs, so I don't know what code you're referring
> to here. From the looks of it though, he's basically reimplemented
> std.datetime.benchmark in benchmarklexer.d and probably should have just used
> benchmark instead.

Cause it's C++ damn it! ;)
>
> - Jonathan M Davis
>


-- 
Dmitry Olshansky

October 11, 2013

Re: std.d.lexer : voting thread

Posted by Dmitry Olshansky
in reply to Dicebot

Dmitry Olshansky

Posted in reply to Dicebot

02-Oct-2013 18:41, Dicebot пишет:
> After brief discussion with Brian and gathering data from the review
> thread, I have decided to start voting for `std.d.lexer` inclusion into
> Phobos.
>

I'd have to answer as NO.

In order to get to a YES state, it needs:
a) Use tok!"==" notation (in line with generic lexer). It makes it far more convenient in the parser down the road as well.
b) Ideally use generic lexer framework but it makes for 2 modules to include, so just make it easy to switch to later (no breakage etc.)
c) Abstract away string table, let user provide his own hooks for that, and provide a default StringCache.
d) Allow operation w/o StringTable at all (make it optional) including "just slice the input" mode.

P.S. I'm not a fun of etc.d.lexer. Instead a dub repo seems like a good place for the moment, for these who need it right now. Other may collectively wait for or help in getting to perfection.

-- 
Dmitry Olshansky

October 11, 2013

Re: std.d.lexer : voting thread

Posted by Dmitry Olshansky
in reply to Dmitry Olshansky

Dmitry Olshansky

Posted in reply to Dmitry Olshansky

11-Oct-2013 13:52, Dmitry Olshansky пишет:
> 11-Oct-2013 01:41, Brian Schott пишет:
>> On Thursday, 10 October 2013 at 17:34:01 UTC, Andrei Alexandrescu wrote:
>>> Excellent point! In fact one would need to use t!"<<".id instead of
>>> t!"<<".
>>>
>>> I'll work on that next.
>>>
>>>
>>> Andrei
>>
>> I don't suppose this new lexer is on Github or something. I'd like to
>> help get this new implementation up and running.
>
> Love this attitude! :)
>
> Having helped with std.d.lexer before (w.r.t. to performance mostly) I'm
> inclined to land a hand in perfecting the more generic one.
>

s/land/lend/


-- 
Dmitry Olshansky

October 11, 2013

Re: std.d.lexer performance (WAS: std.d.lexer : voting thread)

Posted by Jonathan M Davis
in reply to Dmitry Olshansky

Jonathan M Davis

Posted in reply to Dmitry Olshansky

On Friday, October 11, 2013 13:53:29 Dmitry Olshansky wrote:
> 11-Oct-2013 13:07, Jonathan M Davis пишет:
> > On Friday, October 11, 2013 12:56:14 Dmitry Olshansky wrote:
> >> 04-Oct-2013 15:28, Brian Schott пишет:
> >>> On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
> >>>> I see we're considerably behind dmd. If improving performance would come at the price of changing the API, it may be sensible to hold off adoption for a bit.
> >>>> 
> >>>> Andrei
> >>> 
> >>> The old benchmarks measured total program run time. I ran a new set of benchmarks, placing stopwatch calls around just the lexing code to bypass any slowness caused by druntime startup. I also made a similar modification to DMD.
> >>> 
> >>> Here's the result:
> >>> 
> >>> https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experim
> >>> en
> >>> tal/std_lexer/images/times5.png
> >>> 
> >>> 
> >>> I suspect that I've made an error in the benchmarking due to how much faster std.d.lexer is than DMD now, so I've uploaded what I have to Github.
> >>> 
> >>> https://github.com/Hackerpilot/lexerbenchmark
> >> 
> >> I'm suspicious of:
> >> printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100));
> >> 
> >> Plus I think clock_gettime often has too coarse resolution (I'd use
> >> gettimeofday as more reliable).
> >> Also check core\time.d  TickDuration.currSystemTick as it uses
> >> CLOCK_MONOTONIC on *nix. You should do the same to make timings
> >> meaningful.
> > 
> > Why not just use use std.datetime's benchmark or StopWatch? Though looking at lexerbenchmark.d it looks like he's using StopWatch rather than clock_gettime directly, and there are no printfs, so I don't know what code you're referring to here. From the looks of it though, he's basically reimplemented std.datetime.benchmark in benchmarklexer.d and probably should have just used benchmark instead.
> 
> Cause it's C++ damn it! ;)

Your comments would make perfect sense for C++, but lexerbenchmark.d is in D. And I don't know what else you could be talking about, because that's all I see referenced here.

- Jonathan M Davis

October 11, 2013

Re: std.d.lexer performance (WAS: std.d.lexer : voting thread)

Posted by Dmitry Olshansky
in reply to Jonathan M Davis

Dmitry Olshansky

Posted in reply to Jonathan M Davis

11-Oct-2013 14:58, Jonathan M Davis пишет:
> On Friday, October 11, 2013 13:53:29 Dmitry Olshansky wrote:
>> 11-Oct-2013 13:07, Jonathan M Davis пишет:
>>> On Friday, October 11, 2013 12:56:14 Dmitry Olshansky wrote:
>>>> 04-Oct-2013 15:28, Brian Schott пишет:
>>>>> On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
>>>>>> I see we're considerably behind dmd. If improving performance would
>>>>>> come at the price of changing the API, it may be sensible to hold off
>>>>>> adoption for a bit.
>>>>>>
>>>>>> Andrei
>>>>>
>>>>> The old benchmarks measured total program run time. I ran a new set of
>>>>> benchmarks, placing stopwatch calls around just the lexing code to
>>>>> bypass any slowness caused by druntime startup. I also made a similar
>>>>> modification to DMD.
>>>>>
>>>>> Here's the result:
>>>>>
>>>>> https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experim
>>>>> en
>>>>> tal/std_lexer/images/times5.png
>>>>>
>>>>>
>>>>> I suspect that I've made an error in the benchmarking due to how much
>>>>> faster std.d.lexer is than DMD now, so I've uploaded what I have to
>>>>> Github.
>>>>>
>>>>> https://github.com/Hackerpilot/lexerbenchmark
>>>>
>>>> I'm suspicious of:
>>>> printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100));
>>>>
>>>> Plus I think clock_gettime often has too coarse resolution (I'd use
>>>> gettimeofday as more reliable).
>>>> Also check core\time.d  TickDuration.currSystemTick as it uses
>>>> CLOCK_MONOTONIC on *nix. You should do the same to make timings
>>>> meaningful.
>>>
>>> Why not just use use std.datetime's benchmark or StopWatch? Though looking
>>> at lexerbenchmark.d it looks like he's using StopWatch rather than
>>> clock_gettime directly, and there are no printfs, so I don't know what
>>> code you're referring to here. From the looks of it though, he's
>>> basically reimplemented std.datetime.benchmark in benchmarklexer.d and
>>> probably should have just used benchmark instead.
>>
>> Cause it's C++ damn it! ;)
>
> Your comments would make perfect sense for C++, but lexerbenchmark.d is in D.
> And I don't know what else you could be talking about, because that's all I
> see referenced here.

I was looking at dmd.diff actually in linked repo.
https://github.com/Hackerpilot/lexerbenchmark/blob/master/dmd.diff

lexerbenchmark.d uses StopWatch.

>
> - Jonathan M Davis
>


-- 
Dmitry Olshansky

October 11, 2013

Re: std.d.lexer : voting thread

Posted by Andrei Alexandrescu
in reply to Dmitry Olshansky

Andrei Alexandrescu

Posted in reply to Dmitry Olshansky

On 10/11/13 2:17 AM, Dmitry Olshansky wrote:
> 06-Oct-2013 20:07, Andrei Alexandrescu пишет:
>> On 10/6/13 5:40 AM, Joseph Rushton Wakeling wrote:
>>> How quickly do you think this vision could be realized? If soon, I'd say
>>> it's worth delaying a decision on the current proposed lexer, if not ...
>>> well, jam tomorrow, perfect is the enemy of good, and all that ...
>>
>> I'm working on related code, and got all the way there in one day
>> (Friday) with a C++ tokenizer for linting purposes (doesn't open
>> #includes or expand #defines etc; it wasn't meant to).
>>
>> The core generated fragment that does the matching is at
>> https://dpaste.de/GZY3.
>>
>> The surrounding switch statement (also in library code) handles
>> whitespace and line counting. The client code needs to handle by hand
>> things like parsing numbers (note how the matcher stops upon the first
>> digit), identifiers, comments (matcher stops upon detecting "//" or
>> "/*") etc. Such things can be achieved with hand-written code (as I do),
>> other similar tokenizers, DFAs, etc. The point is that the core loop
>> that looks at every character looking for a lexeme is fast.
>
> This is something I agree with.
> I'd call that loop the "dispatcher loop" in a sense that it detects the
> kind of stuff and forwards to a special hot loop for that case (if any,
> e.g. skipping comments).
>
> BTW it absolutely must be able to do so in one step, the generated code
> already knows that the token is tok!"//" hence it may call proper
> handler right there.
>
> case '/':
> ... switch(s[1]){
> ...
>      case '/':
>          // it's a pseudo token anyway so instead of
>          //t = tok!"//";
>
>          // just _handle_ it!
>          t = hookFor!"//"(); //user hook for pseudo-token
>          // eats whitespace & returns tok!"comment" or some such
>          // if need be
>          break token_scan;
> }
>
> This also helps to get not only "raw" tokens but allow user to cook
> extra tokens by hand for special cases that can't be handled by
> "dispatcher loop".

That's a good idea. The only concerns I have are:

* I'm biased toward patterns for laying efficient code, having hacked into such for the past year. Even discounting for that, I have the feeling that speed is near the top of the list of people who evaluate lexer generators. I fear that too much inline code present inside a fairly large switch statement may hurt efficiency, which is why I'm biased in favor of "small core loop dispatching upon the first few characters, out-of-line code for handling particular cases that need attention".

* I've grown to be a big fan of the simplicity of the generator. Yes, that also means bare on features but it's simple enough to be used casually for the simplest tasks that people wouldn't normally think of using a lexer for. If we add hookFor, it would be great if it didn't impact simplicity a lot.


Andrei

October 12, 2013

Re: etc vs. package mangers

Posted by SomeDude
in reply to Jonathan M Davis

SomeDude

Posted in reply to Jonathan M Davis

On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
> On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
>> On 2013-10-06 22:40, Andrei Alexandrescu wrote:
>> > I think /etc/ should be a stepping stone to std, just like in C++ boost
>> > is for std (and boost's sandbox is for boost).
>> 
>> Currently "etc" seems like where C bindings are placed.
>
> That's what I thought that it was for. I don't remember etc ever really being
> discussed before, and all it has are C bindings, so the idea that it would
> hold anything other than C bindings is news to me, though I think that we
> should probably shy away from putting C bindings in Phobos in general.
>
> - Jonathan M Davi

The problem is, if these C bindings are removed, the immediate reflex will be to think that Phobos doesn't have the features that were fulfilled by these bindings. So the impulse will be to reinvent the wheel, when these bindings are perfectly okay and do the job well. C bindings is a way to save us time and build upon proven quality libraries. I don't see any problem with C bindings being in the standard library, as long as they are really useful and high quality. The "not invented here" itch is a bad one. The workforce of the community should be directed at real problems and filling real gaps, rather than being wasted at reinventing the wheel merely for aethetic/ideological reasons.

I don't see any need to remove etc.

October 12, 2013

Re: etc vs. package mangers

Posted by Jonathan M Davis
in reply to SomeDude

Jonathan M Davis

Posted in reply to SomeDude

On Saturday, October 12, 2013 11:09:21 SomeDude wrote:
> On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
> > On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
> >> On 2013-10-06 22:40, Andrei Alexandrescu wrote:
> >> > I think /etc/ should be a stepping stone to std, just like
> >> > in C++ boost
> >> > is for std (and boost's sandbox is for boost).
> >> 
> >> Currently "etc" seems like where C bindings are placed.
> > 
> > That's what I thought that it was for. I don't remember etc
> > ever really being
> > discussed before, and all it has are C bindings, so the idea
> > that it would
> > hold anything other than C bindings is news to me, though I
> > think that we
> > should probably shy away from putting C bindings in Phobos in
> > general.
> > 
> > - Jonathan M Davi
> 
> The problem is, if these C bindings are removed, the immediate reflex will be to think that Phobos doesn't have the features that were fulfilled by these bindings. So the impulse will be to reinvent the wheel, when these bindings are perfectly okay and do the job well. C bindings is a way to save us time and build upon proven quality libraries. I don't see any problem with C bindings being in the standard library, as long as they are really useful and high quality. The "not invented here" itch is a bad one. The workforce of the community should be directed at real problems and filling real gaps, rather than being wasted at reinventing the wheel merely for aethetic/ideological reasons.
> 
> I don't see any need to remove etc.

Deimos is for C bindings, not Phobos. We don't want any more modules in std built on top of C bindings for libraries that aren't guaranteed to be on all of the systems that we support. Having std.net.curl has been very problematic due to the problems with getting a proper version of libcurl to link against in Windows, and there has even been some discussion of removing it entirely. So, there will be no more Phobos modules built on anything like curl or openssl or gcrypt or any other C library which isn't guaranteed to be on all systems. That being the case, there's no point in putting C bindings in Phobos. Deimos was created specifically so that there wolud be a place to get bindings to C libraries. We may want to make some adjustments to how Deimos is handled, but it's our solution to C bindings, not Phobos:

https://github.com/D-Programming-Deimos

druntime should have C bindings for the OSes that we support, but that's the only C bindings that should be in D's standard libraries. Whether we'll remove any that we have is still up for debate, but we're not adding any more.

- Jonathan M Davis

October 13, 2013

Re: etc vs. package mangers

Posted by Paulo Pinto
in reply to Jonathan M Davis

Paulo Pinto

Posted in reply to Jonathan M Davis

Am 13.10.2013 01:11, schrieb Jonathan M Davis:
> On Saturday, October 12, 2013 11:09:21 SomeDude wrote:
>> On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
>>> On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
>>>> On 2013-10-06 22:40, Andrei Alexandrescu wrote:
>>>>> I think /etc/ should be a stepping stone to std, just like
>>>>> in C++ boost
>>>>> is for std (and boost's sandbox is for boost).
>>>>
>>>> Currently "etc" seems like where C bindings are placed.
>>>
>>> That's what I thought that it was for. I don't remember etc
>>> ever really being
>>> discussed before, and all it has are C bindings, so the idea
>>> that it would
>>> hold anything other than C bindings is news to me, though I
>>> think that we
>>> should probably shy away from putting C bindings in Phobos in
>>> general.
>>>
>>> - Jonathan M Davi
>>
>> The problem is, if these C bindings are removed, the immediate
>> reflex will be to think that Phobos doesn't have the features
>> that were fulfilled by these bindings. So the impulse will be to
>> reinvent the wheel, when these bindings are perfectly okay and do
>> the job well. C bindings is a way to save us time and build upon
>> proven quality libraries. I don't see any problem with C bindings
>> being in the standard library, as long as they are really useful
>> and high quality. The "not invented here" itch is a bad one. The
>> workforce of the community should be directed at real problems
>> and filling real gaps, rather than being wasted at reinventing
>> the wheel merely for aethetic/ideological reasons.
>>
>> I don't see any need to remove etc.
>
> Deimos is for C bindings, not Phobos. We don't want any more modules in std
> built on top of C bindings for libraries that aren't guaranteed to be on all
> of the systems that we support. Having std.net.curl has been very problematic
> due to the problems with getting a proper version of libcurl to link against
> in Windows, and there has even been some discussion of removing it entirely.
> So, there will be no more Phobos modules built on anything like curl or
> openssl or gcrypt or any other C library which isn't guaranteed to be on all
> systems. That being the case, there's no point in putting C bindings in
> Phobos. Deimos was created specifically so that there wolud be a place to get
> bindings to C libraries. We may want to make some adjustments to how Deimos is
> handled, but it's our solution to C bindings, not Phobos:
>
> https://github.com/D-Programming-Deimos
>
> druntime should have C bindings for the OSes that we support, but that's the
> only C bindings that should be in D's standard libraries. Whether we'll remove
> any that we have is still up for debate, but we're not adding any more.
>
> - Jonathan M Davis
>

+1 for removing std.net.curl.

--
Paulo

October 13, 2013

Re: etc vs. package mangers

Posted by Jordi Sayol

Jordi Sayol

On 13/10/13 01:11, Jonathan M Davis wrote:
> On Saturday, October 12, 2013 11:09:21 SomeDude wrote:
>> On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
>>> On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
>>>> On 2013-10-06 22:40, Andrei Alexandrescu wrote:
>>>>> I think /etc/ should be a stepping stone to std, just like
>>>>> in C++ boost
>>>>> is for std (and boost's sandbox is for boost).
>>>>
>>>> Currently "etc" seems like where C bindings are placed.
>>>
>>> That's what I thought that it was for. I don't remember etc
>>> ever really being
>>> discussed before, and all it has are C bindings, so the idea
>>> that it would
>>> hold anything other than C bindings is news to me, though I
>>> think that we
>>> should probably shy away from putting C bindings in Phobos in
>>> general.
>>>
>>> - Jonathan M Davi
>>
>> The problem is, if these C bindings are removed, the immediate reflex will be to think that Phobos doesn't have the features that were fulfilled by these bindings. So the impulse will be to reinvent the wheel, when these bindings are perfectly okay and do the job well. C bindings is a way to save us time and build upon proven quality libraries. I don't see any problem with C bindings being in the standard library, as long as they are really useful and high quality. The "not invented here" itch is a bad one. The workforce of the community should be directed at real problems and filling real gaps, rather than being wasted at reinventing the wheel merely for aethetic/ideological reasons.
>>
>> I don't see any need to remove etc.
> 
> Deimos is for C bindings, not Phobos. We don't want any more modules in std built on top of C bindings for libraries that aren't guaranteed to be on all of the systems that we support. Having std.net.curl has been very problematic due to the problems with getting a proper version of libcurl to link against in Windows, and there has even been some discussion of removing it entirely. So, there will be no more Phobos modules built on anything like curl or openssl or gcrypt or any other C library which isn't guaranteed to be on all systems. That being the case, there's no point in putting C bindings in Phobos. Deimos was created specifically so that there wolud be a place to get bindings to C libraries. We may want to make some adjustments to how Deimos is handled, but it's our solution to C bindings, not Phobos:
> 
> https://github.com/D-Programming-Deimos
> 
> druntime should have C bindings for the OSes that we support, but that's the only C bindings that should be in D's standard libraries. Whether we'll remove any that we have is still up for debate, but we're not adding any more.
> 
> - Jonathan M Davis
> 

+1 for removing std.net.curl too

-- 
Jordi Sayol

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation