std.d.lexer requirements (page 14)

On 2012-08-02 22:51, Christophe Travert wrote: > Jacob Carlborg , dans le message (digitalmars.D:174069), a écrit : >> On 2012-08-02 10:15, Walter Bright wrote: >> >>> Worst case use an adapter range. >> >> And that is better than a plain string? >> > because its front method does not do any decoding. If it was a plain string you wouldn't use "front". You would handle any, possible, decoding by yourself, internally in the lexer. This is what Johnathan already does, it seems: static if(isNarrowString!R) Unqual!(ElementEncodingType!R) first = range[0]; else dchar first = range.front; If you're only supporting plain UTF-8 strings you would just do: char first = range[0]; -- /Jacob Carlborg

On 2012-08-03 08:49, dennis luehring wrote: > wouldn't it be better to extract the lexer part of dmd into its own > (hopefully small) library - that way the lexer is still useable by dmd > AND benchmarkable from outside - it is then even possible to replace the > dmd lexer by an D version due to the c linkage feature of D That was what I meant. If it's easy someone would have already done it. -- /Jacob Carlborg

On 8/2/2012 11:37 PM, Jacob Carlborg wrote: > I'm not sure how easy it would be to just measure the lexing phase of DMD. If > it's easy someone would probably already have extracted the lexer from DMD. You don't need to extract it to measure it. Just have it lex the source files in a loop, and time that loop.

On 2012-08-03 08:59, Walter Bright wrote: > You don't need to extract it to measure it. Just have it lex the source > files in a loop, and time that loop. Well, that's the problem. It's not like DMD has a single "lex" function that does all the job. Would it perhaps be possible to time Parser::parseModule and remove all things that doesn't seem related to lexing? Remove stuff like: a = new Identifiers(); md = new ModuleDeclaration(a, id, safe); And similar. -- /Jacob Carlborg

On 8/3/2012 12:11 AM, Jacob Carlborg wrote: > On 2012-08-03 08:59, Walter Bright wrote: > >> You don't need to extract it to measure it. Just have it lex the source >> files in a loop, and time that loop. > > Well, that's the problem. It's not like DMD has a single "lex" function that > does all the job. > > Would it perhaps be possible to time Parser::parseModule and remove all things > that doesn't seem related to lexing? > > Remove stuff like: > > a = new Identifiers(); > md = new ModuleDeclaration(a, id, safe); > > And similar. > Look in doc.c at highlightCode2() for how to call the lexer by itself.

On 08/02/2012 04:41 AM, Walter Bright wrote: > On 8/2/2012 1:21 AM, Jonathan M Davis wrote: >> How would we measure that? dmd's lexer is tied to dmd, so how would we >> test >> the speed of only its lexer? > > Easy. Just make a special version of dmd that lexes only, and time it. I made a lexing-only version of dmd at https://github.com/edmccard/dmd/tree/lexonly by stripping non-lexer-related code from mars.c, and adding a lexModule function that is called instead of Module::parse.. There's no benchmarking code yet; it basically just does while (token.value != TOKeof) nextToken(); for each D source file passed on the command line. --Ed

Jacob Carlborg , dans le message (digitalmars.D:174131), a écrit : > static if(isNarrowString!R) > Unqual!(ElementEncodingType!R) first = range[0]; > else > dchar first = range.front; I find it more comfortable to just use first = range.front, with a range of char or ubyte. This range does not have to be a string, it can be a something over a file, stream, socket. It can also be the result of an algorithm, because you *can* use algorithm on ranges of char, and it makes sense if you know what you are doing. If Walter discovers the lexer does not work with a socket, a "file.byChunk.join", and has to do expensive utf-8 decoding for the lexer to work because it can only use range of dchar, and not range of char (except that it special-cased strings), he may not be happy. It the range happens to be a string, I would use an adapter to make it appear like a range of char, not dchar, like the library likes to do. I think Andrei suggested that already. -- Christophe

August 03, 2012

Re: std.d.lexer requirements

Posted by dennis luehring
in reply to Ed McCardell

Permalink

dennis luehring

Posted in reply to Ed McCardell

Permalink

Am 03.08.2012 09:56, schrieb Ed McCardell:
> On 08/02/2012 04:41 AM, Walter Bright wrote:
>> On 8/2/2012 1:21 AM, Jonathan M Davis wrote:
>>> How would we measure that? dmd's lexer is tied to dmd, so how would we
>>> test
>>> the speed of only its lexer?
>>
>> Easy. Just make a special version of dmd that lexes only, and time it.
>
> I made a lexing-only version of dmd at
>
> https://github.com/edmccard/dmd/tree/lexonly
>
> by stripping non-lexer-related code from mars.c, and adding a lexModule
> function that is called instead of Module::parse.. There's no
> benchmarking code yet; it basically just does
>
>     while (token.value != TOKeof) nextToken();
>
> for each D source file passed on the command line.
>
> --Ed
>

Walter mentioned an easier way - without "forking" dmd - should be better an integral part of the ongoing development maybe under a tools or benchmark section?

news://news.digitalmars.com:119/jvfv14$1ri0$1@digitalmars.com
->Look in doc.c at highlightCode2() for how to call the lexer by itself.

On 8/3/2012 1:27 AM, dennis luehring wrote: > Walter mentioned an easier way - without "forking" dmd - should be better an > integral part of the ongoing development maybe under a tools or benchmark section? Forking it is fine. This is just for a one-off benchmarking thing.

On 2012-08-03 09:35, Walter Bright wrote: > Look in doc.c at highlightCode2() for how to call the lexer by itself. So basically: Token tok; //start timer while (tok.value != TOKeof) lex.scan(&tok); //end timer Something like that? -- /Jacob Carlborg

Forums