August 03, 2012
On 2012-08-02 22:51, Christophe Travert wrote:
> Jacob Carlborg , dans le message (digitalmars.D:174069), a écrit :
>> On 2012-08-02 10:15, Walter Bright wrote:
>>
>>> Worst case use an adapter range.
>>
>> And that is better than a plain string?
>>
> because its front method does not do any decoding.

If it was a plain string you wouldn't use "front". You would handle any, possible, decoding by yourself, internally in the lexer. This is what Johnathan already does, it seems:

static if(isNarrowString!R)
    Unqual!(ElementEncodingType!R) first = range[0];
else
    dchar first = range.front;

If you're only supporting plain UTF-8 strings you would just do:

char first = range[0];

-- 
/Jacob Carlborg
August 03, 2012
On 2012-08-03 08:49, dennis luehring wrote:

> wouldn't it be better to extract the lexer part of dmd into its own
> (hopefully small) library - that way the lexer is still useable by dmd
> AND benchmarkable from outside - it is then even possible to replace the
> dmd lexer by an D version due to the c linkage feature of D

That was what I meant. If it's easy someone would have already done it.

-- 
/Jacob Carlborg
August 03, 2012
On 8/2/2012 11:37 PM, Jacob Carlborg wrote:
> I'm not sure how easy it would be to just measure the lexing phase of DMD. If
> it's easy someone would probably already have extracted the lexer from DMD.


You don't need to extract it to measure it. Just have it lex the source files in a loop, and time that loop.

August 03, 2012
On 2012-08-03 08:59, Walter Bright wrote:

> You don't need to extract it to measure it. Just have it lex the source
> files in a loop, and time that loop.

Well, that's the problem. It's not like DMD has a single "lex" function that does all the job.

Would it perhaps be possible to time Parser::parseModule and remove all things that doesn't seem related to lexing?

Remove stuff like:

a = new Identifiers();
md = new ModuleDeclaration(a, id, safe);

And similar.

-- 
/Jacob Carlborg
August 03, 2012
On 8/3/2012 12:11 AM, Jacob Carlborg wrote:
> On 2012-08-03 08:59, Walter Bright wrote:
>
>> You don't need to extract it to measure it. Just have it lex the source
>> files in a loop, and time that loop.
>
> Well, that's the problem. It's not like DMD has a single "lex" function that
> does all the job.
>
> Would it perhaps be possible to time Parser::parseModule and remove all things
> that doesn't seem related to lexing?
>
> Remove stuff like:
>
> a = new Identifiers();
> md = new ModuleDeclaration(a, id, safe);
>
> And similar.
>

Look in doc.c at highlightCode2() for how to call the lexer by itself.

August 03, 2012
On 08/02/2012 04:41 AM, Walter Bright wrote:
> On 8/2/2012 1:21 AM, Jonathan M Davis wrote:
>> How would we measure that? dmd's lexer is tied to dmd, so how would we
>> test
>> the speed of only its lexer?
>
> Easy. Just make a special version of dmd that lexes only, and time it.

I made a lexing-only version of dmd at

https://github.com/edmccard/dmd/tree/lexonly

by stripping non-lexer-related code from mars.c, and adding a lexModule function that is called instead of Module::parse.. There's no benchmarking code yet; it basically just does

  while (token.value != TOKeof) nextToken();

for each D source file passed on the command line.

--Ed

August 03, 2012
Jacob Carlborg , dans le message (digitalmars.D:174131), a écrit :
> static if(isNarrowString!R)
>      Unqual!(ElementEncodingType!R) first = range[0];
> else
>      dchar first = range.front;

I find it more comfortable to just use
first = range.front, with a range of char or ubyte.

This range does not have to be a string, it can be a something over a file, stream, socket. It can also be the result of an algorithm, because you *can* use algorithm on ranges of char, and it makes sense if you know what you are doing.

If Walter discovers the lexer does not work with a socket, a "file.byChunk.join", and has to do expensive utf-8 decoding for the lexer to work because it can only use range of dchar, and not range of char (except that it special-cased strings), he may not be happy.

It the range happens to be a string, I would use an adapter to make it appear like a range of char, not dchar, like the library likes to do. I think Andrei suggested that already.

-- 
Christophe
August 03, 2012
Am 03.08.2012 09:56, schrieb Ed McCardell:
> On 08/02/2012 04:41 AM, Walter Bright wrote:
>> On 8/2/2012 1:21 AM, Jonathan M Davis wrote:
>>> How would we measure that? dmd's lexer is tied to dmd, so how would we
>>> test
>>> the speed of only its lexer?
>>
>> Easy. Just make a special version of dmd that lexes only, and time it.
>
> I made a lexing-only version of dmd at
>
> https://github.com/edmccard/dmd/tree/lexonly
>
> by stripping non-lexer-related code from mars.c, and adding a lexModule
> function that is called instead of Module::parse.. There's no
> benchmarking code yet; it basically just does
>
>     while (token.value != TOKeof) nextToken();
>
> for each D source file passed on the command line.
>
> --Ed
>

Walter mentioned an easier way - without "forking" dmd - should be better an integral part of the ongoing development maybe under a tools or benchmark section?

news://news.digitalmars.com:119/jvfv14$1ri0$1@digitalmars.com
->Look in doc.c at highlightCode2() for how to call the lexer by itself.




August 03, 2012
On 8/3/2012 1:27 AM, dennis luehring wrote:
> Walter mentioned an easier way - without "forking" dmd - should be better an
> integral part of the ongoing development maybe under a tools or benchmark section?


Forking it is fine. This is just for a one-off benchmarking thing.

August 03, 2012
On 2012-08-03 09:35, Walter Bright wrote:

> Look in doc.c at highlightCode2() for how to call the lexer by itself.

So basically:

Token tok;

//start timer

while (tok.value != TOKeof)
    lex.scan(&tok);

//end timer

Something like that?

-- 
/Jacob Carlborg