Jump to page: 1 27  
Page
Thread overview
Lexer in D
Mar 02, 2013
Namespace
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Namespace
Mar 02, 2013
David
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Namespace
Mar 02, 2013
Namespace
Mar 02, 2013
Namespace
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Namespace
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Namespace
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Namespace
Mar 02, 2013
Namespace
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Zach the Mystic
Mar 02, 2013
Dmitry Olshansky
Mar 02, 2013
Namespace
Mar 02, 2013
Namespace
Mar 02, 2013
Namespace
Mar 03, 2013
Dmitry Olshansky
Mar 03, 2013
Namespace
Mar 03, 2013
Dmitry Olshansky
Mar 03, 2013
Namespace
Mar 03, 2013
Namespace
Mar 03, 2013
Dmitry Olshansky
Mar 03, 2013
Namespace
Mar 03, 2013
Dmitry Olshansky
Mar 03, 2013
Namespace
Mar 04, 2013
Namespace
Mar 04, 2013
Namespace
Mar 04, 2013
Dmitry Olshansky
Mar 04, 2013
Namespace
Mar 04, 2013
Dmitry Olshansky
Mar 04, 2013
Namespace
Mar 04, 2013
Dmitry Olshansky
Mar 11, 2013
Namespace
Mar 11, 2013
Namespace
Mar 11, 2013
FG
Mar 11, 2013
Namespace
Mar 11, 2013
David
Mar 11, 2013
Namespace
Mar 11, 2013
monarch_dodra
Mar 11, 2013
Namespace
Mar 11, 2013
monarch_dodra
Mar 11, 2013
Namespace
Mar 13, 2013
Namespace
Mar 14, 2013
Dmitry Olshansky
Mar 14, 2013
Namespace
Mar 14, 2013
Dmitry Olshansky
Mar 14, 2013
Artur Skawina
Mar 14, 2013
Dmitry Olshansky
Mar 15, 2013
Namespace
Mar 21, 2013
Namespace
Mar 23, 2013
Namespace
Mar 24, 2013
Namespace
Mar 24, 2013
Dmitry Olshansky
Mar 24, 2013
Namespace
Mar 24, 2013
Brian Schott
Mar 24, 2013
Namespace
Mar 14, 2013
Dmitry Olshansky
Mar 15, 2013
Namespace
Mar 04, 2013
Brian Schott
Mar 04, 2013
Dmitry Olshansky
Mar 03, 2013
Dmitry Olshansky
Mar 03, 2013
Jonathan M Davis
Mar 03, 2013
Dmitry Olshansky
March 02, 2013
For one of our courses this year we have a very cool topic: Lexer and Compiler.
In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time.
That's why I wanted to start again to write a lexer (hitherto purely for the learning effect).
I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime.
So I wanted to ask whether there are nice people who help me to improve our lexer.

Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d

Thanks in advance. :)
March 02, 2013
02-Mar-2013 13:33, Namespace пишет:
> For one of our courses this year we have a very cool topic: Lexer and
> Compiler.
> In the past I have already tried some experiments in this direction with
> Romulus and Remus, but I've never measured my Lexing time.
> That's why I wanted to start again to write a lexer (hitherto purely for
> the learning effect).
> I looked at the code of some of the other lexers, that are also written
> in D, to get some inspiration from, but currently my Lexer is a little
> bit slow, it requires for example ~200 msecs for std.datetime.
> So I wanted to ask whether there are nice people who help me to improve
> our lexer.
>
> Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d
>
> Thanks in advance. :)

See this one I'm currently investing my time into:

https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer

Other then this use cachegrind and compile with ldc/gdc :)

Also for dmd check the flags are "-release -O -inline  -noboundscheck"

-- 
Dmitry Olshansky
March 02, 2013
I'm already using these flags. But thanks for your Link I will take a look at it. :)
March 02, 2013
Am 02.03.2013 10:33, schrieb Namespace:
> For one of our courses this year we have a very cool topic: Lexer and
> Compiler.
> In the past I have already tried some experiments in this direction with
> Romulus and Remus, but I've never measured my Lexing time.
> That's why I wanted to start again to write a lexer (hitherto purely for
> the learning effect).
> I looked at the code of some of the other lexers, that are also written
> in D, to get some inspiration from, but currently my Lexer is a little
> bit slow, it requires for example ~200 msecs for std.datetime.
> So I wanted to ask whether there are nice people who help me to improve
> our lexer.
> 
> Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d
> 
> Thanks in advance. :)

Making `types` and `keywords` an array might improve the speed a little bit:

if (id in types) -> if (types.canFind(id))
(same with keywords)
March 02, 2013
02-Mar-2013 15:04, David пишет:
> Am 02.03.2013 10:33, schrieb Namespace:
>> For one of our courses this year we have a very cool topic: Lexer and
>> Compiler.
>> In the past I have already tried some experiments in this direction with
>> Romulus and Remus, but I've never measured my Lexing time.
>> That's why I wanted to start again to write a lexer (hitherto purely for
>> the learning effect).
>> I looked at the code of some of the other lexers, that are also written
>> in D, to get some inspiration from, but currently my Lexer is a little
>> bit slow, it requires for example ~200 msecs for std.datetime.
>> So I wanted to ask whether there are nice people who help me to improve
>> our lexer.
>>
>> Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d
>>
>> Thanks in advance. :)
>
> Making `types` and `keywords` an array might improve the speed a little bit:
>
> if (id in types) -> if (types.canFind(id))
> (same with keywords)

That would be slower as canFind is linear search.
Simpler way is use first letter and length to select a bucket of keywords.

But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket.


-- 
Dmitry Olshansky
March 02, 2013
With canFind it takes ~300 msecs. :/
Any idea how I could improve it?

> Simpler way is use first letter and length to select a bucket of keywords.

I don't understand how you could do this without AA's and how this could be more performant than canFind.
March 02, 2013
I sorted both, types and keywords, alphabetical and wrote my own 'canFind':

bool canFind(const size_t size)(ref const string[size] values, string value) pure nothrow {
	if (value[0] < 'm' || value[0] == '_') {
		foreach (string _val; values) {
			if (_val == value) return true;
		}
	} else {
		foreach_reverse (string _val; values) {
			if (_val == value) return true;
		}
	}
	
	return false;
}

That brings ~20 msecs, so I have ~280 mesecs instead of ~ 300. But it isn't perfect.
March 02, 2013
> That would be slower as canFind is linear search.
> Simpler way is use first letter and length to select a bucket of keywords.

I think I get it.
Did you mean something like this?

[code]
immutable ubyte[char] typeMap;
immutable ubyte[char] keywordMap;

static this() {
	typeMap = [
		'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26
	];
	
	keywordMap = [
		'_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : 30, 'g' : 36, 'i' : 37, 'l' : 45,
		'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' : 69, 'u' : 77, 'v' : 79, 'w' : 81
	];
}

bool isKeyword(string value) pure nothrow {
	if (value[0] !in keywordMap) return false;
	
	foreach (string kword; keywords[keywordMap[value[0]] .. $]) {
		if (kword == value) return true;
		if (kword[0] != value[0]) return false;
	}
	
	return false;
}

bool isType(string value) pure nothrow {
	if (value[0] !in typeMap) return false;
	
	foreach (string type; types[typeMap[value[0]] .. $]) {
		if (type == value) return true;
		if (type[0] != value[0]) return false;
	}
	
	return false;
}
[/code]

I'm now by ~230 msecs.
March 02, 2013
02-Mar-2013 18:53, Namespace пишет:
>> That would be slower as canFind is linear search.
>> Simpler way is use first letter and length to select a bucket of
>> keywords.
>
> I think I get it.
> Did you mean something like this?
>

No-no and no.

Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code.

In fact I proposed to do just 2 levels of switches: first switch by length then by first char.

Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e.

Even this should have decent speed:

string[][256] keywords = [
null,
null,
...
//at index 'i'
['int', 'if', 'invariant', 'idouble', 'ifloat' ...]
//array of strings starting with char for each
...
];

bool isKeyword(string value)
{
	string[] toSearch = keywords[value[0]];
	return toSearch.canFind(value);
}

Better yet only store [1..$] parts of keywords and search for value[1..$].

Even faster is to make the same for each possible length of keyword.

> [code]
> immutable ubyte[char] typeMap;
> immutable ubyte[char] keywordMap;
>
> static this() {
>      typeMap = [
>          'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' :
> 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26
>      ];
>
>      keywordMap = [
>          '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' :
> 30, 'g' : 36, 'i' : 37, 'l' : 45,
>          'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't'
> : 69, 'u' : 77, 'v' : 79, 'w' : 81
>      ];
> }
>
> bool isKeyword(string value) pure nothrow {
>      if (value[0] !in keywordMap) return false;
>
>      foreach (string kword; keywords[keywordMap[value[0]] .. $]) {
>          if (kword == value) return true;
>          if (kword[0] != value[0]) return false;
>      }
>
>      return false;
> }
>
> bool isType(string value) pure nothrow {
>      if (value[0] !in typeMap) return false;
>
>      foreach (string type; types[typeMap[value[0]] .. $]) {
>          if (type == value) return true;
>          if (type[0] != value[0]) return false;
>      }
>
>      return false;
> }
> [/code]
>
> I'm now by ~230 msecs.


-- 
Dmitry Olshansky
March 02, 2013
> No-no and no.
>
> Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code.
> In fact I proposed to do just 2 levels of switches: first switch by length then by first char.

I don't understand.
You don't mean something like:

switch (value) {
case "if": isKeyword = true; break;
...

right?

> Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e.
>
> Even this should have decent speed:
>
> string[][256] keywords = [
> null,
> null,
> ...
> //at index 'i'
> ['int', 'if', 'invariant', 'idouble', 'ifloat' ...]
> //array of strings starting with char for each
> ...
> ];
>
> bool isKeyword(string value)
> {
> 	string[] toSearch = keywords[value[0]];
> 	return toSearch.canFind(value);
> }

Ok I will try it. Thank you.
« First   ‹ Prev
1 2 3 4 5 6 7