Thread overview | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 02, 2013 Lexer in D | ||||
---|---|---|---|---|
| ||||
For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :) |
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | 02-Mar-2013 13:33, Namespace пишет: > For one of our courses this year we have a very cool topic: Lexer and > Compiler. > In the past I have already tried some experiments in this direction with > Romulus and Remus, but I've never measured my Lexing time. > That's why I wanted to start again to write a lexer (hitherto purely for > the learning effect). > I looked at the code of some of the other lexers, that are also written > in D, to get some inspiration from, but currently my Lexer is a little > bit slow, it requires for example ~200 msecs for std.datetime. > So I wanted to ask whether there are nice people who help me to improve > our lexer. > > Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d > > Thanks in advance. :) See this one I'm currently investing my time into: https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer Other then this use cachegrind and compile with ldc/gdc :) Also for dmd check the flags are "-release -O -inline -noboundscheck" -- Dmitry Olshansky |
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | I'm already using these flags. But thanks for your Link I will take a look at it. :) |
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | Am 02.03.2013 10:33, schrieb Namespace:
> For one of our courses this year we have a very cool topic: Lexer and
> Compiler.
> In the past I have already tried some experiments in this direction with
> Romulus and Remus, but I've never measured my Lexing time.
> That's why I wanted to start again to write a lexer (hitherto purely for
> the learning effect).
> I looked at the code of some of the other lexers, that are also written
> in D, to get some inspiration from, but currently my Lexer is a little
> bit slow, it requires for example ~200 msecs for std.datetime.
> So I wanted to ask whether there are nice people who help me to improve
> our lexer.
>
> Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d
>
> Thanks in advance. :)
Making `types` and `keywords` an array might improve the speed a little bit:
if (id in types) -> if (types.canFind(id))
(same with keywords)
|
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to David | 02-Mar-2013 15:04, David пишет: > Am 02.03.2013 10:33, schrieb Namespace: >> For one of our courses this year we have a very cool topic: Lexer and >> Compiler. >> In the past I have already tried some experiments in this direction with >> Romulus and Remus, but I've never measured my Lexing time. >> That's why I wanted to start again to write a lexer (hitherto purely for >> the learning effect). >> I looked at the code of some of the other lexers, that are also written >> in D, to get some inspiration from, but currently my Lexer is a little >> bit slow, it requires for example ~200 msecs for std.datetime. >> So I wanted to ask whether there are nice people who help me to improve >> our lexer. >> >> Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d >> >> Thanks in advance. :) > > Making `types` and `keywords` an array might improve the speed a little bit: > > if (id in types) -> if (types.canFind(id)) > (same with keywords) That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket. -- Dmitry Olshansky |
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | With canFind it takes ~300 msecs. :/
Any idea how I could improve it?
> Simpler way is use first letter and length to select a bucket of keywords.
I don't understand how you could do this without AA's and how this could be more performant than canFind.
|
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | I sorted both, types and keywords, alphabetical and wrote my own 'canFind': bool canFind(const size_t size)(ref const string[size] values, string value) pure nothrow { if (value[0] < 'm' || value[0] == '_') { foreach (string _val; values) { if (_val == value) return true; } } else { foreach_reverse (string _val; values) { if (_val == value) return true; } } return false; } That brings ~20 msecs, so I have ~280 mesecs instead of ~ 300. But it isn't perfect. |
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | > That would be slower as canFind is linear search.
> Simpler way is use first letter and length to select a bucket of keywords.
I think I get it.
Did you mean something like this?
[code]
immutable ubyte[char] typeMap;
immutable ubyte[char] keywordMap;
static this() {
typeMap = [
'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26
];
keywordMap = [
'_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : 30, 'g' : 36, 'i' : 37, 'l' : 45,
'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' : 69, 'u' : 77, 'v' : 79, 'w' : 81
];
}
bool isKeyword(string value) pure nothrow {
if (value[0] !in keywordMap) return false;
foreach (string kword; keywords[keywordMap[value[0]] .. $]) {
if (kword == value) return true;
if (kword[0] != value[0]) return false;
}
return false;
}
bool isType(string value) pure nothrow {
if (value[0] !in typeMap) return false;
foreach (string type; types[typeMap[value[0]] .. $]) {
if (type == value) return true;
if (type[0] != value[0]) return false;
}
return false;
}
[/code]
I'm now by ~230 msecs.
|
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | 02-Mar-2013 18:53, Namespace пишет: >> That would be slower as canFind is linear search. >> Simpler way is use first letter and length to select a bucket of >> keywords. > > I think I get it. > Did you mean something like this? > No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char. Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. Even this should have decent speed: string[][256] keywords = [ null, null, ... //at index 'i' ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] //array of strings starting with char for each ... ]; bool isKeyword(string value) { string[] toSearch = keywords[value[0]]; return toSearch.canFind(value); } Better yet only store [1..$] parts of keywords and search for value[1..$]. Even faster is to make the same for each possible length of keyword. > [code] > immutable ubyte[char] typeMap; > immutable ubyte[char] keywordMap; > > static this() { > typeMap = [ > 'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : > 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26 > ]; > > keywordMap = [ > '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : > 30, 'g' : 36, 'i' : 37, 'l' : 45, > 'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' > : 69, 'u' : 77, 'v' : 79, 'w' : 81 > ]; > } > > bool isKeyword(string value) pure nothrow { > if (value[0] !in keywordMap) return false; > > foreach (string kword; keywords[keywordMap[value[0]] .. $]) { > if (kword == value) return true; > if (kword[0] != value[0]) return false; > } > > return false; > } > > bool isType(string value) pure nothrow { > if (value[0] !in typeMap) return false; > > foreach (string type; types[typeMap[value[0]] .. $]) { > if (type == value) return true; > if (type[0] != value[0]) return false; > } > > return false; > } > [/code] > > I'm now by ~230 msecs. -- Dmitry Olshansky |
March 02, 2013 Re: Lexer in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | > No-no and no. > > Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. > In fact I proposed to do just 2 levels of switches: first switch by length then by first char. I don't understand. You don't mean something like: switch (value) { case "if": isKeyword = true; break; ... right? > Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. > > Even this should have decent speed: > > string[][256] keywords = [ > null, > null, > ... > //at index 'i' > ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] > //array of strings starting with char for each > ... > ]; > > bool isKeyword(string value) > { > string[] toSearch = keywords[value[0]]; > return toSearch.canFind(value); > } Ok I will try it. Thank you. |
Copyright © 1999-2021 by the D Language Foundation