May 11, 2012
On Friday, 11 May 2012 at 09:02:12 UTC, Jacob Carlborg wrote:
> If think that the end goal of a project like this, putting a D frontend in Phobos, should be that the compiler should be built using this library. This would result in the compiler and library always being in sync and having the same behavior. Otherwise it's easy this would be just another tool that tries to lex and parse D code, always being out of sync with the compiler and not having the same behavior.
>
> For this to happen, for Walter to start using this, I think there would be a greater change if the frontend was a port of the DMD frontend and not changed too much.

My plan is to create frontend that would be much better than existing, both in design and implementation. I decided to work on this full time for several months.

Front end will not produce the same data as DMD front end does, so most likely it will not be suitable to replace existing C++ implementation. But since no information will be lost I will consider creating a separate project which would build output compatible with existing (not identical, I don't think that would be feasible).
May 11, 2012
On Friday, 11 May 2012 at 09:21:29 UTC, dennis luehring wrote:
> Am 11.05.2012 11:02, schrieb Jacob Carlborg:
>> For this to happen, for Walter to start using this, I think there would
>> be a greater change if the frontend was a port of the DMD frontend and
>> not changed too much.
>
> or a pure D version of it with the features:
>
> -very fast in parsing/lexing - there need to be a benchmark enviroment from the very start
Will add that to May roadmap.

> -easy to extend and fix
>
> and think that is what walter wants from an D-ish frontend in the first place

I plan to go this way.
May 11, 2012
On 2012-05-11 11:22, Roman D. Boiko wrote:

>> What about line and column information?
> Indices of the first code unit of each line are stored inside lexer and
> a function will compute Location (line number, column number, file
> specification) for any index. This way size of Token instance is reduced
> to the minimum. It is assumed that Location can be computed on demand,
> and is not needed frequently. So column is calculated by reverse walk
> till previous end of line, etc. Locations will possible to calculate
> both taking into account special token sequences (e.g., #line 3
> "ab/c.d"), or discarding them.

Aha, clever. As long as I can get out the information I'm happy :) How about adding properties for this in the token struct?

>>>> * Does it convert numerical literals and similar to their actual values
>>> It is planned to add a post-processor for that as part of parser,
>>> please see README.md for some more details.
>>
>> Isn't that a job for the lexer?
> That might be done in lexer for efficiency reasons (to avoid lexing
> token value again). But separating this into a dedicated post-processing
> phase leads to a much cleaner design (IMO), also suitable for uses when
> such values are not needed.

That might be the case. But I don't think it belongs in the parser.

> Also I don't think that performance would be
> improved given the ratio of number of literals to total number of tokens
> and the need to store additional information per token if it is done in
> lexer. I will elaborate on that later.

Ok, fair enough. Perhaps this could be a property in the Token struct as well. In that case I would suggest renaming "value" to lexeme/spelling/representation, or something like that, and then name the new property "value".

-- 
/Jacob Carlborg
May 11, 2012
On 2012-05-11 11:31, Roman D. Boiko wrote:

> My plan is to create frontend that would be much better than existing,
> both in design and implementation. I decided to work on this full time
> for several months.

That's good news.

> Front end will not produce the same data as DMD front end does, so most
> likely it will not be suitable to replace existing C++ implementation.

That's too bad.

> But since no information will be lost I will consider creating a
> separate project which would build output compatible with existing (not
> identical, I don't think that would be feasible).

Ok.

-- 
/Jacob Carlborg
May 11, 2012
On 2012-05-11 11:23, Roman D. Boiko wrote:
> On Friday, 11 May 2012 at 09:19:07 UTC, dennis luehring wrote:
>> does the parser/lexer allow half-finished syntax parsing? for being
>> useable in an IDE for syntax-highlighting while coding?
> That's planned, but I would like to see your usage scenarios
> (pseudo-code would help a lot).

Example from TextMate:

* "void" - keyword is colored
* "module main" - nothing colored until I type a semicolon
* "module main;" - keyword and "main" is colored (differently)
* "void foo" - keyword is colored
* "void foo (" - keyword and "foo" is colored (differently)
* "struct F" - keyword and "F is colored (differently)

* Literals are always colored.
* User-defined constants are always colored. It's basically any token that is all upper case

-- 
/Jacob Carlborg
May 11, 2012
Am 11.05.2012 11:33, schrieb Roman D. Boiko:
>>  -very fast in parsing/lexing - there need to be a benchmark
>>  enviroment from the very start
> Will add that to May roadmap.

are using slices for prevent coping everything around?

the parser/lexer need to be as fast as the original one - maybe even faster - else it won't replace walters at any time - because speed does matter here very much
May 11, 2012
On Friday, 11 May 2012 at 09:28:36 UTC, dennis luehring wrote:
> Am 11.05.2012 11:23, schrieb Roman D. Boiko:
>> On Friday, 11 May 2012 at 09:19:07 UTC, dennis luehring wrote:
>>> does the parser/lexer allow half-finished syntax parsing? for
>>> being useable in an IDE for syntax-highlighting while coding?
>> That's planned, but I would like to see your usage scenarios
>> (pseudo-code would help a lot).
>>
>
> try to syntaxhiglight while coding - thats the scenario, parts
> of the source code isn't fully valid while writing

I depends on IDE. For example, sublime-text (and most likely TextMate) uses regex for syntax highlighting. That makes it impossible to use with for D in some scenarios (like nested block comments). Any IDE that provides API for coloring will get correct information if code is valid.

If it is not valid, it is only possible to handle specific kinds of errors, but in general there will always be cases when highlighting (or some parsing / semantic analysis information) is "incorrect". I aim to handle common errors gracefully.

In practice, I think, it is possible to handle 99% of problems, but this requires a lot of design / specification work.
May 11, 2012
On Friday, 11 May 2012 at 09:36:28 UTC, Jacob Carlborg wrote:
> On 2012-05-11 11:22, Roman D. Boiko wrote:
>> Locations will possible to calculate
>> both taking into account special token sequences (e.g., #line 3
>> "ab/c.d"), or discarding them.
>
> Aha, clever. As long as I can get out the information I'm happy :) How about adding properties for this in the token struct?
There is a method for that in Lexer interface, for me it looks like it belongth there and not to token. Version accepting token and producing a pair of start/end Locations will be added.

>>>>> * Does it convert numerical literals and similar to their actual values
>>>> It is planned to add a post-processor for that as part of parser,
>>>> please see README.md for some more details.
>>>
>>> Isn't that a job for the lexer?
>> That might be done in lexer for efficiency reasons (to avoid lexing
>> token value again). But separating this into a dedicated post-processing
>> phase leads to a much cleaner design (IMO), also suitable for uses when
>> such values are not needed.
>
> That might be the case. But I don't think it belongs in the parser.
I will provide example code and a dedicated post later to illustrate my point.

>> Also I don't think that performance would be
>> improved given the ratio of number of literals to total number of tokens
>> and the need to store additional information per token if it is done in
>> lexer. I will elaborate on that later.
>
> Ok, fair enough. Perhaps this could be a property in the Token struct as well. In that case I would suggest renaming "value" to lexeme/spelling/representation, or something like that, and then name the new property "value".
I was going to rename value, but couldn't find a nice term. Thanks for your suggestions!
As for the property with strongly typed literal value, currently I plan to put it into AST.
May 11, 2012
On Friday, 11 May 2012 at 10:01:17 UTC, dennis luehring wrote:
> Am 11.05.2012 11:33, schrieb Roman D. Boiko:
>>> -very fast in parsing/lexing - there need to be a benchmark
>>> enviroment from the very start
>> Will add that to May roadmap.
>
> are using slices for prevent coping everything around?
>
> the parser/lexer need to be as fast as the original one - maybe even faster - else it won't replace walters at any time - because speed does matter here very much

I tried optimizing code when it didn't complicate design. And more optimizations will be added later.

It would be interesting to have benchmarks for comparing performance, but I don't plan to branch and edit DMD front end to prepare it for benchmarking (any time soon).
May 11, 2012
On Friday, 11 May 2012 at 10:40:43 UTC, Roman D. Boiko wrote:
> On Friday, 11 May 2012 at 10:01:17 UTC, dennis luehring wrote:
>> Am 11.05.2012 11:33, schrieb Roman D. Boiko:
>>>> -very fast in parsing/lexing - there need to be a benchmark
>>>> enviroment from the very start
>>> Will add that to May roadmap.
>>
>> are using slices for prevent coping everything around?
>>
>> the parser/lexer need to be as fast as the original one - maybe even faster - else it won't replace walters at any time - because speed does matter here very much
>
> I tried optimizing code when it didn't complicate design. And more optimizations will be added later.
>
> It would be interesting to have benchmarks for comparing performance, but I don't plan to branch and edit DMD front end to prepare it for benchmarking (any time soon).


Ever thought of asking the VisualD developer to integrate your library into his IDE extension? Might be cool to do so because of extended completion abilities etc. (lol I'm the Mono-D dev -- but why not? ;D)