View mode: basic / threaded / horizontal-split · Log in · Help
May 11, 2012
DCT: D compiler as a collection of libraries
There were several discussions about the need for a D compiler 
library.

I propose my draft implementation of lexer for community review:
https://github.com/roman-d-boiko/dct

Lexer is based on Brian Schott's project 
https://github.com/Hackerpilot/Dscanner, but it has been 
refactored and extended (and more changes are on the way).

The goal is to have source code loading, lexer, parser and 
semantic analysis available as parts of Phobos. These libraries 
should be designed to be usable in multiple scenarios (e.g., 
refactoring, code analysis, etc.).

My commitment is to have at least front end built this year (and 
conforming to the D2 specification unless explicitly stated 
otherwise for some particular aspect).

Please post any feed here. A dedicated project web-site will be 
created later.
May 11, 2012
Re: DCT: D compiler as a collection of libraries
On 2012-05-11 10:01, Roman D. Boiko wrote:
> There were several discussions about the need for a D compiler library.
>
> I propose my draft implementation of lexer for community review:
> https://github.com/roman-d-boiko/dct
>
> Lexer is based on Brian Schott's project
> https://github.com/Hackerpilot/Dscanner, but it has been refactored and
> extended (and more changes are on the way).
>
> The goal is to have source code loading, lexer, parser and semantic
> analysis available as parts of Phobos. These libraries should be
> designed to be usable in multiple scenarios (e.g., refactoring, code
> analysis, etc.).
>
> My commitment is to have at least front end built this year (and
> conforming to the D2 specification unless explicitly stated otherwise
> for some particular aspect).
>
> Please post any feed here. A dedicated project web-site will be created
> later.
>

(Re-posting here)

A couple of questions:

* What's the sate of the lexer
* Does it convert numerical literals and similar to their actual values
* Does it retain full source information
* Is there an example we can look at to see how the API is used
* Does it have a range based interface

-- 
/Jacob Carlborg
May 11, 2012
Re: DCT: D compiler as a collection of libraries
On Friday, 11 May 2012 at 08:38:36 UTC, Jacob Carlborg wrote:
> (Re-posting here)
> A couple of questions:
>
> * What's the sate of the lexer
I consider it a draft state, because it has got several rewrites
recently and I plan to do more, especially based on community
feedback. However, implementation handles almost all possible
cases. Because of rewrites it is most likely broken at this
moment, I'm going to fix it ASAP (in a day or two).

Lexer will provide a random-access range of tokens (this is not
done yet).

Each token contains:
* start index (position in the original encoding, 0 corresponds
to the first code unit after BOM),
* token value encoded as UTF-8 string,
* token kind (e.g., token.kind = TokenKind.Float),
* possibly enum with annotations (e.g., token.annotations =
FloatAnnotation.Hex | FloatAnnotation.Real)

> * Does it convert numerical literals and similar to their 
> actual values
It is planned to add a post-processor for that as part of parser,
please see README.md for some more details.

> * Does it retain full source information
Yes, this is a design choice to preserve all information. Source
code is converted to UTF-8 and stored as token.value, even
whitespaces. Information about code unit indices in the original
encoding is preserved, too.

> * Is there an example we can look at to see how the API is used
TBD soon (see Roadmap in the readme file)

> * Does it have a range based interface
Yes, this is what I consider one of its strengths.
May 11, 2012
Re: DCT: D compiler as a collection of libraries
On 2012-05-11 10:01, Roman D. Boiko wrote:
> There were several discussions about the need for a D compiler library.
>
> I propose my draft implementation of lexer for community review:
> https://github.com/roman-d-boiko/dct
>
> Lexer is based on Brian Schott's project
> https://github.com/Hackerpilot/Dscanner, but it has been refactored and
> extended (and more changes are on the way).
>
> The goal is to have source code loading, lexer, parser and semantic
> analysis available as parts of Phobos. These libraries should be
> designed to be usable in multiple scenarios (e.g., refactoring, code
> analysis, etc.).
>
> My commitment is to have at least front end built this year (and
> conforming to the D2 specification unless explicitly stated otherwise
> for some particular aspect).
>
> Please post any feed here. A dedicated project web-site will be created
> later.

If think that the end goal of a project like this, putting a D frontend 
in Phobos, should be that the compiler should be built using this 
library. This would result in the compiler and library always being in 
sync and having the same behavior. Otherwise it's easy this would be 
just another tool that tries to lex and parse D code, always being out 
of sync with the compiler and not having the same behavior.

For this to happen, for Walter to start using this, I think there would 
be a greater change if the frontend was a port of the DMD frontend and 
not changed too much.

-- 
/Jacob Carlborg
May 11, 2012
Re: DCT: D compiler as a collection of libraries
On 2012-05-11 10:58, Roman D. Boiko wrote:
> On Friday, 11 May 2012 at 08:38:36 UTC, Jacob Carlborg wrote:
>> (Re-posting here)
>> A couple of questions:
>>
>> * What's the sate of the lexer
> I consider it a draft state, because it has got several rewrites
> recently and I plan to do more, especially based on community
> feedback. However, implementation handles almost all possible
> cases. Because of rewrites it is most likely broken at this
> moment, I'm going to fix it ASAP (in a day or two).

I see.

> Lexer will provide a random-access range of tokens (this is not
> done yet).

Ok.

> Each token contains:
> * start index (position in the original encoding, 0 corresponds
> to the first code unit after BOM),
> * token value encoded as UTF-8 string,
> * token kind (e.g., token.kind = TokenKind.Float),
> * possibly enum with annotations (e.g., token.annotations =
> FloatAnnotation.Hex | FloatAnnotation.Real)

What about line and column information?

>> * Does it convert numerical literals and similar to their actual values
> It is planned to add a post-processor for that as part of parser,
> please see README.md for some more details.

Isn't that a job for the lexer?

>> * Does it retain full source information
> Yes, this is a design choice to preserve all information. Source
> code is converted to UTF-8 and stored as token.value, even
> whitespaces. Information about code unit indices in the original
> encoding is preserved, too.

That's sounds good.

>> * Is there an example we can look at to see how the API is used
> TBD soon (see Roadmap in the readme file)
>
>> * Does it have a range based interface
> Yes, this is what I consider one of its strengths.

I see. Thanks.

-- 
/Jacob Carlborg
May 11, 2012
Re: DCT: D compiler as a collection of libraries
Am 11.05.2012 10:01, schrieb Roman D. Boiko:
> There were several discussions about the need for a D compiler
> library.
>
> I propose my draft implementation of lexer for community review:
> https://github.com/roman-d-boiko/dct
>
> Lexer is based on Brian Schott's project
> https://github.com/Hackerpilot/Dscanner, but it has been
> refactored and extended (and more changes are on the way).
>
> The goal is to have source code loading, lexer, parser and
> semantic analysis available as parts of Phobos. These libraries
> should be designed to be usable in multiple scenarios (e.g.,
> refactoring, code analysis, etc.).
>
> My commitment is to have at least front end built this year (and
> conforming to the D2 specification unless explicitly stated
> otherwise for some particular aspect).
>
> Please post any feed here. A dedicated project web-site will be
> created later.
>

does the parser/lexer allow half-finished syntax parsing? for being 
useable in an IDE for syntax-highlighting while coding?
May 11, 2012
Re: DCT: D compiler as a collection of libraries
Am 11.05.2012 11:02, schrieb Jacob Carlborg:
> For this to happen, for Walter to start using this, I think there would
> be a greater change if the frontend was a port of the DMD frontend and
> not changed too much.

or a pure D version of it with the features:

-very fast in parsing/lexing - there need to be a benchmark enviroment 
from the very start

-easy to extend and fix

and think that is what walter wants from an D-ish frontend in the first 
place
May 11, 2012
Re: DCT: D compiler as a collection of libraries
On Friday, 11 May 2012 at 09:08:24 UTC, Jacob Carlborg wrote:
> On 2012-05-11 10:58, Roman D. Boiko wrote:
>> Each token contains:
>> * start index (position in the original encoding, 0 corresponds
>> to the first code unit after BOM),
>> * token value encoded as UTF-8 string,
>> * token kind (e.g., token.kind = TokenKind.Float),
>> * possibly enum with annotations (e.g., token.annotations =
>> FloatAnnotation.Hex | FloatAnnotation.Real)
>
> What about line and column information?
Indices of the first code unit of each line are stored inside 
lexer and a function will compute Location (line number, column 
number, file specification) for any index. This way size of Token 
instance is reduced to the minimum. It is assumed that Location 
can be computed on demand, and is not needed frequently. So 
column is calculated by reverse walk till previous end of line, 
etc. Locations will possible to calculate both taking into 
account special token sequences (e.g., #line 3 "ab/c.d"), or 
discarding them.

>>> * Does it convert numerical literals and similar to their 
>>> actual values
>> It is planned to add a post-processor for that as part of 
>> parser,
>> please see README.md for some more details.
>
> Isn't that a job for the lexer?
That might be done in lexer for efficiency reasons (to avoid 
lexing token value again). But separating this into a dedicated 
post-processing phase leads to a much cleaner design (IMO), also 
suitable for uses when such values are not needed. Also I don't 
think that performance would be improved given the ratio of 
number of literals to total number of tokens and the need to 
store additional information per token if it is done in lexer. I 
will elaborate on that later.
May 11, 2012
Re: DCT: D compiler as a collection of libraries
On Friday, 11 May 2012 at 09:19:07 UTC, dennis luehring wrote:
> does the parser/lexer allow half-finished syntax parsing? for 
> being useable in an IDE for syntax-highlighting while coding?
That's planned, but I would like to see your usage scenarios 
(pseudo-code would help a lot).
May 11, 2012
Re: DCT: D compiler as a collection of libraries
Am 11.05.2012 11:23, schrieb Roman D. Boiko:
> On Friday, 11 May 2012 at 09:19:07 UTC, dennis luehring wrote:
>>  does the parser/lexer allow half-finished syntax parsing? for
>>  being useable in an IDE for syntax-highlighting while coding?
> That's planned, but I would like to see your usage scenarios
> (pseudo-code would help a lot).
>

try to syntaxhiglight while coding - thats the scenario, parts
of the source code isn't fully valid while writing
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home