DCT: D compiler as a collection of libraries - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » DCT: D compiler as a collection of libraries

Thread overview

DCT: D compiler as a collection of libraries
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Roman D. Boiko
May 11, 2012 deadalnix
May 11, 2012 Roman D. Boiko
May 11, 2012 Roman D. Boiko
May 11, 2012 deadalnix
May 11, 2012 Roman D. Boiko
May 12, 2012 Ary Manzana
May 12, 2012 Roman D. Boiko
May 12, 2012 Roman D. Boiko
May 12, 2012 Roman D. Boiko
May 12, 2012 Ary Manzana
May 14, 2012 Roman D. Boiko
May 14, 2012 Roman D. Boiko
May 14, 2012 deadalnix
May 14, 2012 Roman D. Boiko
May 14, 2012 Tove
May 14, 2012 Roman D. Boiko
May 14, 2012 Roman D. Boiko
May 15, 2012 deadalnix
May 15, 2012 Roman D. Boiko
May 15, 2012 Timon Gehr
May 15, 2012 Roman D. Boiko
May 16, 2012 deadalnix
May 11, 2012 Ary Manzana
May 11, 2012 dennis luehring
May 11, 2012 Roman D. Boiko
May 11, 2012 Roman D. Boiko
May 12, 2012 deadalnix
May 12, 2012 Roman D. Boiko
May 12, 2012 Tobias Pankrath
May 12, 2012 Roman D. Boiko
May 12, 2012 Timon Gehr
May 13, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 dennis luehring
May 11, 2012 Roman D. Boiko
May 11, 2012 dennis luehring
May 11, 2012 Roman D. Boiko
May 11, 2012 alex
May 11, 2012 Roman D. Boiko
May 11, 2012 alex
May 11, 2012 Roman D. Boiko
May 11, 2012 Roman D. Boiko
May 11, 2012 deadalnix
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 deadalnix
May 11, 2012 Roman D. Boiko
May 11, 2012 deadalnix
May 11, 2012 deadalnix
May 11, 2012 Roman D. Boiko
May 11, 2012 deadalnix
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Rory McGuire
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 Roman D. Boiko
May 11, 2012 Rory McGuire
May 11, 2012 Jonathan M Davis
May 11, 2012 dennis luehring
May 11, 2012 Roman D. Boiko
May 11, 2012 dennis luehring
May 11, 2012 Roman D. Boiko
May 11, 2012 Jacob Carlborg
May 11, 2012 deadalnix
May 12, 2012 Jonathan M Davis
May 12, 2012 Roman D. Boiko
May 19, 2012 Marco Leise
May 20, 2012 Roman D. Boiko
May 20, 2012 Marco Leise
May 20, 2012 Roman D. Boiko
May 20, 2012 Marco Leise
May 20, 2012 Roman D. Boiko

May 11, 2012

DCT: D compiler as a collection of libraries

Posted by Roman D. Boiko

Roman D. Boiko

There were several discussions about the need for a D compiler library.

I propose my draft implementation of lexer for community review:
https://github.com/roman-d-boiko/dct

Lexer is based on Brian Schott's project https://github.com/Hackerpilot/Dscanner, but it has been refactored and extended (and more changes are on the way).

The goal is to have source code loading, lexer, parser and semantic analysis available as parts of Phobos. These libraries should be designed to be usable in multiple scenarios (e.g., refactoring, code analysis, etc.).

My commitment is to have at least front end built this year (and conforming to the D2 specification unless explicitly stated otherwise for some particular aspect).

Please post any feed here. A dedicated project web-site will be created later.

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by Jacob Carlborg
in reply to Roman D. Boiko

Jacob Carlborg

Posted in reply to Roman D. Boiko

On 2012-05-11 10:01, Roman D. Boiko wrote:
> There were several discussions about the need for a D compiler library.
>
> I propose my draft implementation of lexer for community review:
> https://github.com/roman-d-boiko/dct
>
> Lexer is based on Brian Schott's project
> https://github.com/Hackerpilot/Dscanner, but it has been refactored and
> extended (and more changes are on the way).
>
> The goal is to have source code loading, lexer, parser and semantic
> analysis available as parts of Phobos. These libraries should be
> designed to be usable in multiple scenarios (e.g., refactoring, code
> analysis, etc.).
>
> My commitment is to have at least front end built this year (and
> conforming to the D2 specification unless explicitly stated otherwise
> for some particular aspect).
>
> Please post any feed here. A dedicated project web-site will be created
> later.
>

(Re-posting here)

A couple of questions:

* What's the sate of the lexer
* Does it convert numerical literals and similar to their actual values
* Does it retain full source information
* Is there an example we can look at to see how the API is used
* Does it have a range based interface

-- 
/Jacob Carlborg

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by Roman D. Boiko
in reply to Jacob Carlborg

Roman D. Boiko

Posted in reply to Jacob Carlborg

On Friday, 11 May 2012 at 08:38:36 UTC, Jacob Carlborg wrote:
> (Re-posting here)
> A couple of questions:
>
> * What's the sate of the lexer
I consider it a draft state, because it has got several rewrites
recently and I plan to do more, especially based on community
feedback. However, implementation handles almost all possible
cases. Because of rewrites it is most likely broken at this
moment, I'm going to fix it ASAP (in a day or two).

Lexer will provide a random-access range of tokens (this is not
done yet).

Each token contains:
* start index (position in the original encoding, 0 corresponds
to the first code unit after BOM),
* token value encoded as UTF-8 string,
* token kind (e.g., token.kind = TokenKind.Float),
* possibly enum with annotations (e.g., token.annotations =
FloatAnnotation.Hex | FloatAnnotation.Real)

> * Does it convert numerical literals and similar to their actual values
It is planned to add a post-processor for that as part of parser,
please see README.md for some more details.

> * Does it retain full source information
Yes, this is a design choice to preserve all information. Source
code is converted to UTF-8 and stored as token.value, even
whitespaces. Information about code unit indices in the original
encoding is preserved, too.

> * Is there an example we can look at to see how the API is used
TBD soon (see Roadmap in the readme file)

> * Does it have a range based interface
Yes, this is what I consider one of its strengths.

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by Jacob Carlborg
in reply to Roman D. Boiko

Jacob Carlborg

Posted in reply to Roman D. Boiko

On 2012-05-11 10:01, Roman D. Boiko wrote:
> There were several discussions about the need for a D compiler library.
>
> I propose my draft implementation of lexer for community review:
> https://github.com/roman-d-boiko/dct
>
> Lexer is based on Brian Schott's project
> https://github.com/Hackerpilot/Dscanner, but it has been refactored and
> extended (and more changes are on the way).
>
> The goal is to have source code loading, lexer, parser and semantic
> analysis available as parts of Phobos. These libraries should be
> designed to be usable in multiple scenarios (e.g., refactoring, code
> analysis, etc.).
>
> My commitment is to have at least front end built this year (and
> conforming to the D2 specification unless explicitly stated otherwise
> for some particular aspect).
>
> Please post any feed here. A dedicated project web-site will be created
> later.

If think that the end goal of a project like this, putting a D frontend in Phobos, should be that the compiler should be built using this library. This would result in the compiler and library always being in sync and having the same behavior. Otherwise it's easy this would be just another tool that tries to lex and parse D code, always being out of sync with the compiler and not having the same behavior.

For this to happen, for Walter to start using this, I think there would be a greater change if the frontend was a port of the DMD frontend and not changed too much.

-- 
/Jacob Carlborg

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by Jacob Carlborg
in reply to Roman D. Boiko

Jacob Carlborg

Posted in reply to Roman D. Boiko

On 2012-05-11 10:58, Roman D. Boiko wrote:
> On Friday, 11 May 2012 at 08:38:36 UTC, Jacob Carlborg wrote:
>> (Re-posting here)
>> A couple of questions:
>>
>> * What's the sate of the lexer
> I consider it a draft state, because it has got several rewrites
> recently and I plan to do more, especially based on community
> feedback. However, implementation handles almost all possible
> cases. Because of rewrites it is most likely broken at this
> moment, I'm going to fix it ASAP (in a day or two).

I see.

> Lexer will provide a random-access range of tokens (this is not
> done yet).

Ok.

> Each token contains:
> * start index (position in the original encoding, 0 corresponds
> to the first code unit after BOM),
> * token value encoded as UTF-8 string,
> * token kind (e.g., token.kind = TokenKind.Float),
> * possibly enum with annotations (e.g., token.annotations =
> FloatAnnotation.Hex | FloatAnnotation.Real)

What about line and column information?

>> * Does it convert numerical literals and similar to their actual values
> It is planned to add a post-processor for that as part of parser,
> please see README.md for some more details.

Isn't that a job for the lexer?

>> * Does it retain full source information
> Yes, this is a design choice to preserve all information. Source
> code is converted to UTF-8 and stored as token.value, even
> whitespaces. Information about code unit indices in the original
> encoding is preserved, too.

That's sounds good.

>> * Is there an example we can look at to see how the API is used
> TBD soon (see Roadmap in the readme file)
>
>> * Does it have a range based interface
> Yes, this is what I consider one of its strengths.

I see. Thanks.

-- 
/Jacob Carlborg

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by dennis luehring
in reply to Roman D. Boiko

dennis luehring

Posted in reply to Roman D. Boiko

Am 11.05.2012 10:01, schrieb Roman D. Boiko:
> There were several discussions about the need for a D compiler
> library.
>
> I propose my draft implementation of lexer for community review:
> https://github.com/roman-d-boiko/dct
>
> Lexer is based on Brian Schott's project
> https://github.com/Hackerpilot/Dscanner, but it has been
> refactored and extended (and more changes are on the way).
>
> The goal is to have source code loading, lexer, parser and
> semantic analysis available as parts of Phobos. These libraries
> should be designed to be usable in multiple scenarios (e.g.,
> refactoring, code analysis, etc.).
>
> My commitment is to have at least front end built this year (and
> conforming to the D2 specification unless explicitly stated
> otherwise for some particular aspect).
>
> Please post any feed here. A dedicated project web-site will be
> created later.
>

does the parser/lexer allow half-finished syntax parsing? for being useable in an IDE for syntax-highlighting while coding?

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by dennis luehring
in reply to Jacob Carlborg

dennis luehring

Posted in reply to Jacob Carlborg

Am 11.05.2012 11:02, schrieb Jacob Carlborg:
> For this to happen, for Walter to start using this, I think there would
> be a greater change if the frontend was a port of the DMD frontend and
> not changed too much.

or a pure D version of it with the features:

-very fast in parsing/lexing - there need to be a benchmark enviroment from the very start

-easy to extend and fix

and think that is what walter wants from an D-ish frontend in the first place

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by Roman D. Boiko
in reply to Jacob Carlborg

Roman D. Boiko

Posted in reply to Jacob Carlborg

On Friday, 11 May 2012 at 09:08:24 UTC, Jacob Carlborg wrote:
> On 2012-05-11 10:58, Roman D. Boiko wrote:
>> Each token contains:
>> * start index (position in the original encoding, 0 corresponds
>> to the first code unit after BOM),
>> * token value encoded as UTF-8 string,
>> * token kind (e.g., token.kind = TokenKind.Float),
>> * possibly enum with annotations (e.g., token.annotations =
>> FloatAnnotation.Hex | FloatAnnotation.Real)
>
> What about line and column information?
Indices of the first code unit of each line are stored inside lexer and a function will compute Location (line number, column number, file specification) for any index. This way size of Token instance is reduced to the minimum. It is assumed that Location can be computed on demand, and is not needed frequently. So column is calculated by reverse walk till previous end of line, etc. Locations will possible to calculate both taking into account special token sequences (e.g., #line 3 "ab/c.d"), or discarding them.

>>> * Does it convert numerical literals and similar to their actual values
>> It is planned to add a post-processor for that as part of parser,
>> please see README.md for some more details.
>
> Isn't that a job for the lexer?
That might be done in lexer for efficiency reasons (to avoid lexing token value again). But separating this into a dedicated post-processing phase leads to a much cleaner design (IMO), also suitable for uses when such values are not needed. Also I don't think that performance would be improved given the ratio of number of literals to total number of tokens and the need to store additional information per token if it is done in lexer. I will elaborate on that later.

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by Roman D. Boiko
in reply to dennis luehring

Roman D. Boiko

Posted in reply to dennis luehring

On Friday, 11 May 2012 at 09:19:07 UTC, dennis luehring wrote:
> does the parser/lexer allow half-finished syntax parsing? for being useable in an IDE for syntax-highlighting while coding?
That's planned, but I would like to see your usage scenarios (pseudo-code would help a lot).

May 11, 2012

Re: DCT: D compiler as a collection of libraries

Posted by dennis luehring
in reply to Roman D. Boiko

dennis luehring

Posted in reply to Roman D. Boiko

Am 11.05.2012 11:23, schrieb Roman D. Boiko:
> On Friday, 11 May 2012 at 09:19:07 UTC, dennis luehring wrote:
>>  does the parser/lexer allow half-finished syntax parsing? for
>>  being useable in an IDE for syntax-highlighting while coding?
> That's planned, but I would like to see your usage scenarios
> (pseudo-code would help a lot).
>

try to syntaxhiglight while coding - thats the scenario, parts
of the source code isn't fully valid while writing

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation