Tasks Accomplished
Learning AST Nodes and Semantic Analysis in Compiler Design:
Before delving into how to decouple AST nodes from semantics functions, I looked at how compilers work in general and the processes involved.
A typical compiler works this way:
Character Stream=> |Lexer| =>Tokens=>|Parser| =>AST =>|Semantic Routines| =>Intermediate Representation(Optimization) =>|Code Generator| => Assembly Code
Character stream: It is also known as source code or input that the programmer wrote.
Lexer/scanner: lexing/lexical analysis is the process of breaking down a string into meaningful units, the result of this process is called tokens.
Parser: The job of the parser is to obtain strings of tokens from the lexical analyzer and verifies that the string is a grammar from the source language. It detects and reports any syntax errors and produces a parse tree from which intermediate code can be generated.
The output of the parser is an abstract syntax tree (AST).
Abstract syntax tree(AST): The AST is like a blueprint that represents the structure of my code. It breaks down the code into smaller chunks and organizes them in a tree-like structure so that the compiler can understand.
An important fact I learnt is that the AST only contains information related to analyzing the source text and ignores extra syntactic information used for parsing text.
In the dmd compiler codebase, AST nodes are classes and structs, while the semantic routines are function tightly coupled within the AST classes.
I also learnt about the core differences between an AST tree and a parse tree which in summary I would say an AST is focusing on the essential elements and their relationships. It captures the underlying structure and semantics of the code, excluding unnecessary syntactic details while parse tree captures the complete structure of the input code, including all the syntactic details, such as parentheses, semicolons, and other language-specific constructs.
A simple ast node constructed for the practice
https://github.com/dchidindu5/test_demo/blob/main/README.md
Semantic Analysis: It is a process in compiling where the compiler checks whether the code is logical and meaningful. Its major role is type checking to
confirm whether variable declarations, functions, and control flow adheres to the semantics of the language.
So far these processes are the frontend of the dmd compiler.
- To fully understand the directory for the dmd codebase, I used this as a guide, which outlines the files and what they perform.
https://github.com/dlang/dmd/blob/master/compiler/src/dmd/README.md - Looked up into each and every file I would work on.
Initial Refactoring of DMD AST
- Chose the attrib.d AST node file as recommended by my mentor
- I examined the imports and commented out //import dmd.dsymbolsem which is a semantic import.
- Built the compiler and errors were encountered.
- Looked at the error messages and moved the affected semantic functions to dsymbolsem.d which is a semantic analysis file.
- The affected functions were
newScope
func - Converted it into a visitor which is a design pattern for refactoring. Had trouble mastering it so my mentor sent a previous commit on visitors to
Extract dsymbol.Dsymbol.importAll and turn it into a visitor - Implemented it on the newScope func.
First error encountered:
src/dmd/dsymbolsem.d(7494): Error: function `extern (C++) Scope* dmd.dsymbolsem.newScopeVisitor.visit(Scope* sc)` does not override any function, did you mean to override alias `dmd.visitor.Visitor.visit`?
src/dmd/dsymbolsem.d(7494): Functions are the only declarations that may be overridden
Functions are the only declarations that may be overridden
First commit- https://github.com/dlang/dmd/commit/c01f76b25b4eb210d92d0ab858dd025ee72bfc6a
Solution
My mentor helped me to discover that the method signature in newScopeVisitor is not exactly the same as in the base class Visitor. That means that the method I'm trying to override does not have the exact same name, return type,and parameters.
I worked on it and used the exact name and argument and no return type, because it’s a virtual function(does not return any value)
Challenges
Although still refactoring the code, working on new errors
Current commit:
https://github.com/dlang/dmd/compare/master...dchidindu5:dmd:practice1?expand=1
https://github.com/dlang/dmd/commit/36489c94755a502f7141168ed6e006ef95339062
Summary:
This week was focused on building a strong theoretical foundation in compiler design, particularly around AST nodes and semantic analysis, while also getting acquainted with the practical aspects of contributing to the DMD compiler project.
Resources:
AST
https://medium.com/basecs/leveling-up-ones-parsing-game-with-asts-d7a6fc2400ff
https://pgrandinetti.github.io/compilers/page/what-is-semantic-analysis-in-compilers/
Visitors
https://www.geeksforgeeks.org/visitor-method-design-patterns-in-c/
D language Book
http://ddili.org/ders/d.en/index.html