Adela Vais - SAOC Milestone 3 Update 4 - Dlang GLR Parser for GNU Bison (page 2)

On Friday, 18 December 2020 at 21:03:51 UTC, Adela Vais wrote: >>>>>[...] Hello! As of the last update: - I added version identifiers for running the internationalization code. As the D backend uses functions from libintl.h, which is a non-standard C header, the behavior of Bison cannot be to use it by default; the user is able to choose whether they install its prerequisites or not. [1] - I created a dub package [2] for an easier import of the libintl.h functions in Bison and other projects. I also closed the PR [3] from druntime. - All the patches from last week were accepted, and I submitted 3 additional ones: * I removed the getter methods for the semantic value and positions from the Lexer interface, now unnecessary because of the complete symbol approach [4]; * I modified the backend to use all throughout the aliases for location, position, and semantic value [5]; * I removed a comparison inside the parser(), unnecessary after I removed the variable for the external token from this method [6]. The plan for next week: - I will use the libintl dub with Bison. - I will modify the error parsing by eliminating "verbose", a backward compatibility option not needed in D. At the moment, "verbose" and "detailed" options generate the same code. With this change, I will also restructure the way the SymbolKind (the internal types) names are handled when generating the error messages. - Continue working on the push parser. During the next milestone of #SAOC2020 I decided will focus on writing the last remaining parts of the LALR1, and postponing the work on the GLR. While I will not make significant progress on the GLR itself, the 2 parsers share the same user interface, so a lot of my work on the LALR1 will be translated into the GLR. [1]: https://github.com/adelavais/bison/commits/fix-i18n [2]: https://code.dlang.org/packages/libintl [3]: https://github.com/dlang/druntime/pull/3300 [4]: https://github.com/akimd/bison/commit/32bb53870bb9caa2f8de081fdb53cb3540c8ce7a [5]: https://github.com/akimd/bison/commit/27109d9d4ac11665612119344141df0b9f440fbb [6]: https://github.com/akimd/bison/commit/2b4451c4afb8ed90795f8bb7b198996143d769c9

On Tuesday, 22 December 2020 at 21:12:02 UTC, Adela Vais wrote: >>>>>>[...] Hello! As of the last update: - I added internationalisation to the LALR1 parser. [1] - Starting from the code generated by the calc example, I modified it to use a push parser. - I removed a name parsing function used for error messages. It was a backward-compatible feature for the other parsers, but D does not have to use it. [2] - I removed some imports that were unnecessary after I added the internationalization. [3] - I removed support for the error parsing option 'verbose'. This is another backward-compatible feature for the other parsers. Before the removal, 'detailed' and 'verbose' options generated the same output. [4] - I created a way of reporting the number of errors found by the parser. [5] - I submitted a patch for fixing a test function. If lookahead correction with trace debugging was used, then the output was getting mixed up with the error message reporting. [6] - I started working on a way to introduce in the calc example the std.conv.parse enhancement I made. [7] The plan for next week: - Continue working on the unfinished tasks from above. For the push parser, the next step will be to see how the push parser works in a program using lookahead correction. [1] https://github.com/akimd/bison/commit/594cae57ca63fc7b3f18dad3d6472e043c626df0 [2] https://github.com/akimd/bison/commit/5bac3ddcee7c6eebb4833d1954f614fced475073 [3] https://github.com/akimd/bison/commit/dc8b16424a89297368dbc66c69787ed0882966f0 [4] https://github.com/akimd/bison/commit/c13b3c02d39edd4d46480b8ee065466d8720939f [5] https://github.com/akimd/bison/commit/8d01c60e9c1aa5975e38602b8ffeb128833a8518 [6] https://github.com/adelavais/bison/tree/local-test [7] https://github.com/adelavais/bison/tree/parse

January 16, 2021

Re: Adela Vais - SAOC Milestone 4 Update 3 - Dlang GLR Parser for GNU Bison

Posted by Adela Vais
in reply to Adela Vais

Permalink

Adela Vais

Posted in reply to Adela Vais

Permalink

On Thursday, 7 January 2021 at 23:30:29 UTC, Adela Vais wrote:
>>>>>>>[...]

Hello!

As of the last update:

- I modified the calc example to use lookahead correction. Using my last week's work on an unmodified calc example, I modified this program to use a push-parser.

- I made some small style fixes in the examples and submitted them for review along with the std.conv.parse modification started last week. The enhancement to parse was introduced in Dlang v2.095.0, so we will support both versions for the calc example: the one demonstrating the new parse feature, and the one using the old code. [1]

- I started working on token constructors. [2] This feature works in the C++ parser only using the '%define api.value.type variant' option.

By default, the user must write a union containing all the different types of values needed for yylex(), and then use them when declaring the tokens:

%union {
  int ival;
}
[...]
%token <ival> NUM "number"

Variant allows the user to simply write:

%token <int> NUM "number"

D does not support this feature, but I plan to introduce it in the near future (not necessarily during SAOC).

From yylex(), the user must return a Symbol by calling its constructor. This constructor does not check if the TokenKind and the value correspond in any way, so the user can return Symbol(TokenKind.NUMBER, "I am a string") and the error will be caught much later in the program, and, of course, at runtime.

As a solution to this, the C++ parser provides the option of calling the make_<token_name> function (example: make_NUMBER), which calls the Symbol constructor with the correct arguments, and generates compile-time errors. I want to provide this feature for D (modified to be called as Symbol.NUMBER(someNumber)).

The C++ parser generates the make_ functions with M4, so all the functions are put in a header file for the user. I want to limit the space occupied by these methods by generating them using D. [3] I created a version that does not use variant, as a proof of concept that this task can be done from D. But once I add support for variant, the token constructors should work without modifications.

The plan for the next week of #SAOC2020:
- Continue working on the above. Push-parser next steps:
    * I have to do more correctness and speed tests for both versions (with and without lookahead correction).
    * Start integrating it in the D backend.

[1]: https://github.com/adelavais/bison/tree/calc-example-fix
[2]: https://github.com/adelavais/bison/tree/token-constructors
[3]: https://github.com/adelavais/bison/blob/token-constructors/tests/testsuite.dir/582/calc.d#L544

On Saturday, 16 January 2021 at 22:09:04 UTC, Adela Vais wrote: >>>>>>>>[...] Hello! - I worked with Akim Demaille (one of my mentors, Bison co-maintainer) on adding the '%define api.value.type union'. He provided me with a WIP based on my work on the token constructors, and I continued it with the D code needed. Unlike C++, D allows structs, classes, etc. to be union members, so there is no need for D to implement '%define api.value.type variant'. [1] - I worked on token-constructors. In C++, this directive works only with '%define api.value.type variant'. In D, I managed to add this feature for the default parser, which uses %union, too. For this change, I had to rewrite the Symbol's constructors. I wrote them in D, but they became too complex. In the near future, I want to rewrite them in M4, which will also make them easier to maintain. [1] - I almost finished the push-parser. After this change, the backend will support 3 different options for this directive: pull (by default), push (which I implemented in the past month), and both (which means that the user has access to both parse functions, and the pull-parse method is a wrapper around the push one). I integrated it into the backend and it is going to be reviewed soon. [2] Future plans: - In the near future I will finish my work on the LALR1, which means that I will add the token-constructors and push-parser features to the backend. I also need to rewrite the Symbol's constructors in M4 and to complete the documentation. - During the next few months I will be working on the GLR. It will not be a wrapper around the C's GLR, as initially planned, but a stand-alone parser. The difference between a user's code of a LALR1 and GLR parser is only the presence of the '%glr-parser' declaration, otherwise, the user's code is identical. Given that the user APIs are the same, a lot of my work on the LALR1 will be used by the GLR, too. [1]: https://github.com/adelavais/bison/tree/tok-constr [2]: https://github.com/adelavais/bison/tree/push-parser

Forums