Thread overview
Feedback needed: Complete symbol appoach for Bison's D backend
Nov 14, 2020
Adela Vais
Nov 16, 2020
H. S. Teoh
Nov 18, 2020
Adela Vais
November 14, 2020
Hello!

I need some feedback about the return value of yylex() in Bison's Lexer class, which must be provided by the user.

This method should provide the Bison parser with three values: the TokenKind (which is the current return value), the semantic value, and the location (optional parameter). The last two are set in yylex(), stored in the lexer class, and retrieved by the Bison parser through getters.

The other parsers provide the option of complete symbols, which means that yylex()'s return value is changed to a structure that binds together the TokenKind, the semantic value, and the location. Internally, the structure is immediately divided into its components, which continue to be used separately throughout the parser.

The big advantage of the complete symbol is that it is beginner-friendly, and reduces the potential errors caused because the user forgot to set one of the values.
The main disadvantage is the possible overhead the structure adds to the parser. It will be created and destroyed for each discovered token.

Should we keep both versions, or move to a complete symbol approach? Given that Bison's current release still has D as an experimental feature, this would not be a breaking change. If we decide on using both, the complete symbol approach will be selected through a Bison directive, like in the other parsers.

An example of the current method, using TokenKind:
https://github.com/akimd/bison/blob/master/examples/d/calc/calc.y#L117

An example using the Symbol struct:
https://github.com/adelavais/bison/blob/complete-external-symbols/examples/d/calc/calc.y#L117

November 16, 2020
On Sat, Nov 14, 2020 at 03:50:23PM +0000, Adela Vais via Digitalmars-d wrote: [...]
> I need some feedback about the return value of yylex() in Bison's Lexer class, which must be provided by the user.
[...]
> An example of the current method, using TokenKind: https://github.com/akimd/bison/blob/master/examples/d/calc/calc.y#L117
> 
> An example using the Symbol struct: https://github.com/adelavais/bison/blob/complete-external-symbols/examples/d/calc/calc.y#L117

Hi Adela,

I took a quick look the code.  I agree that returning Symbol is best because it gives the most friendly API.

Generally, returning a struct ought to be quite cheap: for small structs, it could even be returned in CPU registers so the cost will be minimal.  However, I see that you allocate a new instance of YYLocation each time: that's bound to have performance issues.  Is there any reason to allocate YYLocation on the heap?  Is it because it's a class as opposed to a struct?  If it's a class, what was the rationale behind it?

In my mind, it should be a struct unless there's something in it that must persist on the heap. Based on its construction parameters, it looks to me to be just a container to store start/end positions in the input; if so, it does not need to be a class. A struct will do just fine, and will avoid unnecessary GC allocations.


[...]
> The main disadvantage is the possible overhead the structure adds to the parser. It will be created and destroyed for each discovered token.

Make it a struct, and make all of its members structs or PODs. Then there will be minimal construction overhead, and no destruction costs at all.


T

-- 
BREAKFAST.COM halted...Cereal Port Not Responding. -- YHL
November 18, 2020
On Monday, 16 November 2020 at 19:42:43 UTC, H. S. Teoh wrote:
> [...]
> Make it a struct, and make all of its members structs or PODs. Then there will be minimal construction overhead, and no destruction costs at all.

Thank you for the response! I will make this modification.