[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: multistart: free choice of the start symbol
From: |
Akim Demaille |
Subject: |
Re: multistart: free choice of the start symbol |
Date: |
Tue, 29 Sep 2020 19:20:04 +0200 |
> Le 27 sept. 2020 à 20:46, Rici Lake <ricilake@gmail.com> a écrit :
>
> Many parser generators do have the option to parse from various roots. One
> interesting case is ANTLR, which provides methods for parsing from *every*
> non-terminal (with names generated from the non-terminal).
Well, that's "cheating" (as you pointed out farther in your message):
ANTLR implements an recursive descent parser, i.e., its very technique
consists in emitting one parsing function per non-terminal. So actually,
I expect that all the LL generators support the free choice of the start
symbol.
Bison generates LR parsers. That does not apply.
> Although the
> vast majority of these interfaces will never be used, it turns out to be
> extremely convenient for debugging grammars (and for didactic purposes,
> such as drawing small parse trees). In ANTLR, these interfaces have little
> or no cost, since it fundamentally produces recursive descent parser
> anyway, but it might still be reasonable to allow "%start *" for parser
> debugging.
>
> Of course, in a C code generator, you most certainly wouldn't want to
> generate dozens (or hundreds) of unused interfaces, so this kind of feature
> would be better implemented by a general call which took a non-terminal
> enumerator as an argument. But that would require that the returned value
> type be the same regardless of non-terminal, which effectively reduces to
> the YYSTYPE union (or whatever it happens to be).
>
> OK, it's not necessarily a great idea to design a production interface
> around a feature only used for debugging.
Exactly :) Reading this sentence reminds me of one of my favorite
scenes in Oceans' 1[0-9]: https://www.youtube.com/watch?v=tcRvN2gtPiw
This feature, "start *", would generate quite larger automata.
In the case of Bison's own grammar, I get 450 states (that only x3,
I was expecting more) *and* additional conflicts (because Bison is
still using LALR for its grammar, so you can still have "subautomata"
that share states).
What I did not anticipate though, is that it crashes when generating
canonical LR on that grammar. However, I not not yet investigated
the impact of my changes in IELR and canonical LR, so that a TODO.
Using LR, "%start *" should be safe. You do have a point here.
- [PATCH 15/17] multistart: allow tokens as start symbols, (continued)
- [PATCH 15/17] multistart: allow tokens as start symbols, Akim Demaille, 2020/09/20
- [PATCH 16/17] yacc.c: also count calls to YYERROR in yynerrs, Akim Demaille, 2020/09/20
- [PATCH 17/17] multistart: also give access to yynerrs, Akim Demaille, 2020/09/20
- Re: [PATCH 00/17] RFC: multiple start symbols, Paul Eggert, 2020/09/20
- Re: [PATCH 00/17] RFC: multiple start symbols, Akim Demaille, 2020/09/23
- Re: [PATCH 00/17] RFC: multiple start symbols, Adrian Vogelsgesang, 2020/09/23
- Re: [PATCH 00/17] RFC: multiple start symbols, Akim Demaille, 2020/09/27
- Re: [PATCH 00/17] RFC: multiple start symbols, Rici Lake, 2020/09/27
- Re: multistart: returning structs, Akim Demaille, 2020/09/29
- Re: multistart: yynerrs, Akim Demaille, 2020/09/29
- Re: multistart: free choice of the start symbol,
Akim Demaille <=