[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Anybody else with an interest in parser wrangling?
From: |
Jean Abou Samra |
Subject: |
Re: Anybody else with an interest in parser wrangling? |
Date: |
Mon, 20 Mar 2023 15:22:44 +0100 |
User-agent: |
Evolution 3.46.4 (3.46.4-1.fc37) |
Le lundi 20 mars 2023 à 00:15 +0100, David Kastrup a écrit :
> Jean Abou Samra <[jean@abou-samra.fr](mailto:jean@abou-samra.fr)> writes:
>
>
> > Le dimanche 19 mars 2023 à 17:51 +0100, David Kastrup a écrit :
> >
> > >
> > > So how to better involve others? The parser may be one of those
> > > areas with an awful amount of shoestring and glue, namely fiddling
> > > around until things happen to work. All that fiddling happens in
> > > private before commits end up in master, meaning that it has no
> > > opportunity to end up contagious the way it happens now.
> > >
> > > That's not really fabulous regarding the "bus factor" in that area.
> >
> >
> > I would feel a lot more comfortable with modifying the parser if there
> > was an explanation, in code comments or in the CG, of how the
> > parser/lexer interplay works, when lookahead is OK or bad, and how to
> > avoid it when necessary. Things like the comment above MYBACKUP
> >
> > ```
> > // The following are somewhat precarious constructs as they may change
> > // the value of the lookahead token. That implies that the lookahead
> > // token must not yet have made an impact on the state stack other
> > // than causing the reduction of the current rule, or switching the
> > // lookahead token while Bison is mulling it over will cause trouble.
> > ```
> >
> > are obscure to me.
>
>
> Well, Bison creates LALR(1) parsers. That means that the parser always
> is in a certain state. It looks at the next token, the "lookahead"
> token (only one, that's what the 1 in LALR(1) is about) and then
> transitions into another state while either shifting the current state
> onto some stack, or by using a rule for reducing the current stack into
> a production.
>
> The above comment is fearsome about the possibility that the
> statemachine processes the current lookahead token without eating it,
> but then getting the lookahead token switched out under its radar and
> ending in a state that is not able to process the switched-out token.
>
> So far, the fears expressed in that comment have not materialized.
>
> The parser is only able to process a certain subset of languages. Since
> the parser makes deterministic progress by either consuming a lookahead
> token while growing the stack by 1 or by consuming stack material, it
> ends up O(1), namely efficient with regard to the size of its input.
>
> When the parser applies a rule, you can specify code that will be
> executed in the reduction.
>
> The MYBACKUP and MYPARSE stuff messes with the input in order to trigger
> syntactic decisions based on expression values. That's a bit more than
> usually expected from a Bison-generated parser.
Yes, I understand the basic way Bison parsers work. What I don't understand is
what other “effects” the lookahead can have, and why having caused the
reduction of the current rule is never a problem. AFAIU, the parser works as a
loop
- Get next token from lexer.
- Decide whether to shift or to reduce some rule. Use a lookahead token if
necessary.
- Do the shift or the reduction and execute the semantic action.
The lookahead token gets switched during the semantic action. Isn't it a
problem if the previous lookahead token says the current rule should be
reduced, but the new one would have required shifting? Or is that just not a
useful use of MYBACKUP/MYREPARSE?
signature.asc
Description: This is a digitally signed message part