[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Anybody else with an interest in parser wrangling?

From: David Kastrup
Subject: Re: Anybody else with an interest in parser wrangling?
Date: Mon, 20 Mar 2023 00:15:07 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

Jean Abou Samra <> writes:

> Le dimanche 19 mars 2023 à 17:51 +0100, David Kastrup a écrit :  
>> So how to better involve others?  The parser may be one of those
>> areas with an awful amount of shoestring and glue, namely fiddling
>> around until things happen to work.  All that fiddling happens in
>> private before commits end up in master, meaning that it has no
>> opportunity to end up contagious the way it happens now.
>> That's not really fabulous regarding the "bus factor" in that area.
> I would feel a lot more comfortable with modifying the parser if there
> was an explanation, in code comments or in the CG, of how the
> parser/lexer interplay works, when lookahead is OK or bad, and how to
> avoid it when necessary. Things like the comment above MYBACKUP
> ```
> // The following are somewhat precarious constructs as they may change
> // the value of the lookahead token.  That implies that the lookahead
> // token must not yet have made an impact on the state stack other
> // than causing the reduction of the current rule, or switching the
> // lookahead token while Bison is mulling it over will cause trouble.
> ```
> are obscure to me.

Well, Bison creates LALR(1) parsers.  That means that the parser always
is in a certain state.  It looks at the next token, the "lookahead"
token (only one, that's what the 1 in LALR(1) is about) and then
transitions into another state while either shifting the current state
onto some stack, or by using a rule for reducing the current stack into
a production.

The above comment is fearsome about the possibility that the
statemachine processes the current lookahead token without eating it,
but then getting the lookahead token switched out under its radar and
ending in a state that is not able to process the switched-out token.

So far, the fears expressed in that comment have not materialized.

The parser is only able to process a certain subset of languages.  Since
the parser makes deterministic progress by either consuming a lookahead
token while growing the stack by 1 or by consuming stack material, it
ends up O(1), namely efficient with regard to the size of its input.

When the parser applies a rule, you can specify code that will be
executed in the reduction.

The MYBACKUP and MYPARSE stuff messes with the input in order to trigger
syntactic decisions based on expression values.  That's a bit more than
usually expected from a Bison-generated parser.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]