help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Clarification on "Infinite parsing loops in IELR + LAC"


From: Joel Denny
Subject: Re: Clarification on "Infinite parsing loops in IELR + LAC"
Date: Sat, 29 Sep 2018 20:29:51 -0400

Hi Adrian,

On Sat, Sep 29, 2018, 10:51 AM Adrian Vogelsgesang <
address@hidden> wrote:

> Hi Joel,
>
> Thanks for your detailed explanation! (and sorry for my late reply)
>

Sure, and no problem.


> > Do you have a specific use case in mind that requires these features, or
> > have you chosen them because of their superiority in theory?
>
> LAC: our parser currently reports incorrect lists of expected tokens for
> syntax errors. This should be fixed by using LAC.
> IELR: mostly, due to superiority in theory.
>

Cool.  Thanks for explaining.

I would of course prefer if the parsing tables created by LALR were
> sufficient and would be willing to spend the time to adjust the grammar to
> be LALR-compatible.
>

Sometimes for a parser, I have added a case to the test suite to generate
the LALR parser tables and diff with the IELR parser tables so that I find
out about mysterious conflicts.  That's mostly because I think it's
interesting.  If you find cases where IELR produces tables that are
actually significantly larger than LALR and thus refactoring for LALR is
worthwhile, I'd be interested to hear about that.

As I understand, Bison introduces the mysterious conflicts *silently*.
> Given the choice between incorrect parsing tables and less efficient
> parsing tables, I would opt for the less efficient parser tables.
>

Agreed.

Sorry to be pedantic, but I want to be sure one point is clear: Bison warns
about all conflicts until you resolve them.  What Bison doesn't do here is
automatically tell you which conflicts are mysterious or when a previously
resolved conflict becomes mysterious.  IELR avoids that problem.


> On a separate point:
> The grammar I am working on is using the C++ template but LAC is not yet
> supported for C++.
> I am thinking about contributing LAC support for C++.
>

That sounds cool.  You should check with Akim before getting started as he
maintains Bison.  I'll be glad to offer conceptual help on the LAC side of
things as I can. However, I'm afraid I can't make promises on my
availability, so there might sometimes be long delays in my replies, and I
probably can't spend much time in actual code.  Again, I'll do what I can
as I think it would be cool to see this happen.

I read the parts on syntax error handling from your thesis (
> https://tigerprints.clemson.edu/cgi/viewcontent.cgi?article=1519&context=all_dissertations,
> section 5.2.2).
> I was wondering, if there might be additional pitfalls I should be aware
> of when porting the C code to C++?
>

None that I know of, but I'm afraid I've been away from Bison development
for years.  Akim would have to help here.


> Also, in my specific implementation, I don't really care about semantic
> actions being delayed.
> In case of a syntax error, we throw away the semantic values anyway and
> the semantic actions don't have any bad side effects.
> To my understanding, it should be possible to eagerly execute semantic
> actions already during the first, exploratory parse and thereby avoid
> redoing the parse.
> Only if this first parse fails, we go back and extract the list of valid
> tokens based on the initial state.
>

I doubt it would buy you much in terms of performance because parsing (as
distinct from file I/O, lexical analysis, and semantic actions) typically
contributes little, so repeating reductions shouldn't matter much.  If you
find that's not true for you, that would be interesting to hear about.

Would such a mode be feasible?
>

Probably, but I think it would require changes to the way the parser stack
is maintained.  LAC forks a temporary stack and leaves the original stack
alone until the exploratory parse is done, and the base of the temporary
stack is a pointer into the original stack. I think you instead want to
continue parsing on the original stack and leave the temporary stack alone,
so you'll have to decide what to do about the temporary stack's base
pointer as reductions remove parts of the original stack. I'm wondering
whether any solution is significantly more efficient than just running the
exploratory parse as LAC does now. I'll be interested to hear what you
decide.

Should I add it to "define parse.lac", maybe as a third option
> "expected-tokens-only"?


I'm not sure yet about the name. I think I first need to understand the
exact algorithm you settle on and its benefits.

Hope that helps.

Joel


> - Adrian
>
> From: help-bison <address@hidden>
> on behalf of Joel Denny <address@hidden>
> Date: Monday, 17 September 2018 at 05:45
> To: akim <address@hidden>
> Cc: "address@hidden" <address@hidden>, Adrian Vogelsgesang <
> address@hidden>
> Subject: Re: Clarification on "Infinite parsing loops in IELR + LAC"
>
> Hi Adrian, Akim,
>
> On 09/11/2018 12:47 AM, akim wrote:
> > Hi Adrian,
> >
> > That would be a question for Joel, I don't know the details.
>
> Thanks for alerting me to this.
>
> > Le 2018-09-11 00:03, Adrian Vogelsgesang a écrit :
> >> Hi everyone,
> >>
> >> From what I understand reading the bison documentation and some of the
> >> linked publications,
> >> I should probably be using IELR in combination with Lookahead-Correction
>
> Do you have a specific use case in mind that requires these features, or
> have you chosen them because of their superiority in theory?
>
> >>
> >> However, I stumbled across a warning in section 5.8.3 (“LAC”) in the
> >> bison documentation
> >> (https://www.gnu.org/software/bison/manual/bison.html#LAC
> >>> IELR plus LAC does have one shortcoming relative to canonical LR.
> >>> Some parsers generated by Bison
> >>> can loop infinitely. LAC does not fix infinite parsing loops that
> >>> occur between encountering a syntax
> >>> error and detecting it, but enabling canonical LR or disabling
> >>> default reductions sometimes does.
> >>
> >> This sounds quite frightening to me and I am not sure if LAC is
> >> actually something I should be using.
> >> Could someone clarify that part of the documentation for me?
>
> The goal of IELR + LAC is to achieve canonical LR behavior with nearly
> LALR-sized parser tables. The point of the statement you quoted is to
> clarify that IELR + LAC does not achieve canonical LR behavior for some
> grammars in the case of some syntax errors. That statement does not
> describe some major flaw that is introduced by switching to IELR or LAC
> from a traditional Yacc/Bison-generated parser, which uses LALR without
> LAC, and which is less powerful.
>
> >>
> >> In particular, I would be interested in the following points:
> >> * The documentation only mentions IELR + LAC, but what about LALR + LAC?
>
> Switching from IELR to LALR reduces the set of grammars that can be
> recognized. In other words, it takes you even farther from canonical
> LR. However, if you have a grammar that LALR is powerful enough to
> parse, IELR produces exactly LALR parser tables. In that case, until
> your grammar evolves, IELR + LAC = LALR + LAC.
>
> >> * "LAC *does not fix* infinite parsing loops" -> this sounds as if
> >> these infinite parsing loops would also exist without LAC?
>
> Exactly.
>
> I suppose it's possible that one could write a semantic action to
> terminate the parse during one of these infinite parsing loops on a
> syntax error. In that case, because LAC suppresses semantics actions
> while detecting the syntax error, LAC would prevent the semantic action
> from terminating the parse. However, I don't recall ever encountering
> an actual semantic action written for such a purpose.
>
> >> * When exactly do those infinite loops occur? How can I avoid them?
>
> Infinite parsing loops are a known issue in the LR family, including
> canonical LR. In my experience, they rarely occur in practical
> grammars, even when using LALR. I see them mostly as a theoretical
> possibility that occurs in contrived grammars. If you find that they
> occur for a practical grammar, please post it here. Perhaps we should
> say something like this in the manual after the text you quoted?
>
> In summary: Traditional Yacc/Bison (that is, LALR without LAC) < IELR +
> LAC < canonical LR. However, the difference between IELR + LAC and
> canonical LR concerns an issue (infinite parsing loops on syntax errors)
> that appears to be rare for practical grammars.
>
> Hope that helps.
>
> Joel
>
> >>
> >> Best,
> >> Adrian
> >>
> >> _______________________________________________
> >> address@hidden https://lists.gnu.org/mailman/listinfo/help-bison
> >
>
> _______________________________________________
> address@hidden https://lists.gnu.org/mailman/listinfo/help-bison
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]