emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using the wisent parser-generator, as it creates faster parsers


From: Lynn Winebarger
Subject: Re: Using the wisent parser-generator, as it creates faster parsers
Date: Tue, 27 Dec 2022 22:22:37 -0500

On Mon, Dec 26, 2022, 1:13 PM Alexandr Karbivnichyi <ambulajan@gmail.com> wrote:
On Mon, 2022-12-26 at 08:54 -0500, Lynn Winebarger wrote:
> On Sun, Dec 25, 2022, 11:03 PM <ambulajan@gmail.com> wrote:
>
> I don't agree - conflicts detected by the parser generator indicate
> that distinct ASTs map to the same text, at least according to the
> reduction method being used.  I like using LR derivations (including
> ones produced by LALR) because of the bottom-up recognition of
> grammar symbols. 

True for LALR parsers. Tried-and-tested system, for industrial
compilers built on a formal standard with simple grammar that doesn't
change often.
I suppose it depends on the tools you are using.  I tend to roll my own implementation of the LALR(1) construction of DeRemer and Penello, then use the relations they define to determine how strings that will be ambiguously parsed by the grammar. Then that ambiguity can be resolved. It's not that different from what you describe doing at run-time with GLR, just performing the analysis ahead of time (on LR(0) states, aka item sets) and encoding the disambiguation logic into the grammar itself.  The DeRemer-Penello construction is just a control-flow analysis.  I'm not familiar with any work that uses it to automatically construct test cases that illustrate ambiguities in grammars, but it's surprising if there isn't given the tools for test case generation for programs in general purpose programming languages.
There's then the second question of how to map productions of the disambiguated grammar back to an AST data structure that is much "flatter".  Usually the grammar for that AST data structure is the origin of ambiguities in the concrete syntax.

It's a problem in noncommutative algebra.  The ambiguous grammar presumably specifies the language (strings of terminal sbols or tokens) being recognized correctly.  A CFG is a system of equations, and a parse is a particular solution to those equations for a given element in that language.  It's just a matter of finding an equivalent system of equations such that every solution is unique.  At least, that formulation helps me.

Lynn



reply via email to

[Prev in Thread] Current Thread [Next in Thread]