help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: I can't make sure it's a bug. But I think it's important.


From: Ron Burk
Subject: Re: FW: I can't make sure it's a bug. But I think it's important.
Date: Fri, 13 Sep 2013 22:34:57 -0700

On Fri, Sep 13, 2013 at 1:42 AM, 王波 <address@hidden> wrote:
>    I think Bison should support this way.

I don't think so. It's a specific case of this general problem:
you have a lexer and a parser, you've made the parser depend on state
in the lexer, but that has the potential to make your parser depend on
the vagaries of the parsing algorithm's lookahead. So, despite using
a very high-level tool (a parser generator), you've ended up with code
that relies on the implementation details of that tool. Your proposed
solution is to have the lexer able to tamper in turn with parser state,
which may "fix" your particular grammar, but will surely break others
that did not expect newlines to alter the grammar behavior.

If you can make the newline explicit in your grammar, that's fine.
Often you can't, or it's highly inconvenient. Other solutions include:

Redefine your token data structure so that it includes the line number
of that token. This is the simple and obvious solution, since the line
number is, after all, logically an attribute of the token.

Often, line number information is really only needed in the event of
error messages, in which case it doesn't matter much if it requires
a bit of extra processing to obtain them. So, for example, one can
read input files completely into memory and have tokens be pointers
to the token offsets within the memory image. One can then easily
write a function that calculates the line number of any token. This
costs more memory (the input file is duplicated in memory) but also
makes it easy to recover the exact line (with white space and comments)
to display in the error message, and easy to calculate the exact column
of the token as well. It also saves calculating, storing, and lugging around
that (largely unusued) line number information for each token, so it's
unlikely to be a net CPU loss.

Other solutions that prevent shared state between lexer and parser may
be possible as well. But in general, any shared state between them is
a potential problem due to the parser's need for lookahead, whose
details you would/should rather not make your code dependent on.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]