[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
maint: gram: fix handling of nterms in actions when some are unused (was
From: |
Akim Demaille |
Subject: |
maint: gram: fix handling of nterms in actions when some are unused (was: symbol type issue with unused non-terminals) |
Date: |
Sat, 2 Feb 2019 17:27:13 +0100 |
> Le 1 févr. 2019 à 13:27, Scheidler, Balázs <address@hidden> a écrit :
>
> On Fri, Feb 1, 2019 at 6:56 AM Akim Demaille <address@hidden> wrote:
>
>> For the record, something that is very useful to debug such issues is the
>> --trace option. In the present case, --trace=muscle would reveal the
>> generated symbol numbers. Comparing 3.2 and 3.3 is instructive.
>
> I used --trace=muscles while trying to understand what bison does. I was
> also reading its code, which I've found pretty easy to read btw.
Thanks :) We do try to keep it clean.
>> I'll try to address this asap and release the fix immediately.
>> Sorry about this issue.
>>
>> Our test suite is already quite big, but I regularly discover missing
>> cases...
>>
> It's an uphill battle, but still a useful one. I find that tests
> (especially if they are fast) give me a lot of self confidence when cutting
> releases :)
Ours is not really fast :( But it's very useful.
How about the appended commit? I will push it to the maint branch once the CI
okay'ed it. Please, confirm that it addresses your issue. Then I'll release
3.3.2 (today or tomorrow). You seem to be a power user of Bison: you might be
willing to subscribe to its announcement list, so that you could check how new
releases behave on your code base:
https://lists.gnu.org/mailman/listinfo/bison-announce.
There's a tarball with these changes here:
https://www.lrde.epita.fr/~akim/private/bison/bison-3.3.1.9-9929.tar.gz
https://www.lrde.epita.fr/~akim/private/bison/bison-3.3.1.9-9929.tar.xz
Thanks for the report, and the analysis!
commit 680407846d3cffeff94b98a3e1893d2eeee437d0
Author: Akim Demaille <address@hidden>
Date: Sat Feb 2 07:18:00 2019 +0100
gram: fix handling of nterms in actions when some are unused
Since Bison 3.3, semantic values and locations in rule actions (i.e.,
'$...' and '@...') are passed to the m4 backend as the symbol number.
Unfortunately, when there are unused symbols, the symbols are
renumbered _after_ the numbers were used in the rule actions. As a
result, the evaluation of the skeleton failed because it used non
existing symbol numbers. Which is the happy scenario: we could use
numbers of other existing symbols...
Reported by Balázs Scheidler.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html
Translating the rule actions after the symbol renumbering moves too
many parts in bison. Relying on the symbol identifiers is more
troublesome than it might first seem: some don't have an
identifier (tokens with only a literal string), some might have a
complex one (tokens with a literal string with characters special for
M4). Well, these are tokens, but nterms also have issues: "dummy"
nterms (for midrule actions) are named address@hidden etc. which is risky
for
M4.
Instead, let's simply give M4 the mapping between the old numbers and
the new ones. To avoid confusion between old and new numbers, always
emit pre-renumbering numbers as "orig NUM".
* data/README: Give details about "orig NUM".
* data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the
"orig NUM".
* src/output.c (prepare_symbol_definitions): Pass nterm_map to m4.
* src/reduce.h, src/reduce.c (nterm_map): Extract it from
nonterminals_reduce, to make it public.
(reduce_free): Free it.
* src/scan-code.l (handle_action_dollar): When referring to a nterm,
use "orig NUM".
* tests/reduce.at: Check that the generated parsers are proper C.
(Useless Parts): New, based Balázs Scheidler's report.
diff --git a/NEWS b/NEWS
index 64aa3b61..f5475861 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,10 @@ GNU Bison NEWS
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Bug fixes
+
+ Bison 3.3 failed to generate parsers for grammars with unused non terminal
+ symbols.
* Noteworthy changes in release 3.3.1 (2019-01-27) [stable]
diff --git a/THANKS b/THANKS
index f704388a..597ca3f7 100644
--- a/THANKS
+++ b/THANKS
@@ -18,6 +18,7 @@ Antonio Silva Correia address@hidden
Arnold Robbins address@hidden
Art Haas address@hidden
Askar Safin address@hidden
+Balázs Scheidler address@hidden
Baron Schwartz address@hidden
Ben Pfaff address@hidden
Benoit Perrot address@hidden
diff --git a/data/README b/data/README
index 1a2e71cc..e313ee63 100644
--- a/data/README
+++ b/data/README
@@ -75,48 +75,73 @@ skeletons.
## Symbols
+### `b4_symbol(NUM, FIELD)`
In order to unify the handling of the various aspects of symbols (tag, type
name, whether terminal, etc.), bison.exe defines one macro per (token,
field), where field can `has_id`, `id`, etc.: see
-src/output.c:prepare_symbols_definitions().
+`prepare_symbols_definitions()` in `src/output.c`.
-The various FIELDS are:
+The macro `b4_symbol(NUM, FIELD)` gives access to the following FIELDS:
+
+- `has_id`: 0 or 1.
-- has_id: 0 or 1.
Whether the symbol has an id.
-- id: string
- If has_id, the id. Guaranteed to be usable as a C identifier.
- Prefixed by api.token.prefix if defined.
-- tag: string.
+
+- `id`: string
+ If has_id, the id (prefixed by api.token.prefix if defined), otherwise
+ defined as empty. Guaranteed to be usable as a C identifier.
+
+- `tag`: string.
A representation of the symbol. Can be 'foo', 'foo.id', '"foo"' etc.
-- user_number: integer
+
+- `user_number`: integer
The external number as used by yylex. Can be ASCII code when a character,
some number chosen by bison, or some user number in the case of
%token FOO <NUM>. Corresponds to yychar in yacc.c.
-- is_token: 0 or 1
+
+- `is_token`: 0 or 1
Whether this is a terminal symbol.
-- number: integer
+
+- `number`: integer
The internal number (computed from the external number by yytranslate).
Corresponds to yytoken in yacc.c. This is the same number that serves as
key in b4_symbol(NUM, FIELD).
-- has_type: 0, 1
+
+ In bison, symbols are first assigned increasing numbers in order of
+ appearance (tokens first, them nterms). Unused nterms are then renumbered
+ to appear last (i.e., first tokens, then used nterms and finally unused
+ nterms). This final number is the one contained in this field.
+
+ The code of the rule actions, however, is emitted before we know what
+ symbols are unused, so they use the original numbers. To avoid confusion,
+ they actually use "orig NUM" instead of "NUM". bison also emits
+ definitions for `b4_symbol(orig NUM, number)` that map from original
+ numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the
+ other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
+ symbols whose original number was 42.
+
+- `has_type`: 0, 1
Whether has a semantic value.
-- type_tag: string
+
+- `type_tag`: string
When api.value.type=union, the generated name for the union member.
yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
-- type
+
+- `type`
If it has a semantic value, its type tag, or, if variant are used,
its type.
In the case of api.value.type=union, type is the real type (e.g. int).
-- has_printer: 0, 1
-- printer: string
-- printer_file: string
-- printer_line: integer
+
+- `has_printer`: 0, 1
+- `printer`: string
+- `printer_file`: string
+- `printer_line`: integer
If the symbol has a printer, everything about it.
-- has_destructor, destructor, destructor_file, destructor_line
+
+- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`
Likewise.
-### b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])
+### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
Expansion of $$, $1, $<TYPE-TAG>3, etc.
The semantic value from a given VAL.
@@ -127,14 +152,14 @@ The semantic value from a given VAL.
The result can be used safely, it is put in parens to avoid nasty precedence
issues.
-### b4_lhs_value(SYMBOL-NUM, [TYPE])
+### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
-### b4_rhs_data(RULE-LENGTH, POS)
+### `b4_rhs_data(RULE-LENGTH, POS)`
The data corresponding to the symbol `#POS`, where the current rule has
`RULE-LENGTH` symbols on RHS.
-### b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])
+### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
on RHS.
diff --git a/data/skeletons/bison.m4 b/data/skeletons/bison.m4
index 8a33a582..e3591875 100644
--- a/data/skeletons/bison.m4
+++ b/data/skeletons/bison.m4
@@ -389,17 +389,28 @@ m4_define([b4_glr_cc_if],
#
# The following macros provide access to symbol related values.
+# __b4_symbol(NUM, FIELD)
+# -----------------------
+# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
+# undefined.
+m4_define([__b4_symbol],
+[m4_indir([b4_symbol($1, $2)])])
+
+
# _b4_symbol(NUM, FIELD)
# ----------------------
-# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
+# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if
# undefined.
m4_define([_b4_symbol],
-[m4_indir([b4_symbol($1, $2)])])
+[m4_ifdef([b4_symbol($1, number)],
+ [__b4_symbol(m4_indir([b4_symbol($1, number)]), $2)],
+ [__b4_symbol([$1], [$2])])])
+
# b4_symbol(NUM, FIELD)
# ---------------------
-# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
+# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if
# undefined. If FIELD = id, prepend the token prefix.
m4_define([b4_symbol],
[m4_case([$2],
diff --git a/src/output.c b/src/output.c
index da600ae5..6e4fc595 100644
--- a/src/output.c
+++ b/src/output.c
@@ -38,6 +38,7 @@
#include "muscle-tab.h"
#include "output.h"
#include "reader.h"
+#include "reduce.h"
#include "scan-code.h" /* max_left_semantic_context */
#include "scan-skel.h"
#include "symtab.h"
@@ -414,6 +415,14 @@ merger_output (FILE *out)
static void
prepare_symbol_definitions (void)
{
+ /* Map "orig NUM" to new numbers. See data/README. */
+ for (symbol_number i = ntokens; i < nsyms + nuseless_nonterminals; ++i)
+ {
+ obstack_printf (&format_obstack, "symbol(orig %d, number)", i);
+ const char *key = obstack_finish0 (&format_obstack);
+ MUSCLE_INSERT_INT (key, nterm_map ? nterm_map[i - ntokens] : i);
+ }
+
for (int i = 0; i < nsyms; ++i)
{
symbol *sym = symbols[i];
diff --git a/src/reduce.c b/src/reduce.c
index e3a5e039..b1815531 100644
--- a/src/reduce.c
+++ b/src/reduce.c
@@ -259,13 +259,14 @@ reduce_grammar_tables (void)
| Remove useless nonterminals. |
`------------------------------*/
+symbol_number *nterm_map = NULL;
+
static void
nonterminals_reduce (void)
{
+ nterm_map = xnmalloc (nvars, sizeof *nterm_map);
/* Map the nonterminals to their new index: useful first, useless
afterwards. Kept for later report. */
-
- symbol_number *nterm_map = xnmalloc (nvars, sizeof *nterm_map);
{
symbol_number n = ntokens;
for (symbol_number i = ntokens; i < nsyms; ++i)
@@ -306,8 +307,6 @@ nonterminals_reduce (void)
nsyms -= nuseless_nonterminals;
nvars -= nuseless_nonterminals;
-
- free (nterm_map);
}
@@ -433,4 +432,6 @@ reduce_free (void)
bitset_free (V);
bitset_free (V1);
bitset_free (P);
+ free (nterm_map);
+ nterm_map = NULL;
}
diff --git a/src/reduce.h b/src/reduce.h
index c3866fc3..a6b4946b 100644
--- a/src/reduce.h
+++ b/src/reduce.h
@@ -32,6 +32,11 @@ bool reduce_nonterminal_useless_in_grammar (const
sym_content *sym);
void reduce_free (void);
+/** Map initial nterm numbers to the new ones. Built by
+ * reduce_grammar. Size nvars. */
+extern symbol_number *nterm_map;
+
extern unsigned nuseless_nonterminals;
extern unsigned nuseless_productions;
+
#endif /* !REDUCE_H_ */
diff --git a/src/scan-code.l b/src/scan-code.l
index 9141a9a5..d9e2a4b9 100644
--- a/src/scan-code.l
+++ b/src/scan-code.l
@@ -648,7 +648,7 @@ handle_action_dollar (symbol_list *rule, char *text,
location dollar_loc)
untyped_var_seen = true;
}
- obstack_printf (&obstack_for_string, "]b4_lhs_value(%d, ",
+ obstack_printf (&obstack_for_string, "]b4_lhs_value(orig %d, ",
sym->content.sym->content->number);
obstack_quote (&obstack_for_string, type_name);
obstack_sgrow (&obstack_for_string, ")[");
@@ -677,7 +677,9 @@ handle_action_dollar (symbol_list *rule, char *text,
location dollar_loc)
"]b4_rhs_value(%d, %d, ",
effective_rule_length, n);
if (sym)
- obstack_printf (&obstack_for_string, "%d, ",
sym->content.sym->content->number);
+ obstack_printf (&obstack_for_string, "%s%d, ",
+ sym->content.sym->content->class == nterm_sym ?
"orig " : "",
+ sym->content.sym->content->number);
else
obstack_sgrow (&obstack_for_string, "[], ");
diff --git a/tests/reduce.at b/tests/reduce.at
index 805b89d2..511c6f17 100644
--- a/tests/reduce.at
+++ b/tests/reduce.at
@@ -100,6 +100,9 @@ Rules useless in grammar
4 useless3: %empty
]])
+# Make sure the generated parser is correct.
+AT_COMPILE([input.o])
+
AT_CLEANUP
@@ -195,6 +198,81 @@ Rules useless in grammar
10 useless9: '9'
]])
+# Make sure the generated parser is correct.
+AT_COMPILE([input.o])
+
+AT_CLEANUP
+
+
+
+## --------------- ##
+## Useless Parts. ##
+## --------------- ##
+
+AT_SETUP([Useless Parts])
+
+# We used to emit code that used symbol numbers before the useless
+# symbol elimination, hence before the renumbering of the useful
+# symbols. As a result, the evaluation of the skeleton failed because
+# it used non existing symbol numbers. Which is the happy scenario:
+# we could use numbers of other existing symbols...
+# http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html
+
+AT_DATA([[input.y]],
+[[%type <ptr> used1
+%type <ptr> used2
+
+%%
+start
+ : used1
+ ;
+
+used1
+ : used2 { $$ = $1; }
+ ;
+
+unused
+ : used2
+ ;
+
+used2
+ : { $$ = (void*)0; }
+ ;
+]])
+
+AT_BISON_CHECK([[-fcaret -rall input.y]], 0, [],
+[[input.y: warning: 1 nonterminal useless in grammar [-Wother]
+input.y: warning: 1 rule useless in grammar [-Wother]
+input.y:13.1-6: warning: nonterminal useless in grammar: unused [-Wother]
+ unused
+ ^~~~~~
+]])
+
+
+AT_CHECK([[sed -n '/^State 0/q;/^$/!p' input.output]], 0,
+[[Nonterminals useless in grammar
+ unused
+Rules useless in grammar
+ 4 unused: used2
+Grammar
+ 0 $accept: start $end
+ 1 start: used1
+ 2 used1: used2
+ 3 used2: %empty
+Terminals, with rules where they appear
+$end (0) 0
+error (256)
+Nonterminals, with rules where they appear
+$accept (3)
+ on left: 0
+start (4)
+ on left: 1, on right: 0
+used1 <ptr> (5)
+ on left: 2, on right: 1
+used2 <ptr> (6)
+ on left: 3, on right: 2
+]])
+
AT_CLEANUP