[PATCH 2/5] symbols: clean up their parsing

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 2/5] symbols: clean up their parsing

From:	Akim Demaille
Subject:	[PATCH 2/5] symbols: clean up their parsing
Date:	Sun, 16 Dec 2018 10:45:38 +0100
Prompted by Rici Lake.
http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html

We have four classes of directives that declare symbols: %nterm,
%type, %token, and the family of %left etc.  Currently not all of them
support the possibility to have several type tags (`<type>`), and not
all of them support the fact of not having any type tag at all
(%type).  Let's unify this.

- %type
  POSIX Yacc specifies that %type is for nonterminals only.  However,
  some Bison users want to use it for both tokens and nterms
  (actually, Bison's own grammar does this in several places, e.g.,
  CHAR).  So it should accept char/string literals.

  As a consequence cannot be used to declare tokens with their alias:
  `%type foo "foo"` would be ambiguous (are we defining foo = "foo",
  or are these two different symbols?)

  POSIX specifies that it is OK to use %type without a type tag.  I'm
  not sure what it means, but we support it.

- %token
  Accept token declarations with number and string literal:
  (ID|CHAR) NUM? STRING?.

- %left, etc.
  They cannot be the same as %token, because we accept to declare the
  symbol with %token, and to then qualify its precedence with %left.
  Then `%left foo "foo"` would also be ambiguous: foo="foo", or two
  symbols.

  They cannot be simply a list of identifiers, but POSIX Yacc says we
  can declare token numbers here.  I personally think this is a bad
  idea, precedence management is tricky in itself and should not be
  cluttered with token declaration issues.

  We used to accept declaring a token number on a string literal here
  (e.g., `%left "token" 1`).  This is abnormal.  Either the feature is
  useful, and then it should be supported in %token, or it's useless
  and we should not support it in corner cases.

- %nterm
  Obviously cannot accept tokens, nor char/string literals.  Does not
  exist in POSIX Yacc, but since %type also works for terminals, it is
  a nice option to have.

* src/parse-gram.y: Avoid relying on side effects.  For instance, get
rid of current_type, rather, build the list of symbols and iterate
over it to assign the type.
It's not always possible/convenient.  For instance, we still use
current_class.
Prefer "decl" to "def", since in the rest of the implementation we
actually "declare" symbols, we don't "define" them.
(token_decls, token_decls_for_prec, symbol_decls, nterm_decls): New.
Use them for %token, %left, %type and %nterm.
* src/symlist.h, src/symlist.c (symbol_list_type_set): New.
* tests/regression.at b/tests/regression.at
(Token number in precedence declaration): We no longer accept
to give a number to string literals.
---
 src/parse-gram.y    | 183 +++++++++++++++++++++++++++++---------------
 src/symlist.c       |   9 +++
 src/symlist.h       |   5 ++
 tests/regression.at |   4 +-
 4 files changed, 137 insertions(+), 64 deletions(-)

diff --git a/src/parse-gram.y b/src/parse-gram.y
index 71ef183d..5cc05a86 100644
--- a/src/parse-gram.y
+++ b/src/parse-gram.y
@@ -52,7 +52,6 @@
   static named_ref *current_lhs_named_ref;
   static symbol *current_lhs_symbol;
   static symbol_class current_class = unknown_sym;
-  static uniqstr current_type = NULL;
 
   /** Set the new current left-hand side symbol, possibly common
    * to several right-hand side parts of rule.
@@ -201,13 +200,13 @@
 %token <int> INT "integer"
 %printer { fprintf (yyo, "%d", $$); } <int>
 
-%type <symbol*> id id_colon string_as_id symbol symbol.prec
+%type <symbol*> id id_colon string_as_id symbol token_decl token_decl_for_prec
 %printer { fprintf (yyo, "%s", $$ ? $$->tag : "<NULL>"); } <symbol*>
 %printer { fprintf (yyo, "%s:", $$->tag); } id_colon
 
 %type <assoc> precedence_declarator
 
-%type <symbol_list*>  symbols.1 symbols.prec generic_symlist 
generic_symlist_item
+%printer { symbol_list_syms_print ($$, yyo); } <symbol_list*>
 
 %type <named_ref*> named_ref.opt
 
@@ -332,8 +331,7 @@ params:
 `----------------------*/
 
 grammar_declaration:
-  precedence_declaration
-| symbol_declaration
+  symbol_declaration
 | "%start" symbol
     {
       grammar_start_symbol_set ($2, @2);
@@ -402,36 +400,29 @@ grammar_declaration:
 ;
 
 
-
-
+%type <symbol_list*> nterm_decls symbol_decls symbol_decl.1
+      token_decls token_decls_for_prec
+      token_decl.1 token_decl_for_prec.1;
 symbol_declaration:
-  "%nterm" { current_class = nterm_sym; } symbol_defs.1
+  "%nterm" { current_class = nterm_sym; } nterm_decls[syms]
     {
       current_class = unknown_sym;
-      current_type = NULL;
+      symbol_list_free ($syms);
     }
-| "%token" { current_class = token_sym; } symbol_defs.1
+| "%token" { current_class = token_sym; } token_decls[syms]
     {
       current_class = unknown_sym;
-      current_type = NULL;
+      symbol_list_free ($syms);
     }
-| "%type" TAG symbols.1
+| "%type" symbol_decls[syms]
     {
-      for (symbol_list *list = $3; list; list = list->next)
-        symbol_type_set (list->content.sym, $2, @2);
-      symbol_list_free ($3);
+      symbol_list_free ($syms);
     }
-;
-
-precedence_declaration:
-  precedence_declarator tag.opt symbols.prec[syms]
+| precedence_declarator token_decls_for_prec[syms]
     {
       ++current_prec;
       for (symbol_list *list = $syms; list; list = list->next)
-        {
-          symbol_type_set (list->content.sym, $[tag.opt], @[tag.opt]);
-          symbol_precedence_set (list->content.sym, current_prec, $1, @1);
-        }
+        symbol_precedence_set (list->content.sym, current_prec, $1, @1);
       symbol_list_free ($syms);
     }
 ;
@@ -448,32 +439,7 @@ tag.opt:
 | TAG    { $$ = $1; }
 ;
 
-/* Just like symbols.1 but accept INT for the sake of POSIX.  */
-symbols.prec:
-  symbol.prec
-    { $$ = symbol_list_sym_new ($1, @1); }
-| symbols.prec symbol.prec
-    { $$ = symbol_list_append ($1, symbol_list_sym_new ($2, @2)); }
-;
-
-symbol.prec:
-  symbol[id] int.opt[num]
-    {
-      $$ = $id;
-      symbol_class_set ($id, token_sym, @id, false);
-      if (0 <= $num)
-        symbol_user_token_number_set ($id, $num, @num);
-    }
-;
-
-/* One or more symbols to be %typed. */
-symbols.1:
-  symbol
-    { $$ = symbol_list_sym_new ($1, @1); }
-| symbols.1 symbol
-    { $$ = symbol_list_append ($1, symbol_list_sym_new ($2, @2)); }
-;
-
+%type <symbol_list*> generic_symlist generic_symlist_item;
 generic_symlist:
   generic_symlist_item
 | generic_symlist generic_symlist_item   { $$ = symbol_list_append ($1, $2); }
@@ -490,16 +456,50 @@ tag:
 | "<>"  { $$ = uniqstr_new (""); }
 ;
 
-/* One symbol (token or nterm depending on current_class) definition.  */
-symbol_def:
-  TAG
+/*-----------------------.
+| nterm_decls (%nterm).  |
+`-----------------------*/
+
+// A non empty list of possibly tagged symbols for %nterm.
+// 
+// Can easily be defined like symbol_decls but restricted to ID, but
+// using token_decls allows to reudce the number of rules, and also to
+// make nicer error messages on "%nterm 'a'" or '%nterm FOO "foo"'.
+nterm_decls:
+  token_decls
+;
+
+/*-----------------------------------.
+| token_decls (%token, and %nterm).  |
+`-----------------------------------*/
+
+// A non empty list of possibly tagged symbols for %token or %nterm.
+token_decls:
+  token_decl.1[syms]
+    {
+      $$ = $syms;
+    }
+| TAG token_decl.1[syms]
+    {
+      $$ = symbol_list_type_set ($syms, $TAG, @TAG);
+    }
+| token_decls TAG token_decl.1[syms]
     {
-      current_type = $1;
+      $$ = symbol_list_append ($1, symbol_list_type_set ($syms, $TAG, @TAG));
     }
-| id int.opt[num] string_as_id.opt[alias]
+;
+
+// One or more symbol declarations for %token or %nterm.
+token_decl.1:
+  token_decl                { $$ = symbol_list_sym_new ($1, @1); }
+| token_decl.1 token_decl   { $$ = symbol_list_append ($1, symbol_list_sym_new 
($2, @2)); }
+
+// One symbol declaration for %token or %nterm.
+token_decl:
+  id int.opt[num] string_as_id.opt[alias]
     {
+      $$ = $id;
       symbol_class_set ($id, current_class, @id, true);
-      symbol_type_set ($id, current_type, @id);
       if (0 <= $num)
         symbol_user_token_number_set ($id, $num, @num);
       if ($alias)
@@ -513,15 +513,74 @@ int.opt:
 | INT
 ;
 
-/* One or more symbol definitions. */
-symbol_defs.1:
-  symbol_def
-| symbol_defs.1 symbol_def
-  /* FIXME: cannot do that, results in infinite loop in LAC.
-| error                    { yyerrok; }
-  */
+/*-------------------------------------.
+| token_decls_for_prec (%left, etc.).  |
+`-------------------------------------*/
+
+// A non empty list of possibly tagged tokens for precedence declaration.
+//
+// Similar to %token (token_decls), but in '%left FOO 1 "foo"', it treats
+// FOO and "foo" as two different symbols instead of aliasing them.
+token_decls_for_prec:
+  token_decl_for_prec.1[syms]
+    {
+      $$ = $syms;
+    }
+| TAG token_decl_for_prec.1[syms]
+    {
+      $$ = symbol_list_type_set ($syms, $TAG, @TAG);
+    }
+| token_decls_for_prec TAG token_decl_for_prec.1[syms]
+    {
+      $$ = symbol_list_append ($1, symbol_list_type_set ($syms, $TAG, @TAG));
+    }
+;
+
+// One or more token declarations for precedence declaration.
+token_decl_for_prec.1:
+  token_decl_for_prec
+    { $$ = symbol_list_sym_new ($1, @1); }
+| token_decl_for_prec.1 token_decl_for_prec
+    { $$ = symbol_list_append ($1, symbol_list_sym_new ($2, @2)); }
+
+// One token declaration for precedence declaration.
+token_decl_for_prec:
+  id int.opt[num]
+    {
+      $$ = $id;
+      symbol_class_set ($id, token_sym, @id, false);
+      if (0 <= $num)
+        symbol_user_token_number_set ($id, $num, @num);
+    }
+| string_as_id
+;
+
+
+/*-----------------------.
+| symbol_decls (%type).  |
+`-----------------------*/
+
+// A non empty list of typed symbols.
+symbol_decls:
+  symbol_decl.1[syms]
+    {
+      $$ = $syms;
+    }
+| TAG symbol_decl.1[syms]
+    {
+      $$ = symbol_list_type_set ($syms, $TAG, @TAG);
+    }
+| symbol_decls TAG symbol_decl.1[syms]
+    {
+      $$ = symbol_list_append ($1, symbol_list_type_set ($syms, $TAG, @TAG));
+    }
 ;
 
+// One or more token declarations.
+symbol_decl.1:
+  symbol                { $$ = symbol_list_sym_new ($1, @1); }
+| symbol_decl.1 symbol  { $$ = symbol_list_append ($1, symbol_list_sym_new 
($2, @2)); }
+;
 
         /*------------------------------------------.
         | The grammar section: between the two %%.  |
diff --git a/src/symlist.c b/src/symlist.c
index d7b61e24..7d9fe83f 100644
--- a/src/symlist.c
+++ b/src/symlist.c
@@ -81,6 +81,15 @@ symbol_list_type_new (uniqstr type_name, location loc)
 }
 
 
+symbol_list *
+symbol_list_type_set (symbol_list *syms, uniqstr type_name, location loc)
+{
+  for (symbol_list *l = syms; l; l = l->next)
+    symbol_type_set (l->content.sym, type_name, loc);
+  return syms;
+}
+
+
 /*-----------------------------------------------------------------------.
 | Print this list, for which every content_type must be SYMLIST_SYMBOL.  |
 `-----------------------------------------------------------------------*/
diff --git a/src/symlist.h b/src/symlist.h
index 0bd6bd75..3fdf1710 100644
--- a/src/symlist.h
+++ b/src/symlist.h
@@ -103,6 +103,11 @@ symbol_list *symbol_list_sym_new (symbol *sym, location 
loc);
 /** Create a list containing \c type_name at \c loc.  */
 symbol_list *symbol_list_type_new (uniqstr type_name, location loc);
 
+/** Assign the type \c type_name to all the members of \c syms.
+ ** \returns \c syms */
+symbol_list *symbol_list_type_set (symbol_list *syms,
+                                   uniqstr type_name, location loc);
+
 /** Print this list.
 
   \pre For every node \c n in the list, <tt>n->content_type =
diff --git a/tests/regression.at b/tests/regression.at
index 1a7d777c..a49b7c13 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1096,10 +1096,10 @@ AT_DATA_GRAMMAR([input.y],
   ]AT_YYERROR_DECLARE[
   ]AT_YYLEX_DECLARE[
 }
-
 %define parse.error verbose
+%token TK_ALIAS 3 "tok alias"
 %right END 0
-%left TK1 1 TK2 2 "tok alias" 3
+%left TK1 1 TK2 2 "tok alias"
 
 %%
 
-- 
2.19.2
[Prev in Thread]
Current Thread
[Next in Thread]
[PATCH 0/5] Revamp the parsing of symbol declarations, Akim Demaille, 2018/12/16
- [PATCH 2/5] symbols: clean up their parsing, Akim Demaille <=
- [PATCH 4/5] symbols: check the previous commit, Akim Demaille, 2018/12/16
- [PATCH 5/5] symbols: check more invalid declarations, Akim Demaille, 2018/12/16
- [PATCH 1/5] symbols: set tag_seen when assigning a type to symbols, Akim Demaille, 2018/12/16
- Re: [PATCH 0/5] Revamp the parsing of symbol declarations, Akim Demaille, 2018/12/16
Prev by Date: tests: isolate test about Yacc warnings
Next by Date: [PATCH 4/5] symbols: check the previous commit
Previous by thread: [PATCH 0/5] Revamp the parsing of symbol declarations
Next by thread: [PATCH 4/5] symbols: check the previous commit
Index(es):
- Date
- Thread