Re: Extract the action scanner from the grammar scanner

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Extract the action scanner from the grammar scanner

From:	Akim Demaille
Subject:	Re: Extract the action scanner from the grammar scanner
Date:	Tue, 06 Jun 2006 18:39:49 +0200
User-agent:	Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)
>>> "Akim" == Akim Demaille <address@hidden> writes:

 > I have already referred to this change several times.  Our current
 > scanner is very complex and has subtle interactions with our parser.
 > This is hard to maintain, and also prevents easy extensions of
 > our grammar (which defeats the whole point of having moved to
 > using bison for Bison).

 > I was about to install this patch when I read Paul's message about
 > 2.3 being out soon?

I'm installing the patch now.  The work is almost done, but there
remains a couple of issues to address.  One significant issue is the
fail of the calc++ sample, due to the fact that we are still blindly
adding a `;' at the end of user actions.  Unfortunately this is
applied in a place where we are not having a user action, but rather a
specification of the extra arguments.

Of course I plan to solve this in the near future.

Also, we have to sort out the initial columns and lines, I discovered
that we changed our values.  This is troublesome.

Index: ChangeLog
from  Akim Demaille  <address@hidden>

        Extract the parsing of user actions from the grammar scanner.
        As a consequence, the relation between the grammar scanner and
        parser is much simpler.  We can also split "composite tokens" back
        into simple tokens.
        * src/gram.h (ITEM_NUMBER_MAX, RULE_NUMBER_MAX): New.
        * src/scan-gram.l (add_column_width, adjust_location): Move to and
        rename as...
        * src/location.h, src/location.c (add_column_width)
        (location_compute): these.
        Fix the column count: the initial column is 0.
        (location_print): Be robust to ending column being 0.
        * src/location.h (boundary_set): New.
        * src/main.c: Adjust to scanner_free being renamed as
        gram_scanner_free.
        * src/output.c: Include scan-code.h.
        * src/parse-gram.y: Include scan-gram.h and scan-code.h.
        Use boundary_set.
        (PERCENT_DESTRUCTOR, PERCENT_PRINTER, PERCENT_INITIAL_ACTION)
        (PERCENT_LEX_PARAM, PERCENT_PARSE_PARAM): Remove the {...} part,
        which is now, again, a separate token.
        Adjust all dependencies.
        Whereever actions with $ and @ are used, use translate_code.
        (action): Remove this nonterminal which is now useless.
        * src/reader.c: Include assert.h, scan-gram.h and scan-code.h.
        (grammar_current_rule_action_append): Use translate_code.
        (packgram): Bound check ruleno, itemno, and rule_length.
        * src/reader.h (gram_in, gram__flex_debug, scanner_cursor)
        (last_string, last_braced_code_loc, max_left_semantic_context)
        (scanner_initialize, scanner_free, scanner_last_string_free)
        (gram_out, gram_lineno, YY_DECL_): Move to...
        * src/scan-gram.h: this new file.
        (YY_DECL): Rename as...
        (GRAM_DECL): this.
        * src/scan-code.h, src/scan-code.l, src/scan-code-c.c: New.
        * src/scan-gram.l (gram_get_lineno, gram_get_in, gram_get_out):
        (gram_get_leng, gram_get_text, gram_set_lineno, gram_set_in):
        (gram_set_out, gram_get_debug, gram_set_debug, gram_lex_destroy):
        Move these declarations, and...
        (obstack_for_string, STRING_GROW, STRING_FINISH, STRING_FREE):
        these to...
        * src/flex-scanner.h: this new file.
        * src/scan-gram.l (rule_length, rule_length_overflow)
        (increment_rule_length): Remove.
        (last_braced_code_loc): Rename as...
        (gram_last_braced_code_loc): this.
        Adjust to the changes of the parser.
        Move all the handling of $ and @ into...
        * src/scan-code.l: here.
        * src/scan-gram.l (handle_dollar, handle_at): Remove.
        (handle_action_dollar, handle_action_at): Move to...
        * src/scan-code.l: here.
        * src/Makefile.am (bison_SOURCES): Add flex-scanner.h,
        scan-code.h, scan-code-c.c, scan-gram.h.
        (EXTRA_bison_SOURCES): Add scan-code.l.
        (BUILT_SOURCES): Add scan-code.c.
        (yacc): Be robust to white spaces.

        * tests/conflicts.at, tests/input.at, tests/reduce.at,
        * tests/regression.at: Adjust the column numbers.
        * tests/regression.at: Adjust the error message.

$Id: ChangeLog,v 1.1494 2006/06/06 06:00:55 jdenny Exp $
Index: src/Makefile.am
===================================================================
RCS file: /cvsroot/bison/bison/src/Makefile.am,v
retrieving revision 1.68
diff -u -u -r1.68 Makefile.am
--- src/Makefile.am 6 Jun 2006 05:23:44 -0000 1.68
+++ src/Makefile.am 6 Jun 2006 16:38:55 -0000
@@ -1,4 +1,4 @@
-## Copyright (C) 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+## Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, 
Inc.
 
 ## This program is free software; you can redistribute it and/or modify
 ## it under the terms of the GNU General Public License as published by
@@ -39,6 +39,7 @@
        conflicts.c conflicts.h                   \
        derives.c derives.h                       \
        files.c files.h                           \
+       flex-scanner.h                            \
        getargs.c getargs.h                       \
        gram.c gram.h                             \
        lalr.h lalr.c                             \
@@ -54,8 +55,9 @@
        reduce.c reduce.h                         \
        revision.c revision.h                     \
        relation.c relation.h                     \
-       scan-gram-c.c                             \
-       scan-skel-c.c scan-skel.h                 \
+       scan-code.h scan-code-c.c                 \
+       scan-gram.h scan-gram-c.c                 \
+       scan-skel.h scan-skel-c.c                 \
        state.c state.h                           \
        symlist.c symlist.h                       \
        symtab.c symtab.h                         \
@@ -65,15 +67,20 @@
        vcg.c vcg.h                               \
        vcg_defaults.h
 
-EXTRA_bison_SOURCES = scan-skel.l scan-gram.l
+EXTRA_bison_SOURCES = scan-code.l scan-skel.l scan-gram.l
 
-BUILT_SOURCES = revision.c scan-skel.c scan-gram.c parse-gram.c parse-gram.h
+BUILT_SOURCES =                                        \
+parse-gram.c parse-gram.h                      \
+revision.c                                     \
+scan-code.c                                    \
+scan-skel.c                                    \
+scan-gram.c                                    \
 
 MOSTLYCLEANFILES = yacc
 
 yacc:
        echo '#! /bin/sh' >$@
-       echo 'exec $(bindir)/bison -y "$$@"' >>$@
+       echo "exec '$(bindir)/bison' -y \"address@hidden"" >>$@
        chmod a+x $@
 
 echo:
Index: src/gram.h
===================================================================
RCS file: /cvsroot/bison/bison/src/gram.h,v
retrieving revision 1.58
diff -u -u -r1.58 gram.h
--- src/gram.h 9 Nov 2005 15:48:05 -0000 1.58
+++ src/gram.h 6 Jun 2006 16:38:55 -0000
@@ -1,6 +1,6 @@
 /* Data definitions for internal representation of Bison's input.
 
-   Copyright (C) 1984, 1986, 1989, 1992, 2001, 2002, 2003, 2004, 2005
+   Copyright (C) 1984, 1986, 1989, 1992, 2001, 2002, 2003, 2004, 2005, 2006
    Free Software Foundation, Inc.
 
    This file is part of Bison, the GNU Compiler Compiler.
@@ -115,6 +115,7 @@
 extern int nvars;
 
 typedef int item_number;
+#define ITEM_NUMBER_MAX INT_MAX
 extern item_number *ritem;
 extern unsigned int nritems;
 
@@ -146,6 +147,7 @@
 
 /* Rule numbers.  */
 typedef int rule_number;
+#define RULE_NUMBER_MAX INT_MAX
 extern rule_number nrules;
 
 static inline item_number
Index: src/location.c
===================================================================
RCS file: /cvsroot/bison/bison/src/location.c,v
retrieving revision 1.6
diff -u -u -r1.6 location.c
--- src/location.c 9 Dec 2005 23:51:26 -0000 1.6
+++ src/location.c 6 Jun 2006 16:38:55 -0000
@@ -1,6 +1,5 @@
 /* Locations for Bison
-
-   Copyright (C) 2002, 2005 Free Software Foundation, Inc.
+   Copyright (C) 2002, 2005, 2006 Free Software Foundation, Inc.
 
    This file is part of Bison, the GNU Compiler Compiler.
 
@@ -28,11 +27,80 @@
 
 location const empty_location;
 
+/* If BUF is null, add BUFSIZE (which in this case must be less than
+   INT_MAX) to COLUMN; otherwise, add mbsnwidth (BUF, BUFSIZE, 0) to
+   COLUMN.  If an overflow occurs, or might occur but is undetectable,
+   return INT_MAX.  Assume COLUMN is nonnegative.  */
+
+static inline int
+add_column_width (int column, char const *buf, size_t bufsize)
+{
+  size_t width;
+  unsigned int remaining_columns = INT_MAX - column;
+
+  if (buf)
+    {
+      if (INT_MAX / 2 <= bufsize)
+       return INT_MAX;
+      width = mbsnwidth (buf, bufsize, 0);
+    }
+  else
+    width = bufsize;
+
+  return width <= remaining_columns ? column + width : INT_MAX;
+}
+
+/* Set *LOC and adjust scanner cursor to account for token TOKEN of
+   size SIZE.  */
+
+void
+location_compute (location *loc, boundary *cur, char const *token, size_t size)
+{
+  int line = cur->line;
+  int column = cur->column;
+  char const *p0 = token;
+  char const *p = token;
+  char const *lim = token + size;
+
+  loc->start = *cur;
+
+  for (p = token; p < lim; p++)
+    switch (*p)
+      {
+      case '\n':
+       line += line < INT_MAX;
+       column = 1;
+       p0 = p + 1;
+       break;
+
+      case '\t':
+       column = add_column_width (column, p0, p - p0);
+       column = add_column_width (column, NULL, 8 - ((column - 1) & 7));
+       p0 = p + 1;
+       break;
+
+      default:
+       break;
+      }
+
+  cur->line = line;
+  cur->column = column = add_column_width (column, p0, p - p0);
+
+  loc->end = *cur;
+
+  if (line == INT_MAX && loc->start.line != INT_MAX)
+    warn_at (*loc, _("line number overflow"));
+  if (column == INT_MAX && loc->start.column != INT_MAX)
+    warn_at (*loc, _("column number overflow"));
+}
+
+
 /* Output to OUT the location LOC.
    Warning: it uses quotearg's slot 3.  */
 void
 location_print (FILE *out, location loc)
 {
+  int end_col = 0 < loc.end.column ? loc.end.column - 1 : 0;
   fprintf (out, "%s:%d.%d",
           quotearg_n_style (3, escape_quoting_style, loc.start.file),
           loc.start.line, loc.start.column);
@@ -40,9 +108,9 @@
   if (loc.start.file != loc.end.file)
     fprintf (out, "-%s:%d.%d",
             quotearg_n_style (3, escape_quoting_style, loc.end.file),
-            loc.end.line, loc.end.column - 1);
+            loc.end.line, end_col);
   else if (loc.start.line < loc.end.line)
-    fprintf (out, "-%d.%d", loc.end.line, loc.end.column - 1);
-  else if (loc.start.column < loc.end.column - 1)
-    fprintf (out, "-%d", loc.end.column - 1);
+    fprintf (out, "-%d.%d", loc.end.line, end_col);
+  else if (loc.start.column < end_col)
+    fprintf (out, "-%d", end_col);
 }
Index: src/location.h
===================================================================
RCS file: /cvsroot/bison/bison/src/location.h,v
retrieving revision 1.13
diff -u -u -r1.13 location.h
--- src/location.h 9 Mar 2006 23:23:11 -0000 1.13
+++ src/location.h 6 Jun 2006 16:38:55 -0000
@@ -40,6 +40,15 @@
 
 } boundary;
 
+/* Set the position of \a a. */
+static inline void
+boundary_set (boundary *b, const char *f, int l, int c)
+{
+  b->file = f; 
+  b->line = l;         
+  b->column = c;               
+}
+
 /* Return nonzero if A and B are equal boundaries.  */
 static inline bool
 equal_boundaries (boundary a, boundary b)
@@ -64,6 +73,11 @@
 
 extern location const empty_location;
 
+/* Set *LOC and adjust scanner cursor to account for token TOKEN of
+   size SIZE.  */
+void location_compute (location *loc,
+                      boundary *cur, char const *token, size_t size);
+
 void location_print (FILE *out, location loc);
 
 #endif /* ! defined LOCATION_H_ */
Index: src/main.c
===================================================================
RCS file: /cvsroot/bison/bison/src/main.c,v
retrieving revision 1.85
diff -u -u -r1.85 main.c
--- src/main.c 9 Dec 2005 23:51:26 -0000 1.85
+++ src/main.c 6 Jun 2006 16:38:55 -0000
@@ -1,6 +1,7 @@
 /* Top level entry point of Bison.
 
-   Copyright (C) 1984, 1986, 1989, 1992, 1995, 2000, 2001, 2002, 2004, 2005
+   Copyright (C) 1984, 1986, 1989, 1992, 1995, 2000, 2001, 2002, 2004, 2005,
+   2006
    Free Software Foundation, Inc.
 
    This file is part of Bison, the GNU Compiler Compiler.
@@ -169,7 +170,7 @@
 
   /* The scanner memory cannot be released right after parsing, as it
      contains things such as user actions, prologue, epilogue etc.  */
-  scanner_free ();
+  gram_scanner_free ();
   muscle_free ();
   uniqstrs_free ();
   timevar_pop (TV_FREE);
Index: src/output.c
===================================================================
RCS file: /cvsroot/bison/bison/src/output.c,v
retrieving revision 1.247
diff -u -u -r1.247 output.c
--- src/output.c 14 May 2006 20:40:35 -0000 1.247
+++ src/output.c 6 Jun 2006 16:38:55 -0000
@@ -36,6 +36,7 @@
 #include "muscle_tab.h"
 #include "output.h"
 #include "reader.h"
+#include "scan-code.h"    /* max_left_semantic_context */
 #include "scan-skel.h"
 #include "symtab.h"
 #include "tables.h"
Index: src/parse-gram.y
===================================================================
RCS file: /cvsroot/bison/bison/src/parse-gram.y,v
retrieving revision 1.74
diff -u -u -r1.74 parse-gram.y
--- src/parse-gram.y 14 May 2006 19:14:10 -0000 1.74
+++ src/parse-gram.y 6 Jun 2006 16:38:55 -0000
@@ -32,6 +32,8 @@
 #include "quotearg.h"
 #include "reader.h"
 #include "symlist.h"
+#include "scan-gram.h"
+#include "scan-code.h"
 #include "strverscmp.h"
 
 #define YYLLOC_DEFAULT(Current, Rhs, N)  (Current) = lloc_default (Rhs, N)
@@ -84,9 +86,8 @@
 {
   /* Bison's grammar can initial empty locations, hence a default
      location is needed. */
-  @$.start.file   = @$.end.file   = current_file;
-  @$.start.line   = @$.end.line   = 1;
-  @$.start.column = @$.end.column = 0;
+  boundary_set (&@$.start, current_file, 1, 0);
+  boundary_set (&@$.end, current_file, 1, 0);
 }
 
 /* Only NUMBERS have a value.  */
@@ -109,8 +110,8 @@
 %token PERCENT_NTERM       "%nterm"
 
 %token PERCENT_TYPE        "%type"
-%token PERCENT_DESTRUCTOR  "%destructor {...}"
-%token PERCENT_PRINTER     "%printer {...}"
+%token PERCENT_DESTRUCTOR  "%destructor"
+%token PERCENT_PRINTER     "%printer"
 
 %token PERCENT_UNION       "%union {...}"
 
@@ -137,8 +138,8 @@
   PERCENT_EXPECT_RR      "%expect-rr"
   PERCENT_FILE_PREFIX     "%file-prefix"
   PERCENT_GLR_PARSER      "%glr-parser"
-  PERCENT_INITIAL_ACTION  "%initial-action {...}"
-  PERCENT_LEX_PARAM       "%lex-param {...}"
+  PERCENT_INITIAL_ACTION  "%initial-action"
+  PERCENT_LEX_PARAM       "%lex-param"
   PERCENT_LOCATIONS       "%locations"
   PERCENT_NAME_PREFIX     "%name-prefix"
   PERCENT_NO_DEFAULT_PREC "%no-default-prec"
@@ -146,7 +147,7 @@
   PERCENT_NONDETERMINISTIC_PARSER
                          "%nondeterministic-parser"
   PERCENT_OUTPUT          "%output"
-  PERCENT_PARSE_PARAM     "%parse-param {...}"
+  PERCENT_PARSE_PARAM     "%parse-param"
   PERCENT_PURE_PARSER     "%pure-parser"
   PERCENT_REQUIRE        "%require"
   PERCENT_SKELETON        "%skeleton"
@@ -167,23 +168,14 @@
 %token EPILOGUE        "epilogue"
 %token BRACED_CODE     "{...}"
 
-
 %type <chars> STRING string_content
-             "%destructor {...}"
-             "%initial-action {...}"
-             "%lex-param {...}"
-             "%parse-param {...}"
-             "%printer {...}"
+             "{...}"
              "%union {...}"
              PROLOGUE EPILOGUE
 %printer { fprintf (stderr, "\"%s\"", $$); }
              STRING string_content
 %printer { fprintf (stderr, "{\n%s\n}", $$); }
-             "%destructor {...}"
-             "%initial-action {...}"
-             "%lex-param {...}"
-             "%parse-param {...}"
-             "%printer {...}"
+             "{...}"
              "%union {...}"
              PROLOGUE EPILOGUE
 %type <uniqstr> TYPE
@@ -214,7 +206,8 @@
 
 declaration:
   grammar_declaration
-| PROLOGUE                                 { prologue_augment ($1, @1); }
+| PROLOGUE                         { prologue_augment (translate_code ($1, @1),
+                                                      @1); }
 | "%debug"                                 { debug_flag = true; }
 | "%define" string_content
     {
@@ -232,17 +225,17 @@
       nondeterministic_parser = true;
       glr_parser = true;
     }
-| "%initial-action {...}"
+| "%initial-action" "{...}"
     {
-      muscle_code_grow ("initial_action", $1, @1);
+      muscle_code_grow ("initial_action", translate_symbol_action ($2, @2), 
@2);
     }
-| "%lex-param {...}"                      { add_param ("lex_param", $1, @1); }
+| "%lex-param" "{...}"                    { add_param ("lex_param", $2, @2); }
 | "%locations"                             { locations_flag = true; }
 | "%name-prefix" "=" string_content        { spec_name_prefix = $3; }
 | "%no-lines"                              { no_lines_flag = true; }
 | "%nondeterministic-parser"              { nondeterministic_parser = true; }
 | "%output" "=" string_content             { spec_outfile = $3; }
-| "%parse-param {...}"                    { add_param ("parse_param", $1, @1); 
}
+| "%parse-param" "{...}"                  { add_param ("parse_param", $2, @2); 
}
 | "%pure-parser"                           { pure_parser = true; }
 | "%require" string_content                { version_check (&@2, $2); }
 | "%skeleton" string_content               { skeleton = $2; }
@@ -275,19 +268,21 @@
       typed = true;
       muscle_code_grow ("stype", body, @1);
     }
-| "%destructor {...}" symbols.1
+| "%destructor" "{...}" symbols.1
     {
       symbol_list *list;
-      for (list = $2; list; list = list->next)
-       symbol_destructor_set (list->sym, $1, @1);
-      symbol_list_free ($2);
+      const char *action = translate_symbol_action ($2, @2);
+      for (list = $3; list; list = list->next)
+       symbol_destructor_set (list->sym, action, @2);
+      symbol_list_free ($3);
     }
-| "%printer {...}" symbols.1
+| "%printer" "{...}" symbols.1
     {
       symbol_list *list;
-      for (list = $2; list; list = list->next)
-       symbol_printer_set (list->sym, $1, @1);
-      symbol_list_free ($2);
+      const char *action = translate_symbol_action ($2, @2);
+      for (list = $3; list; list = list->next)
+       symbol_printer_set (list->sym, action, @2);
+      symbol_list_free ($3);
     }
 | "%default-prec"
     {
@@ -346,7 +341,6 @@
 ;
 
 /* One or more nonterminals to be %typed. */
-
 symbols.1:
   symbol            { $$ = symbol_list_new ($1, @1); }
 | symbols.1 symbol  { $$ = symbol_list_prepend ($1, $2, @2); }
@@ -426,7 +420,9 @@
     { grammar_current_rule_begin (current_lhs, current_lhs_location); }
 | rhs symbol
     { grammar_current_rule_symbol_append ($2, @2); }
-| rhs action
+| rhs "{...}"
+    { grammar_current_rule_action_append (gram_last_string,
+                                         gram_last_braced_code_loc); }
 | rhs "%prec" symbol
     { grammar_current_rule_prec_set ($3, @3); }
 | rhs "%dprec" INT
@@ -440,23 +436,6 @@
 | string_as_id    { $$ = $1; }
 ;
 
-/* Handle the semantics of an action specially, with a mid-rule
-   action, so that grammar_current_rule_action_append is invoked
-   immediately after the braced code is read by the scanner.
-
-   This implementation relies on the LALR(1) parsing algorithm.
-   If grammar_current_rule_action_append were executed in a normal
-   action for this rule, then when the input grammar contains two
-   successive actions, the scanner would have to read both actions
-   before reducing this rule.  That wouldn't work, since the scanner
-   relies on all preceding input actions being processed by
-   grammar_current_rule_action_append before it scans the next
-   action.  */
-action:
-    { grammar_current_rule_action_append (last_string, last_braced_code_loc); }
-  BRACED_CODE
-;
-
 /* A string used as an ID: quote it.  */
 string_as_id:
   STRING
@@ -477,8 +456,8 @@
   /* Nothing.  */
 | "%%" EPILOGUE
     {
-      muscle_code_grow ("epilogue", $2, @2);
-      scanner_last_string_free ();
+      muscle_code_grow ("epilogue", translate_code ($2, @2), @2);
+      gram_scanner_last_string_free ();
     }
 ;
 
@@ -563,7 +542,7 @@
       free (name);
     }
 
-  scanner_last_string_free ();
+  gram_scanner_last_string_free ();
 }
 
 static void
Index: src/reader.c
===================================================================
RCS file: /cvsroot/bison/bison/src/reader.c,v
retrieving revision 1.254
diff -u -u -r1.254 reader.c
--- src/reader.c 14 May 2006 19:14:10 -0000 1.254
+++ src/reader.c 6 Jun 2006 16:38:55 -0000
@@ -22,6 +22,7 @@
 
 #include <config.h>
 #include "system.h"
+#include <assert.h>
 
 #include <quotearg.h>
 
@@ -34,6 +35,8 @@
 #include "reader.h"
 #include "symlist.h"
 #include "symtab.h"
+#include "scan-gram.h"
+#include "scan-code.h"
 
 static void check_and_convert_grammar (void);
 
@@ -77,6 +80,8 @@
     !typed ? &pre_prologue_obstack : &post_prologue_obstack;
 
   obstack_fgrow1 (oout, "]b4_syncline(%d, [[", loc.start.line);
+  /* FIXME: Protection of M4 characters missing here.  See
+     output.c:escaped_output.  */
   MUSCLE_OBSTACK_SGROW (oout,
                        quotearg_style (c_quoting_style, loc.start.file));
   obstack_sgrow (oout, "]])[\n");
@@ -398,9 +403,7 @@
 void
 grammar_current_rule_action_append (const char *action, location loc)
 {
-  /* There's no need to invoke grammar_midrule_action here, since the
-     scanner already did it if necessary.  */
-  current_rule->action = action;
+  current_rule->action = translate_rule_action (current_rule, action, loc);
   current_rule->action_location = loc;
 }
 
@@ -426,6 +429,7 @@
 
   while (p)
     {
+      int rule_length = 0;
       symbol *ruleprec = p->ruleprec;
       rules[ruleno].user_number = ruleno;
       rules[ruleno].number = ruleno;
@@ -440,18 +444,22 @@
       rules[ruleno].action = p->action;
       rules[ruleno].action_location = p->action_location;
 
-      p = p->next;
-      while (p && p->sym)
+      for (p = p->next; p && p->sym; p = p->next)
        {
+         ++rule_length;
+
+         /* Don't allow rule_length == INT_MAX, since that might
+            cause confusion with strtol if INT_MAX == LONG_MAX.  */
+         if (rule_length == INT_MAX)
+             fatal_at (rules[ruleno].location, _("rule is too long"));
+
          /* item_number = symbol_number.
             But the former needs to contain more: negative rule numbers. */
          ritem[itemno++] = symbol_number_as_item_number (p->sym->number);
          /* A rule gets by default the precedence and associativity
-            of the last token in it.  */
+            of its last token.  */
          if (p->sym->class == token_sym && default_prec)
            rules[ruleno].prec = p->sym;
-         if (p)
-           p = p->next;
        }
 
       /* If this rule has a %prec,
@@ -461,8 +469,11 @@
          rules[ruleno].precsym = ruleprec;
          rules[ruleno].prec = ruleprec;
        }
+      /* An item ends by the rule number (negated).  */
       ritem[itemno++] = rule_number_as_item_number (ruleno);
+      assert (itemno < ITEM_NUMBER_MAX);
       ++ruleno;
+      assert (ruleno < RULE_NUMBER_MAX);
 
       if (p)
        p = p->next;
@@ -511,7 +522,7 @@
 
   gram__flex_debug = trace_flag & trace_scan;
   gram_debug = trace_flag & trace_parse;
-  scanner_initialize ();
+  gram_scanner_initialize ();
   gram_parse ();
 
   if (! complaint_issued)
Index: src/reader.h
===================================================================
RCS file: /cvsroot/bison/bison/src/reader.h,v
retrieving revision 1.49
diff -u -u -r1.49 reader.h
--- src/reader.h 9 Mar 2006 23:23:11 -0000 1.49
+++ src/reader.h 6 Jun 2006 16:38:55 -0000
@@ -35,26 +35,6 @@
   uniqstr type;
 } merger_list;
 
-/* From the scanner.  */
-extern FILE *gram_in;
-extern int gram__flex_debug;
-extern boundary scanner_cursor;
-extern char *last_string;
-extern location last_braced_code_loc;
-extern int max_left_semantic_context;
-void scanner_initialize (void);
-void scanner_free (void);
-void scanner_last_string_free (void);
-
-/* These are declared by the scanner, but not used.  We put them here
-   to pacify "make syntax-check".  */
-extern FILE *gram_out;
-extern int gram_lineno;
-
-# define YY_DECL int gram_lex (YYSTYPE *val, location *loc)
-YY_DECL;
-
-
 /* From the parser.  */
 extern int gram_debug;
 int gram_parse (void);
Index: src/scan-action.l
===================================================================
RCS file: src/scan-action.l
diff -N src/scan-action.l
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/scan-action.l 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,866 @@
+/* Bison Grammar Scanner                             -*- C -*-
+
+   Copyright (C) 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+
+   This file is part of Bison, the GNU Compiler Compiler.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301  USA
+*/
+
+%option debug nodefault nounput noyywrap never-interactive
+%option prefix="gram_" outfile="lex.yy.c"
+
+%{
+#include "system.h"
+
+#include <mbswidth.h>
+#include <get-errno.h>
+#include <quote.h>
+
+#include "complain.h"
+#include "files.h"
+#include "getargs.h"
+#include "gram.h"
+#include "quotearg.h"
+#include "reader.h"
+#include "uniqstr.h"
+
+#define YY_USER_INIT                                   \
+  do                                                   \
+    {                                                  \
+      scanner_cursor.file = current_file;              \
+      scanner_cursor.line = 1;                         \
+      scanner_cursor.column = 1;                       \
+      code_start = scanner_cursor;                     \
+    }                                                  \
+  while (0)
+
+/* Location of scanner cursor.  */
+boundary scanner_cursor;
+
+static void adjust_location (location *, char const *, size_t);
+#define YY_USER_ACTION  adjust_location (loc, yytext, yyleng);
+
+static size_t no_cr_read (FILE *, char *, size_t);
+#define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size))
+
+/* Within well-formed rules, RULE_LENGTH is the number of values in
+   the current rule so far, which says where to find `$0' with respect
+   to the top of the stack.  It is not the same as the rule->length in
+   the case of mid rule actions.
+
+   Outside of well-formed rules, RULE_LENGTH has an undefined value.  */
+int rule_length;
+
+static void handle_dollar (int token_type, char *cp, location loc);
+static void handle_at (int token_type, char *cp, location loc);
+static void handle_syncline (char *args);
+static unsigned long int scan_integer (char const *p, int base, location loc);
+static int convert_ucn_to_byte (char const *hex_text);
+static void unexpected_eof (boundary, char const *);
+static void unexpected_newline (boundary, char const *);
+
+%}
+%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT
+%x SC_STRING SC_CHARACTER
+%x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
+%x SC_PRE_CODE SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE
+
+letter   [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
+id       {letter}({letter}|[0-9])*
+directive %{letter}({letter}|[0-9]|-)*
+int      [0-9]+
+
+/* POSIX says that a tag must be both an id and a C union member, but
+   historically almost any character is allowed in a tag.  We disallow
+   NUL and newline, as this simplifies our implementation.  */
+tag     [^\0\n>]+
+
+/* Zero or more instances of backslash-newline.  Following GCC, allow
+   white space between the backslash and the newline.  */
+splice  (\\[ \f\t\v]*\n)*
+
+%%
+%{
+  /* Nesting level of the current code in braces.  */
+  int braces_level IF_LINT (= 0);
+
+  /* Parent context state, when applicable.  */
+  int context_state IF_LINT (= 0);
+
+  /* Token type to return, when applicable.  */
+  int token_type IF_LINT (= 0);
+
+  /* Where containing code started, when applicable.  Its initial
+     value is relevant only when yylex is invoked in the SC_EPILOGUE
+     start condition.  */
+  boundary code_start = scanner_cursor;
+
+  /* Where containing comment or string or character literal started,
+     when applicable.  */
+  boundary token_start IF_LINT (= scanner_cursor);
+%}
+
+
+  /*-----------------------.
+  | Scanning white space.  |
+  `-----------------------*/
+
+<INITIAL>
+{
+  /* Comments and white space.  */
+  ","         warn_at (*loc, _("stray `,' treated as white space"));
+  [ \f\n\t\v]  |
+  "//".*       ;
+  "/*" {
+    token_start = loc->start;
+    context_state = YY_START;
+    BEGIN SC_YACC_COMMENT;
+  }
+
+  /* #line directives are not documented, and may be withdrawn or
+     modified in future versions of Bison.  */
+  ^"#line "{int}" \"".*"\"\n" {
+    handle_syncline (yytext + sizeof "#line " - 1);
+  }
+}
+
+
+  /*----------------------------.
+  | Scanning Bison directives.  |
+  `----------------------------*/
+<INITIAL>
+{
+
+  /* Code in between braces.  */
+  "{" {
+    STRING_GROW;
+    token_type = BRACED_CODE;
+    braces_level = 0;
+    code_start = loc->start;
+    BEGIN SC_BRACED_CODE;
+  }
+
+}
+
+
+  /*------------------------------------------------------------.
+  | Scanning a C comment.  The initial `/ *' is already eaten.  |
+  `------------------------------------------------------------*/
+
+<SC_COMMENT>
+{
+  "*"{splice}"/"  STRING_GROW; BEGIN context_state;
+  <<EOF>>        unexpected_eof (token_start, "*/"); BEGIN context_state;
+}
+
+
+  /*--------------------------------------------------------------.
+  | Scanning a line comment.  The initial `//' is already eaten.  |
+  `--------------------------------------------------------------*/
+
+<SC_LINE_COMMENT>
+{
+  "\n"          STRING_GROW; BEGIN context_state;
+  {splice}      STRING_GROW;
+  <<EOF>>       BEGIN context_state;
+}
+
+
+  /*------------------------------------------------.
+  | Scanning a Bison string, including its escapes. |
+  | The initial quote is already eaten.             |
+  `------------------------------------------------*/
+
+<SC_ESCAPED_STRING>
+{
+  "\"" {
+    STRING_FINISH;
+    loc->start = token_start;
+    val->chars = last_string;
+    rule_length++;
+    BEGIN INITIAL;
+    return STRING;
+  }
+  \n           unexpected_newline (token_start, "\""); BEGIN INITIAL;
+  <<EOF>>      unexpected_eof (token_start, "\"");     BEGIN INITIAL;
+}
+
+  /*----------------------------------------------------------.
+  | Scanning a Bison character literal, decoding its escapes. |
+  | The initial quote is already eaten.                              |
+  `----------------------------------------------------------*/
+
+<SC_ESCAPED_CHARACTER>
+{
+  "'" {
+    unsigned char last_string_1;
+    STRING_GROW;
+    STRING_FINISH;
+    loc->start = token_start;
+    val->symbol = symbol_get (quotearg_style (escape_quoting_style,
+                                             last_string),
+                             *loc);
+    symbol_class_set (val->symbol, token_sym, *loc);
+    last_string_1 = last_string[1];
+    symbol_user_token_number_set (val->symbol, last_string_1, *loc);
+    STRING_FREE;
+    rule_length++;
+    BEGIN INITIAL;
+    return ID;
+  }
+  \n           unexpected_newline (token_start, "'");  BEGIN INITIAL;
+  <<EOF>>      unexpected_eof (token_start, "'");      BEGIN INITIAL;
+}
+
+<SC_ESCAPED_CHARACTER,SC_ESCAPED_STRING>
+{
+  \0       complain_at (*loc, _("invalid null character"));
+}
+
+
+  /*----------------------------.
+  | Decode escaped characters.  |
+  `----------------------------*/
+
+<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>
+{
+  \\[0-7]{1,3} {
+    unsigned long int c = strtoul (yytext + 1, 0, 8);
+    if (UCHAR_MAX < c)
+      complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
+    else if (! c)
+      complain_at (*loc, _("invalid null character: %s"), quote (yytext));
+    else
+      obstack_1grow (&obstack_for_string, c);
+  }
+
+  \\x[0-9abcdefABCDEF]+ {
+    unsigned long int c;
+    set_errno (0);
+    c = strtoul (yytext + 2, 0, 16);
+    if (UCHAR_MAX < c || get_errno ())
+      complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
+    else if (! c)
+      complain_at (*loc, _("invalid null character: %s"), quote (yytext));
+    else
+      obstack_1grow (&obstack_for_string, c);
+  }
+
+  \\a  obstack_1grow (&obstack_for_string, '\a');
+  \\b  obstack_1grow (&obstack_for_string, '\b');
+  \\f  obstack_1grow (&obstack_for_string, '\f');
+  \\n  obstack_1grow (&obstack_for_string, '\n');
+  \\r  obstack_1grow (&obstack_for_string, '\r');
+  \\t  obstack_1grow (&obstack_for_string, '\t');
+  \\v  obstack_1grow (&obstack_for_string, '\v');
+
+  /* \\[\"\'?\\] would be shorter, but it confuses xgettext.  */
+  \\("\""|"'"|"?"|"\\")  obstack_1grow (&obstack_for_string, yytext[1]);
+
+  \\(u|U[0-9abcdefABCDEF]{4})[0-9abcdefABCDEF]{4} {
+    int c = convert_ucn_to_byte (yytext);
+    if (c < 0)
+      complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
+    else if (! c)
+      complain_at (*loc, _("invalid null character: %s"), quote (yytext));
+    else
+      obstack_1grow (&obstack_for_string, c);
+  }
+  \\(.|\n)     {
+    complain_at (*loc, _("unrecognized escape sequence: %s"), quote (yytext));
+    STRING_GROW;
+  }
+}
+
+  /*--------------------------------------------.
+  | Scanning user-code characters and strings.  |
+  `--------------------------------------------*/
+
+<SC_CHARACTER,SC_STRING>
+{
+  {splice}|address@hidden      STRING_GROW;
+}
+
+<SC_CHARACTER>
+{
+  "'"          STRING_GROW; BEGIN context_state;
+  \n           unexpected_newline (token_start, "'"); BEGIN context_state;
+  <<EOF>>      unexpected_eof (token_start, "'"); BEGIN context_state;
+}
+
+<SC_STRING>
+{
+  "\""         STRING_GROW; BEGIN context_state;
+  \n           unexpected_newline (token_start, "\""); BEGIN context_state;
+  <<EOF>>      unexpected_eof (token_start, "\""); BEGIN context_state;
+}
+
+
+  /*---------------------------------------------------.
+  | Strings, comments etc. can be found in user code.  |
+  `---------------------------------------------------*/
+
+<INITIAL>
+{
+  "'" {
+    STRING_GROW;
+    context_state = YY_START;
+    token_start = loc->start;
+    BEGIN SC_CHARACTER;
+  }
+  "\"" {
+    STRING_GROW;
+    context_state = YY_START;
+    token_start = loc->start;
+    BEGIN SC_STRING;
+  }
+  "/"{splice}"*" {
+    STRING_GROW;
+    context_state = YY_START;
+    token_start = loc->start;
+    BEGIN SC_COMMENT;
+  }
+  "/"{splice}"/" {
+    STRING_GROW;
+    context_state = YY_START;
+    BEGIN SC_LINE_COMMENT;
+  }
+}
+
+
+  /*---------------------------------------------------------------.
+  | Scanning some code in braces (%union and actions). The initial |
+  | "{" is already eaten.                                          |
+  `---------------------------------------------------------------*/
+
+<INITIAL>
+{
+  "{"|"<"{splice}"%"  STRING_GROW; braces_level++;
+  "%"{splice}">"      STRING_GROW; braces_level--;
+  "}" {
+    bool outer_brace = --braces_level < 0;
+
+    /* As an undocumented Bison extension, append `;' before the last
+       brace in braced code, so that the user code can omit trailing
+       `;'.  But do not append `;' if emulating Yacc, since Yacc does
+       not append one.
+
+       FIXME: Bison should warn if a semicolon seems to be necessary
+       here, and should omit the semicolon if it seems unnecessary
+       (e.g., after ';', '{', or '}', each followed by comments or
+       white space).  Such a warning shouldn't depend on --yacc; it
+       should depend on a new --pedantic option, which would cause
+       Bison to warn if it detects an extension to POSIX.  --pedantic
+       should also diagnose other Bison extensions like %yacc.
+       Perhaps there should also be a GCC-style --pedantic-errors
+       option, so that such warnings are diagnosed as errors.  */
+    if (outer_brace && token_type == BRACED_CODE && ! yacc_flag)
+      obstack_1grow (&obstack_for_string, ';');
+
+    obstack_1grow (&obstack_for_string, '}');
+
+    if (outer_brace)
+      {
+       STRING_FINISH;
+       rule_length++;
+       loc->start = code_start;
+       val->chars = last_string;
+       BEGIN INITIAL;
+       return token_type;
+      }
+  }
+
+  /* Tokenize `<<%' correctly (as `<<' `%') rather than incorrrectly
+     (as `<' `<%').  */
+  "<"{splice}"<"  STRING_GROW;
+
+  "$"("<"{tag}">")?(-?[0-9]+|"$")  handle_dollar (token_type, yytext, *loc);
+  "@"(-?[0-9]+|"$")               handle_at (token_type, yytext, *loc);
+
+  <<EOF>>  unexpected_eof (code_start, "}"); BEGIN INITIAL;
+}
+
+
+  /*--------------------------------------------------------------.
+  | Scanning some prologue: from "%{" (already scanned) to "%}".  |
+  `--------------------------------------------------------------*/
+
+<SC_PROLOGUE>
+{
+  "%}" {
+    STRING_FINISH;
+    loc->start = code_start;
+    val->chars = last_string;
+    BEGIN INITIAL;
+    return PROLOGUE;
+  }
+
+  <<EOF>>  unexpected_eof (code_start, "%}"); BEGIN INITIAL;
+}
+
+
+  /*---------------------------------------------------------------.
+  | Scanning the epilogue (everything after the second "%%", which |
+  | has already been eaten).                                       |
+  `---------------------------------------------------------------*/
+
+<SC_EPILOGUE>
+{
+  <<EOF>> {
+    STRING_FINISH;
+    loc->start = code_start;
+    val->chars = last_string;
+    BEGIN INITIAL;
+    return EPILOGUE;
+  }
+}
+
+
+  /*-----------------------------------------.
+  | Escape M4 quoting characters in C code.  |
+  `-----------------------------------------*/
+
+<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
+{
+  \$   obstack_sgrow (&obstack_for_string, "$][");
+  \@   obstack_sgrow (&obstack_for_string, "@@");
+  \[   obstack_sgrow (&obstack_for_string, "@{");
+  \]   obstack_sgrow (&obstack_for_string, "@}");
+}
+
+
+  /*-----------------------------------------------------.
+  | By default, grow the string obstack with the input.  |
+  `-----------------------------------------------------*/
+
+<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER,SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>.
     |
+<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>\n  
STRING_GROW;
+
+%%
+
+/* Keeps track of the maximum number of semantic values to the left of
+   a handle (those referenced by $0, $-1, etc.) are required by the
+   semantic actions of this grammar. */
+int max_left_semantic_context = 0;
+
+/* Set *LOC and adjust scanner cursor to account for token TOKEN of
+   size SIZE.  */
+
+static void
+adjust_location (location *loc, char const *token, size_t size)
+{
+  int line = scanner_cursor.line;
+  int column = scanner_cursor.column;
+  char const *p0 = token;
+  char const *p = token;
+  char const *lim = token + size;
+
+  loc->start = scanner_cursor;
+
+  for (p = token; p < lim; p++)
+    switch (*p)
+      {
+      case '\n':
+       line++;
+       column = 1;
+       p0 = p + 1;
+       break;
+
+      case '\t':
+       column += mbsnwidth (p0, p - p0, 0);
+       column += 8 - ((column - 1) & 7);
+       p0 = p + 1;
+       break;
+      }
+
+  scanner_cursor.line = line;
+  scanner_cursor.column = column + mbsnwidth (p0, p - p0, 0);
+
+  loc->end = scanner_cursor;
+}
+
+
+/* Read bytes from FP into buffer BUF of size SIZE.  Return the
+   number of bytes read.  Remove '\r' from input, treating \r\n
+   and isolated \r as \n.  */
+
+static size_t
+no_cr_read (FILE *fp, char *buf, size_t size)
+{
+  size_t bytes_read = fread (buf, 1, size, fp);
+  if (bytes_read)
+    {
+      char *w = memchr (buf, '\r', bytes_read);
+      if (w)
+       {
+         char const *r = ++w;
+         char const *lim = buf + bytes_read;
+
+         for (;;)
+           {
+             /* Found an '\r'.  Treat it like '\n', but ignore any
+                '\n' that immediately follows.  */
+             w[-1] = '\n';
+             if (r == lim)
+               {
+                 int ch = getc (fp);
+                 if (ch != '\n' && ungetc (ch, fp) != ch)
+                   break;
+               }
+             else if (*r == '\n')
+               r++;
+
+             /* Copy until the next '\r'.  */
+             do
+               {
+                 if (r == lim)
+                   return w - buf;
+               }
+             while ((*w++ = *r++) != '\r');
+           }
+
+         return w - buf;
+       }
+    }
+
+  return bytes_read;
+}
+
+
+/*------------------------------------------------------------------.
+| TEXT is pointing to a wannabee semantic value (i.e., a `$').      |
+|                                                                   |
+| Possible inputs: $[<TYPENAME>]($|integer)                         |
+|                                                                   |
+| Output to OBSTACK_FOR_STRING a reference to this semantic value.  |
+`------------------------------------------------------------------*/
+
+static inline bool
+handle_action_dollar (char *text, location loc)
+{
+  const char *type_name = NULL;
+  char *cp = text + 1;
+
+  if (! current_rule)
+    return false;
+
+  /* Get the type name if explicit. */
+  if (*cp == '<')
+    {
+      type_name = ++cp;
+      while (*cp != '>')
+       ++cp;
+      *cp = '\0';
+      ++cp;
+    }
+
+  if (*cp == '$')
+    {
+      if (!type_name)
+       type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
+      if (!type_name && typed)
+       complain_at (loc, _("$$ of `%s' has no declared type"),
+                    current_rule->sym->tag);
+      if (!type_name)
+       type_name = "";
+      obstack_fgrow1 (&obstack_for_string,
+                     "]b4_lhs_value([%s])[", type_name);
+    }
+  else
+    {
+      long int num;
+      set_errno (0);
+      num = strtol (cp, 0, 10);
+
+      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+       {
+         int n = num;
+         if (1-n > max_left_semantic_context)
+           max_left_semantic_context = 1-n;
+         if (!type_name && n > 0)
+           type_name = symbol_list_n_type_name_get (current_rule, loc, n);
+         if (!type_name && typed)
+           complain_at (loc, _("$%d of `%s' has no declared type"),
+                        n, current_rule->sym->tag);
+         if (!type_name)
+           type_name = "";
+         obstack_fgrow3 (&obstack_for_string,
+                         "]b4_rhs_value(%d, %d, [%s])[",
+                         rule_length, n, type_name);
+       }
+      else
+       complain_at (loc, _("integer out of range: %s"), quote (text));
+    }
+
+  return true;
+}
+
+
+/*----------------------------------------------------------------.
+| Map `$?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
+| (are we in an action?).                                         |
+`----------------------------------------------------------------*/
+
+static void
+handle_dollar (int token_type, char *text, location loc)
+{
+  switch (token_type)
+    {
+    case BRACED_CODE:
+      if (handle_action_dollar (text, loc))
+       return;
+      break;
+
+    case PERCENT_DESTRUCTOR:
+    case PERCENT_INITIAL_ACTION:
+    case PERCENT_PRINTER:
+      if (text[1] == '$')
+       {
+         obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
+         return;
+       }
+      break;
+
+    default:
+      break;
+    }
+
+  complain_at (loc, _("invalid value: %s"), quote (text));
+}
+
+
+/*------------------------------------------------------.
+| TEXT is a location token (i.e., a address@hidden').  Output to |
+| OBSTACK_FOR_STRING a reference to this location.      |
+`------------------------------------------------------*/
+
+static inline bool
+handle_action_at (char *text, location loc)
+{
+  char *cp = text + 1;
+  locations_flag = true;
+
+  if (! current_rule)
+    return false;
+
+  if (*cp == '$')
+    obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
+  else
+    {
+      long int num;
+      set_errno (0);
+      num = strtol (cp, 0, 10);
+
+      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+       {
+         int n = num;
+         obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
+                         rule_length, n);
+       }
+      else
+       complain_at (loc, _("integer out of range: %s"), quote (text));
+    }
+
+  return true;
+}
+
+
+/*----------------------------------------------------------------.
+| Map address@hidden' onto the proper M4 symbol, depending on its TOKEN_TYPE |
+| (are we in an action?).                                         |
+`----------------------------------------------------------------*/
+
+static void
+handle_at (int token_type, char *text, location loc)
+{
+  switch (token_type)
+    {
+    case BRACED_CODE:
+      handle_action_at (text, loc);
+      return;
+
+    case PERCENT_INITIAL_ACTION:
+    case PERCENT_DESTRUCTOR:
+    case PERCENT_PRINTER:
+      if (text[1] == '$')
+       {
+         obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
+         return;
+       }
+      break;
+
+    default:
+      break;
+    }
+
+  complain_at (loc, _("invalid value: %s"), quote (text));
+}
+
+
+/*------------------------------------------------------.
+| Scan NUMBER for a base-BASE integer at location LOC.  |
+`------------------------------------------------------*/
+
+static unsigned long int
+scan_integer (char const *number, int base, location loc)
+{
+  unsigned long int num;
+  set_errno (0);
+  num = strtoul (number, 0, base);
+  if (INT_MAX < num || get_errno ())
+    {
+      complain_at (loc, _("integer out of range: %s"), quote (number));
+      num = INT_MAX;
+    }
+  return num;
+}
+
+
+/*------------------------------------------------------------------.
+| Convert universal character name UCN to a single-byte character,  |
+| and return that character.  Return -1 if UCN does not correspond  |
+| to a single-byte character.                                      |
+`------------------------------------------------------------------*/
+
+static int
+convert_ucn_to_byte (char const *ucn)
+{
+  unsigned long int code = strtoul (ucn + 2, 0, 16);
+
+  /* FIXME: Currently we assume Unicode-compatible unibyte characters
+     on ASCII hosts (i.e., Latin-1 on hosts with 8-bit bytes).  On
+     non-ASCII hosts we support only the portable C character set.
+     These limitations should be removed once we add support for
+     multibyte characters.  */
+
+  if (UCHAR_MAX < code)
+    return -1;
+
+#if ! ('$' == 0x24 && '@' == 0x40 && '`' == 0x60 && '~' == 0x7e)
+  {
+    /* A non-ASCII host.  Use CODE to index into a table of the C
+       basic execution character set, which is guaranteed to exist on
+       all Standard C platforms.  This table also includes '$', '@',
+       and '`', which are not in the basic execution character set but
+       which are unibyte characters on all the platforms that we know
+       about.  */
+    static signed char const table[] =
+      {
+       '\0',   -1,   -1,   -1,   -1,   -1,   -1, '\a',
+       '\b', '\t', '\n', '\v', '\f', '\r',   -1,   -1,
+         -1,   -1,   -1,   -1,   -1,   -1,   -1,   -1,
+         -1,   -1,   -1,   -1,   -1,   -1,   -1,   -1,
+        ' ',  '!',  '"',  '#',  '$',  '%',  '&', '\'',
+        '(',  ')',  '*',  '+',  ',',  '-',  '.',  '/',
+        '0',  '1',  '2',  '3',  '4',  '5',  '6',  '7',
+        '8',  '9',  ':',  ';',  '<',  '=',  '>',  '?',
+        '@',  'A',  'B',  'C',  'D',  'E',  'F',  'G',
+        'H',  'I',  'J',  'K',  'L',  'M',  'N',  'O',
+        'P',  'Q',  'R',  'S',  'T',  'U',  'V',  'W',
+        'X',  'Y',  'Z',  '[', '\\',  ']',  '^',  '_',
+        '`',  'a',  'b',  'c',  'd',  'e',  'f',  'g',
+        'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
+        'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
+        'x',  'y',  'z',  '{',  '|',  '}',  '~'
+      };
+
+    code = code < sizeof table ? table[code] : -1;
+  }
+#endif
+
+  return code;
+}
+
+
+/*----------------------------------------------------------------.
+| Handle `#line INT "FILE"'.  ARGS has already skipped `#line '.  |
+`----------------------------------------------------------------*/
+
+static void
+handle_syncline (char *args)
+{
+  int lineno = strtol (args, &args, 10);
+  const char *file = NULL;
+  file = strchr (args, '"') + 1;
+  *strchr (file, '"') = 0;
+  scanner_cursor.file = current_file = uniqstr_new (file);
+  scanner_cursor.line = lineno;
+  scanner_cursor.column = 1;
+}
+
+
+/*----------------------------------------------------------------.
+| For a token or comment starting at START, report message MSGID, |
+| which should say that an end marker was found before           |
+| the expected TOKEN_END.                                        |
+`----------------------------------------------------------------*/
+
+static void
+unexpected_end (boundary start, char const *msgid, char const *token_end)
+{
+  location loc;
+  loc.start = start;
+  loc.end = scanner_cursor;
+  complain_at (loc, _(msgid), token_end);
+}
+
+
+/*------------------------------------------------------------------------.
+| Report an unexpected EOF in a token or comment starting at START.       |
+| An end of file was encountered and the expected TOKEN_END was missing.  |
+`------------------------------------------------------------------------*/
+
+static void
+unexpected_eof (boundary start, char const *token_end)
+{
+  unexpected_end (start, N_("missing `%s' at end of file"), token_end);
+}
+
+
+/*----------------------------------------.
+| Likewise, but for unexpected newlines.  |
+`----------------------------------------*/
+
+static void
+unexpected_newline (boundary start, char const *token_end)
+{
+  unexpected_end (start, N_("missing `%s' at end of line"), token_end);
+}
+
+
+/*-------------------------.
+| Initialize the scanner.  |
+`-------------------------*/
+
+void
+scanner_initialize (void)
+{
+  obstack_init (&obstack_for_string);
+}
+
+
+/*-----------------------------------------------.
+| Free all the memory allocated to the scanner.  |
+`-----------------------------------------------*/
+
+void
+scanner_free (void)
+{
+  obstack_free (&obstack_for_string, 0);
+  /* Reclaim Flex's buffers.  */
+  yy_delete_buffer (YY_CURRENT_BUFFER);
+}
Index: src/scan-code-c.c
===================================================================
RCS file: src/scan-code-c.c
diff -N src/scan-code-c.c
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/scan-code-c.c 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,2 @@
+#include <config.h>
+#include "scan-code.c"
Index: src/scan-code.h
===================================================================
RCS file: src/scan-code.h
diff -N src/scan-code.h
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/scan-code.h 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,47 @@
+/* Bison Action Scanner
+
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+   This file is part of Bison, the GNU Compiler Compiler.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301  USA
+*/
+
+#ifndef SCAN_CODE_H_
+# define SCAN_CODE_H_
+
+# include "location.h"
+# include "symlist.h"
+
+/* Keeps track of the maximum number of semantic values to the left of
+   a handle (those referenced by $0, $-1, etc.) are required by the
+   semantic actions of this grammar. */
+extern int max_left_semantic_context;
+
+void code_scanner_free (void);
+
+/* The action A contains $$, $1 etc. referring to the values
+   of the rule R. */
+const char *translate_rule_action (symbol_list *r, const char *a, location l);
+
+/* The action A refers to $$ and @$ only, referring to a symbol. */
+const char *translate_symbol_action (const char *a, location l);
+
+/* The action contains no special escapes, just protect M4 special
+   symbols.  */
+const char *translate_code (const char *a, location l);
+
+#endif /* !SCAN_CODE_H_ */
Index: src/scan-code.l
===================================================================
RCS file: src/scan-code.l
diff -N src/scan-code.l
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/scan-code.l 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,358 @@
+/* Bison Action Scanner                             -*- C -*-
+
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+   This file is part of Bison, the GNU Compiler Compiler.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301  USA
+*/
+
+%option debug nodefault nounput noyywrap never-interactive
+%option prefix="code_" outfile="lex.yy.c"
+
+%{
+/* Work around a bug in flex 2.5.31.  See Debian bug 333231
+   <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>.  */
+#undef code_wrap
+#define code_wrap() 1
+
+#define FLEX_PREFIX(Id) code_ ## Id
+#include "flex-scanner.h"
+#include "reader.h"
+#include "getargs.h"
+#include <assert.h>
+#include <get-errno.h>
+#include <quote.h>
+
+#include "scan-code.h"
+
+/* The current calling start condition: SC_RULE_ACTION or
+   SC_SYMBOL_ACTION. */
+# define YY_DECL const char *code_lex (int sc_context)
+YY_DECL;
+
+#define YY_USER_ACTION  location_compute (loc, &loc->end, yytext, yyleng);
+
+static void handle_action_dollar (char *cp, location loc);
+static void handle_action_at (char *cp, location loc);
+static location the_location;
+static location *loc = &the_location;
+
+/* The rule being processed. */
+symbol_list *current_rule;
+%}
+ /* C and C++ comments in code. */
+%x SC_COMMENT SC_LINE_COMMENT
+ /* Strings and characters in code. */
+%x SC_STRING SC_CHARACTER
+ /* Whether in a rule or symbol action.  Specifies the translation
+    of $ and @.  */
+%x SC_RULE_ACTION SC_SYMBOL_ACTION
+
+
+/* POSIX says that a tag must be both an id and a C union member, but
+   historically almost any character is allowed in a tag.  We disallow
+   NUL and newline, as this simplifies our implementation.  */
+tag     [^\0\n>]+
+
+/* Zero or more instances of backslash-newline.  Following GCC, allow
+   white space between the backslash and the newline.  */
+splice  (\\[ \f\t\v]*\n)*
+
+%%
+
+%{
+  /* This scanner is special: it is invoked only once, henceforth
+     is expected to return only once.  This initialization is
+     therefore done once per action to translate. */
+  assert (sc_context == SC_SYMBOL_ACTION
+         || sc_context == SC_RULE_ACTION
+         || sc_context == INITIAL);
+  BEGIN sc_context;
+%}
+
+  /*------------------------------------------------------------.
+  | Scanning a C comment.  The initial `/ *' is already eaten.  |
+  `------------------------------------------------------------*/
+
+<SC_COMMENT>
+{
+  "*"{splice}"/"  STRING_GROW; BEGIN sc_context;
+}
+
+
+  /*--------------------------------------------------------------.
+  | Scanning a line comment.  The initial `//' is already eaten.  |
+  `--------------------------------------------------------------*/
+
+<SC_LINE_COMMENT>
+{
+  "\n"          STRING_GROW; BEGIN sc_context;
+  {splice}      STRING_GROW;
+}
+
+
+  /*--------------------------------------------.
+  | Scanning user-code characters and strings.  |
+  `--------------------------------------------*/
+
+<SC_CHARACTER,SC_STRING>
+{
+  {splice}|\\{splice}. STRING_GROW;
+}
+
+<SC_CHARACTER>
+{
+  "'"          STRING_GROW; BEGIN sc_context;
+}
+
+<SC_STRING>
+{
+  "\""         STRING_GROW; BEGIN sc_context;
+}
+
+
+<SC_RULE_ACTION,SC_SYMBOL_ACTION>{
+  "'" {
+    STRING_GROW;
+    BEGIN SC_CHARACTER;
+  }
+  "\"" {
+    STRING_GROW;
+    BEGIN SC_STRING;
+  }
+  "/"{splice}"*" {
+    STRING_GROW;
+    BEGIN SC_COMMENT;
+  }
+  "/"{splice}"/" {
+    STRING_GROW;
+    BEGIN SC_LINE_COMMENT;
+  }
+}
+
+<SC_RULE_ACTION>
+{
+  "$"("<"{tag}">")?(-?[0-9]+|"$")   handle_action_dollar (yytext, *loc);
+  "@"(-?[0-9]+|"$")                handle_action_at (yytext, *loc);
+
+  "$"  {
+    warn_at (*loc, _("stray `$'"));
+    obstack_sgrow (&obstack_for_string, "$][");
+  }
+  "@"  {
+    warn_at (*loc, _("stray `@'"));
+    obstack_sgrow (&obstack_for_string, "@@");
+  }
+}
+
+<SC_SYMBOL_ACTION>
+{
+  "$$"   obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
+  "@$"   obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
+}
+
+
+  /*-----------------------------------------.
+  | Escape M4 quoting characters in C code.  |
+  `-----------------------------------------*/
+
+<*>
+{
+  \$   obstack_sgrow (&obstack_for_string, "$][");
+  \@   obstack_sgrow (&obstack_for_string, "@@");
+  \[   obstack_sgrow (&obstack_for_string, "@{");
+  \]   obstack_sgrow (&obstack_for_string, "@}");
+}
+
+  /*-----------------------------------------------------.
+  | By default, grow the string obstack with the input.  |
+  `-----------------------------------------------------*/
+
+<*>.|\n        STRING_GROW;
+
+ /* End of processing. */
+<*><<EOF>>      {
+                   obstack_1grow (&obstack_for_string, '\0');
+                  return obstack_finish (&obstack_for_string);
+                 }
+
+%%
+
+/* Keeps track of the maximum number of semantic values to the left of
+   a handle (those referenced by $0, $-1, etc.) are required by the
+   semantic actions of this grammar. */
+int max_left_semantic_context = 0;
+
+
+/*------------------------------------------------------------------.
+| TEXT is pointing to a wannabee semantic value (i.e., a `$').      |
+|                                                                   |
+| Possible inputs: $[<TYPENAME>]($|integer)                         |
+|                                                                   |
+| Output to OBSTACK_FOR_STRING a reference to this semantic value.  |
+`------------------------------------------------------------------*/
+
+static void
+handle_action_dollar (char *text, location loc)
+{
+  const char *type_name = NULL;
+  char *cp = text + 1;
+  int rule_length = symbol_list_length (current_rule->next);
+
+  /* Get the type name if explicit. */
+  if (*cp == '<')
+    {
+      type_name = ++cp;
+      while (*cp != '>')
+       ++cp;
+      *cp = '\0';
+      ++cp;
+    }
+
+  if (*cp == '$')
+    {
+      if (!type_name)
+       type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
+      if (!type_name && typed)
+       complain_at (loc, _("$$ of `%s' has no declared type"),
+                    current_rule->sym->tag);
+      if (!type_name)
+       type_name = "";
+      obstack_fgrow1 (&obstack_for_string,
+                     "]b4_lhs_value([%s])[", type_name);
+      current_rule->used = true;
+    }
+  else
+    {
+      long int num;
+      set_errno (0);
+      num = strtol (cp, 0, 10);
+      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+       {
+         int n = num;
+         if (1-n > max_left_semantic_context)
+           max_left_semantic_context = 1-n;
+         if (!type_name && n > 0)
+           type_name = symbol_list_n_type_name_get (current_rule, loc, n);
+         if (!type_name && typed)
+           complain_at (loc, _("$%d of `%s' has no declared type"),
+                        n, current_rule->sym->tag);
+         if (!type_name)
+           type_name = "";
+         obstack_fgrow3 (&obstack_for_string,
+                         "]b4_rhs_value(%d, %d, [%s])[",
+                         rule_length, n, type_name);
+         symbol_list_n_used_set (current_rule, n, true);
+       }
+      else
+       complain_at (loc, _("integer out of range: %s"), quote (text));
+    }
+}
+
+
+/*------------------------------------------------------.
+| TEXT is a location token (i.e., a address@hidden').  Output to |
+| OBSTACK_FOR_STRING a reference to this location.      |
+`------------------------------------------------------*/
+
+static void
+handle_action_at (char *text, location loc)
+{
+  char *cp = text + 1;
+  int rule_length = symbol_list_length (current_rule->next);
+  locations_flag = true;
+
+  if (*cp == '$')
+    obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
+  else
+    {
+      long int num;
+      set_errno (0);
+      num = strtol (cp, 0, 10);
+
+      if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+       {
+         int n = num;
+         obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
+                         rule_length, n);
+       }
+      else
+       complain_at (loc, _("integer out of range: %s"), quote (text));
+    }
+}
+
+
+/*-------------------------.
+| Initialize the scanner.  |
+`-------------------------*/
+
+/* Translate the dollars and ats in \a a, whose location is l.
+   Depending on the \a sc_context (SC_RULE_ACTION, SC_SYMBOL_ACTION,
+   INITIAL), the processing is different.  */
+
+static const char *
+translate_action (int sc_context, const char *a, location l)
+{
+  const char *res;
+  static bool initialized = false;
+  if (!initialized)
+    {
+      obstack_init (&obstack_for_string);
+      /* The initial buffer, never used. */
+      yy_delete_buffer (YY_CURRENT_BUFFER);
+      yy_flex_debug = 0;
+      initialized = true;
+    }
+
+  loc->start = loc->end = l.start;
+  yy_switch_to_buffer (yy_scan_string (a));
+  res = code_lex (sc_context);
+  yy_delete_buffer (YY_CURRENT_BUFFER);
+
+  return res;
+}
+
+const char *
+translate_rule_action (symbol_list *r, const char *a, location l)
+{
+  current_rule = r;
+  return translate_action (SC_RULE_ACTION, a, l);
+}
+
+const char *
+translate_symbol_action (const char *a, location l)
+{
+  return translate_action (SC_SYMBOL_ACTION, a, l);
+}
+
+const char *
+translate_code (const char *a, location l)
+{
+  return translate_action (INITIAL, a, l);
+}
+
+/*-----------------------------------------------.
+| Free all the memory allocated to the scanner.  |
+`-----------------------------------------------*/
+
+void
+code_scanner_free (void)
+{
+  obstack_free (&obstack_for_string, 0);
+  /* Reclaim Flex's buffers.  */
+  yy_delete_buffer (YY_CURRENT_BUFFER);
+}
Index: src/scan-gram.h
===================================================================
RCS file: src/scan-gram.h
diff -N src/scan-gram.h
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/scan-gram.h 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,44 @@
+/* Bison Grammar Scanner
+
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+   This file is part of Bison, the GNU Compiler Compiler.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301  USA
+*/
+
+#ifndef SCAN_GRAM_H_
+# define SCAN_GRAM_H_
+
+/* From the scanner.  */
+extern FILE *gram_in;
+extern int gram__flex_debug;
+extern boundary gram_scanner_cursor;
+extern char *gram_last_string;
+extern location gram_last_braced_code_loc;
+void gram_scanner_initialize (void);
+void gram_scanner_free (void);
+void gram_scanner_last_string_free (void);
+
+/* These are declared by the scanner, but not used.  We put them here
+   to pacify "make syntax-check".  */
+extern FILE *gram_out;
+extern int gram_lineno;
+
+# define GRAM_LEX_DECL int gram_lex (YYSTYPE *val, location *loc)
+GRAM_LEX_DECL;
+
+#endif /* !SCAN_GRAM_H_ */
Index: src/scan-gram.l
===================================================================
RCS file: /cvsroot/bison/bison/src/scan-gram.l,v
retrieving revision 1.86
diff -u -u -r1.86 scan-gram.l
--- src/scan-gram.l 3 Apr 2006 13:50:10 -0000 1.86
+++ src/scan-gram.l 6 Jun 2006 16:38:55 -0000
@@ -29,112 +29,48 @@
 #undef gram_wrap
 #define gram_wrap() 1
 
-#include "system.h"
-
-#include <mbswidth.h>
-#include <quote.h>
+#define FLEX_PREFIX(Id) gram_ ## Id
+#include "flex-scanner.h"
 
 #include "complain.h"
 #include "files.h"
-#include "getargs.h"
+#include "getargs.h"    /* yacc_flag */
 #include "gram.h"
 #include "quotearg.h"
 #include "reader.h"
 #include "uniqstr.h"
 
+#include <mbswidth.h>
+#include <quote.h>
+
+#include "scan-gram.h"
+
+#define YY_DECL GRAM_LEX_DECL
+   
 #define YY_USER_INIT                                   \
-  do                                                   \
-    {                                                  \
-      scanner_cursor.file = current_file;              \
-      scanner_cursor.line = 1;                         \
-      scanner_cursor.column = 1;                       \
-      code_start = scanner_cursor;                     \
-    }                                                  \
-  while (0)
-
-/* Pacify "gcc -Wmissing-prototypes" when flex 2.5.31 is used.  */
-int gram_get_lineno (void);
-FILE *gram_get_in (void);
-FILE *gram_get_out (void);
-int gram_get_leng (void);
-char *gram_get_text (void);
-void gram_set_lineno (int);
-void gram_set_in (FILE *);
-void gram_set_out (FILE *);
-int gram_get_debug (void);
-void gram_set_debug (int);
-int gram_lex_destroy (void);
+   code_start = scanner_cursor = loc->start;           \
 
 /* Location of scanner cursor.  */
 boundary scanner_cursor;
 
-static void adjust_location (location *, char const *, size_t);
-#define YY_USER_ACTION  adjust_location (loc, yytext, yyleng);
+#define YY_USER_ACTION  location_compute (loc, &scanner_cursor, yytext, 
yyleng);
 
 static size_t no_cr_read (FILE *, char *, size_t);
 #define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size))
 
-
-/* OBSTACK_FOR_STRING -- Used to store all the characters that we need to
-   keep (to construct ID, STRINGS etc.).  Use the following macros to
-   use it.
-
-   Use STRING_GROW to append what has just been matched, and
-   STRING_FINISH to end the string (it puts the ending 0).
-   STRING_FINISH also stores this string in LAST_STRING, which can be
-   used, and which is used by STRING_FREE to free the last string.  */
-
-static struct obstack obstack_for_string;
-
 /* A string representing the most recently saved token.  */
 char *last_string;
 
-/* The location of the most recently saved token, if it was a
-   BRACED_CODE token; otherwise, this has an unspecified value.  */
-location last_braced_code_loc;
-
-#define STRING_GROW   \
-  obstack_grow (&obstack_for_string, yytext, yyleng)
-
-#define STRING_FINISH                                  \
-  do {                                                 \
-    obstack_1grow (&obstack_for_string, '\0');         \
-    last_string = obstack_finish (&obstack_for_string);        \
-  } while (0)
-
-#define STRING_FREE \
-  obstack_free (&obstack_for_string, last_string)
-
 void
-scanner_last_string_free (void)
+gram_scanner_last_string_free (void)
 {
   STRING_FREE;
 }
 
-/* Within well-formed rules, RULE_LENGTH is the number of values in
-   the current rule so far, which says where to find `$0' with respect
-   to the top of the stack.  It is not the same as the rule->length in
-   the case of mid rule actions.
-
-   Outside of well-formed rules, RULE_LENGTH has an undefined value.  */
-static int rule_length;
-
-static void rule_length_overflow (location) __attribute__ ((__noreturn__));
-
-/* Increment the rule length by one, checking for overflow.  */
-static inline void
-increment_rule_length (location loc)
-{
-  rule_length++;
-
-  /* Don't allow rule_length == INT_MAX, since that might cause
-     confusion with strtol if INT_MAX == LONG_MAX.  */
-  if (rule_length == INT_MAX)
-    rule_length_overflow (loc);
-}
+/* The location of the most recently saved token, if it was a
+   BRACED_CODE token; otherwise, this has an unspecified value.  */
+location gram_last_braced_code_loc;
 
-static void handle_dollar (int token_type, char *cp, location loc);
-static void handle_at (int token_type, char *cp, location loc);
 static void handle_syncline (char *, location);
 static unsigned long int scan_integer (char const *p, int base, location loc);
 static int convert_ucn_to_byte (char const *hex_text);
@@ -142,11 +78,26 @@
 static void unexpected_newline (boundary, char const *);
 
 %}
-%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT
-%x SC_STRING SC_CHARACTER
-%x SC_AFTER_IDENTIFIER
+ /* A C-like comment in directives/rules. */
+%x SC_YACC_COMMENT
+ /* Strings and characters in directives/rules. */
 %x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
-%x SC_PRE_CODE SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE
+ /* A identifier was just read in directives/rules.  Special state
+    to capture the sequence `identifier :'. */
+%x SC_AFTER_IDENTIFIER
+ /* A keyword that should be followed by some code was read (e.g.
+    %printer). */
+%x SC_PRE_CODE
+
+ /* Three types of user code:
+    - prologue (code between `%{' `%}' in the first section, before %%);
+    - actions, printers, union, etc, (between braced in the middle section);
+    - epilogue (everything after the second %%). */
+%x SC_PROLOGUE SC_BRACED_CODE SC_EPILOGUE
+ /* C and C++ comments in code. */
+%x SC_COMMENT SC_LINE_COMMENT
+ /* Strings and characters in code. */
+%x SC_STRING SC_CHARACTER
 
 letter   [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
 id       {letter}({letter}|[0-9])*
@@ -221,17 +172,17 @@
   "%default"[-_]"prec"    return PERCENT_DEFAULT_PREC;
   "%define"               return PERCENT_DEFINE;
   "%defines"              return PERCENT_DEFINES;
-  "%destructor"                  token_type = PERCENT_DESTRUCTOR; BEGIN 
SC_PRE_CODE;
+  "%destructor"                  /* FIXME: Remove once %union handled 
differently.  */ token_type = BRACED_CODE; return PERCENT_DESTRUCTOR;
   "%dprec"               return PERCENT_DPREC;
   "%error"[-_]"verbose"   return PERCENT_ERROR_VERBOSE;
   "%expect"               return PERCENT_EXPECT;
   "%expect"[-_]"rr"      return PERCENT_EXPECT_RR;
   "%file-prefix"          return PERCENT_FILE_PREFIX;
   "%fixed"[-_]"output"[-_]"files"   return PERCENT_YACC;
-  "%initial-action"       token_type = PERCENT_INITIAL_ACTION; BEGIN 
SC_PRE_CODE;
+  "%initial-action"       /* FIXME: Remove once %union handled differently.  
*/ token_type = BRACED_CODE; return PERCENT_INITIAL_ACTION;
   "%glr-parser"           return PERCENT_GLR_PARSER;
   "%left"                 return PERCENT_LEFT;
-  "%lex-param"           token_type = PERCENT_LEX_PARAM; BEGIN SC_PRE_CODE;
+  "%lex-param"           /* FIXME: Remove once %union handled differently.  */ 
token_type = BRACED_CODE; return PERCENT_LEX_PARAM;
   "%locations"            return PERCENT_LOCATIONS;
   "%merge"               return PERCENT_MERGE;
   "%name"[-_]"prefix"     return PERCENT_NAME_PREFIX;
@@ -241,9 +192,9 @@
   "%nondeterministic-parser"   return PERCENT_NONDETERMINISTIC_PARSER;
   "%nterm"                return PERCENT_NTERM;
   "%output"               return PERCENT_OUTPUT;
-  "%parse-param"         token_type = PERCENT_PARSE_PARAM; BEGIN SC_PRE_CODE;
-  "%prec"                 rule_length--; return PERCENT_PREC;
-  "%printer"              token_type = PERCENT_PRINTER; BEGIN SC_PRE_CODE;
+  "%parse-param"         /* FIXME: Remove once %union handled differently.  */ 
token_type = BRACED_CODE; return PERCENT_PARSE_PARAM;
+  "%prec"                 return PERCENT_PREC;
+  "%printer"              /* FIXME: Remove once %union handled differently.  
*/ token_type = BRACED_CODE; return PERCENT_PRINTER;
   "%pure"[-_]"parser"     return PERCENT_PURE_PARSER;
   "%require"              return PERCENT_REQUIRE;
   "%right"                return PERCENT_RIGHT;
@@ -262,13 +213,12 @@
   }
 
   "="                     return EQUAL;
-  "|"                     rule_length = 0; return PIPE;
+  "|"                     return PIPE;
   ";"                     return SEMICOLON;
 
   {id} {
     val->symbol = symbol_get (yytext, *loc);
     id_loc = *loc;
-    increment_rule_length (*loc);
     BEGIN SC_AFTER_IDENTIFIER;
   }
 
@@ -335,7 +285,6 @@
 <SC_AFTER_IDENTIFIER>
 {
   ":" {
-    rule_length = 0;
     *loc = id_loc;
     BEGIN INITIAL;
     return ID_COLON;
@@ -401,7 +350,6 @@
     STRING_FINISH;
     loc->start = token_start;
     val->chars = last_string;
-    increment_rule_length (*loc);
     BEGIN INITIAL;
     return STRING;
   }
@@ -428,7 +376,6 @@
     last_string_1 = last_string[1];
     symbol_user_token_number_set (val->symbol, last_string_1, *loc);
     STRING_FREE;
-    increment_rule_length (*loc);
     BEGIN INITIAL;
     return ID;
   }
@@ -501,7 +448,7 @@
 
 <SC_CHARACTER,SC_STRING>
 {
-  {splice}|address@hidden      STRING_GROW;
+  {splice}|\\{splice}[^\n\[\]] STRING_GROW;
 }
 
 <SC_CHARACTER>
@@ -622,8 +569,7 @@
        STRING_FINISH;
        loc->start = code_start;
        val->chars = last_string;
-       increment_rule_length (*loc);
-       last_braced_code_loc = *loc;
+       gram_last_braced_code_loc = *loc;
        BEGIN INITIAL;
        return token_type;
       }
@@ -633,18 +579,6 @@
      (as `<' `<%').  */
   "<"{splice}"<"  STRING_GROW;
 
-  "$"("<"{tag}">")?(-?[0-9]+|"$")  handle_dollar (token_type, yytext, *loc);
-  "@"(-?[0-9]+|"$")               handle_at (token_type, yytext, *loc);
-
-  "$"  {
-    warn_at (*loc, _("stray `$'"));
-    obstack_sgrow (&obstack_for_string, "$][");
-  }
-  "@"  {
-    warn_at (*loc, _("stray `@'"));
-    obstack_sgrow (&obstack_for_string, "@@");
-  }
-
   <<EOF>>  unexpected_eof (code_start, "}"); BEGIN INITIAL;
 }
 
@@ -684,19 +618,6 @@
 }
 
 
-  /*-----------------------------------------.
-  | Escape M4 quoting characters in C code.  |
-  `-----------------------------------------*/
-
-<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
-{
-  \$   obstack_sgrow (&obstack_for_string, "$][");
-  \@   obstack_sgrow (&obstack_for_string, "@@");
-  \[   obstack_sgrow (&obstack_for_string, "@{");
-  \]   obstack_sgrow (&obstack_for_string, "@}");
-}
-
-
   /*-----------------------------------------------------.
   | By default, grow the string obstack with the input.  |
   `-----------------------------------------------------*/
@@ -706,79 +627,6 @@
 
 %%
 
-/* Keeps track of the maximum number of semantic values to the left of
-   a handle (those referenced by $0, $-1, etc.) are required by the
-   semantic actions of this grammar. */
-int max_left_semantic_context = 0;
-
-/* If BUF is null, add BUFSIZE (which in this case must be less than
-   INT_MAX) to COLUMN; otherwise, add mbsnwidth (BUF, BUFSIZE, 0) to
-   COLUMN.  If an overflow occurs, or might occur but is undetectable,
-   return INT_MAX.  Assume COLUMN is nonnegative.  */
-
-static inline int
-add_column_width (int column, char const *buf, size_t bufsize)
-{
-  size_t width;
-  unsigned int remaining_columns = INT_MAX - column;
-
-  if (buf)
-    {
-      if (INT_MAX / 2 <= bufsize)
-       return INT_MAX;
-      width = mbsnwidth (buf, bufsize, 0);
-    }
-  else
-    width = bufsize;
-
-  return width <= remaining_columns ? column + width : INT_MAX;
-}
-
-/* Set *LOC and adjust scanner cursor to account for token TOKEN of
-   size SIZE.  */
-
-static void
-adjust_location (location *loc, char const *token, size_t size)
-{
-  int line = scanner_cursor.line;
-  int column = scanner_cursor.column;
-  char const *p0 = token;
-  char const *p = token;
-  char const *lim = token + size;
-
-  loc->start = scanner_cursor;
-
-  for (p = token; p < lim; p++)
-    switch (*p)
-      {
-      case '\n':
-       line += line < INT_MAX;
-       column = 1;
-       p0 = p + 1;
-       break;
-
-      case '\t':
-       column = add_column_width (column, p0, p - p0);
-       column = add_column_width (column, NULL, 8 - ((column - 1) & 7));
-       p0 = p + 1;
-       break;
-
-      default:
-       break;
-      }
-
-  scanner_cursor.line = line;
-  scanner_cursor.column = column = add_column_width (column, p0, p - p0);
-
-  loc->end = scanner_cursor;
-
-  if (line == INT_MAX && loc->start.line != INT_MAX)
-    warn_at (*loc, _("line number overflow"));
-  if (column == INT_MAX && loc->start.column != INT_MAX)
-    warn_at (*loc, _("column number overflow"));
-}
-
-
 /* Read bytes from FP into buffer BUF of size SIZE.  Return the
    number of bytes read.  Remove '\r' from input, treating \r\n
    and isolated \r as \n.  */
@@ -826,173 +674,6 @@
 }
 
 
-/*------------------------------------------------------------------.
-| TEXT is pointing to a wannabee semantic value (i.e., a `$').      |
-|                                                                   |
-| Possible inputs: $[<TYPENAME>]($|integer)                         |
-|                                                                   |
-| Output to OBSTACK_FOR_STRING a reference to this semantic value.  |
-`------------------------------------------------------------------*/
-
-static inline bool
-handle_action_dollar (char *text, location loc)
-{
-  const char *type_name = NULL;
-  char *cp = text + 1;
-
-  if (! current_rule)
-    return false;
-
-  /* Get the type name if explicit. */
-  if (*cp == '<')
-    {
-      type_name = ++cp;
-      while (*cp != '>')
-       ++cp;
-      *cp = '\0';
-      ++cp;
-    }
-
-  if (*cp == '$')
-    {
-      if (!type_name)
-       type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
-      if (!type_name && typed)
-       complain_at (loc, _("$$ of `%s' has no declared type"),
-                    current_rule->sym->tag);
-      if (!type_name)
-       type_name = "";
-      obstack_fgrow1 (&obstack_for_string,
-                     "]b4_lhs_value([%s])[", type_name);
-      current_rule->used = true;
-    }
-  else
-    {
-      long int num = strtol (cp, NULL, 10);
-
-      if (1 - INT_MAX + rule_length <= num && num <= rule_length)
-       {
-         int n = num;
-         if (max_left_semantic_context < 1 - n)
-           max_left_semantic_context = 1 - n;
-         if (!type_name && 0 < n)
-           type_name = symbol_list_n_type_name_get (current_rule, loc, n);
-         if (!type_name && typed)
-           complain_at (loc, _("$%d of `%s' has no declared type"),
-                        n, current_rule->sym->tag);
-         if (!type_name)
-           type_name = "";
-         obstack_fgrow3 (&obstack_for_string,
-                         "]b4_rhs_value(%d, %d, [%s])[",
-                         rule_length, n, type_name);
-         symbol_list_n_used_set (current_rule, n, true);
-       }
-      else
-       complain_at (loc, _("integer out of range: %s"), quote (text));
-    }
-
-  return true;
-}
-
-
-/*----------------------------------------------------------------.
-| Map `$?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
-| (are we in an action?).                                         |
-`----------------------------------------------------------------*/
-
-static void
-handle_dollar (int token_type, char *text, location loc)
-{
-  switch (token_type)
-    {
-    case BRACED_CODE:
-      if (handle_action_dollar (text, loc))
-       return;
-      break;
-
-    case PERCENT_DESTRUCTOR:
-    case PERCENT_INITIAL_ACTION:
-    case PERCENT_PRINTER:
-      if (text[1] == '$')
-       {
-         obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
-         return;
-       }
-      break;
-
-    default:
-      break;
-    }
-
-  complain_at (loc, _("invalid value: %s"), quote (text));
-}
-
-
-/*------------------------------------------------------.
-| TEXT is a location token (i.e., a address@hidden').  Output to |
-| OBSTACK_FOR_STRING a reference to this location.      |
-`------------------------------------------------------*/
-
-static inline bool
-handle_action_at (char *text, location loc)
-{
-  char *cp = text + 1;
-  locations_flag = true;
-
-  if (! current_rule)
-    return false;
-
-  if (*cp == '$')
-    obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
-  else
-    {
-      long int num = strtol (cp, NULL, 10);
-
-      if (1 - INT_MAX + rule_length <= num && num <= rule_length)
-       {
-         int n = num;
-         obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
-                         rule_length, n);
-       }
-      else
-       complain_at (loc, _("integer out of range: %s"), quote (text));
-    }
-
-  return true;
-}
-
-
-/*----------------------------------------------------------------.
-| Map address@hidden' onto the proper M4 symbol, depending on its TOKEN_TYPE |
-| (are we in an action?).                                         |
-`----------------------------------------------------------------*/
-
-static void
-handle_at (int token_type, char *text, location loc)
-{
-  switch (token_type)
-    {
-    case BRACED_CODE:
-      handle_action_at (text, loc);
-      return;
-
-    case PERCENT_INITIAL_ACTION:
-    case PERCENT_DESTRUCTOR:
-    case PERCENT_PRINTER:
-      if (text[1] == '$')
-       {
-         obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
-         return;
-       }
-      break;
-
-    default:
-      break;
-    }
-
-  complain_at (loc, _("invalid value: %s"), quote (text));
-}
-
 
 /*------------------------------------------------------.
 | Scan NUMBER for a base-BASE integer at location LOC.  |
@@ -1087,20 +768,8 @@
       warn_at (loc, _("line number overflow"));
       lineno = INT_MAX;
     }
-  scanner_cursor.file = current_file = uniqstr_new (file);
-  scanner_cursor.line = lineno;
-  scanner_cursor.column = 1;
-}
-
-
-/*---------------------------------.
-| Report a rule that is too long.  |
-`---------------------------------*/
-
-static void
-rule_length_overflow (location loc)
-{
-  fatal_at (loc, _("rule is too long"));
+  current_file = uniqstr_new (file);
+  boundary_set, (&scanner_cursor, current_file, lineno, 1);
 }
 
 
@@ -1148,7 +817,7 @@
 `-------------------------*/
 
 void
-scanner_initialize (void)
+gram_scanner_initialize (void)
 {
   obstack_init (&obstack_for_string);
 }
@@ -1159,7 +828,7 @@
 `-----------------------------------------------*/
 
 void
-scanner_free (void)
+gram_scanner_free (void)
 {
   obstack_free (&obstack_for_string, 0);
   /* Reclaim Flex's buffers.  */
Index: src/system.h
===================================================================
RCS file: /cvsroot/bison/bison/src/system.h,v
retrieving revision 1.76
diff -u -u -r1.76 system.h
--- src/system.h 22 Jan 2006 07:38:49 -0000 1.76
+++ src/system.h 6 Jun 2006 16:38:55 -0000
@@ -113,6 +113,8 @@
 # define ATTRIBUTE_UNUSED __attribute__ ((__unused__))
 #endif
 
+#define FUNCTION_PRINT() fprintf (stderr, "%s: ", __func__)
+
 /*------.
 | NLS.  |
 `------*/
Index: tests/input.at
===================================================================
RCS file: /cvsroot/bison/bison/tests/input.at,v
retrieving revision 1.43
diff -u -u -r1.43 input.at
--- tests/input.at 3 Apr 2006 13:50:10 -0000 1.43
+++ tests/input.at 6 Jun 2006 16:38:55 -0000
@@ -25,33 +25,17 @@
 ## Invalid $n.  ##
 ## ------------ ##
 
-AT_SETUP([Invalid dollar-n])
+AT_SETUP([Invalid \$n and @n])
 
 AT_DATA([input.y],
 [[%%
 exp: { $$ = $1 ; };
-]])
-
-AT_CHECK([bison input.y], [1], [],
-[[input.y:2.13-14: integer out of range: `$1'
-]])
-
-AT_CLEANUP
-
-
-## ------------ ##
-## Invalid @n.  ##
-## ------------ ##
-
-AT_SETUP([Invalid @n])
-
-AT_DATA([input.y],
-[[%%
 exp: { @$ = @1 ; };
 ]])
 
 AT_CHECK([bison input.y], [1], [],
-[[input.y:2.13-14: integer out of range: address@hidden'
+[[input.y:2.13-14: integer out of range: `$1'
+input.y:3.13-14: integer out of range: address@hidden'
 ]])
 
 AT_CLEANUP
@@ -200,11 +184,11 @@
 
 AT_DATA([input.y], [])
 AT_CHECK([bison input.y], [1], [],
-[[input.y:1.1: syntax error, unexpected end of file
+[[input.y:1.0: syntax error, unexpected end of file
 ]])
 
 
-AT_DATA([input.y], 
+AT_DATA([input.y],
 [{}
 ])
 AT_CHECK([bison input.y], [1], [],
Index: tests/regression.at
===================================================================
RCS file: /cvsroot/bison/bison/tests/regression.at,v
retrieving revision 1.98
diff -u -u -r1.98 regression.at
--- tests/regression.at 21 May 2006 04:48:47 -0000 1.98
+++ tests/regression.at 6 Jun 2006 16:38:55 -0000
@@ -346,9 +346,7 @@
 ]])
 
 AT_CHECK([bison input.y], [1], [],
-[[input.y:3.1: missing `{' in "%destructor {...}"
-input.y:4.1: missing `{' in "%initial-action {...}"
-input.y:4.1: syntax error, unexpected %initial-action {...}, expecting string 
or identifier
+[[input.y:3.1-15: syntax error, unexpected %initial-action, expecting {...}
 ]])
 
 AT_CLEANUP
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/01
- Re: Extract the action scanner from the grammar scanner, Akim Demaille <=
  - Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/07
  - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Paul Eggert, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
    - Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/08
Prev by Date: Re: out of date FSF page for Bison
Next by Date: Re: out of date FSF page for Bison
Previous by thread: Re: Extract the action scanner from the grammar scanner
Next by thread: Re: Extract the action scanner from the grammar scanner
Index(es):
- Date
- Thread