[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Extract the action scanner from the grammar scanner
From: |
Akim Demaille |
Subject: |
Re: Extract the action scanner from the grammar scanner |
Date: |
Tue, 06 Jun 2006 18:39:49 +0200 |
User-agent: |
Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) |
>>> "Akim" == Akim Demaille <address@hidden> writes:
> I have already referred to this change several times. Our current
> scanner is very complex and has subtle interactions with our parser.
> This is hard to maintain, and also prevents easy extensions of
> our grammar (which defeats the whole point of having moved to
> using bison for Bison).
> I was about to install this patch when I read Paul's message about
> 2.3 being out soon?
I'm installing the patch now. The work is almost done, but there
remains a couple of issues to address. One significant issue is the
fail of the calc++ sample, due to the fact that we are still blindly
adding a `;' at the end of user actions. Unfortunately this is
applied in a place where we are not having a user action, but rather a
specification of the extra arguments.
Of course I plan to solve this in the near future.
Also, we have to sort out the initial columns and lines, I discovered
that we changed our values. This is troublesome.
Index: ChangeLog
from Akim Demaille <address@hidden>
Extract the parsing of user actions from the grammar scanner.
As a consequence, the relation between the grammar scanner and
parser is much simpler. We can also split "composite tokens" back
into simple tokens.
* src/gram.h (ITEM_NUMBER_MAX, RULE_NUMBER_MAX): New.
* src/scan-gram.l (add_column_width, adjust_location): Move to and
rename as...
* src/location.h, src/location.c (add_column_width)
(location_compute): these.
Fix the column count: the initial column is 0.
(location_print): Be robust to ending column being 0.
* src/location.h (boundary_set): New.
* src/main.c: Adjust to scanner_free being renamed as
gram_scanner_free.
* src/output.c: Include scan-code.h.
* src/parse-gram.y: Include scan-gram.h and scan-code.h.
Use boundary_set.
(PERCENT_DESTRUCTOR, PERCENT_PRINTER, PERCENT_INITIAL_ACTION)
(PERCENT_LEX_PARAM, PERCENT_PARSE_PARAM): Remove the {...} part,
which is now, again, a separate token.
Adjust all dependencies.
Whereever actions with $ and @ are used, use translate_code.
(action): Remove this nonterminal which is now useless.
* src/reader.c: Include assert.h, scan-gram.h and scan-code.h.
(grammar_current_rule_action_append): Use translate_code.
(packgram): Bound check ruleno, itemno, and rule_length.
* src/reader.h (gram_in, gram__flex_debug, scanner_cursor)
(last_string, last_braced_code_loc, max_left_semantic_context)
(scanner_initialize, scanner_free, scanner_last_string_free)
(gram_out, gram_lineno, YY_DECL_): Move to...
* src/scan-gram.h: this new file.
(YY_DECL): Rename as...
(GRAM_DECL): this.
* src/scan-code.h, src/scan-code.l, src/scan-code-c.c: New.
* src/scan-gram.l (gram_get_lineno, gram_get_in, gram_get_out):
(gram_get_leng, gram_get_text, gram_set_lineno, gram_set_in):
(gram_set_out, gram_get_debug, gram_set_debug, gram_lex_destroy):
Move these declarations, and...
(obstack_for_string, STRING_GROW, STRING_FINISH, STRING_FREE):
these to...
* src/flex-scanner.h: this new file.
* src/scan-gram.l (rule_length, rule_length_overflow)
(increment_rule_length): Remove.
(last_braced_code_loc): Rename as...
(gram_last_braced_code_loc): this.
Adjust to the changes of the parser.
Move all the handling of $ and @ into...
* src/scan-code.l: here.
* src/scan-gram.l (handle_dollar, handle_at): Remove.
(handle_action_dollar, handle_action_at): Move to...
* src/scan-code.l: here.
* src/Makefile.am (bison_SOURCES): Add flex-scanner.h,
scan-code.h, scan-code-c.c, scan-gram.h.
(EXTRA_bison_SOURCES): Add scan-code.l.
(BUILT_SOURCES): Add scan-code.c.
(yacc): Be robust to white spaces.
* tests/conflicts.at, tests/input.at, tests/reduce.at,
* tests/regression.at: Adjust the column numbers.
* tests/regression.at: Adjust the error message.
$Id: ChangeLog,v 1.1494 2006/06/06 06:00:55 jdenny Exp $
Index: src/Makefile.am
===================================================================
RCS file: /cvsroot/bison/bison/src/Makefile.am,v
retrieving revision 1.68
diff -u -u -r1.68 Makefile.am
--- src/Makefile.am 6 Jun 2006 05:23:44 -0000 1.68
+++ src/Makefile.am 6 Jun 2006 16:38:55 -0000
@@ -1,4 +1,4 @@
-## Copyright (C) 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+## Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation,
Inc.
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
@@ -39,6 +39,7 @@
conflicts.c conflicts.h \
derives.c derives.h \
files.c files.h \
+ flex-scanner.h \
getargs.c getargs.h \
gram.c gram.h \
lalr.h lalr.c \
@@ -54,8 +55,9 @@
reduce.c reduce.h \
revision.c revision.h \
relation.c relation.h \
- scan-gram-c.c \
- scan-skel-c.c scan-skel.h \
+ scan-code.h scan-code-c.c \
+ scan-gram.h scan-gram-c.c \
+ scan-skel.h scan-skel-c.c \
state.c state.h \
symlist.c symlist.h \
symtab.c symtab.h \
@@ -65,15 +67,20 @@
vcg.c vcg.h \
vcg_defaults.h
-EXTRA_bison_SOURCES = scan-skel.l scan-gram.l
+EXTRA_bison_SOURCES = scan-code.l scan-skel.l scan-gram.l
-BUILT_SOURCES = revision.c scan-skel.c scan-gram.c parse-gram.c parse-gram.h
+BUILT_SOURCES = \
+parse-gram.c parse-gram.h \
+revision.c \
+scan-code.c \
+scan-skel.c \
+scan-gram.c \
MOSTLYCLEANFILES = yacc
yacc:
echo '#! /bin/sh' >$@
- echo 'exec $(bindir)/bison -y "$$@"' >>$@
+ echo "exec '$(bindir)/bison' -y \"address@hidden"" >>$@
chmod a+x $@
echo:
Index: src/gram.h
===================================================================
RCS file: /cvsroot/bison/bison/src/gram.h,v
retrieving revision 1.58
diff -u -u -r1.58 gram.h
--- src/gram.h 9 Nov 2005 15:48:05 -0000 1.58
+++ src/gram.h 6 Jun 2006 16:38:55 -0000
@@ -1,6 +1,6 @@
/* Data definitions for internal representation of Bison's input.
- Copyright (C) 1984, 1986, 1989, 1992, 2001, 2002, 2003, 2004, 2005
+ Copyright (C) 1984, 1986, 1989, 1992, 2001, 2002, 2003, 2004, 2005, 2006
Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler.
@@ -115,6 +115,7 @@
extern int nvars;
typedef int item_number;
+#define ITEM_NUMBER_MAX INT_MAX
extern item_number *ritem;
extern unsigned int nritems;
@@ -146,6 +147,7 @@
/* Rule numbers. */
typedef int rule_number;
+#define RULE_NUMBER_MAX INT_MAX
extern rule_number nrules;
static inline item_number
Index: src/location.c
===================================================================
RCS file: /cvsroot/bison/bison/src/location.c,v
retrieving revision 1.6
diff -u -u -r1.6 location.c
--- src/location.c 9 Dec 2005 23:51:26 -0000 1.6
+++ src/location.c 6 Jun 2006 16:38:55 -0000
@@ -1,6 +1,5 @@
/* Locations for Bison
-
- Copyright (C) 2002, 2005 Free Software Foundation, Inc.
+ Copyright (C) 2002, 2005, 2006 Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler.
@@ -28,11 +27,80 @@
location const empty_location;
+/* If BUF is null, add BUFSIZE (which in this case must be less than
+ INT_MAX) to COLUMN; otherwise, add mbsnwidth (BUF, BUFSIZE, 0) to
+ COLUMN. If an overflow occurs, or might occur but is undetectable,
+ return INT_MAX. Assume COLUMN is nonnegative. */
+
+static inline int
+add_column_width (int column, char const *buf, size_t bufsize)
+{
+ size_t width;
+ unsigned int remaining_columns = INT_MAX - column;
+
+ if (buf)
+ {
+ if (INT_MAX / 2 <= bufsize)
+ return INT_MAX;
+ width = mbsnwidth (buf, bufsize, 0);
+ }
+ else
+ width = bufsize;
+
+ return width <= remaining_columns ? column + width : INT_MAX;
+}
+
+/* Set *LOC and adjust scanner cursor to account for token TOKEN of
+ size SIZE. */
+
+void
+location_compute (location *loc, boundary *cur, char const *token, size_t size)
+{
+ int line = cur->line;
+ int column = cur->column;
+ char const *p0 = token;
+ char const *p = token;
+ char const *lim = token + size;
+
+ loc->start = *cur;
+
+ for (p = token; p < lim; p++)
+ switch (*p)
+ {
+ case '\n':
+ line += line < INT_MAX;
+ column = 1;
+ p0 = p + 1;
+ break;
+
+ case '\t':
+ column = add_column_width (column, p0, p - p0);
+ column = add_column_width (column, NULL, 8 - ((column - 1) & 7));
+ p0 = p + 1;
+ break;
+
+ default:
+ break;
+ }
+
+ cur->line = line;
+ cur->column = column = add_column_width (column, p0, p - p0);
+
+ loc->end = *cur;
+
+ if (line == INT_MAX && loc->start.line != INT_MAX)
+ warn_at (*loc, _("line number overflow"));
+ if (column == INT_MAX && loc->start.column != INT_MAX)
+ warn_at (*loc, _("column number overflow"));
+}
+
+
/* Output to OUT the location LOC.
Warning: it uses quotearg's slot 3. */
void
location_print (FILE *out, location loc)
{
+ int end_col = 0 < loc.end.column ? loc.end.column - 1 : 0;
fprintf (out, "%s:%d.%d",
quotearg_n_style (3, escape_quoting_style, loc.start.file),
loc.start.line, loc.start.column);
@@ -40,9 +108,9 @@
if (loc.start.file != loc.end.file)
fprintf (out, "-%s:%d.%d",
quotearg_n_style (3, escape_quoting_style, loc.end.file),
- loc.end.line, loc.end.column - 1);
+ loc.end.line, end_col);
else if (loc.start.line < loc.end.line)
- fprintf (out, "-%d.%d", loc.end.line, loc.end.column - 1);
- else if (loc.start.column < loc.end.column - 1)
- fprintf (out, "-%d", loc.end.column - 1);
+ fprintf (out, "-%d.%d", loc.end.line, end_col);
+ else if (loc.start.column < end_col)
+ fprintf (out, "-%d", end_col);
}
Index: src/location.h
===================================================================
RCS file: /cvsroot/bison/bison/src/location.h,v
retrieving revision 1.13
diff -u -u -r1.13 location.h
--- src/location.h 9 Mar 2006 23:23:11 -0000 1.13
+++ src/location.h 6 Jun 2006 16:38:55 -0000
@@ -40,6 +40,15 @@
} boundary;
+/* Set the position of \a a. */
+static inline void
+boundary_set (boundary *b, const char *f, int l, int c)
+{
+ b->file = f;
+ b->line = l;
+ b->column = c;
+}
+
/* Return nonzero if A and B are equal boundaries. */
static inline bool
equal_boundaries (boundary a, boundary b)
@@ -64,6 +73,11 @@
extern location const empty_location;
+/* Set *LOC and adjust scanner cursor to account for token TOKEN of
+ size SIZE. */
+void location_compute (location *loc,
+ boundary *cur, char const *token, size_t size);
+
void location_print (FILE *out, location loc);
#endif /* ! defined LOCATION_H_ */
Index: src/main.c
===================================================================
RCS file: /cvsroot/bison/bison/src/main.c,v
retrieving revision 1.85
diff -u -u -r1.85 main.c
--- src/main.c 9 Dec 2005 23:51:26 -0000 1.85
+++ src/main.c 6 Jun 2006 16:38:55 -0000
@@ -1,6 +1,7 @@
/* Top level entry point of Bison.
- Copyright (C) 1984, 1986, 1989, 1992, 1995, 2000, 2001, 2002, 2004, 2005
+ Copyright (C) 1984, 1986, 1989, 1992, 1995, 2000, 2001, 2002, 2004, 2005,
+ 2006
Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler.
@@ -169,7 +170,7 @@
/* The scanner memory cannot be released right after parsing, as it
contains things such as user actions, prologue, epilogue etc. */
- scanner_free ();
+ gram_scanner_free ();
muscle_free ();
uniqstrs_free ();
timevar_pop (TV_FREE);
Index: src/output.c
===================================================================
RCS file: /cvsroot/bison/bison/src/output.c,v
retrieving revision 1.247
diff -u -u -r1.247 output.c
--- src/output.c 14 May 2006 20:40:35 -0000 1.247
+++ src/output.c 6 Jun 2006 16:38:55 -0000
@@ -36,6 +36,7 @@
#include "muscle_tab.h"
#include "output.h"
#include "reader.h"
+#include "scan-code.h" /* max_left_semantic_context */
#include "scan-skel.h"
#include "symtab.h"
#include "tables.h"
Index: src/parse-gram.y
===================================================================
RCS file: /cvsroot/bison/bison/src/parse-gram.y,v
retrieving revision 1.74
diff -u -u -r1.74 parse-gram.y
--- src/parse-gram.y 14 May 2006 19:14:10 -0000 1.74
+++ src/parse-gram.y 6 Jun 2006 16:38:55 -0000
@@ -32,6 +32,8 @@
#include "quotearg.h"
#include "reader.h"
#include "symlist.h"
+#include "scan-gram.h"
+#include "scan-code.h"
#include "strverscmp.h"
#define YYLLOC_DEFAULT(Current, Rhs, N) (Current) = lloc_default (Rhs, N)
@@ -84,9 +86,8 @@
{
/* Bison's grammar can initial empty locations, hence a default
location is needed. */
- @$.start.file = @$.end.file = current_file;
- @$.start.line = @$.end.line = 1;
- @$.start.column = @$.end.column = 0;
+ boundary_set (&@$.start, current_file, 1, 0);
+ boundary_set (&@$.end, current_file, 1, 0);
}
/* Only NUMBERS have a value. */
@@ -109,8 +110,8 @@
%token PERCENT_NTERM "%nterm"
%token PERCENT_TYPE "%type"
-%token PERCENT_DESTRUCTOR "%destructor {...}"
-%token PERCENT_PRINTER "%printer {...}"
+%token PERCENT_DESTRUCTOR "%destructor"
+%token PERCENT_PRINTER "%printer"
%token PERCENT_UNION "%union {...}"
@@ -137,8 +138,8 @@
PERCENT_EXPECT_RR "%expect-rr"
PERCENT_FILE_PREFIX "%file-prefix"
PERCENT_GLR_PARSER "%glr-parser"
- PERCENT_INITIAL_ACTION "%initial-action {...}"
- PERCENT_LEX_PARAM "%lex-param {...}"
+ PERCENT_INITIAL_ACTION "%initial-action"
+ PERCENT_LEX_PARAM "%lex-param"
PERCENT_LOCATIONS "%locations"
PERCENT_NAME_PREFIX "%name-prefix"
PERCENT_NO_DEFAULT_PREC "%no-default-prec"
@@ -146,7 +147,7 @@
PERCENT_NONDETERMINISTIC_PARSER
"%nondeterministic-parser"
PERCENT_OUTPUT "%output"
- PERCENT_PARSE_PARAM "%parse-param {...}"
+ PERCENT_PARSE_PARAM "%parse-param"
PERCENT_PURE_PARSER "%pure-parser"
PERCENT_REQUIRE "%require"
PERCENT_SKELETON "%skeleton"
@@ -167,23 +168,14 @@
%token EPILOGUE "epilogue"
%token BRACED_CODE "{...}"
-
%type <chars> STRING string_content
- "%destructor {...}"
- "%initial-action {...}"
- "%lex-param {...}"
- "%parse-param {...}"
- "%printer {...}"
+ "{...}"
"%union {...}"
PROLOGUE EPILOGUE
%printer { fprintf (stderr, "\"%s\"", $$); }
STRING string_content
%printer { fprintf (stderr, "{\n%s\n}", $$); }
- "%destructor {...}"
- "%initial-action {...}"
- "%lex-param {...}"
- "%parse-param {...}"
- "%printer {...}"
+ "{...}"
"%union {...}"
PROLOGUE EPILOGUE
%type <uniqstr> TYPE
@@ -214,7 +206,8 @@
declaration:
grammar_declaration
-| PROLOGUE { prologue_augment ($1, @1); }
+| PROLOGUE { prologue_augment (translate_code ($1, @1),
+ @1); }
| "%debug" { debug_flag = true; }
| "%define" string_content
{
@@ -232,17 +225,17 @@
nondeterministic_parser = true;
glr_parser = true;
}
-| "%initial-action {...}"
+| "%initial-action" "{...}"
{
- muscle_code_grow ("initial_action", $1, @1);
+ muscle_code_grow ("initial_action", translate_symbol_action ($2, @2),
@2);
}
-| "%lex-param {...}" { add_param ("lex_param", $1, @1); }
+| "%lex-param" "{...}" { add_param ("lex_param", $2, @2); }
| "%locations" { locations_flag = true; }
| "%name-prefix" "=" string_content { spec_name_prefix = $3; }
| "%no-lines" { no_lines_flag = true; }
| "%nondeterministic-parser" { nondeterministic_parser = true; }
| "%output" "=" string_content { spec_outfile = $3; }
-| "%parse-param {...}" { add_param ("parse_param", $1, @1);
}
+| "%parse-param" "{...}" { add_param ("parse_param", $2, @2);
}
| "%pure-parser" { pure_parser = true; }
| "%require" string_content { version_check (&@2, $2); }
| "%skeleton" string_content { skeleton = $2; }
@@ -275,19 +268,21 @@
typed = true;
muscle_code_grow ("stype", body, @1);
}
-| "%destructor {...}" symbols.1
+| "%destructor" "{...}" symbols.1
{
symbol_list *list;
- for (list = $2; list; list = list->next)
- symbol_destructor_set (list->sym, $1, @1);
- symbol_list_free ($2);
+ const char *action = translate_symbol_action ($2, @2);
+ for (list = $3; list; list = list->next)
+ symbol_destructor_set (list->sym, action, @2);
+ symbol_list_free ($3);
}
-| "%printer {...}" symbols.1
+| "%printer" "{...}" symbols.1
{
symbol_list *list;
- for (list = $2; list; list = list->next)
- symbol_printer_set (list->sym, $1, @1);
- symbol_list_free ($2);
+ const char *action = translate_symbol_action ($2, @2);
+ for (list = $3; list; list = list->next)
+ symbol_printer_set (list->sym, action, @2);
+ symbol_list_free ($3);
}
| "%default-prec"
{
@@ -346,7 +341,6 @@
;
/* One or more nonterminals to be %typed. */
-
symbols.1:
symbol { $$ = symbol_list_new ($1, @1); }
| symbols.1 symbol { $$ = symbol_list_prepend ($1, $2, @2); }
@@ -426,7 +420,9 @@
{ grammar_current_rule_begin (current_lhs, current_lhs_location); }
| rhs symbol
{ grammar_current_rule_symbol_append ($2, @2); }
-| rhs action
+| rhs "{...}"
+ { grammar_current_rule_action_append (gram_last_string,
+ gram_last_braced_code_loc); }
| rhs "%prec" symbol
{ grammar_current_rule_prec_set ($3, @3); }
| rhs "%dprec" INT
@@ -440,23 +436,6 @@
| string_as_id { $$ = $1; }
;
-/* Handle the semantics of an action specially, with a mid-rule
- action, so that grammar_current_rule_action_append is invoked
- immediately after the braced code is read by the scanner.
-
- This implementation relies on the LALR(1) parsing algorithm.
- If grammar_current_rule_action_append were executed in a normal
- action for this rule, then when the input grammar contains two
- successive actions, the scanner would have to read both actions
- before reducing this rule. That wouldn't work, since the scanner
- relies on all preceding input actions being processed by
- grammar_current_rule_action_append before it scans the next
- action. */
-action:
- { grammar_current_rule_action_append (last_string, last_braced_code_loc); }
- BRACED_CODE
-;
-
/* A string used as an ID: quote it. */
string_as_id:
STRING
@@ -477,8 +456,8 @@
/* Nothing. */
| "%%" EPILOGUE
{
- muscle_code_grow ("epilogue", $2, @2);
- scanner_last_string_free ();
+ muscle_code_grow ("epilogue", translate_code ($2, @2), @2);
+ gram_scanner_last_string_free ();
}
;
@@ -563,7 +542,7 @@
free (name);
}
- scanner_last_string_free ();
+ gram_scanner_last_string_free ();
}
static void
Index: src/reader.c
===================================================================
RCS file: /cvsroot/bison/bison/src/reader.c,v
retrieving revision 1.254
diff -u -u -r1.254 reader.c
--- src/reader.c 14 May 2006 19:14:10 -0000 1.254
+++ src/reader.c 6 Jun 2006 16:38:55 -0000
@@ -22,6 +22,7 @@
#include <config.h>
#include "system.h"
+#include <assert.h>
#include <quotearg.h>
@@ -34,6 +35,8 @@
#include "reader.h"
#include "symlist.h"
#include "symtab.h"
+#include "scan-gram.h"
+#include "scan-code.h"
static void check_and_convert_grammar (void);
@@ -77,6 +80,8 @@
!typed ? &pre_prologue_obstack : &post_prologue_obstack;
obstack_fgrow1 (oout, "]b4_syncline(%d, [[", loc.start.line);
+ /* FIXME: Protection of M4 characters missing here. See
+ output.c:escaped_output. */
MUSCLE_OBSTACK_SGROW (oout,
quotearg_style (c_quoting_style, loc.start.file));
obstack_sgrow (oout, "]])[\n");
@@ -398,9 +403,7 @@
void
grammar_current_rule_action_append (const char *action, location loc)
{
- /* There's no need to invoke grammar_midrule_action here, since the
- scanner already did it if necessary. */
- current_rule->action = action;
+ current_rule->action = translate_rule_action (current_rule, action, loc);
current_rule->action_location = loc;
}
@@ -426,6 +429,7 @@
while (p)
{
+ int rule_length = 0;
symbol *ruleprec = p->ruleprec;
rules[ruleno].user_number = ruleno;
rules[ruleno].number = ruleno;
@@ -440,18 +444,22 @@
rules[ruleno].action = p->action;
rules[ruleno].action_location = p->action_location;
- p = p->next;
- while (p && p->sym)
+ for (p = p->next; p && p->sym; p = p->next)
{
+ ++rule_length;
+
+ /* Don't allow rule_length == INT_MAX, since that might
+ cause confusion with strtol if INT_MAX == LONG_MAX. */
+ if (rule_length == INT_MAX)
+ fatal_at (rules[ruleno].location, _("rule is too long"));
+
/* item_number = symbol_number.
But the former needs to contain more: negative rule numbers. */
ritem[itemno++] = symbol_number_as_item_number (p->sym->number);
/* A rule gets by default the precedence and associativity
- of the last token in it. */
+ of its last token. */
if (p->sym->class == token_sym && default_prec)
rules[ruleno].prec = p->sym;
- if (p)
- p = p->next;
}
/* If this rule has a %prec,
@@ -461,8 +469,11 @@
rules[ruleno].precsym = ruleprec;
rules[ruleno].prec = ruleprec;
}
+ /* An item ends by the rule number (negated). */
ritem[itemno++] = rule_number_as_item_number (ruleno);
+ assert (itemno < ITEM_NUMBER_MAX);
++ruleno;
+ assert (ruleno < RULE_NUMBER_MAX);
if (p)
p = p->next;
@@ -511,7 +522,7 @@
gram__flex_debug = trace_flag & trace_scan;
gram_debug = trace_flag & trace_parse;
- scanner_initialize ();
+ gram_scanner_initialize ();
gram_parse ();
if (! complaint_issued)
Index: src/reader.h
===================================================================
RCS file: /cvsroot/bison/bison/src/reader.h,v
retrieving revision 1.49
diff -u -u -r1.49 reader.h
--- src/reader.h 9 Mar 2006 23:23:11 -0000 1.49
+++ src/reader.h 6 Jun 2006 16:38:55 -0000
@@ -35,26 +35,6 @@
uniqstr type;
} merger_list;
-/* From the scanner. */
-extern FILE *gram_in;
-extern int gram__flex_debug;
-extern boundary scanner_cursor;
-extern char *last_string;
-extern location last_braced_code_loc;
-extern int max_left_semantic_context;
-void scanner_initialize (void);
-void scanner_free (void);
-void scanner_last_string_free (void);
-
-/* These are declared by the scanner, but not used. We put them here
- to pacify "make syntax-check". */
-extern FILE *gram_out;
-extern int gram_lineno;
-
-# define YY_DECL int gram_lex (YYSTYPE *val, location *loc)
-YY_DECL;
-
-
/* From the parser. */
extern int gram_debug;
int gram_parse (void);
Index: src/scan-action.l
===================================================================
RCS file: src/scan-action.l
diff -N src/scan-action.l
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/scan-action.l 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,866 @@
+/* Bison Grammar Scanner -*- C -*-
+
+ Copyright (C) 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+
+ This file is part of Bison, the GNU Compiler Compiler.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ 02110-1301 USA
+*/
+
+%option debug nodefault nounput noyywrap never-interactive
+%option prefix="gram_" outfile="lex.yy.c"
+
+%{
+#include "system.h"
+
+#include <mbswidth.h>
+#include <get-errno.h>
+#include <quote.h>
+
+#include "complain.h"
+#include "files.h"
+#include "getargs.h"
+#include "gram.h"
+#include "quotearg.h"
+#include "reader.h"
+#include "uniqstr.h"
+
+#define YY_USER_INIT \
+ do \
+ { \
+ scanner_cursor.file = current_file; \
+ scanner_cursor.line = 1; \
+ scanner_cursor.column = 1; \
+ code_start = scanner_cursor; \
+ } \
+ while (0)
+
+/* Location of scanner cursor. */
+boundary scanner_cursor;
+
+static void adjust_location (location *, char const *, size_t);
+#define YY_USER_ACTION adjust_location (loc, yytext, yyleng);
+
+static size_t no_cr_read (FILE *, char *, size_t);
+#define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size))
+
+/* Within well-formed rules, RULE_LENGTH is the number of values in
+ the current rule so far, which says where to find `$0' with respect
+ to the top of the stack. It is not the same as the rule->length in
+ the case of mid rule actions.
+
+ Outside of well-formed rules, RULE_LENGTH has an undefined value. */
+int rule_length;
+
+static void handle_dollar (int token_type, char *cp, location loc);
+static void handle_at (int token_type, char *cp, location loc);
+static void handle_syncline (char *args);
+static unsigned long int scan_integer (char const *p, int base, location loc);
+static int convert_ucn_to_byte (char const *hex_text);
+static void unexpected_eof (boundary, char const *);
+static void unexpected_newline (boundary, char const *);
+
+%}
+%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT
+%x SC_STRING SC_CHARACTER
+%x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
+%x SC_PRE_CODE SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE
+
+letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
+id {letter}({letter}|[0-9])*
+directive %{letter}({letter}|[0-9]|-)*
+int [0-9]+
+
+/* POSIX says that a tag must be both an id and a C union member, but
+ historically almost any character is allowed in a tag. We disallow
+ NUL and newline, as this simplifies our implementation. */
+tag [^\0\n>]+
+
+/* Zero or more instances of backslash-newline. Following GCC, allow
+ white space between the backslash and the newline. */
+splice (\\[ \f\t\v]*\n)*
+
+%%
+%{
+ /* Nesting level of the current code in braces. */
+ int braces_level IF_LINT (= 0);
+
+ /* Parent context state, when applicable. */
+ int context_state IF_LINT (= 0);
+
+ /* Token type to return, when applicable. */
+ int token_type IF_LINT (= 0);
+
+ /* Where containing code started, when applicable. Its initial
+ value is relevant only when yylex is invoked in the SC_EPILOGUE
+ start condition. */
+ boundary code_start = scanner_cursor;
+
+ /* Where containing comment or string or character literal started,
+ when applicable. */
+ boundary token_start IF_LINT (= scanner_cursor);
+%}
+
+
+ /*-----------------------.
+ | Scanning white space. |
+ `-----------------------*/
+
+<INITIAL>
+{
+ /* Comments and white space. */
+ "," warn_at (*loc, _("stray `,' treated as white space"));
+ [ \f\n\t\v] |
+ "//".* ;
+ "/*" {
+ token_start = loc->start;
+ context_state = YY_START;
+ BEGIN SC_YACC_COMMENT;
+ }
+
+ /* #line directives are not documented, and may be withdrawn or
+ modified in future versions of Bison. */
+ ^"#line "{int}" \"".*"\"\n" {
+ handle_syncline (yytext + sizeof "#line " - 1);
+ }
+}
+
+
+ /*----------------------------.
+ | Scanning Bison directives. |
+ `----------------------------*/
+<INITIAL>
+{
+
+ /* Code in between braces. */
+ "{" {
+ STRING_GROW;
+ token_type = BRACED_CODE;
+ braces_level = 0;
+ code_start = loc->start;
+ BEGIN SC_BRACED_CODE;
+ }
+
+}
+
+
+ /*------------------------------------------------------------.
+ | Scanning a C comment. The initial `/ *' is already eaten. |
+ `------------------------------------------------------------*/
+
+<SC_COMMENT>
+{
+ "*"{splice}"/" STRING_GROW; BEGIN context_state;
+ <<EOF>> unexpected_eof (token_start, "*/"); BEGIN context_state;
+}
+
+
+ /*--------------------------------------------------------------.
+ | Scanning a line comment. The initial `//' is already eaten. |
+ `--------------------------------------------------------------*/
+
+<SC_LINE_COMMENT>
+{
+ "\n" STRING_GROW; BEGIN context_state;
+ {splice} STRING_GROW;
+ <<EOF>> BEGIN context_state;
+}
+
+
+ /*------------------------------------------------.
+ | Scanning a Bison string, including its escapes. |
+ | The initial quote is already eaten. |
+ `------------------------------------------------*/
+
+<SC_ESCAPED_STRING>
+{
+ "\"" {
+ STRING_FINISH;
+ loc->start = token_start;
+ val->chars = last_string;
+ rule_length++;
+ BEGIN INITIAL;
+ return STRING;
+ }
+ \n unexpected_newline (token_start, "\""); BEGIN INITIAL;
+ <<EOF>> unexpected_eof (token_start, "\""); BEGIN INITIAL;
+}
+
+ /*----------------------------------------------------------.
+ | Scanning a Bison character literal, decoding its escapes. |
+ | The initial quote is already eaten. |
+ `----------------------------------------------------------*/
+
+<SC_ESCAPED_CHARACTER>
+{
+ "'" {
+ unsigned char last_string_1;
+ STRING_GROW;
+ STRING_FINISH;
+ loc->start = token_start;
+ val->symbol = symbol_get (quotearg_style (escape_quoting_style,
+ last_string),
+ *loc);
+ symbol_class_set (val->symbol, token_sym, *loc);
+ last_string_1 = last_string[1];
+ symbol_user_token_number_set (val->symbol, last_string_1, *loc);
+ STRING_FREE;
+ rule_length++;
+ BEGIN INITIAL;
+ return ID;
+ }
+ \n unexpected_newline (token_start, "'"); BEGIN INITIAL;
+ <<EOF>> unexpected_eof (token_start, "'"); BEGIN INITIAL;
+}
+
+<SC_ESCAPED_CHARACTER,SC_ESCAPED_STRING>
+{
+ \0 complain_at (*loc, _("invalid null character"));
+}
+
+
+ /*----------------------------.
+ | Decode escaped characters. |
+ `----------------------------*/
+
+<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>
+{
+ \\[0-7]{1,3} {
+ unsigned long int c = strtoul (yytext + 1, 0, 8);
+ if (UCHAR_MAX < c)
+ complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
+ else if (! c)
+ complain_at (*loc, _("invalid null character: %s"), quote (yytext));
+ else
+ obstack_1grow (&obstack_for_string, c);
+ }
+
+ \\x[0-9abcdefABCDEF]+ {
+ unsigned long int c;
+ set_errno (0);
+ c = strtoul (yytext + 2, 0, 16);
+ if (UCHAR_MAX < c || get_errno ())
+ complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
+ else if (! c)
+ complain_at (*loc, _("invalid null character: %s"), quote (yytext));
+ else
+ obstack_1grow (&obstack_for_string, c);
+ }
+
+ \\a obstack_1grow (&obstack_for_string, '\a');
+ \\b obstack_1grow (&obstack_for_string, '\b');
+ \\f obstack_1grow (&obstack_for_string, '\f');
+ \\n obstack_1grow (&obstack_for_string, '\n');
+ \\r obstack_1grow (&obstack_for_string, '\r');
+ \\t obstack_1grow (&obstack_for_string, '\t');
+ \\v obstack_1grow (&obstack_for_string, '\v');
+
+ /* \\[\"\'?\\] would be shorter, but it confuses xgettext. */
+ \\("\""|"'"|"?"|"\\") obstack_1grow (&obstack_for_string, yytext[1]);
+
+ \\(u|U[0-9abcdefABCDEF]{4})[0-9abcdefABCDEF]{4} {
+ int c = convert_ucn_to_byte (yytext);
+ if (c < 0)
+ complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
+ else if (! c)
+ complain_at (*loc, _("invalid null character: %s"), quote (yytext));
+ else
+ obstack_1grow (&obstack_for_string, c);
+ }
+ \\(.|\n) {
+ complain_at (*loc, _("unrecognized escape sequence: %s"), quote (yytext));
+ STRING_GROW;
+ }
+}
+
+ /*--------------------------------------------.
+ | Scanning user-code characters and strings. |
+ `--------------------------------------------*/
+
+<SC_CHARACTER,SC_STRING>
+{
+ {splice}|address@hidden STRING_GROW;
+}
+
+<SC_CHARACTER>
+{
+ "'" STRING_GROW; BEGIN context_state;
+ \n unexpected_newline (token_start, "'"); BEGIN context_state;
+ <<EOF>> unexpected_eof (token_start, "'"); BEGIN context_state;
+}
+
+<SC_STRING>
+{
+ "\"" STRING_GROW; BEGIN context_state;
+ \n unexpected_newline (token_start, "\""); BEGIN context_state;
+ <<EOF>> unexpected_eof (token_start, "\""); BEGIN context_state;
+}
+
+
+ /*---------------------------------------------------.
+ | Strings, comments etc. can be found in user code. |
+ `---------------------------------------------------*/
+
+<INITIAL>
+{
+ "'" {
+ STRING_GROW;
+ context_state = YY_START;
+ token_start = loc->start;
+ BEGIN SC_CHARACTER;
+ }
+ "\"" {
+ STRING_GROW;
+ context_state = YY_START;
+ token_start = loc->start;
+ BEGIN SC_STRING;
+ }
+ "/"{splice}"*" {
+ STRING_GROW;
+ context_state = YY_START;
+ token_start = loc->start;
+ BEGIN SC_COMMENT;
+ }
+ "/"{splice}"/" {
+ STRING_GROW;
+ context_state = YY_START;
+ BEGIN SC_LINE_COMMENT;
+ }
+}
+
+
+ /*---------------------------------------------------------------.
+ | Scanning some code in braces (%union and actions). The initial |
+ | "{" is already eaten. |
+ `---------------------------------------------------------------*/
+
+<INITIAL>
+{
+ "{"|"<"{splice}"%" STRING_GROW; braces_level++;
+ "%"{splice}">" STRING_GROW; braces_level--;
+ "}" {
+ bool outer_brace = --braces_level < 0;
+
+ /* As an undocumented Bison extension, append `;' before the last
+ brace in braced code, so that the user code can omit trailing
+ `;'. But do not append `;' if emulating Yacc, since Yacc does
+ not append one.
+
+ FIXME: Bison should warn if a semicolon seems to be necessary
+ here, and should omit the semicolon if it seems unnecessary
+ (e.g., after ';', '{', or '}', each followed by comments or
+ white space). Such a warning shouldn't depend on --yacc; it
+ should depend on a new --pedantic option, which would cause
+ Bison to warn if it detects an extension to POSIX. --pedantic
+ should also diagnose other Bison extensions like %yacc.
+ Perhaps there should also be a GCC-style --pedantic-errors
+ option, so that such warnings are diagnosed as errors. */
+ if (outer_brace && token_type == BRACED_CODE && ! yacc_flag)
+ obstack_1grow (&obstack_for_string, ';');
+
+ obstack_1grow (&obstack_for_string, '}');
+
+ if (outer_brace)
+ {
+ STRING_FINISH;
+ rule_length++;
+ loc->start = code_start;
+ val->chars = last_string;
+ BEGIN INITIAL;
+ return token_type;
+ }
+ }
+
+ /* Tokenize `<<%' correctly (as `<<' `%') rather than incorrrectly
+ (as `<' `<%'). */
+ "<"{splice}"<" STRING_GROW;
+
+ "$"("<"{tag}">")?(-?[0-9]+|"$") handle_dollar (token_type, yytext, *loc);
+ "@"(-?[0-9]+|"$") handle_at (token_type, yytext, *loc);
+
+ <<EOF>> unexpected_eof (code_start, "}"); BEGIN INITIAL;
+}
+
+
+ /*--------------------------------------------------------------.
+ | Scanning some prologue: from "%{" (already scanned) to "%}". |
+ `--------------------------------------------------------------*/
+
+<SC_PROLOGUE>
+{
+ "%}" {
+ STRING_FINISH;
+ loc->start = code_start;
+ val->chars = last_string;
+ BEGIN INITIAL;
+ return PROLOGUE;
+ }
+
+ <<EOF>> unexpected_eof (code_start, "%}"); BEGIN INITIAL;
+}
+
+
+ /*---------------------------------------------------------------.
+ | Scanning the epilogue (everything after the second "%%", which |
+ | has already been eaten). |
+ `---------------------------------------------------------------*/
+
+<SC_EPILOGUE>
+{
+ <<EOF>> {
+ STRING_FINISH;
+ loc->start = code_start;
+ val->chars = last_string;
+ BEGIN INITIAL;
+ return EPILOGUE;
+ }
+}
+
+
+ /*-----------------------------------------.
+ | Escape M4 quoting characters in C code. |
+ `-----------------------------------------*/
+
+<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
+{
+ \$ obstack_sgrow (&obstack_for_string, "$][");
+ \@ obstack_sgrow (&obstack_for_string, "@@");
+ \[ obstack_sgrow (&obstack_for_string, "@{");
+ \] obstack_sgrow (&obstack_for_string, "@}");
+}
+
+
+ /*-----------------------------------------------------.
+ | By default, grow the string obstack with the input. |
+ `-----------------------------------------------------*/
+
+<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER,SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>.
|
+<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>\n
STRING_GROW;
+
+%%
+
+/* Keeps track of the maximum number of semantic values to the left of
+ a handle (those referenced by $0, $-1, etc.) are required by the
+ semantic actions of this grammar. */
+int max_left_semantic_context = 0;
+
+/* Set *LOC and adjust scanner cursor to account for token TOKEN of
+ size SIZE. */
+
+static void
+adjust_location (location *loc, char const *token, size_t size)
+{
+ int line = scanner_cursor.line;
+ int column = scanner_cursor.column;
+ char const *p0 = token;
+ char const *p = token;
+ char const *lim = token + size;
+
+ loc->start = scanner_cursor;
+
+ for (p = token; p < lim; p++)
+ switch (*p)
+ {
+ case '\n':
+ line++;
+ column = 1;
+ p0 = p + 1;
+ break;
+
+ case '\t':
+ column += mbsnwidth (p0, p - p0, 0);
+ column += 8 - ((column - 1) & 7);
+ p0 = p + 1;
+ break;
+ }
+
+ scanner_cursor.line = line;
+ scanner_cursor.column = column + mbsnwidth (p0, p - p0, 0);
+
+ loc->end = scanner_cursor;
+}
+
+
+/* Read bytes from FP into buffer BUF of size SIZE. Return the
+ number of bytes read. Remove '\r' from input, treating \r\n
+ and isolated \r as \n. */
+
+static size_t
+no_cr_read (FILE *fp, char *buf, size_t size)
+{
+ size_t bytes_read = fread (buf, 1, size, fp);
+ if (bytes_read)
+ {
+ char *w = memchr (buf, '\r', bytes_read);
+ if (w)
+ {
+ char const *r = ++w;
+ char const *lim = buf + bytes_read;
+
+ for (;;)
+ {
+ /* Found an '\r'. Treat it like '\n', but ignore any
+ '\n' that immediately follows. */
+ w[-1] = '\n';
+ if (r == lim)
+ {
+ int ch = getc (fp);
+ if (ch != '\n' && ungetc (ch, fp) != ch)
+ break;
+ }
+ else if (*r == '\n')
+ r++;
+
+ /* Copy until the next '\r'. */
+ do
+ {
+ if (r == lim)
+ return w - buf;
+ }
+ while ((*w++ = *r++) != '\r');
+ }
+
+ return w - buf;
+ }
+ }
+
+ return bytes_read;
+}
+
+
+/*------------------------------------------------------------------.
+| TEXT is pointing to a wannabee semantic value (i.e., a `$'). |
+| |
+| Possible inputs: $[<TYPENAME>]($|integer) |
+| |
+| Output to OBSTACK_FOR_STRING a reference to this semantic value. |
+`------------------------------------------------------------------*/
+
+static inline bool
+handle_action_dollar (char *text, location loc)
+{
+ const char *type_name = NULL;
+ char *cp = text + 1;
+
+ if (! current_rule)
+ return false;
+
+ /* Get the type name if explicit. */
+ if (*cp == '<')
+ {
+ type_name = ++cp;
+ while (*cp != '>')
+ ++cp;
+ *cp = '\0';
+ ++cp;
+ }
+
+ if (*cp == '$')
+ {
+ if (!type_name)
+ type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
+ if (!type_name && typed)
+ complain_at (loc, _("$$ of `%s' has no declared type"),
+ current_rule->sym->tag);
+ if (!type_name)
+ type_name = "";
+ obstack_fgrow1 (&obstack_for_string,
+ "]b4_lhs_value([%s])[", type_name);
+ }
+ else
+ {
+ long int num;
+ set_errno (0);
+ num = strtol (cp, 0, 10);
+
+ if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+ {
+ int n = num;
+ if (1-n > max_left_semantic_context)
+ max_left_semantic_context = 1-n;
+ if (!type_name && n > 0)
+ type_name = symbol_list_n_type_name_get (current_rule, loc, n);
+ if (!type_name && typed)
+ complain_at (loc, _("$%d of `%s' has no declared type"),
+ n, current_rule->sym->tag);
+ if (!type_name)
+ type_name = "";
+ obstack_fgrow3 (&obstack_for_string,
+ "]b4_rhs_value(%d, %d, [%s])[",
+ rule_length, n, type_name);
+ }
+ else
+ complain_at (loc, _("integer out of range: %s"), quote (text));
+ }
+
+ return true;
+}
+
+
+/*----------------------------------------------------------------.
+| Map `$?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
+| (are we in an action?). |
+`----------------------------------------------------------------*/
+
+static void
+handle_dollar (int token_type, char *text, location loc)
+{
+ switch (token_type)
+ {
+ case BRACED_CODE:
+ if (handle_action_dollar (text, loc))
+ return;
+ break;
+
+ case PERCENT_DESTRUCTOR:
+ case PERCENT_INITIAL_ACTION:
+ case PERCENT_PRINTER:
+ if (text[1] == '$')
+ {
+ obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
+ return;
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ complain_at (loc, _("invalid value: %s"), quote (text));
+}
+
+
+/*------------------------------------------------------.
+| TEXT is a location token (i.e., a address@hidden'). Output to |
+| OBSTACK_FOR_STRING a reference to this location. |
+`------------------------------------------------------*/
+
+static inline bool
+handle_action_at (char *text, location loc)
+{
+ char *cp = text + 1;
+ locations_flag = true;
+
+ if (! current_rule)
+ return false;
+
+ if (*cp == '$')
+ obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
+ else
+ {
+ long int num;
+ set_errno (0);
+ num = strtol (cp, 0, 10);
+
+ if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+ {
+ int n = num;
+ obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
+ rule_length, n);
+ }
+ else
+ complain_at (loc, _("integer out of range: %s"), quote (text));
+ }
+
+ return true;
+}
+
+
+/*----------------------------------------------------------------.
+| Map address@hidden' onto the proper M4 symbol, depending on its TOKEN_TYPE |
+| (are we in an action?). |
+`----------------------------------------------------------------*/
+
+static void
+handle_at (int token_type, char *text, location loc)
+{
+ switch (token_type)
+ {
+ case BRACED_CODE:
+ handle_action_at (text, loc);
+ return;
+
+ case PERCENT_INITIAL_ACTION:
+ case PERCENT_DESTRUCTOR:
+ case PERCENT_PRINTER:
+ if (text[1] == '$')
+ {
+ obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
+ return;
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ complain_at (loc, _("invalid value: %s"), quote (text));
+}
+
+
+/*------------------------------------------------------.
+| Scan NUMBER for a base-BASE integer at location LOC. |
+`------------------------------------------------------*/
+
+static unsigned long int
+scan_integer (char const *number, int base, location loc)
+{
+ unsigned long int num;
+ set_errno (0);
+ num = strtoul (number, 0, base);
+ if (INT_MAX < num || get_errno ())
+ {
+ complain_at (loc, _("integer out of range: %s"), quote (number));
+ num = INT_MAX;
+ }
+ return num;
+}
+
+
+/*------------------------------------------------------------------.
+| Convert universal character name UCN to a single-byte character, |
+| and return that character. Return -1 if UCN does not correspond |
+| to a single-byte character. |
+`------------------------------------------------------------------*/
+
+static int
+convert_ucn_to_byte (char const *ucn)
+{
+ unsigned long int code = strtoul (ucn + 2, 0, 16);
+
+ /* FIXME: Currently we assume Unicode-compatible unibyte characters
+ on ASCII hosts (i.e., Latin-1 on hosts with 8-bit bytes). On
+ non-ASCII hosts we support only the portable C character set.
+ These limitations should be removed once we add support for
+ multibyte characters. */
+
+ if (UCHAR_MAX < code)
+ return -1;
+
+#if ! ('$' == 0x24 && '@' == 0x40 && '`' == 0x60 && '~' == 0x7e)
+ {
+ /* A non-ASCII host. Use CODE to index into a table of the C
+ basic execution character set, which is guaranteed to exist on
+ all Standard C platforms. This table also includes '$', '@',
+ and '`', which are not in the basic execution character set but
+ which are unibyte characters on all the platforms that we know
+ about. */
+ static signed char const table[] =
+ {
+ '\0', -1, -1, -1, -1, -1, -1, '\a',
+ '\b', '\t', '\n', '\v', '\f', '\r', -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1,
+ ' ', '!', '"', '#', '$', '%', '&', '\'',
+ '(', ')', '*', '+', ',', '-', '.', '/',
+ '0', '1', '2', '3', '4', '5', '6', '7',
+ '8', '9', ':', ';', '<', '=', '>', '?',
+ '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G',
+ 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
+ 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
+ 'X', 'Y', 'Z', '[', '\\', ']', '^', '_',
+ '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g',
+ 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
+ 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
+ 'x', 'y', 'z', '{', '|', '}', '~'
+ };
+
+ code = code < sizeof table ? table[code] : -1;
+ }
+#endif
+
+ return code;
+}
+
+
+/*----------------------------------------------------------------.
+| Handle `#line INT "FILE"'. ARGS has already skipped `#line '. |
+`----------------------------------------------------------------*/
+
+static void
+handle_syncline (char *args)
+{
+ int lineno = strtol (args, &args, 10);
+ const char *file = NULL;
+ file = strchr (args, '"') + 1;
+ *strchr (file, '"') = 0;
+ scanner_cursor.file = current_file = uniqstr_new (file);
+ scanner_cursor.line = lineno;
+ scanner_cursor.column = 1;
+}
+
+
+/*----------------------------------------------------------------.
+| For a token or comment starting at START, report message MSGID, |
+| which should say that an end marker was found before |
+| the expected TOKEN_END. |
+`----------------------------------------------------------------*/
+
+static void
+unexpected_end (boundary start, char const *msgid, char const *token_end)
+{
+ location loc;
+ loc.start = start;
+ loc.end = scanner_cursor;
+ complain_at (loc, _(msgid), token_end);
+}
+
+
+/*------------------------------------------------------------------------.
+| Report an unexpected EOF in a token or comment starting at START. |
+| An end of file was encountered and the expected TOKEN_END was missing. |
+`------------------------------------------------------------------------*/
+
+static void
+unexpected_eof (boundary start, char const *token_end)
+{
+ unexpected_end (start, N_("missing `%s' at end of file"), token_end);
+}
+
+
+/*----------------------------------------.
+| Likewise, but for unexpected newlines. |
+`----------------------------------------*/
+
+static void
+unexpected_newline (boundary start, char const *token_end)
+{
+ unexpected_end (start, N_("missing `%s' at end of line"), token_end);
+}
+
+
+/*-------------------------.
+| Initialize the scanner. |
+`-------------------------*/
+
+void
+scanner_initialize (void)
+{
+ obstack_init (&obstack_for_string);
+}
+
+
+/*-----------------------------------------------.
+| Free all the memory allocated to the scanner. |
+`-----------------------------------------------*/
+
+void
+scanner_free (void)
+{
+ obstack_free (&obstack_for_string, 0);
+ /* Reclaim Flex's buffers. */
+ yy_delete_buffer (YY_CURRENT_BUFFER);
+}
Index: src/scan-code-c.c
===================================================================
RCS file: src/scan-code-c.c
diff -N src/scan-code-c.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/scan-code-c.c 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,2 @@
+#include <config.h>
+#include "scan-code.c"
Index: src/scan-code.h
===================================================================
RCS file: src/scan-code.h
diff -N src/scan-code.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/scan-code.h 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,47 @@
+/* Bison Action Scanner
+
+ Copyright (C) 2006 Free Software Foundation, Inc.
+
+ This file is part of Bison, the GNU Compiler Compiler.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ 02110-1301 USA
+*/
+
+#ifndef SCAN_CODE_H_
+# define SCAN_CODE_H_
+
+# include "location.h"
+# include "symlist.h"
+
+/* Keeps track of the maximum number of semantic values to the left of
+ a handle (those referenced by $0, $-1, etc.) are required by the
+ semantic actions of this grammar. */
+extern int max_left_semantic_context;
+
+void code_scanner_free (void);
+
+/* The action A contains $$, $1 etc. referring to the values
+ of the rule R. */
+const char *translate_rule_action (symbol_list *r, const char *a, location l);
+
+/* The action A refers to $$ and @$ only, referring to a symbol. */
+const char *translate_symbol_action (const char *a, location l);
+
+/* The action contains no special escapes, just protect M4 special
+ symbols. */
+const char *translate_code (const char *a, location l);
+
+#endif /* !SCAN_CODE_H_ */
Index: src/scan-code.l
===================================================================
RCS file: src/scan-code.l
diff -N src/scan-code.l
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/scan-code.l 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,358 @@
+/* Bison Action Scanner -*- C -*-
+
+ Copyright (C) 2006 Free Software Foundation, Inc.
+
+ This file is part of Bison, the GNU Compiler Compiler.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ 02110-1301 USA
+*/
+
+%option debug nodefault nounput noyywrap never-interactive
+%option prefix="code_" outfile="lex.yy.c"
+
+%{
+/* Work around a bug in flex 2.5.31. See Debian bug 333231
+ <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */
+#undef code_wrap
+#define code_wrap() 1
+
+#define FLEX_PREFIX(Id) code_ ## Id
+#include "flex-scanner.h"
+#include "reader.h"
+#include "getargs.h"
+#include <assert.h>
+#include <get-errno.h>
+#include <quote.h>
+
+#include "scan-code.h"
+
+/* The current calling start condition: SC_RULE_ACTION or
+ SC_SYMBOL_ACTION. */
+# define YY_DECL const char *code_lex (int sc_context)
+YY_DECL;
+
+#define YY_USER_ACTION location_compute (loc, &loc->end, yytext, yyleng);
+
+static void handle_action_dollar (char *cp, location loc);
+static void handle_action_at (char *cp, location loc);
+static location the_location;
+static location *loc = &the_location;
+
+/* The rule being processed. */
+symbol_list *current_rule;
+%}
+ /* C and C++ comments in code. */
+%x SC_COMMENT SC_LINE_COMMENT
+ /* Strings and characters in code. */
+%x SC_STRING SC_CHARACTER
+ /* Whether in a rule or symbol action. Specifies the translation
+ of $ and @. */
+%x SC_RULE_ACTION SC_SYMBOL_ACTION
+
+
+/* POSIX says that a tag must be both an id and a C union member, but
+ historically almost any character is allowed in a tag. We disallow
+ NUL and newline, as this simplifies our implementation. */
+tag [^\0\n>]+
+
+/* Zero or more instances of backslash-newline. Following GCC, allow
+ white space between the backslash and the newline. */
+splice (\\[ \f\t\v]*\n)*
+
+%%
+
+%{
+ /* This scanner is special: it is invoked only once, henceforth
+ is expected to return only once. This initialization is
+ therefore done once per action to translate. */
+ assert (sc_context == SC_SYMBOL_ACTION
+ || sc_context == SC_RULE_ACTION
+ || sc_context == INITIAL);
+ BEGIN sc_context;
+%}
+
+ /*------------------------------------------------------------.
+ | Scanning a C comment. The initial `/ *' is already eaten. |
+ `------------------------------------------------------------*/
+
+<SC_COMMENT>
+{
+ "*"{splice}"/" STRING_GROW; BEGIN sc_context;
+}
+
+
+ /*--------------------------------------------------------------.
+ | Scanning a line comment. The initial `//' is already eaten. |
+ `--------------------------------------------------------------*/
+
+<SC_LINE_COMMENT>
+{
+ "\n" STRING_GROW; BEGIN sc_context;
+ {splice} STRING_GROW;
+}
+
+
+ /*--------------------------------------------.
+ | Scanning user-code characters and strings. |
+ `--------------------------------------------*/
+
+<SC_CHARACTER,SC_STRING>
+{
+ {splice}|\\{splice}. STRING_GROW;
+}
+
+<SC_CHARACTER>
+{
+ "'" STRING_GROW; BEGIN sc_context;
+}
+
+<SC_STRING>
+{
+ "\"" STRING_GROW; BEGIN sc_context;
+}
+
+
+<SC_RULE_ACTION,SC_SYMBOL_ACTION>{
+ "'" {
+ STRING_GROW;
+ BEGIN SC_CHARACTER;
+ }
+ "\"" {
+ STRING_GROW;
+ BEGIN SC_STRING;
+ }
+ "/"{splice}"*" {
+ STRING_GROW;
+ BEGIN SC_COMMENT;
+ }
+ "/"{splice}"/" {
+ STRING_GROW;
+ BEGIN SC_LINE_COMMENT;
+ }
+}
+
+<SC_RULE_ACTION>
+{
+ "$"("<"{tag}">")?(-?[0-9]+|"$") handle_action_dollar (yytext, *loc);
+ "@"(-?[0-9]+|"$") handle_action_at (yytext, *loc);
+
+ "$" {
+ warn_at (*loc, _("stray `$'"));
+ obstack_sgrow (&obstack_for_string, "$][");
+ }
+ "@" {
+ warn_at (*loc, _("stray `@'"));
+ obstack_sgrow (&obstack_for_string, "@@");
+ }
+}
+
+<SC_SYMBOL_ACTION>
+{
+ "$$" obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
+ "@$" obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
+}
+
+
+ /*-----------------------------------------.
+ | Escape M4 quoting characters in C code. |
+ `-----------------------------------------*/
+
+<*>
+{
+ \$ obstack_sgrow (&obstack_for_string, "$][");
+ \@ obstack_sgrow (&obstack_for_string, "@@");
+ \[ obstack_sgrow (&obstack_for_string, "@{");
+ \] obstack_sgrow (&obstack_for_string, "@}");
+}
+
+ /*-----------------------------------------------------.
+ | By default, grow the string obstack with the input. |
+ `-----------------------------------------------------*/
+
+<*>.|\n STRING_GROW;
+
+ /* End of processing. */
+<*><<EOF>> {
+ obstack_1grow (&obstack_for_string, '\0');
+ return obstack_finish (&obstack_for_string);
+ }
+
+%%
+
+/* Keeps track of the maximum number of semantic values to the left of
+ a handle (those referenced by $0, $-1, etc.) are required by the
+ semantic actions of this grammar. */
+int max_left_semantic_context = 0;
+
+
+/*------------------------------------------------------------------.
+| TEXT is pointing to a wannabee semantic value (i.e., a `$'). |
+| |
+| Possible inputs: $[<TYPENAME>]($|integer) |
+| |
+| Output to OBSTACK_FOR_STRING a reference to this semantic value. |
+`------------------------------------------------------------------*/
+
+static void
+handle_action_dollar (char *text, location loc)
+{
+ const char *type_name = NULL;
+ char *cp = text + 1;
+ int rule_length = symbol_list_length (current_rule->next);
+
+ /* Get the type name if explicit. */
+ if (*cp == '<')
+ {
+ type_name = ++cp;
+ while (*cp != '>')
+ ++cp;
+ *cp = '\0';
+ ++cp;
+ }
+
+ if (*cp == '$')
+ {
+ if (!type_name)
+ type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
+ if (!type_name && typed)
+ complain_at (loc, _("$$ of `%s' has no declared type"),
+ current_rule->sym->tag);
+ if (!type_name)
+ type_name = "";
+ obstack_fgrow1 (&obstack_for_string,
+ "]b4_lhs_value([%s])[", type_name);
+ current_rule->used = true;
+ }
+ else
+ {
+ long int num;
+ set_errno (0);
+ num = strtol (cp, 0, 10);
+ if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+ {
+ int n = num;
+ if (1-n > max_left_semantic_context)
+ max_left_semantic_context = 1-n;
+ if (!type_name && n > 0)
+ type_name = symbol_list_n_type_name_get (current_rule, loc, n);
+ if (!type_name && typed)
+ complain_at (loc, _("$%d of `%s' has no declared type"),
+ n, current_rule->sym->tag);
+ if (!type_name)
+ type_name = "";
+ obstack_fgrow3 (&obstack_for_string,
+ "]b4_rhs_value(%d, %d, [%s])[",
+ rule_length, n, type_name);
+ symbol_list_n_used_set (current_rule, n, true);
+ }
+ else
+ complain_at (loc, _("integer out of range: %s"), quote (text));
+ }
+}
+
+
+/*------------------------------------------------------.
+| TEXT is a location token (i.e., a address@hidden'). Output to |
+| OBSTACK_FOR_STRING a reference to this location. |
+`------------------------------------------------------*/
+
+static void
+handle_action_at (char *text, location loc)
+{
+ char *cp = text + 1;
+ int rule_length = symbol_list_length (current_rule->next);
+ locations_flag = true;
+
+ if (*cp == '$')
+ obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
+ else
+ {
+ long int num;
+ set_errno (0);
+ num = strtol (cp, 0, 10);
+
+ if (INT_MIN <= num && num <= rule_length && ! get_errno ())
+ {
+ int n = num;
+ obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
+ rule_length, n);
+ }
+ else
+ complain_at (loc, _("integer out of range: %s"), quote (text));
+ }
+}
+
+
+/*-------------------------.
+| Initialize the scanner. |
+`-------------------------*/
+
+/* Translate the dollars and ats in \a a, whose location is l.
+ Depending on the \a sc_context (SC_RULE_ACTION, SC_SYMBOL_ACTION,
+ INITIAL), the processing is different. */
+
+static const char *
+translate_action (int sc_context, const char *a, location l)
+{
+ const char *res;
+ static bool initialized = false;
+ if (!initialized)
+ {
+ obstack_init (&obstack_for_string);
+ /* The initial buffer, never used. */
+ yy_delete_buffer (YY_CURRENT_BUFFER);
+ yy_flex_debug = 0;
+ initialized = true;
+ }
+
+ loc->start = loc->end = l.start;
+ yy_switch_to_buffer (yy_scan_string (a));
+ res = code_lex (sc_context);
+ yy_delete_buffer (YY_CURRENT_BUFFER);
+
+ return res;
+}
+
+const char *
+translate_rule_action (symbol_list *r, const char *a, location l)
+{
+ current_rule = r;
+ return translate_action (SC_RULE_ACTION, a, l);
+}
+
+const char *
+translate_symbol_action (const char *a, location l)
+{
+ return translate_action (SC_SYMBOL_ACTION, a, l);
+}
+
+const char *
+translate_code (const char *a, location l)
+{
+ return translate_action (INITIAL, a, l);
+}
+
+/*-----------------------------------------------.
+| Free all the memory allocated to the scanner. |
+`-----------------------------------------------*/
+
+void
+code_scanner_free (void)
+{
+ obstack_free (&obstack_for_string, 0);
+ /* Reclaim Flex's buffers. */
+ yy_delete_buffer (YY_CURRENT_BUFFER);
+}
Index: src/scan-gram.h
===================================================================
RCS file: src/scan-gram.h
diff -N src/scan-gram.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/scan-gram.h 6 Jun 2006 16:38:55 -0000
@@ -0,0 +1,44 @@
+/* Bison Grammar Scanner
+
+ Copyright (C) 2006 Free Software Foundation, Inc.
+
+ This file is part of Bison, the GNU Compiler Compiler.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ 02110-1301 USA
+*/
+
+#ifndef SCAN_GRAM_H_
+# define SCAN_GRAM_H_
+
+/* From the scanner. */
+extern FILE *gram_in;
+extern int gram__flex_debug;
+extern boundary gram_scanner_cursor;
+extern char *gram_last_string;
+extern location gram_last_braced_code_loc;
+void gram_scanner_initialize (void);
+void gram_scanner_free (void);
+void gram_scanner_last_string_free (void);
+
+/* These are declared by the scanner, but not used. We put them here
+ to pacify "make syntax-check". */
+extern FILE *gram_out;
+extern int gram_lineno;
+
+# define GRAM_LEX_DECL int gram_lex (YYSTYPE *val, location *loc)
+GRAM_LEX_DECL;
+
+#endif /* !SCAN_GRAM_H_ */
Index: src/scan-gram.l
===================================================================
RCS file: /cvsroot/bison/bison/src/scan-gram.l,v
retrieving revision 1.86
diff -u -u -r1.86 scan-gram.l
--- src/scan-gram.l 3 Apr 2006 13:50:10 -0000 1.86
+++ src/scan-gram.l 6 Jun 2006 16:38:55 -0000
@@ -29,112 +29,48 @@
#undef gram_wrap
#define gram_wrap() 1
-#include "system.h"
-
-#include <mbswidth.h>
-#include <quote.h>
+#define FLEX_PREFIX(Id) gram_ ## Id
+#include "flex-scanner.h"
#include "complain.h"
#include "files.h"
-#include "getargs.h"
+#include "getargs.h" /* yacc_flag */
#include "gram.h"
#include "quotearg.h"
#include "reader.h"
#include "uniqstr.h"
+#include <mbswidth.h>
+#include <quote.h>
+
+#include "scan-gram.h"
+
+#define YY_DECL GRAM_LEX_DECL
+
#define YY_USER_INIT \
- do \
- { \
- scanner_cursor.file = current_file; \
- scanner_cursor.line = 1; \
- scanner_cursor.column = 1; \
- code_start = scanner_cursor; \
- } \
- while (0)
-
-/* Pacify "gcc -Wmissing-prototypes" when flex 2.5.31 is used. */
-int gram_get_lineno (void);
-FILE *gram_get_in (void);
-FILE *gram_get_out (void);
-int gram_get_leng (void);
-char *gram_get_text (void);
-void gram_set_lineno (int);
-void gram_set_in (FILE *);
-void gram_set_out (FILE *);
-int gram_get_debug (void);
-void gram_set_debug (int);
-int gram_lex_destroy (void);
+ code_start = scanner_cursor = loc->start; \
/* Location of scanner cursor. */
boundary scanner_cursor;
-static void adjust_location (location *, char const *, size_t);
-#define YY_USER_ACTION adjust_location (loc, yytext, yyleng);
+#define YY_USER_ACTION location_compute (loc, &scanner_cursor, yytext,
yyleng);
static size_t no_cr_read (FILE *, char *, size_t);
#define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size))
-
-/* OBSTACK_FOR_STRING -- Used to store all the characters that we need to
- keep (to construct ID, STRINGS etc.). Use the following macros to
- use it.
-
- Use STRING_GROW to append what has just been matched, and
- STRING_FINISH to end the string (it puts the ending 0).
- STRING_FINISH also stores this string in LAST_STRING, which can be
- used, and which is used by STRING_FREE to free the last string. */
-
-static struct obstack obstack_for_string;
-
/* A string representing the most recently saved token. */
char *last_string;
-/* The location of the most recently saved token, if it was a
- BRACED_CODE token; otherwise, this has an unspecified value. */
-location last_braced_code_loc;
-
-#define STRING_GROW \
- obstack_grow (&obstack_for_string, yytext, yyleng)
-
-#define STRING_FINISH \
- do { \
- obstack_1grow (&obstack_for_string, '\0'); \
- last_string = obstack_finish (&obstack_for_string); \
- } while (0)
-
-#define STRING_FREE \
- obstack_free (&obstack_for_string, last_string)
-
void
-scanner_last_string_free (void)
+gram_scanner_last_string_free (void)
{
STRING_FREE;
}
-/* Within well-formed rules, RULE_LENGTH is the number of values in
- the current rule so far, which says where to find `$0' with respect
- to the top of the stack. It is not the same as the rule->length in
- the case of mid rule actions.
-
- Outside of well-formed rules, RULE_LENGTH has an undefined value. */
-static int rule_length;
-
-static void rule_length_overflow (location) __attribute__ ((__noreturn__));
-
-/* Increment the rule length by one, checking for overflow. */
-static inline void
-increment_rule_length (location loc)
-{
- rule_length++;
-
- /* Don't allow rule_length == INT_MAX, since that might cause
- confusion with strtol if INT_MAX == LONG_MAX. */
- if (rule_length == INT_MAX)
- rule_length_overflow (loc);
-}
+/* The location of the most recently saved token, if it was a
+ BRACED_CODE token; otherwise, this has an unspecified value. */
+location gram_last_braced_code_loc;
-static void handle_dollar (int token_type, char *cp, location loc);
-static void handle_at (int token_type, char *cp, location loc);
static void handle_syncline (char *, location);
static unsigned long int scan_integer (char const *p, int base, location loc);
static int convert_ucn_to_byte (char const *hex_text);
@@ -142,11 +78,26 @@
static void unexpected_newline (boundary, char const *);
%}
-%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT
-%x SC_STRING SC_CHARACTER
-%x SC_AFTER_IDENTIFIER
+ /* A C-like comment in directives/rules. */
+%x SC_YACC_COMMENT
+ /* Strings and characters in directives/rules. */
%x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
-%x SC_PRE_CODE SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE
+ /* A identifier was just read in directives/rules. Special state
+ to capture the sequence `identifier :'. */
+%x SC_AFTER_IDENTIFIER
+ /* A keyword that should be followed by some code was read (e.g.
+ %printer). */
+%x SC_PRE_CODE
+
+ /* Three types of user code:
+ - prologue (code between `%{' `%}' in the first section, before %%);
+ - actions, printers, union, etc, (between braced in the middle section);
+ - epilogue (everything after the second %%). */
+%x SC_PROLOGUE SC_BRACED_CODE SC_EPILOGUE
+ /* C and C++ comments in code. */
+%x SC_COMMENT SC_LINE_COMMENT
+ /* Strings and characters in code. */
+%x SC_STRING SC_CHARACTER
letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
id {letter}({letter}|[0-9])*
@@ -221,17 +172,17 @@
"%default"[-_]"prec" return PERCENT_DEFAULT_PREC;
"%define" return PERCENT_DEFINE;
"%defines" return PERCENT_DEFINES;
- "%destructor" token_type = PERCENT_DESTRUCTOR; BEGIN
SC_PRE_CODE;
+ "%destructor" /* FIXME: Remove once %union handled
differently. */ token_type = BRACED_CODE; return PERCENT_DESTRUCTOR;
"%dprec" return PERCENT_DPREC;
"%error"[-_]"verbose" return PERCENT_ERROR_VERBOSE;
"%expect" return PERCENT_EXPECT;
"%expect"[-_]"rr" return PERCENT_EXPECT_RR;
"%file-prefix" return PERCENT_FILE_PREFIX;
"%fixed"[-_]"output"[-_]"files" return PERCENT_YACC;
- "%initial-action" token_type = PERCENT_INITIAL_ACTION; BEGIN
SC_PRE_CODE;
+ "%initial-action" /* FIXME: Remove once %union handled differently.
*/ token_type = BRACED_CODE; return PERCENT_INITIAL_ACTION;
"%glr-parser" return PERCENT_GLR_PARSER;
"%left" return PERCENT_LEFT;
- "%lex-param" token_type = PERCENT_LEX_PARAM; BEGIN SC_PRE_CODE;
+ "%lex-param" /* FIXME: Remove once %union handled differently. */
token_type = BRACED_CODE; return PERCENT_LEX_PARAM;
"%locations" return PERCENT_LOCATIONS;
"%merge" return PERCENT_MERGE;
"%name"[-_]"prefix" return PERCENT_NAME_PREFIX;
@@ -241,9 +192,9 @@
"%nondeterministic-parser" return PERCENT_NONDETERMINISTIC_PARSER;
"%nterm" return PERCENT_NTERM;
"%output" return PERCENT_OUTPUT;
- "%parse-param" token_type = PERCENT_PARSE_PARAM; BEGIN SC_PRE_CODE;
- "%prec" rule_length--; return PERCENT_PREC;
- "%printer" token_type = PERCENT_PRINTER; BEGIN SC_PRE_CODE;
+ "%parse-param" /* FIXME: Remove once %union handled differently. */
token_type = BRACED_CODE; return PERCENT_PARSE_PARAM;
+ "%prec" return PERCENT_PREC;
+ "%printer" /* FIXME: Remove once %union handled differently.
*/ token_type = BRACED_CODE; return PERCENT_PRINTER;
"%pure"[-_]"parser" return PERCENT_PURE_PARSER;
"%require" return PERCENT_REQUIRE;
"%right" return PERCENT_RIGHT;
@@ -262,13 +213,12 @@
}
"=" return EQUAL;
- "|" rule_length = 0; return PIPE;
+ "|" return PIPE;
";" return SEMICOLON;
{id} {
val->symbol = symbol_get (yytext, *loc);
id_loc = *loc;
- increment_rule_length (*loc);
BEGIN SC_AFTER_IDENTIFIER;
}
@@ -335,7 +285,6 @@
<SC_AFTER_IDENTIFIER>
{
":" {
- rule_length = 0;
*loc = id_loc;
BEGIN INITIAL;
return ID_COLON;
@@ -401,7 +350,6 @@
STRING_FINISH;
loc->start = token_start;
val->chars = last_string;
- increment_rule_length (*loc);
BEGIN INITIAL;
return STRING;
}
@@ -428,7 +376,6 @@
last_string_1 = last_string[1];
symbol_user_token_number_set (val->symbol, last_string_1, *loc);
STRING_FREE;
- increment_rule_length (*loc);
BEGIN INITIAL;
return ID;
}
@@ -501,7 +448,7 @@
<SC_CHARACTER,SC_STRING>
{
- {splice}|address@hidden STRING_GROW;
+ {splice}|\\{splice}[^\n\[\]] STRING_GROW;
}
<SC_CHARACTER>
@@ -622,8 +569,7 @@
STRING_FINISH;
loc->start = code_start;
val->chars = last_string;
- increment_rule_length (*loc);
- last_braced_code_loc = *loc;
+ gram_last_braced_code_loc = *loc;
BEGIN INITIAL;
return token_type;
}
@@ -633,18 +579,6 @@
(as `<' `<%'). */
"<"{splice}"<" STRING_GROW;
- "$"("<"{tag}">")?(-?[0-9]+|"$") handle_dollar (token_type, yytext, *loc);
- "@"(-?[0-9]+|"$") handle_at (token_type, yytext, *loc);
-
- "$" {
- warn_at (*loc, _("stray `$'"));
- obstack_sgrow (&obstack_for_string, "$][");
- }
- "@" {
- warn_at (*loc, _("stray `@'"));
- obstack_sgrow (&obstack_for_string, "@@");
- }
-
<<EOF>> unexpected_eof (code_start, "}"); BEGIN INITIAL;
}
@@ -684,19 +618,6 @@
}
- /*-----------------------------------------.
- | Escape M4 quoting characters in C code. |
- `-----------------------------------------*/
-
-<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
-{
- \$ obstack_sgrow (&obstack_for_string, "$][");
- \@ obstack_sgrow (&obstack_for_string, "@@");
- \[ obstack_sgrow (&obstack_for_string, "@{");
- \] obstack_sgrow (&obstack_for_string, "@}");
-}
-
-
/*-----------------------------------------------------.
| By default, grow the string obstack with the input. |
`-----------------------------------------------------*/
@@ -706,79 +627,6 @@
%%
-/* Keeps track of the maximum number of semantic values to the left of
- a handle (those referenced by $0, $-1, etc.) are required by the
- semantic actions of this grammar. */
-int max_left_semantic_context = 0;
-
-/* If BUF is null, add BUFSIZE (which in this case must be less than
- INT_MAX) to COLUMN; otherwise, add mbsnwidth (BUF, BUFSIZE, 0) to
- COLUMN. If an overflow occurs, or might occur but is undetectable,
- return INT_MAX. Assume COLUMN is nonnegative. */
-
-static inline int
-add_column_width (int column, char const *buf, size_t bufsize)
-{
- size_t width;
- unsigned int remaining_columns = INT_MAX - column;
-
- if (buf)
- {
- if (INT_MAX / 2 <= bufsize)
- return INT_MAX;
- width = mbsnwidth (buf, bufsize, 0);
- }
- else
- width = bufsize;
-
- return width <= remaining_columns ? column + width : INT_MAX;
-}
-
-/* Set *LOC and adjust scanner cursor to account for token TOKEN of
- size SIZE. */
-
-static void
-adjust_location (location *loc, char const *token, size_t size)
-{
- int line = scanner_cursor.line;
- int column = scanner_cursor.column;
- char const *p0 = token;
- char const *p = token;
- char const *lim = token + size;
-
- loc->start = scanner_cursor;
-
- for (p = token; p < lim; p++)
- switch (*p)
- {
- case '\n':
- line += line < INT_MAX;
- column = 1;
- p0 = p + 1;
- break;
-
- case '\t':
- column = add_column_width (column, p0, p - p0);
- column = add_column_width (column, NULL, 8 - ((column - 1) & 7));
- p0 = p + 1;
- break;
-
- default:
- break;
- }
-
- scanner_cursor.line = line;
- scanner_cursor.column = column = add_column_width (column, p0, p - p0);
-
- loc->end = scanner_cursor;
-
- if (line == INT_MAX && loc->start.line != INT_MAX)
- warn_at (*loc, _("line number overflow"));
- if (column == INT_MAX && loc->start.column != INT_MAX)
- warn_at (*loc, _("column number overflow"));
-}
-
-
/* Read bytes from FP into buffer BUF of size SIZE. Return the
number of bytes read. Remove '\r' from input, treating \r\n
and isolated \r as \n. */
@@ -826,173 +674,6 @@
}
-/*------------------------------------------------------------------.
-| TEXT is pointing to a wannabee semantic value (i.e., a `$'). |
-| |
-| Possible inputs: $[<TYPENAME>]($|integer) |
-| |
-| Output to OBSTACK_FOR_STRING a reference to this semantic value. |
-`------------------------------------------------------------------*/
-
-static inline bool
-handle_action_dollar (char *text, location loc)
-{
- const char *type_name = NULL;
- char *cp = text + 1;
-
- if (! current_rule)
- return false;
-
- /* Get the type name if explicit. */
- if (*cp == '<')
- {
- type_name = ++cp;
- while (*cp != '>')
- ++cp;
- *cp = '\0';
- ++cp;
- }
-
- if (*cp == '$')
- {
- if (!type_name)
- type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
- if (!type_name && typed)
- complain_at (loc, _("$$ of `%s' has no declared type"),
- current_rule->sym->tag);
- if (!type_name)
- type_name = "";
- obstack_fgrow1 (&obstack_for_string,
- "]b4_lhs_value([%s])[", type_name);
- current_rule->used = true;
- }
- else
- {
- long int num = strtol (cp, NULL, 10);
-
- if (1 - INT_MAX + rule_length <= num && num <= rule_length)
- {
- int n = num;
- if (max_left_semantic_context < 1 - n)
- max_left_semantic_context = 1 - n;
- if (!type_name && 0 < n)
- type_name = symbol_list_n_type_name_get (current_rule, loc, n);
- if (!type_name && typed)
- complain_at (loc, _("$%d of `%s' has no declared type"),
- n, current_rule->sym->tag);
- if (!type_name)
- type_name = "";
- obstack_fgrow3 (&obstack_for_string,
- "]b4_rhs_value(%d, %d, [%s])[",
- rule_length, n, type_name);
- symbol_list_n_used_set (current_rule, n, true);
- }
- else
- complain_at (loc, _("integer out of range: %s"), quote (text));
- }
-
- return true;
-}
-
-
-/*----------------------------------------------------------------.
-| Map `$?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
-| (are we in an action?). |
-`----------------------------------------------------------------*/
-
-static void
-handle_dollar (int token_type, char *text, location loc)
-{
- switch (token_type)
- {
- case BRACED_CODE:
- if (handle_action_dollar (text, loc))
- return;
- break;
-
- case PERCENT_DESTRUCTOR:
- case PERCENT_INITIAL_ACTION:
- case PERCENT_PRINTER:
- if (text[1] == '$')
- {
- obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
- return;
- }
- break;
-
- default:
- break;
- }
-
- complain_at (loc, _("invalid value: %s"), quote (text));
-}
-
-
-/*------------------------------------------------------.
-| TEXT is a location token (i.e., a address@hidden'). Output to |
-| OBSTACK_FOR_STRING a reference to this location. |
-`------------------------------------------------------*/
-
-static inline bool
-handle_action_at (char *text, location loc)
-{
- char *cp = text + 1;
- locations_flag = true;
-
- if (! current_rule)
- return false;
-
- if (*cp == '$')
- obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
- else
- {
- long int num = strtol (cp, NULL, 10);
-
- if (1 - INT_MAX + rule_length <= num && num <= rule_length)
- {
- int n = num;
- obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
- rule_length, n);
- }
- else
- complain_at (loc, _("integer out of range: %s"), quote (text));
- }
-
- return true;
-}
-
-
-/*----------------------------------------------------------------.
-| Map address@hidden' onto the proper M4 symbol, depending on its TOKEN_TYPE |
-| (are we in an action?). |
-`----------------------------------------------------------------*/
-
-static void
-handle_at (int token_type, char *text, location loc)
-{
- switch (token_type)
- {
- case BRACED_CODE:
- handle_action_at (text, loc);
- return;
-
- case PERCENT_INITIAL_ACTION:
- case PERCENT_DESTRUCTOR:
- case PERCENT_PRINTER:
- if (text[1] == '$')
- {
- obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
- return;
- }
- break;
-
- default:
- break;
- }
-
- complain_at (loc, _("invalid value: %s"), quote (text));
-}
-
/*------------------------------------------------------.
| Scan NUMBER for a base-BASE integer at location LOC. |
@@ -1087,20 +768,8 @@
warn_at (loc, _("line number overflow"));
lineno = INT_MAX;
}
- scanner_cursor.file = current_file = uniqstr_new (file);
- scanner_cursor.line = lineno;
- scanner_cursor.column = 1;
-}
-
-
-/*---------------------------------.
-| Report a rule that is too long. |
-`---------------------------------*/
-
-static void
-rule_length_overflow (location loc)
-{
- fatal_at (loc, _("rule is too long"));
+ current_file = uniqstr_new (file);
+ boundary_set, (&scanner_cursor, current_file, lineno, 1);
}
@@ -1148,7 +817,7 @@
`-------------------------*/
void
-scanner_initialize (void)
+gram_scanner_initialize (void)
{
obstack_init (&obstack_for_string);
}
@@ -1159,7 +828,7 @@
`-----------------------------------------------*/
void
-scanner_free (void)
+gram_scanner_free (void)
{
obstack_free (&obstack_for_string, 0);
/* Reclaim Flex's buffers. */
Index: src/system.h
===================================================================
RCS file: /cvsroot/bison/bison/src/system.h,v
retrieving revision 1.76
diff -u -u -r1.76 system.h
--- src/system.h 22 Jan 2006 07:38:49 -0000 1.76
+++ src/system.h 6 Jun 2006 16:38:55 -0000
@@ -113,6 +113,8 @@
# define ATTRIBUTE_UNUSED __attribute__ ((__unused__))
#endif
+#define FUNCTION_PRINT() fprintf (stderr, "%s: ", __func__)
+
/*------.
| NLS. |
`------*/
Index: tests/input.at
===================================================================
RCS file: /cvsroot/bison/bison/tests/input.at,v
retrieving revision 1.43
diff -u -u -r1.43 input.at
--- tests/input.at 3 Apr 2006 13:50:10 -0000 1.43
+++ tests/input.at 6 Jun 2006 16:38:55 -0000
@@ -25,33 +25,17 @@
## Invalid $n. ##
## ------------ ##
-AT_SETUP([Invalid dollar-n])
+AT_SETUP([Invalid \$n and @n])
AT_DATA([input.y],
[[%%
exp: { $$ = $1 ; };
-]])
-
-AT_CHECK([bison input.y], [1], [],
-[[input.y:2.13-14: integer out of range: `$1'
-]])
-
-AT_CLEANUP
-
-
-## ------------ ##
-## Invalid @n. ##
-## ------------ ##
-
-AT_SETUP([Invalid @n])
-
-AT_DATA([input.y],
-[[%%
exp: { @$ = @1 ; };
]])
AT_CHECK([bison input.y], [1], [],
-[[input.y:2.13-14: integer out of range: address@hidden'
+[[input.y:2.13-14: integer out of range: `$1'
+input.y:3.13-14: integer out of range: address@hidden'
]])
AT_CLEANUP
@@ -200,11 +184,11 @@
AT_DATA([input.y], [])
AT_CHECK([bison input.y], [1], [],
-[[input.y:1.1: syntax error, unexpected end of file
+[[input.y:1.0: syntax error, unexpected end of file
]])
-AT_DATA([input.y],
+AT_DATA([input.y],
[{}
])
AT_CHECK([bison input.y], [1], [],
Index: tests/regression.at
===================================================================
RCS file: /cvsroot/bison/bison/tests/regression.at,v
retrieving revision 1.98
diff -u -u -r1.98 regression.at
--- tests/regression.at 21 May 2006 04:48:47 -0000 1.98
+++ tests/regression.at 6 Jun 2006 16:38:55 -0000
@@ -346,9 +346,7 @@
]])
AT_CHECK([bison input.y], [1], [],
-[[input.y:3.1: missing `{' in "%destructor {...}"
-input.y:4.1: missing `{' in "%initial-action {...}"
-input.y:4.1: syntax error, unexpected %initial-action {...}, expecting string
or identifier
+[[input.y:3.1-15: syntax error, unexpected %initial-action, expecting {...}
]])
AT_CLEANUP
- Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/01
- Re: Extract the action scanner from the grammar scanner,
Akim Demaille <=
- Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Paul Eggert, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Joel E. Denny, 2006/06/07
- Re: [SPAM] Re: Extract the action scanner from the grammar scanner, Akim Demaille, 2006/06/08