From 1b53838ddf91013af970fee3ca19c12535a4ca91 Mon Sep 17 00:00:00 2001 From: James Youngman Date: Tue, 14 Nov 2017 22:21:49 +0000 Subject: [PATCH 2/2] regexprops: don't mention regex dialects we're not going to document. To: address@hidden * lib/regextype.c (get_regex_type_synonym): don't return regex dialect Y as a synonym of dialect X, if we're not in fact going to include X. Accept a CONTEXT parameter in order to identify this situation. This ensures that the bug fixed in commit e2c673cbcdc325a3a2e9dd02169bb4a42c61bc48 stays fixed for any permutation of regex_map. * lib/regextype.h: update prototype of get_regex_type_synonym. * lib/regexprops.c (describe_all): Pass the new context parameter. * doc/regexprops.texi: regenerate this file. --- doc/regexprops.texi | 296 ++++++++++++++++++++++++++-------------------------- lib/regexprops.c | 2 +- lib/regextype.c | 28 +++-- lib/regextype.h | 7 +- 4 files changed, 170 insertions(+), 163 deletions(-) diff --git a/doc/regexprops.texi b/doc/regexprops.texi index 0229460e..94c1e2e7 100644 --- a/doc/regexprops.texi +++ b/doc/regexprops.texi @@ -11,15 +11,15 @@ @menu * findutils-default regular expression syntax:: +* emacs regular expression syntax:: +* gnu-awk regular expression syntax:: +* grep regular expression syntax:: * posix-awk regular expression syntax:: +* awk regular expression syntax:: * posix-basic regular expression syntax:: * posix-egrep regular expression syntax:: -* posix-extended regular expression syntax:: -* awk regular expression syntax:: * egrep regular expression syntax:: -* emacs regular expression syntax:: -* gnu-awk regular expression syntax:: -* grep regular expression syntax:: +* posix-extended regular expression syntax:: @end menu @node findutils-default regular expression syntax @@ -113,11 +113,11 @@ The character @samp{$} only represents the end of a string when it appears: The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. address@hidden posix-awk regular expression syntax address@hidden @samp{posix-awk} regular expression syntax address@hidden emacs regular expression syntax address@hidden @samp{emacs} regular expression syntax -The character @samp{.} matches any single character except the null character. +The character @samp{.} matches any single character except newline. @table @samp @@ -133,57 +133,7 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. - - -GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. - - -Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. - -The alternation operator is @samp{|}. - -The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. - - address@hidden, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: address@hidden - address@hidden At the beginning of a regular expression - address@hidden After an open-group, signified by @samp{(} - address@hidden After the alternation operator @samp{|} - address@hidden enumerate - - -Intervals are specified by @address@hidden and @address@hidden -Invalid intervals are treated as literals, for example @address@hidden is treated as @address@hidden - -The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. - - address@hidden posix-basic regular expression syntax address@hidden @samp{posix-basic} regular expression syntax - - -The character @samp{.} matches any single character except the null character. - - address@hidden @samp - address@hidden \+ -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. address@hidden \? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. address@hidden + and ? -match themselves. - address@hidden table - - -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. GNU extensions are supported: @@ -237,7 +187,7 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate address@hidden, @samp{\+} and @samp{\?} are special at any point in a regular expression except: address@hidden, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @@ -249,15 +199,13 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate -Intervals are specified by @address@hidden and @address@hidden -Invalid intervals such as @address@hidden are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. address@hidden posix-egrep regular expression syntax address@hidden @samp{posix-egrep} regular expression syntax address@hidden gnu-awk regular expression syntax address@hidden @samp{gnu-awk} regular expression syntax The character @samp{.} matches any single character. @@ -276,7 +224,7 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @@ -308,7 +256,16 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. address@hidden, @samp{+} and @samp{?} are special at any point in a regular expression except: address@hidden + address@hidden At the beginning of a regular expression + address@hidden After an open-group, signified by @samp{(} + address@hidden After the alternation operator @samp{|} + address@hidden enumerate Intervals are specified by @address@hidden and @address@hidden @@ -317,23 +274,22 @@ Invalid intervals are treated as literals, for example @address@hidden is treated as The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. address@hidden posix-extended regular expression syntax address@hidden @samp{posix-extended} regular expression syntax address@hidden grep regular expression syntax address@hidden @samp{grep} regular expression syntax -The character @samp{.} matches any single character except the null character. +The character @samp{.} matches any single character. @table @samp address@hidden + -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. address@hidden ? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ -matches a @samp{+} +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? -matches a @samp{?}. +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. address@hidden + and ? +match themselves. + @end table @@ -362,6 +318,86 @@ GNU extensions are supported: @end enumerate +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. + +The alternation operator is @samp{\|}. + +The character @samp{^} only represents the beginning of a string when it appears: address@hidden + address@hidden At the beginning of a regular expression + address@hidden After an open-group, signified by @samp{\(} + + address@hidden After a newline + address@hidden After the alternation operator @samp{\|} + address@hidden enumerate + + +The character @samp{$} only represents the end of a string when it appears: address@hidden + address@hidden At the end of a regular expression + address@hidden Before a close-group, signified by @samp{\)} + address@hidden Before a newline + address@hidden Before the alternation operator @samp{\|} + address@hidden enumerate + + address@hidden, @samp{\+} and @samp{\?} are special at any point in a regular expression except: address@hidden + address@hidden At the beginning of a regular expression + address@hidden After an open-group, signified by @samp{\(} + address@hidden After a newline + address@hidden After the alternation operator @samp{\|} + address@hidden enumerate + + +Intervals are specified by @address@hidden and @address@hidden +Invalid intervals such as @address@hidden are not accepted. + + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + address@hidden posix-awk regular expression syntax address@hidden @samp{posix-awk} regular expression syntax + + +The character @samp{.} matches any single character except the null character. + + address@hidden @samp + address@hidden + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. address@hidden ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. address@hidden \+ +matches a @samp{+} address@hidden \? +matches a @samp{?}. address@hidden table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + + +GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + + Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. @@ -382,8 +418,7 @@ The characters @samp{^} and @samp{$} always represent the beginning and end of a Intervals are specified by @address@hidden and @address@hidden -Invalid intervals such as @address@hidden are not accepted. - +Invalid intervals are treated as literals, for example @address@hidden is treated as @address@hidden The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -438,30 +473,26 @@ The characters @samp{^} and @samp{$} always represent the beginning and end of a The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. address@hidden egrep regular expression syntax address@hidden @samp{egrep} regular expression syntax -This is a synonym for posix-egrep. address@hidden emacs regular expression syntax address@hidden @samp{emacs} regular expression syntax address@hidden posix-basic regular expression syntax address@hidden @samp{posix-basic} regular expression syntax -The character @samp{.} matches any single character except newline. +The character @samp{.} matches any single character except the null character. @table @samp address@hidden + -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. address@hidden ? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ -matches a @samp{+} +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? -matches a @samp{?}. +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. address@hidden + and ? +match themselves. + @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @@ -515,7 +546,7 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate address@hidden, @samp{+} and @samp{?} are special at any point in a regular expression except: address@hidden, @samp{\+} and @samp{\?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @@ -527,13 +558,15 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate +Intervals are specified by @address@hidden and @address@hidden +Invalid intervals such as @address@hidden are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. address@hidden gnu-awk regular expression syntax address@hidden @samp{gnu-awk} regular expression syntax address@hidden posix-egrep regular expression syntax address@hidden @samp{posix-egrep} regular expression syntax The character @samp{.} matches any single character. @@ -552,7 +585,7 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @@ -584,16 +617,7 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. address@hidden, @samp{+} and @samp{?} are special at any point in a regular expression except: address@hidden - address@hidden At the beginning of a regular expression - address@hidden After an open-group, signified by @samp{(} - address@hidden After the alternation operator @samp{|} - address@hidden enumerate +The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. Intervals are specified by @address@hidden and @address@hidden @@ -602,22 +626,26 @@ Invalid intervals are treated as literals, for example @address@hidden is treated as The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. address@hidden grep regular expression syntax address@hidden @samp{grep} regular expression syntax address@hidden egrep regular expression syntax address@hidden @samp{egrep} regular expression syntax +This is a synonym for posix-egrep. address@hidden posix-extended regular expression syntax address@hidden @samp{posix-extended} regular expression syntax -The character @samp{.} matches any single character. +The character @samp{.} matches any single character except the null character. @table @samp address@hidden \+ address@hidden + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. address@hidden \? address@hidden ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. address@hidden + and ? -match themselves. - address@hidden \+ +matches a @samp{+} address@hidden \? +matches a @samp{?}. @end table @@ -646,55 +674,27 @@ GNU extensions are supported: @end enumerate -Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. - -The alternation operator is @samp{\|}. - -The character @samp{^} only represents the beginning of a string when it appears: address@hidden - address@hidden At the beginning of a regular expression - address@hidden After an open-group, signified by @samp{\(} - - address@hidden After a newline - address@hidden After the alternation operator @samp{\|} - address@hidden enumerate - - -The character @samp{$} only represents the end of a string when it appears: address@hidden - address@hidden At the end of a regular expression - address@hidden Before a close-group, signified by @samp{\)} - address@hidden Before a newline +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. address@hidden Before the alternation operator @samp{\|} +The alternation operator is @samp{|}. address@hidden enumerate +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. address@hidden, @samp{\+} and @samp{\?} are special at any point in a regular expression except: address@hidden, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression address@hidden After an open-group, signified by @samp{\(} - address@hidden After a newline address@hidden After an open-group, signified by @samp{(} address@hidden After the alternation operator @samp{\|} address@hidden After the alternation operator @samp{|} @end enumerate -Intervals are specified by @address@hidden and @address@hidden -Invalid intervals such as @address@hidden are not accepted. +Intervals are specified by @address@hidden and @address@hidden +Invalid intervals such as @address@hidden are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. diff --git a/lib/regexprops.c b/lib/regexprops.c index b20b4a38..6794e09f 100644 --- a/lib/regexprops.c +++ b/lib/regexprops.c @@ -558,7 +558,7 @@ describe_all (const char *contextname, if (NULL == next) next = ""; begin_subsection (name, next, previous, up); - parent = get_regex_type_synonym (i); + parent = get_regex_type_synonym (i, context); if (parent >= 0) { content ("This is a synonym for "); diff --git a/lib/regextype.c b/lib/regextype.c index 89416ebd..5b9a1d51 100644 --- a/lib/regextype.c +++ b/lib/regextype.c @@ -56,19 +56,17 @@ struct tagRegexTypeMap struct tagRegexTypeMap regex_map[] = { { "findutils-default", CONTEXT_FINDUTILS, RE_SYNTAX_EMACS|RE_DOT_NEWLINE }, - + { "ed", CONTEXT_GENERIC, RE_SYNTAX_ED }, + { "emacs", CONTEXT_ALL, RE_SYNTAX_EMACS }, + { "gnu-awk", CONTEXT_ALL, RE_SYNTAX_GNU_AWK }, + { "grep", CONTEXT_ALL, RE_SYNTAX_GREP }, { "posix-awk", CONTEXT_ALL, RE_SYNTAX_POSIX_AWK }, + { "awk", CONTEXT_ALL, RE_SYNTAX_AWK }, { "posix-basic", CONTEXT_ALL, RE_SYNTAX_POSIX_BASIC }, { "posix-egrep", CONTEXT_ALL, RE_SYNTAX_POSIX_EGREP }, + { "egrep", CONTEXT_ALL, RE_SYNTAX_EGREP }, { "posix-extended", CONTEXT_ALL, RE_SYNTAX_POSIX_EXTENDED }, { "posix-minimal-basic", CONTEXT_GENERIC, RE_SYNTAX_POSIX_MINIMAL_BASIC }, - - { "awk", CONTEXT_ALL, RE_SYNTAX_AWK }, - { "ed", CONTEXT_GENERIC, RE_SYNTAX_ED }, - { "egrep", CONTEXT_ALL, RE_SYNTAX_EGREP }, - { "emacs", CONTEXT_ALL, RE_SYNTAX_EMACS }, - { "gnu-awk", CONTEXT_ALL, RE_SYNTAX_GNU_AWK }, - { "grep", CONTEXT_ALL, RE_SYNTAX_GREP }, { "sed", CONTEXT_GENERIC, RE_SYNTAX_SED }, /* ,{ "posix-common", CONTEXT_GENERIC, _RE_SYNTAX_POSIX_COMMON } */ }; @@ -140,18 +138,26 @@ unsigned int get_regex_type_context (unsigned int ix) } int -get_regex_type_synonym (unsigned int ix) +get_regex_type_synonym (unsigned int ix, unsigned int context) { unsigned i; int flags; if (ix >= N_REGEX_MAP_ENTRIES) return -1; - flags = regex_map[ix].option_val; + /* Terminate the loop before we get to IX, so that we always + consistently choose the same entry as a synonym (rather than + stating that x and y are synonyms of each other). */ for (i=0u; i