[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: new changeresyntax builtin

From: Eric Blake
Subject: Re: RFC: new changeresyntax builtin
Date: Wed, 5 Jul 2006 21:53:05 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Gary V. Vaughan <gary <at> gnu.org> writes:

> I'm thinking of removing epatsubst, eregexp and erenamesyms in HEAD, in
> favour of a more flexible and scalable changeresyntax builtin as an
> analogue to re_set_syntax in the GNU regex C API.
> If a bogus operand is given:
>   changeresyntax(`meh')
>   => stdin:1: m4: ERROR: unknown argument to built-in `changeresyntax';
>      use one of: AWK, ED, EGREP, EMACS, GNU_AWK, GREP, POSIX_AWK, 

I like it!  It is similar to the -regextype primary recently added to findutils 
4.2.24.  And it goes well with the regexprops-generic.texi in gnulib, which 
documents all the high-level regular expression families in GNU programs.  And 

> This replaces 3 builtins with one more powerful builtin, an obvious
> win to my mind   Can anyone see a downside to this change?

A few issues to be resolved, first.

One - autoconf documents m4_bpatsubst as mapping to m4's patsubst, with the 
note that m4_patsubst is reserved for the day that m4 introduces epatsubst.  We 
need to make sure that repeated use of changeresyntax is efficient.  With your 
proposal, autoconf will have to do something like:

define(`m4_patsubst', `changeresyntax(`POSIX_EXTENDED')'defn(`patsubst'))
define(`m4_bpatsubst', `changeresyntax(`EMACS')'defn(`patsubst'))

(which implies that we will need to fix the mixing of text and builtins in a 
single definition; or else expand the above example into using helper macros).

Two - what about case-insensitive regular expressions?  Again, using findutils 
as an example, it provides -regex and -iregex as the two primaries affected by -
regextype.  So we should really have 7 regex builtins in m4:
patsubst, regex, renamesyms, ipatsubst, iregex, irenamesyms, changeregex.

Three - is changeresyntax(`emacs') the same as changeresyntax(`EMACS')?  Should 
we accept unambiguous prefixes, like changeresyntax(`em')?

Four - what should the default be?  Do we stick with EMACS syntax, for 1.4.x 
compatibility, or do we go for broke and make the default POSIX_EXTENDED?  
Whatever we choose, we should probably also have a command-line option to set 
the default.

Five - it looks like you already have a patch started.  Don't forget to add the 
current resyntax to frozen files, since it should be saved across loads.  And 
how would this interact if the state in the frozen file and the state requested 
by the command line differ on reload?

Six - is it also worth adding an optional parameter to the existing regex 
builtins?  I'm thinking along the lines of:
patsubst(string, regexp, replacement, opt syntax)

That optional syntax parameter could also serve as the place to request flags 
like case-insensitive or global vs. first match only (kind of like perl's 
s///ig).  Then you would only need four regex primitives (changeresyntax, 
patsubst, regex, and renamesyms), because the optional syntax parameter could 
double as the place to request case-insensitivity.  For example, autoconf could 
then do something like:

define(`m4_bpatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', `EMACS')')
define(`m4_patsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', 
define(`m4_ipatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', 

It still makes sense to provide changeresyntax, even if you add the optional 
parameter to the other three builtins, so that you don't always have to request 
which syntax.

Eric Blake

reply via email to

[Prev in Thread] Current Thread [Next in Thread]