m4-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: new changeresyntax builtin


From: Gary V. Vaughan
Subject: Re: RFC: new changeresyntax builtin
Date: Thu, 06 Jul 2006 09:23:52 +0100
User-agent: Thunderbird 1.5.0.4 (Macintosh/20060530)

Hi Eric,

Eric Blake wrote:
> Gary V. Vaughan <gary <at> gnu.org> writes:
> 
>> I'm thinking of removing epatsubst, eregexp and erenamesyms in HEAD, in
>> favour of a more flexible and scalable changeresyntax builtin as an
>> analogue to re_set_syntax in the GNU regex C API.
>>
>> If a bogus operand is given:
>>
>>   changeresyntax(`meh')
>>   => stdin:1: m4: ERROR: unknown argument to built-in `changeresyntax';
>>      use one of: AWK, ED, EGREP, EMACS, GNU_AWK, GREP, POSIX_AWK, 
> POSIX_BASIC, POSIX_EGREP, POSIX_EXTENDED, SED.
> 
> I like it!  It is similar to the -regextype primary recently added to 
> findutils 
> 4.2.24.  And it goes well with the regexprops-generic.texi in gnulib, which 
> documents all the high-level regular expression families in GNU programs.

Nice!  I hadn't spotted that.  We could @include that into our m4.texi docs.

> And what about POSIX_MINIMAL_BASIC?

Oversight on my part.

>> This replaces 3 builtins with one more powerful builtin, an obvious
>> win to my mind   Can anyone see a downside to this change?
> 
> A few issues to be resolved, first.
> 
> One - autoconf documents m4_bpatsubst as mapping to m4's patsubst, with the 
> note that m4_patsubst is reserved for the day that m4 introduces epatsubst.  
> We 
> need to make sure that repeated use of changeresyntax is efficient.  With 
> your 
> proposal, autoconf will have to do something like:
> 
> define(`m4_patsubst', `changeresyntax(`POSIX_EXTENDED')'defn(`patsubst'))
> define(`m4_bpatsubst', `changeresyntax(`EMACS')'defn(`patsubst'))
> 
> (which implies that we will need to fix the mixing of text and builtins in a 
> single definition; or else expand the above example into using helper macros).

See later...

> Two - what about case-insensitive regular expressions?  Again, using 
> findutils 
> as an example, it provides -regex and -iregex as the two primaries affected 
> by -
> regextype.  So we should really have 7 regex builtins in m4:
> patsubst, regex, renamesyms, ipatsubst, iregex, irenamesyms, changeregex.

Good call.  Lets add the 'opt syntax' parameter instead.

> Three - is changeresyntax(`emacs') the same as changeresyntax(`EMACS')?  
> Should 
> we accept unambiguous prefixes, like changeresyntax(`em')?

I think lower-case is okay, though I'm a stick in the mud with making
manifest constants all upper-case, so I think the docs should use the
upper-case form.  No need for prefix support IMHO, if we want to save
the user a few keystrokes we can always accept an emacs m4 minor mode
that will tab-complete.

> Four - what should the default be?  Do we stick with EMACS syntax, for 1.4.x 
> compatibility, or do we go for broke and make the default POSIX_EXTENDED?  
> Whatever we choose, we should probably also have a command-line option to set 
> the default.

Definitely EMACS for backwards compatibility, but a command line option
to start the interpreter with a different syntax is an excellent idea.

> Five - it looks like you already have a patch started.

Not yet, though I'll work one up over the weekend :-)

> Don't forget to add the 
> current resyntax to frozen files, since it should be saved across loads.  And 
> how would this interact if the state in the frozen file and the state 
> requested 
> by the command line differ on reload?

Most command line arguments are processed before the frozen state file
is loaded, but we do make an exception for module loading and symbol
tracing.  I'm inclined to defer actioning the command line setting until
after the frozen state has been reloaded... otherwise there would be no
point in passing the option in the first place.

> Six - is it also worth adding an optional parameter to the existing regex 
> builtins?  I'm thinking along the lines of:
> patsubst(string, regexp, replacement, opt syntax)
> 
> That optional syntax parameter could also serve as the place to request flags 
> like case-insensitive or global vs. first match only (kind of like perl's 
> s///ig).  Then you would only need four regex primitives (changeresyntax, 
> patsubst, regex, and renamesyms), because the optional syntax parameter could 
> double as the place to request case-insensitivity.  For example, autoconf 
> could 
> then do something like:
> 
> define(`m4_bpatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', `EMACS')')
> define(`m4_patsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', 
> `POSIX_EXTENDED')')
> define(`m4_ipatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', 
> `POSIX_EXTENDED,insensitive')')

Yes, that's nice.  In fact, I think processing each argument after the
3rd as an option would allow us to support several of the perl/sed like
modifiers -- but I'll save that for a later patch.

> It still makes sense to provide changeresyntax, even if you add the optional 
> parameter to the other three builtins, so that you don't always have to 
> request 
> which syntax.

ACK.

Cheers,
        Gary.
-- 
Gary V. Vaughan      ())_.  address@hidden,gnu.org}
Research Scientist   ( '/   http://blog.azazil.net
GNU Hacker           / )=   http://trac.azazil.net/projects/libtool
Technical Author   `(_~)_   http://sources.redhat.com/autobook

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]