[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: An m4 puzzlement.

From: Eric Blake
Subject: Re: An m4 puzzlement.
Date: Wed, 14 Dec 2011 08:51:13 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111115 Thunderbird/8.0

On 12/14/2011 03:43 AM, R. Clayton wrote:
>   $ cat t
>   m4_define(_itlvar,
>     `m4_ifelse(m4_regexp($1, ^[[:alpha:]]*$), -1, $1, itl($1))')

Underquoted, although it doesn't affect the particular issue you raised.
 It's _always_ a good idea to quote the first argument to m4_define, so
that if the name is already a macro, you are still defining the intended
macro name rather than a new macro named with the expansion of the
argument.  It's also good to quote intermediate arguments (one level of
quoting for every level of macro calls).  That is, I would have used:

  `m4_ifelse(`m4_regexp(`$1', `^[[:alpha:]]*$')',
    `-1', `$1', `itl($1)')'')

>   m4_define(subsen, `_itlvar($1) \(\sqsubseteq\) _itlvar($2)')

Again, underquoted.  I would have used:
m4_define(`subsen', `_itlvar(`$1') \(\sqsubseteq\) _itlvar(`$2')')

>   subsen(a, b)
>   itl(a) \(\sqsubseteq\) b

Ah, the real problem here can be seen by using: m4 -P -deaq -tm4_regexp

m4trace: -2- m4_regexp(`a', `^[[:alpha:]]*$') -> `0'
m4trace: -2- m4_regexp(`b', `^[[:alpha:]]*$') -> `-1'

In other words, the real problem is that m4 (unfortunately) does not
understand [[:alpha:]] yet, and is instead treating your regex as
exactly one byte from [:[ahlp], followed by any number (including 0) of
a literal ]*; 'a' matches that regex, but 'b' does not.

Fixing this would be a matter of teaching m4 to use the RE_CHAR_CLASSES
flag when compiling regular expressions; unfortunately, doing this risks
breaking backward compatibility, unless it is done carefully.  There is
code in place for the eventual m4 2.0 that allows selecting different
regex flavors (m4 1.4.x defaulted to emacs flavor, which lacks
RE_CHAR_CLASSES, but the goal is to allow the posix flavor which
includes RE_CHAR_CLASSES).  Until then, unfortunately, the best I can
suggest is to instead write your regexp as `^[a-zA-Z]*$'; this has no
loss in generality for m4 1.4.x since it is hard-coded to the C locale
(teaching m4 to honor the user's locale is also on the to-do list).

Eric Blake   address@hidden    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]