m4-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

m4 regex usage [was: Multi-Line Definitions]


From: Eric Blake
Subject: m4 regex usage [was: Multi-Line Definitions]
Date: Sat, 29 Sep 2007 18:01:20 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070728 Thunderbird/2.0.0.6 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[adding m4-patches; this branch of the thread can drop other lists]

According to Eric Blake on 9/29/2007 1:31 PM:
> Here's something a bit more telling.  With the attached patch, and in the
> coreutils directory,
> 
> $ M4_TRACE_FILE=~/m4.trace M4=~/m4/src/m4 autoconf

I tweaked my tracer patch a bit to distinguish between patsubst and regexp.

$ sort <m4.trace | uniq -c |sort -n -k1,1 |tail -n 15
...
   1214 p:
   1596
...

So half of the empty lines in my trace actually did a multi-line regex.
But 1214 of them did a patsubst(string, [], []), and m4 wasted time
compiling the empty regex every one of those times!  Applying this to m4,
to add some benefit to autoconf < 2.62 vs. m4 > 1.4.10 (watch for a
followup to autoconf that avoids the empty regex to begin with).

2007-09-29  Eric Blake  <address@hidden>

        Optimize for Autoconf usage pattern.
        * src/builtin.c (m4_regexp, m4_patsubst): Handle empty regex
        faster.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG/udP84KuGfSFAYARAmHzAJwJO8+zwXssS/qlIEfotONpp/epRgCfQgQ3
Rjq/NWvO4ha9S+o3gpv9gdg=
=BSOA
-----END PGP SIGNATURE-----
>From aa46ced67010190918295b965f5e2879dcd9a30c Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 29 Sep 2007 17:48:29 -0600
Subject: [PATCH] Optimize for Autoconf usage pattern.

* src/builtin.c (m4_regexp, m4_patsubst): Handle empty regex
faster.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog     |    6 ++++++
 src/builtin.c |   36 +++++++++++++++++++++++++++---------
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 0cea5b5..f29b557 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2007-09-29  Eric Blake  <address@hidden>
+
+       Optimize for Autoconf usage pattern.
+       * src/builtin.c (m4_regexp, m4_patsubst): Handle empty regex
+       faster.
+
 2007-09-24  Eric Blake  <address@hidden>
 
        Create .gitignore alongside .cvsignore.
diff --git a/src/builtin.c b/src/builtin.c
index dee2276..65f4585 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -1968,8 +1968,19 @@ m4_regexp (struct obstack *obs, int argc, token_data 
**argv)
       return;
     }
 
-  victim = TOKEN_DATA_TEXT (argv[1]);
-  regexp = TOKEN_DATA_TEXT (argv[2]);
+  victim = ARG (1);
+  regexp = ARG (2);
+  repl = ARG (3);
+
+  if (!*regexp)
+    {
+      /* The empty regex matches everything!  */
+      if (argc == 3)
+       shipout_int (obs, 0);
+      else
+       obstack_grow (obs, repl, strlen (repl));
+      return;
+    }
 
   init_pattern_buffer (&buf, &regs);
   msg = re_compile_pattern (regexp, strlen (regexp), &buf);
@@ -1993,10 +2004,7 @@ m4_regexp (struct obstack *obs, int argc, token_data 
**argv)
   else if (argc == 3)
     shipout_int (obs, startpos);
   else if (startpos >= 0)
-    {
-      repl = TOKEN_DATA_TEXT (argv[3]);
-      substitute (obs, victim, repl, &regs);
-    }
+    substitute (obs, victim, repl, &regs);
 
   free_pattern_buffer (&buf, &regs);
 }
@@ -2013,6 +2021,7 @@ m4_patsubst (struct obstack *obs, int argc, token_data 
**argv)
 {
   const char *victim;          /* first argument */
   const char *regexp;          /* regular expression */
+  const char *repl;
 
   struct re_pattern_buffer buf;        /* compiled regular expression */
   struct re_registers regs;    /* for subexpression matches */
@@ -2029,7 +2038,17 @@ m4_patsubst (struct obstack *obs, int argc, token_data 
**argv)
       return;
     }
 
-  regexp = TOKEN_DATA_TEXT (argv[2]);
+  victim = ARG (1);
+  regexp = ARG (2);
+  repl = ARG (3);
+
+  /* The empty regex matches everywhere, but if there is no
+     replacement, we need not waste time with it.  */
+  if (!*regexp && !*repl)
+    {
+      obstack_grow (obs, victim, strlen (victim));
+      return;
+    }
 
   init_pattern_buffer (&buf, &regs);
   msg = re_compile_pattern (regexp, strlen (regexp), &buf);
@@ -2042,7 +2061,6 @@ m4_patsubst (struct obstack *obs, int argc, token_data 
**argv)
       return;
     }
 
-  victim = TOKEN_DATA_TEXT (argv[1]);
   length = strlen (victim);
 
   offset = 0;
@@ -2073,7 +2091,7 @@ m4_patsubst (struct obstack *obs, int argc, token_data 
**argv)
 
       /* Handle the part of the string that was covered by the match.  */
 
-      substitute (obs, victim, ARG (3), &regs);
+      substitute (obs, victim, repl, &regs);
 
       /* Update the offset to the end of the match.  If the regexp
         matched a null string, advance offset one more, to avoid
-- 
1.5.3.2


reply via email to

[Prev in Thread] Current Thread [Next in Thread]