bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 17/17] grep: match multibyte charsets line-by-line when using -i


From: Paolo Bonzini
Subject: [PATCH 17/17] grep: match multibyte charsets line-by-line when using -i
Date: Fri, 12 Mar 2010 18:49:18 +0100

The turtle combination -i + MB_CUR_MAX>1 requires case conversion ahead
of time.  Avoid doing this repeatedly when many matches succeed.  Together
with the previous changes, this fixes https://savannah.gnu.org/bugs/?29117
and https://savannah.gnu.org/bugs/?14472.

* src/grep.c (do_execute): New.
(grepbuf): Use it.
---
 src/grep.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/src/grep.c b/src/grep.c
index f1d341a..1f73c70 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1025,6 +1025,42 @@ prtext (char const *beg, char const *lim, int *nlinesp)
   used = 1;
 }
 
+EXECUTE_RET do_execute EXECUTE_ARGS
+{
+  const char *line_buf, *line_end, *line_next;
+  size_t result = (size_t) -1;
+
+  /* -i is a real turtle with multibyte character sts, so match
+     line-by-line.
+
+     FIXME: this is just an ugly workaround, and it doesn't really
+     belong here.  Also, PCRE is always using this same per-line
+     matching algorithm.  Either we fix -i, or we should refactor
+     this code---for example, we could adding another function pointer
+     to struct matcher to split the buffer passed to execute.  It would
+     perform the memchr if line-by-line matching is necessary, or just
+     returns buf + size otherwise.  */
+  if (MB_CUR_MAX == 1 || !match_icase)
+    return execute(buf, size, match_size, start_ptr);
+
+  for (line_next = buf; result == (size_t)-1 && line_next < buf + size; )
+    {
+      line_buf = line_next;
+      line_end = memchr (line_buf, eolbyte, (buf + size) - line_buf);
+      if (line_end == NULL)
+        line_next = line_end = buf + size;
+      else
+        line_next = line_end + 1;
+
+      if (start_ptr && start_ptr >= line_end)
+        continue;
+
+      result = execute (line_buf, line_next - line_buf, match_size, start_ptr);
+    }
+
+  return result == (size_t)-1 ? result : (line_buf - buf) + result;
+}
+
 /* Scan the specified portion of the buffer, matching lines (or
    between matching lines if OUT_INVERT is true).  Return a count of
    lines printed. */
@@ -1038,8 +1074,8 @@ grepbuf (char const *beg, char const *lim)
 
   nlines = 0;
   p = beg;
-  while ((match_offset = execute(p, lim - p, &match_size,
-                                NULL)) != (size_t) -1)
+  while ((match_offset = do_execute(p, lim - p, &match_size,
+                                   NULL)) != (size_t) -1)
     {
       char const *b = p + match_offset;
       char const *endp = b + match_size;
-- 
1.6.6





reply via email to

[Prev in Thread] Current Thread [Next in Thread]