[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep branch, master, updated. v2.22-20-gd1160ec
From: |
Paul Eggert |
Subject: |
grep branch, master, updated. v2.22-20-gd1160ec |
Date: |
Fri, 08 Jan 2016 05:30:34 +0000 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".
The branch, master has been updated
via d1160ec6d239b2e0f20c2fb3395e3b70963bf916 (commit)
from 5cb49d2f375f0606ac9d916af6024d4b92ba0786 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=d1160ec6d239b2e0f20c2fb3395e3b70963bf916
commit d1160ec6d239b2e0f20c2fb3395e3b70963bf916
Author: Paul Eggert <address@hidden>
Date: Thu Jan 7 21:28:23 2016 -0800
grep: improve unibyte -P performance
This is a followon to the recent changes prompted by Bug#20526.
In <http://bugs.gnu.org/bug=20526#86> Norihiro Tanaka pointed out
that grep mistakenly assumed that unibyte locales cannot have
encoding errors. Here, the mistake hurt performance significantly.
On Fedora 23 x86-64 in the C locale, this patch improved grep's
performance by a factor of 7 when run as "grep -P 'z.*a'" on the
output of "yes $(printf '\200\n') | head -n 1000000000".
* src/pcresearch.c (multibyte_locale) [HAVE_LIBPCRE]: New static var.
(Pcompile): Set it.
(Pexecute): Use it to avoid the need to call
buf_has_encoding_errors in unibyte locales.
diff --git a/src/pcresearch.c b/src/pcresearch.c
index c0b8678..1fae94d 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -84,6 +84,8 @@ jit_exec (char const *subject, int search_bytes, int
search_offset,
/* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
string matches when that flag is used. */
static int empty_match[2];
+
+static bool multibyte_locale;
#endif
void
@@ -104,10 +106,14 @@ Pcompile (char const *pattern, size_t size)
char const *p;
char const *pnul;
- if (using_utf8 ())
- flags |= PCRE_UTF8;
- else if (MB_CUR_MAX != 1)
- error (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
+ if (1 < MB_CUR_MAX)
+ {
+ if (! using_utf8 ())
+ error (EXIT_TROUBLE, 0,
+ _("-P supports only unibyte and UTF-8 locales"));
+ multibyte_locale = true;
+ flags |= PCRE_UTF8;
+ }
/* FIXME: Remove these restrictions. */
if (memchr (pattern, '\n', size))
@@ -194,12 +200,16 @@ Pexecute (char *buf, size_t size, size_t *match_size,
error. */
char const *subject = buf;
- /* If the input is free of encoding errors a multiline search is
+ /* If the input is unibyte or is free of encoding errors a multiline search
is
typically more efficient. Otherwise, a single-line search is
typically faster, so that pcre_exec doesn't waste time validating
the entire input buffer. */
- bool multiline = ! buf_has_encoding_errors (buf, size - 1);
- buf[size - 1] = eolbyte;
+ bool multiline = true;
+ if (multibyte_locale)
+ {
+ multiline = ! buf_has_encoding_errors (buf, size - 1);
+ buf[size - 1] = eolbyte;
+ }
for (; p < buf + size; p = line_start = line_end + 1)
{
-----------------------------------------------------------------------
Summary of changes:
src/pcresearch.c | 24 +++++++++++++++++-------
1 files changed, 17 insertions(+), 7 deletions(-)
hooks/post-receive
--
grep
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- grep branch, master, updated. v2.22-20-gd1160ec,
Paul Eggert <=