[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 1/2] pcresearch: set UTF-8 flag correctly for UTF-8 locales
From: |
Paolo Bonzini |
Subject: |
[PATCH 1/2] pcresearch: set UTF-8 flag correctly for UTF-8 locales |
Date: |
Wed, 3 Oct 2012 11:20:29 +0200 |
From: Petr Pisar <address@hidden>
Otherwise, Unicode properties (\p{XXX}) do not work with characters
outside the 7-bit ASCII character set.
* src/pcresearch.c (Pcompile): Look for UTF-8 locales and set PCRE_UTF8
if one is found.
---
NEWS | 6 ++++++
src/pcresearch.c | 8 ++++++++
2 file modificati, 14 inserzioni(+)
diff --git a/NEWS b/NEWS
index 9309f62..bc669b9 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU grep NEWS -*- outline
-*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Bug fixes
+
+ While multi-byte mode is only supported by PCRE with UTF-8 locales,
+ grep did not activate it. This can cause failures to match multibyte
+ characters against some regular expressions, especially those including
+ the '.' or '\p' metacharacters.
* Noteworthy changes in release 2.14 (2012-08-20) [stable]
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 2994e65..3539b58 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -25,6 +25,9 @@
#elif HAVE_PCRE_PCRE_H
# include <pcre/pcre.h>
#endif
+#if HAVE_LANGINFO_CODESET
+# include <langinfo.h>
+#endif
#if HAVE_LIBPCRE
/* Compiled internal form of a Perl regular expression. */
@@ -51,6 +54,11 @@ Pcompile (char const *pattern, size_t size)
char const *p;
char const *pnul;
+#if defined HAVE_LANGINFO_CODESET
+ if (!strcmp(nl_langinfo(CODESET), "UTF-8"))
+ flags |= PCRE_UTF8;
+#endif
+
/* FIXME: Remove these restrictions. */
if (memchr(pattern, '\n', size))
error (EXIT_TROUBLE, 0, _("the -P option only supports a single pattern"));
--
1.7.12.1