[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] regex: Fix fastmap for multibyte character ranges.
From: |
Paolo Bonzini |
Subject: |
[PATCH] regex: Fix fastmap for multibyte character ranges. |
Date: |
Wed, 25 Nov 2009 11:46:32 +0100 |
This is another bug in computing the fastmap. I had overlooked it when
fixing the fastmap mess, because it usually does not happen with !_LIBC.
However, it is there in that case too.
The bug is that whenever we have a range at the beginning of the regex,
the regex must be tested on any possible multibyte character. The reason
why _LIBC masks it, is that almost always there is a collation symbol for
each possible multibyte-character lead byte, so all the lead bytes are
in general already part of the fastmap.
A simple reproducer is the following sed script:
$ echo 'абвгдеёжзийклмнопрстуфхцчшщъыьэюя' | ./bad-sed -e 's/[а-я]/!/g'
абвгдеёжзийклмнопрстуфхцчшщъыьэюя
$ echo 'абвгдеёжзийклмнопрстуфхцчшщъыьэюя' | ./good-sed -e 's/[а-я]/!/g'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2009-11-25 Paolo Bonzini <address@hidden>
* lib/regcomp.c (re_compute_fastmap_iter): Add all multibyte lead
characters when a multibyte character range is included.
---
ChangeLog | 6 ++++++
lib/regcomp.c | 2 +-
2 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index fcdf307..54c5514 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2009-11-25 Paolo Bonzini <address@hidden>
+
+ regex: Fix fastmap for multibyte character ranges.
+ * lib/regcomp.c (re_compute_fastmap_iter): Add all multibyte lead
+ characters when a multibyte character range is included.
+
2009-11-22 Andy Wingo <address@hidden>
version-etc: work also with AM_INIT_AUTOMAKE's no-define option
diff --git a/lib/regcomp.c b/lib/regcomp.c
index 6472ff6..6aef405 100644
--- a/lib/regcomp.c
+++ b/lib/regcomp.c
@@ -383,7 +383,7 @@ re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t
*init_state,
applies to multibyte character sets; for single byte character
sets, the SIMPLE_BRACKET again suffices. */
if (dfa->mb_cur_max > 1
- && (cset->nchar_classes || cset->non_match
+ && (cset->nchar_classes || cset->non_match || cset->nranges
# ifdef _LIBC
|| cset->nequiv_classes
# endif /* _LIBC */
--
1.6.5.2
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [PATCH] regex: Fix fastmap for multibyte character ranges.,
Paolo Bonzini <=