[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 0/4] dfa: [à] optimization and character range fix
From: |
Paolo Bonzini |
Subject: |
[PATCH 0/4] dfa: [à] optimization and character range fix |
Date: |
Tue, 7 Jun 2011 13:03:36 +0200 |
Jim requested to find a testcase for my better case folding patch.
While I could find one, it is currently latent because grep disables
processing of MBCSET elements (2c817e2, dfa: fall back to glibc matcher
if a MBCSET is found, 2010-04-29) and the fallback matcher obviously
does not have the bug.
So, the way to show the bug is to optimize MBCSETs a bit and avoid
the fallback. :) This is the often requested change of turning
"[àèì0-9]" into, say "à|è|ì|[0-9]", and is implemented by the last
patch in the series.
The improvement is only in the constant factor, and on a normal machine
the run-time is decent (but very suboptimal!) even without the
optimization. Since timeout doesn't accept fractional seconds, a robust
testcase for the optimization cannot be added, anyway here is an example:
$ yes a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0aà | \
sed 100000q > input
$ time ./grep 'a[à]' input > /dev/null
real 0m0.079s
$ time grep 'a[à]' input > /dev/null
real 0m0.901s
The optimization itself is tested by char-class-multibyte.
Paolo Bonzini (4):
tests: exercise latent bug in character ranges
dfa: correct handling of single-byte character ranges
dfa: refactor to prepare for upcoming optimizations
dfa: optimize wide characters in a bracket expression
src/dfa.c | 163 ++++++++++++++++++++++++++++++++++-------------------
tests/Makefile.am | 1 +
tests/bogus-wctob | 17 ++++++
3 files changed, 123 insertions(+), 58 deletions(-)
create mode 100644 tests/bogus-wctob
--
1.7.4.4
- [PATCH 0/4] dfa: [à] optimization and character range fix,
Paolo Bonzini <=