bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 0/4] dfa: [à] optimization and character range fix


From: Paolo Bonzini
Subject: [PATCH 0/4] dfa: [à] optimization and character range fix
Date: Tue, 7 Jun 2011 13:03:36 +0200

Jim requested to find a testcase for my better case folding patch.
While I could find one, it is currently latent because grep disables
processing of MBCSET elements (2c817e2, dfa: fall back to glibc matcher
if a MBCSET is found, 2010-04-29) and the fallback matcher obviously
does not have the bug.

So, the way to show the bug is to optimize MBCSETs a bit and avoid
the fallback. :)  This is the often requested change of turning
"[àèì0-9]" into, say "à|è|ì|[0-9]", and is implemented by the last
patch in the series.

The improvement is only in the constant factor, and on a normal machine
the run-time is decent (but very suboptimal!) even without the
optimization.  Since timeout doesn't accept fractional seconds, a robust
testcase for the optimization cannot be added, anyway here is an example:

   $ yes a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0aà | \
     sed 100000q > input
   $ time ./grep 'a[à]'  input > /dev/null 
   real    0m0.079s
   $ time grep 'a[à]'  input > /dev/null 
   real    0m0.901s

The optimization itself is tested by char-class-multibyte.

Paolo Bonzini (4):
  tests: exercise latent bug in character ranges
  dfa: correct handling of single-byte character ranges
  dfa: refactor to prepare for upcoming optimizations
  dfa: optimize wide characters in a bracket expression

 src/dfa.c         |  163 ++++++++++++++++++++++++++++++++++-------------------
 tests/Makefile.am |    1 +
 tests/bogus-wctob |   17 ++++++
 3 files changed, 123 insertions(+), 58 deletions(-)
 create mode 100644 tests/bogus-wctob

-- 
1.7.4.4




reply via email to

[Prev in Thread] Current Thread [Next in Thread]