bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22103: [PATCH] grep: improve performance for grep -P in UTF-8


From: Norihiro Tanaka
Subject: bug#22103: [PATCH] grep: improve performance for grep -P in UTF-8
Date: Mon, 07 Dec 2015 08:01:23 +0900

After grep -P found first match, TEXTBIN_UNKNOWN optimizations is not
used.  Therefore, if grep -P found early match, grep -P is very slow in
UTF-8.

  $ time -p grep -P ^1$ <(seq 999999)
  1
  real 14.55
  user 13.77
  sys 1.12

Or grep -Pa is not used TEXTBIN_UNKNOWN optimizations.  Therefere, it is
also very slow in UTF-8.

grep -P ^1$ <(seq 999999)

  $ time -p grep -Pa a <(seq 999999)
  real 14.53
  user 13.65
  sys 1.35

This change makes deference to leave TEXTBIN_UNKNOWN optimizations until
grep -P finds a binary character.

It will bring more than 10x speed up.

  $ time -p src/grep -P ^1$ <(seq 999999)
  1
  real 0.97
  user 0.79
  sys 0.24

  $ time -p src/grep -Pa a <(seq 999999)
  real 0.98
  user 0.23
  sys 0.99

BTW, this change conflicts with proposal in bug#22028.

Attachment: 0001-grep-improve-performance-for-grep-P-in-UTF-8.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]