[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16966: [PATCH] grep: optimization with the superset of DFA
From: |
Norihiro Tanaka |
Subject: |
bug#16966: [PATCH] grep: optimization with the superset of DFA |
Date: |
Wed, 02 Apr 2014 22:02:01 +0900 |
Paolo Bonzini wrote:
> Does anything change if there are a few million c's?
The superset of `a ANYCHAR b' is 'a CSET STAR b'.
It's DFA states are following.
s0: The position set is none.
s1: The position set is 1:a
s2: The position set is 1:a 2:CSET
s3: The position set is 1:a 2:CSET 3:b (accepted)
1 accccccccccccccccccccccccccccccccccccccc
^ s1
1 accccccccccccccccccccccccccccccccccccccc
^ s2
1 accccccccccccccccccccccccccccccccccccccc
^ s2
......
1,000,000 cccccccccccccccccccccccccccccccccccccccb
^ s2
1,000,000 cccccccccccccccccccccccccccccccccccccccb
^ s3 accepted
Then re-searched with the superset because matched on multi-lines.
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s0
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s1
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s2
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s2
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s3 accepted
Then re-searched with the original DFA becuase matched on one-line.
The DFA states are following.
s0: The position set is none.
s1: The position set is 1:a
s2: The position set is 1:a 2:ANYCHAR
s3: The position set is 1:a 2:ANYCHAR 3:b (accepted)
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s1
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s2
1,000,000 cccccccccccccccccccccacccccccccccccccccb
^ s0 rejected
Even if matched multi-line, It's still as fast and memory-required as
matched single-line, because in search-mode STAR doesn't count number
of NFA and DFA states up. So for example, a set of DFA states for
`a CSET STAR b' is correspond with for `a CSET b'. (If my recognition is
right.)
OTOH, If CSET{1,mb_cur_max} is assigned for ANYCHAR, number of DFA
states will be greater than above, and I may be slightly slower
becuase of overheads to build DFA states. Notice that `{m,n}'
expression requires a lot of memory and is slow (See also
http://www.gnu.org/software/grep/manual/grep.html#Reporting-Bugs).
Example, use the superset for following pattern in EUC-JP locale.
It's mb_cur_max == 3.
a.....b
If ANYCHAR is replaced to `CSET STAR', as following.
a CSET STAR CSET STAR CSET STAR CSET STAR CSET STAR CSET STAR b
I draw it below.
/\
/ \ 2:CSET
\ /
\/
1:a ---> 3:b
(The figure drawn previously was wrong.)
It's equal to following.
a CSET STAR b
OTOH, the period is replaced to `CSET CSET CSET CAT OR CSET CSET CAT
CSET CAT OR', below. Itt's complicated, and I don't want to draw it. :)
a CSET CSET CSET CAT OR CSET CSET CAT CSET CAT OR CSET CSET CSET
CAT OR CSET CSET CAT CSET CAT OR CAT CSET CSET CSET CAT OR CSET
CSET CAT CSET CAT OR CAT CSET CSET CSET CAT OR CSET CSET CAT CSET
CAT OR CAT b
Thanks,
Norihiro
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/02
- bug#16966: [PATCH] grep: optimization with the superset of DFA,
Norihiro Tanaka <=
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/02
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/03
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01