bug#16966: [PATCH] grep: optimization with the superset of DFA

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16966: [PATCH] grep: optimization with the superset of DFA

From:	Norihiro Tanaka
Subject:	bug#16966: [PATCH] grep: optimization with the superset of DFA
Date:	Wed, 02 Apr 2014 22:02:01 +0900

Paolo Bonzini wrote:
> Does anything change if there are a few million c's?

The superset of `a ANYCHAR b' is 'a CSET STAR b'.
It's DFA states are following.

s0: The position set is none. 
s1: The position set is 1:a 
s2: The position set is 1:a 2:CSET
s3: The position set is 1:a 2:CSET 3:b (accepted)

        1  accccccccccccccccccccccccccccccccccccccc
           ^ s1

        1  accccccccccccccccccccccccccccccccccccccc
            ^ s2

        1  accccccccccccccccccccccccccccccccccccccc
             ^ s2

              ......

1,000,000  cccccccccccccccccccccccccccccccccccccccb
                                                 ^ s2

1,000,000  cccccccccccccccccccccccccccccccccccccccb
                                                  ^ s3 accepted

Then re-searched with the superset because matched on multi-lines.

1,000,000  cccccccccccccccccccccacccccccccccccccccb
           ^ s0


1,000,000  cccccccccccccccccccccacccccccccccccccccb
                                ^ s1

1,000,000  cccccccccccccccccccccacccccccccccccccccb
                                 ^ s2

1,000,000  cccccccccccccccccccccacccccccccccccccccb
                                                 ^ s2

1,000,000  cccccccccccccccccccccacccccccccccccccccb
                                                  ^ s3 accepted

Then re-searched with the original DFA becuase matched on one-line.
The DFA states are following.

s0: The position set is none. 
s1: The position set is 1:a 
s2: The position set is 1:a 2:ANYCHAR
s3: The position set is 1:a 2:ANYCHAR 3:b (accepted)

1,000,000  cccccccccccccccccccccacccccccccccccccccb
                                ^ s1

1,000,000  cccccccccccccccccccccacccccccccccccccccb
                                 ^ s2

1,000,000  cccccccccccccccccccccacccccccccccccccccb
                                  ^ s0 rejected



Even if matched multi-line, It's still as fast and memory-required as
matched single-line, because in search-mode STAR doesn't count number
of NFA and DFA states up.  So for example, a set of DFA states for
`a CSET STAR b' is correspond with for `a CSET b'. (If my recognition is
right.)

OTOH, If CSET{1,mb_cur_max} is assigned for ANYCHAR, number of DFA
states will be greater than above, and I may be slightly slower
becuase of overheads to build DFA states.  Notice that `{m,n}'
expression requires a lot of memory and is slow (See also
http://www.gnu.org/software/grep/manual/grep.html#Reporting-Bugs).

Example, use the superset for following pattern in EUC-JP locale.
It's mb_cur_max == 3.

    a.....b

If ANYCHAR is replaced to `CSET STAR', as following.

    a CSET STAR CSET STAR CSET STAR CSET STAR CSET STAR CSET STAR b

I draw it below.

        /\
       /  \  2:CSET
       \  /
        \/
       1:a ---> 3:b
(The figure drawn previously was wrong.)

It's equal to following.

    a CSET STAR b

OTOH, the period is replaced to `CSET CSET CSET CAT OR CSET CSET CAT
CSET CAT OR', below.  Itt's complicated, and I don't want to draw it. :)

    a CSET CSET CSET CAT OR CSET CSET CAT CSET CAT OR CSET CSET CSET
    CAT OR CSET CSET CAT CSET CAT OR CAT CSET CSET CSET CAT OR CSET
    CSET CAT CSET CAT OR CAT CSET CSET CSET CAT OR CSET CSET CAT CSET
    CAT OR CAT b

Thanks,
Norihiro

[Prev in Thread]

Current Thread

[Next in Thread]

bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
  - bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/01
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/01
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Paolo Bonzini, 2014/04/02
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka <=
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/02
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/03
    - bug#16966: [PATCH] grep: optimization with the superset of DFA, Norihiro Tanaka, 2014/04/01
- bug#16966: [PATCH] grep: optimization with the superset of DFA, Paul Eggert, 2014/04/07

Prev by Date: bug#17157: [PATCH 1/5] Partially revert "dfa: improve port to freestanding DJGPP"
Next by Date: bug#16966: [PATCH] grep: optimization with the superset of DFA
Previous by thread: bug#16966: [PATCH] grep: optimization with the superset of DFA
Next by thread: bug#16966: [PATCH] grep: optimization with the superset of DFA
Index(es):
- Date
- Thread