[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dfa is slower than regex in some cases
From: |
Norihiro Tanaka |
Subject: |
Re: dfa is slower than regex in some cases |
Date: |
Sun, 03 Dec 2017 14:23:50 +0900 |
I attached a test case on this mail.
On Sun, 03 Dec 2017 14:22:23 +0900
Norihiro Tanaka <address@hidden> wrote:
> Hi,
>
> A case that dfa is slower than regex on bug-gawk mailing list is
> discussed 4 months ago.
>
> http://lists.gnu.org/archive/html/bug-gawk/2017-07/msg00002.html
>
> Although gawk is fixed to not use dfa in some cases, it looks like a
> temporary treatment. (11d4ea518166ffbc0c2fe85d090723e8f299486c)
>
> --------
> $ env LC_ALL=C ./gawk --version
> GNU Awk 4.2.0, API: 2.0 (GNU MPFR 3.1.5, GNU MP 4.3.2)
> Copyright (C) 1989, 1991-2017 Free Software Foundation.
>
> This program is free software; you can redistribute it and/or modify
> it under the terms of the GNU General Public License as published by
> the Free Software Foundation; either version 3 of the License, or
> (at your option) any later version.
>
> This program is distributed in the hope that it will be useful,
> but WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> GNU General Public License for more details.
>
> You should have received a copy of the GNU General Public License
> along with this program. If not, see http://www.gnu.org/licenses/.
> $ time -p env LC_ALL=C ./gawk -f test.awk test.inp >/dev/null
> real 1.38
> user 1.17
> sys 0.11
> $ time -p env LC_ALL=C GAWK_NO_DFA=1 ./gawk -f test.awk test.inp >/dev/null
> real 0.51
> user 0.43
> sys 0.07
> --------
>
> This problem seems to be due to dfa.c. In fact, grep is also slower
> than not to use dfa in the same cases.
>
> --------
> $ env LC_ALL=C src/grep --version
> grep (GNU grep) 3.1
> Copyright (C) 2017 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Written by Mike Haertel and others, see
> <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
> $ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
> real 1.06
> user 0.97
> sys 0.08
> --------
>
> # grep does not have control by GREP_NO_DFA environment, so I changed
> # as following and rebuilded, and run the same test as above.
>
> ========
> diff --git a/lib/dfa.c b/lib/dfa.c
> index 2b9c80e..f234b88 100644
> --- a/lib/dfa.c
> +++ b/lib/dfa.c
> @@ -3474,21 +3474,7 @@ dfacomp (char const *s, size_t len, struct dfa *d,
> bool searchflag)
> dfaparse (s, len, d);
> dfassbuild (d);
>
> - if (dfa_supported (d))
> - {
> - dfaoptimize (d);
> - dfaanalyze (d, searchflag);
> - }
> - else
> - {
> - d->dfaexec = dfaexec_noop;
> - }
> -
> - if (d->superset)
> - {
> - d->fast = true;
> - dfaanalyze (d->superset, searchflag);
> - }
> + d->dfaexec = dfaexec_noop;
> }
>
> /* Free the storage held by the components of a dfa. */
> ========
>
> --------
> $ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
> real 0.43
> user 0.37
> sys 0.05
> --------
>
> However, I do not still know why dfa is slower than regex in this case.
>
> Thanks,
> Norihiro
>
--
田中 紀洋 (Norihiro TANAKA)
E-mail : address@hidden
test.inp
Description: Binary data
test.awk
Description: Binary data
test.pat
Description: Binary data