bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 7/9] dfa: run simple UTF-8 regexps as a single-byte character


From: Jim Meyering
Subject: Re: [PATCH 7/9] dfa: run simple UTF-8 regexps as a single-byte character set
Date: Mon, 15 Mar 2010 11:16:21 +0100

Paolo Bonzini wrote:
> This partially works around https://savannah.gnu.org/bugs/?29117,
> but in general provides a speedup whenever fgrep is "almost" sufficient
> but not quite (e.g. grep ^abc).  Speedup is too good to be true :-)
> (can get to 1000x on some not-too-contrived testcases).
>
> * src/dfa.c (dfaoptimize): New.
> (dfacomp): Call it.
> ---
>  src/dfa.c |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
>
> diff --git a/src/dfa.c b/src/dfa.c
> index 6a658c1..f9f7cd3 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -3000,6 +3000,30 @@ dfainit (struct dfa *d)
>  #endif
>  }
>
> +static void
> +dfaoptimize (struct dfa *d)
> +{
> +  int i;
> +  if (!using_utf8)
> +    return;
> +
> +  for (i = 0; i < d->tindex; ++i)
> +    {
> +      switch(d->tokens[i])
> +     {
> +     case ANYCHAR:
> +       return;
> +     case MBCSET:
> +       return;
> +     default:
> +       break; /* can not happen.  */

That comment is false.
Otherwise, you could replace the entire loop with

  if (d->tindex)
    return

Stylistic: please put the two cases together:

        case ANYCHAR:
        case MBCSET:
          return;

Also stylistic, please declare "i" as an unsigned int.

Hmm... that makes me realize that dfa.tindex should probably be
be declared as an unsigned type too, along with most of the other
members.  But let's not go there just yet ;-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]