bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18777: [PATCH] dfa: improvement for checking of multibyte character


From: arnold
Subject: bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Date: Tue, 21 Oct 2014 00:23:07 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Norihiro Tanaka <address@hidden> wrote:

> Eric Blake <address@hidden> wrote:
> > Is it worth extending your optimization to all five of the
> > POSIX-guaranteed single byte characters?
>
> Thanks, but I don't want to perform it immediately.  DFA has already
> regarded newline as a single byte character, but hasn't others yet.  So,
> we may need to make many changes to handle invalid locales and sequences
> not to conform to the rule.  If we omitted that, It might be that limits
> are added to the locale to be able to apply DFA to.  Threfore, it should
> be performed carefully.

I would think adding a check for '\r' would be safe and would help
too; given that on Windows systems '\r' generally occurs just as
frequently as '\n', it should give a nice speedup for gawk on those
systems.

The other characters that Erik cited seem less like a big issue to me.

Thanks,

Arnold





reply via email to

[Prev in Thread] Current Thread [Next in Thread]