bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] dfa: optimize UTF-8 period


From: Paolo Bonzini
Subject: Re: [PATCH 3/3] dfa: optimize UTF-8 period
Date: Mon, 19 Apr 2010 14:06:00 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.3

On 04/19/2010 10:07 AM, Jim Meyering wrote:
Paolo Bonzini wrote:
On 04/17/2010 09:27 AM, Jim Meyering wrote:
Paolo Bonzini wrote:
* NEWS: Document improvement.
* src/dfa.c (struct dfa): Add utf8_anychar_classes.
(add_utf8_anychar): New.
(atom): Simplify if/else nesting.  Call add_utf8_anychar for ANYCHAR
in UTF-8 locales.
(dfaoptimize): Abort on ANYCHAR.
---
   NEWS      |    6 ++++++
   src/dfa.c |   46 +++++++++++++++++++++++++++++++++++++++++++---
   2 files changed, 49 insertions(+), 3 deletions(-)

Only quick superficial feedback for now:

I pushed all patches but this.

Thanks!

Would you please add comments describing this one in more detail?
I ran out of time trying to understand how it works.

Something like this?

 /* For UTF-8 expand the period to a series of CSETs that define a valid
    UTF-8 character.  This avoids using the slow multibyte path.  I'm
    pretty sure it would be both profitable and correct to do it for
    any encoding; however, the optimization must be done manually as
    it is done above in add_utf8_anychar.  So, let's start with
    UTF-8: it is the most used, and the structure of the encoding
    makes the correctness more obvious.  */




reply via email to

[Prev in Thread] Current Thread [Next in Thread]