bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/2] fix fgrep -F in SJIS character sets


From: Paolo Bonzini
Subject: Re: [PATCH 0/2] fix fgrep -F in SJIS character sets
Date: Mon, 29 Mar 2010 10:15:05 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.3

On 03/29/2010 09:43 AM, Jim Meyering wrote:
Paolo Bonzini wrote:
Jim's fix for the fgrep infinite loop would erroneously miss matches
in SJIS character sets.  In this character set low bytes (i.e. ASCII
bytes) are also valid second bytes in a double-byte character, so you
have to continue looking for a match, even if you match in the middle
of a double-byte character.

Good catch!
Thank you.

The attached test will be skipped unless (on a glibc system) you run
something like

   mkdir /usr/lib/locale/ja_JP.SHIFT_JIS
   zcat /usr/share/i18n/charmaps/SHIFT_JIS.gz | \
     localedef \
       -f - \
       -i /usr/share/i18n/locales/ja_JP \
       /usr/lib/locale/ja_JP.SHIFT_JIS

It is telling that when you run those commands,
you see this diagnostic:

   character map `SHIFT_JIS' is not ASCII compatible, locale not ISO C compliant

Not really, that's just because \ is the yen symbol in that locale. It is actually a widely used encoding.

the mixed-sp+TAB indentation above is ugly (yes, we will
fix things asap, once the pace of bug fixes decreases)

Ok.

+# % becomes an half-width katakana in SJIS, and an invalid sequence

s/an/a/

+seq=0

I find s/seq/k/ to be slightly more readable.

+      timeout 10s grep $1 `encode "$3"`>  out$seq 2>&1

Please use $(...), rather than `...` in tests.
init.sh ensures that the shell we are using is capable enough.

Adjusted and pushed.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]