[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep dfa bug
From: |
KIMURA Koichi |
Subject: |
grep dfa bug |
Date: |
Mon, 01 Aug 2005 09:12:03 +0900 |
Hi,
I think I found bug of dfa of gawk.
Situation:
In Japanese ShiftJIS locale, half-witdth katakana in character class
does not match appropriately.
Reproduce:
set LANG=ja_JP.SJIS
export LANG
echo ABCDE | grep '/[A-E]\+/p'
Actually, A B C D E is half-width katakana character.
(data to reprodcue is appended at end of this mail (uuencoded SJIS data))
Result:
nothig printed.
I guess patch below solve this problem, but I'm not confident
that influence doesn't go out to other environments.
regards,
--- dfa.c.2~ 2005-03-22 14:43:10.000000000 +0900
+++ dfa.c 2005-07-31 22:21:27.000000000 +0900
@@ -2825,7 +2825,8 @@ dfaexec (struct dfa *d, char const *begi
remain_bytes
= mbrtowc(inputwcs + i, begin + i,
end - (unsigned char const *)begin - i + 1, &mbs);
- if (remain_bytes <= 1)
+ if (remain_bytes < 1
+ || (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))
{
remain_bytes = 0;
inputwcs[i] = (wchar_t)begin[i];
begin 644 testkana.sh
M<V5T($Q!3D<]:F%?2E`N4TI)4PIE>'!O<address@hidden;F]T('!R:6YT"F5C!
<:&address@hidden;address@hidden"!G<F5P("<O6[$MM5U<*R\G"@``(
``
end
size 73
--
KIMRUA Koichi
- grep dfa bug,
KIMURA Koichi <=