bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] dfa: convert to wide character line-by-line


From: Jim Meyering
Subject: Re: [PATCH 1/2] dfa: convert to wide character line-by-line
Date: Wed, 05 May 2010 09:59:59 +0200

Jim Meyering wrote:
> Paolo Bonzini wrote:
>> This provides a nice speedup for -m in general, but especially
>> it avoids quadratic complexity in case we have to go to glibc.
>>
>> * NEWS: Document change.
>> * src/dfa.c (prepare_wc_buf): Extract out of dfaexec.  Convert
>> only up to the next newline.
>> (dfaexec): Exit multibyte processing loop if past buf_end.
>> Call prepare_wc_buf again after processing a newline.
>
> Nice indeed, but it induces an abort from glibc with some inputs:
> This is on F13:
>
>     $ export LC_ALL=fr_FR.UTF-8
>     $ printf '\xc3\n' > in; src/grep '[é]' in
>     *** buffer overflow detected ***: src/grep terminated
>     ======= Backtrace: =======
>     /lib64/libc.so.6(__fortify_fail+0x37)[0x33920fedb7]
>     /lib64/libc.so.6[0x33920fcd30]
>     /lib64/libc.so.6(__strncpy_chk+0x17b)[0x33920fbfeb]
>     ...
>     zsh: abort (core dumped)  src/grep '[é]' in
>
>> +/* Initialize mblen_buf and inputwcs with data from the next line.  */
>> +
>> +static void
>> +prepare_wc_buf (const char *begin, const char *end)
>> +{
>> +  unsigned char eol = eolbyte;
>> +  size_t remain_bytes, i;
>
> Here's the quick summary:
>
>   "remain_bytes" must not be of type "size_t".
>   Please leave it as "int", like it was in the code you moved.

Perhaps better:
Leave remain_bytes declared as size_t, since that's what mbrtowc
returns, after all.

But then you'll have to adjust the test to explicitly
check for (size_t) -1 and (size_t) -2.

           if (remain_bytes < 1
+              || remain_bytes == (size_t) -1
+              || remain_bytes == (size_t) -2
               || (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))




reply via email to

[Prev in Thread] Current Thread [Next in Thread]