bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-16 surrogate pair handling in grep -i option


From: Jim Meyering
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Sun, 18 Aug 2013 14:26:50 -0700

Thank you for adjusting that.  I've added a test script and tweaked a
few minor things.

I added curly braces around another 1-line else statement, adjusted the
commit log to use the explanation from NEWS and added our usual
ChangeLog-style entries.

In NEWS, I've added the statement that the bug was introduced in 2.6,
but that is only a guess.  Can you tell me if that is true?  If not,
for which releases does grep segfault on cygwin?  Since you are still
listed as the "Author" of this entire patch, please review my changes
carefully.  This test passes for me, but I haven't tried it on cygwin,
and from what you say, part of it will fail there.

Note that I've removed your Patch by: line. The Author: field of the git
commit identifies the person who wrote the patch (you), so including
"Patch by" would be redundant.  Signed-off-by is omitted for the same
reason.

Does using the en_US.UTF-8 locale on cygwin provoke the failure when
run against an unpatched grep?

On Fri, Aug 16, 2013 at 12:00 PM, Corinna Vinschen <address@hidden> wrote:
> Hi Jim,
>
> first of all, thanks for your review.
>
> On Aug 16 07:42, Jim Meyering wrote:
>> Hi Corina,
>>
>> Thanks a lot for the patcb.  It is almost perfect.
>>
>> [ - the git one-line summary should be readable.
>>   - comment nit: s/  as/  as a/
>>   - a style issue: we want curly braces around the 1-line
>>   else body in the first #ifdef block
>>   - please attribute the reporter (or a list URL) in the commit log
>> ]
>
> I attached a new patch which hopefully fixes these points.  It also
> contains a suggestion for a NEWS entry.
>
>> Do any of the existing tests trigger this malfunction?
>
> Unfortunately not.  The testsuite runs fine except for a few failures
> in fmbtest, but they are not related to this problem (...but I didn't
> took another look into them, either)
>
>> If not, can you create a small example that triggers the
>> problem on cygwin?  Even better would be the addition of a new
>> script in tests/, which is required for any bug-fix patch.
>
> I'm not quite sure how the test script as a whole should look like, but
> the test content is very simple;  just create input which contains a
> valid 4 byte UTF-8 character.  Without the patch, grep SEGVs on Cygwin,
> with the patch it runs fine:
>
>   Before:
>
>     $ printf "\xf0\x90\x90\x85\n" | ./grep -i blurb
>     Segmentation fault (core dumped)
>     $
>
>   After:
>
>     $ printf "\xf0\x90\x90\x85\n" | ./grep -i blurb
>     $
>
> However, I just noticed that it's still not possible to grep for such
> a character.  Apparently the regex compiler does not recognize UTF-16
> surrogate pairs either.  Grep fails, grep -F works:
>
>     $ printf "\xf0\x90\x90\x85\n" | grep '𐐅'
>     $ printf "\xf0\x90\x90\x85\n" | grep -F '𐐅'
>     𐐅
>
> Not sure if that's such a terrible restriction, though...
>
>> Also, it'd be great if you would add a NEWS entry that
>> describes your fix.
>
> My suggestion for a NEWS entry is part of the attached patch.
>
>> That said, there's no pressure.
>> If you can tell me how to reproduce the failure, I'll
>> make time to write both the test and NEWS addition, and
>> amend them onto your patch.
>>
>> PS. Your timing is great. I'm planning to make a release pretty soon.
>
> Sounds good :)
>
>
> Thanks,
> Corinna
>
> --
> Corinna Vinschen
> Cygwin Maintainer
> Red Hat

Attachment: cygwin-segfault.diff.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]