[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-16 surrogate pair handling in grep -i option

From: Corinna Vinschen
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Fri, 16 Aug 2013 21:00:50 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Jim,

first of all, thanks for your review.

On Aug 16 07:42, Jim Meyering wrote:
> Hi Corina,
> Thanks a lot for the patcb.  It is almost perfect.
> [ - the git one-line summary should be readable.
>   - comment nit: s/  as/  as a/
>   - a style issue: we want curly braces around the 1-line
>   else body in the first #ifdef block
>   - please attribute the reporter (or a list URL) in the commit log
> ]

I attached a new patch which hopefully fixes these points.  It also
contains a suggestion for a NEWS entry.

> Do any of the existing tests trigger this malfunction?

Unfortunately not.  The testsuite runs fine except for a few failures
in fmbtest, but they are not related to this problem (...but I didn't
took another look into them, either)

> If not, can you create a small example that triggers the
> problem on cygwin?  Even better would be the addition of a new
> script in tests/, which is required for any bug-fix patch.

I'm not quite sure how the test script as a whole should look like, but
the test content is very simple;  just create input which contains a
valid 4 byte UTF-8 character.  Without the patch, grep SEGVs on Cygwin,
with the patch it runs fine:


    $ printf "\xf0\x90\x90\x85\n" | ./grep -i blurb
    Segmentation fault (core dumped)


    $ printf "\xf0\x90\x90\x85\n" | ./grep -i blurb

However, I just noticed that it's still not possible to grep for such
a character.  Apparently the regex compiler does not recognize UTF-16
surrogate pairs either.  Grep fails, grep -F works:

    $ printf "\xf0\x90\x90\x85\n" | grep '𐐅'
    $ printf "\xf0\x90\x90\x85\n" | grep -F '𐐅'

Not sure if that's such a terrible restriction, though...

> Also, it'd be great if you would add a NEWS entry that
> describes your fix.

My suggestion for a NEWS entry is part of the attached patch.

> That said, there's no pressure.
> If you can tell me how to reproduce the failure, I'll
> make time to write both the test and NEWS addition, and
> amend them onto your patch.
> PS. Your timing is great. I'm planning to make a release pretty soon.

Sounds good :)


Corinna Vinschen
Cygwin Maintainer
Red Hat

Attachment: 0001-fix-Cygwin-UTF-16-surrogate-pair-handling-with-i.patch
Description: Text document

Attachment: pgpx26YKJtBjP.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]