[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-16 surrogate pair handling in grep -i option

From: Corinna Vinschen
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Tue, 20 Aug 2013 17:11:22 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Aug 20 16:18, Paolo Bonzini wrote:
> Il 16/08/2013 16:42, Jim Meyering ha scritto:
> > Hi Corina,
> > 
> > Thanks a lot for the patcb.  It is almost perfect.
> > 
> > [ - the git one-line summary should be readable.
> >   - comment nit: s/  as/  as a/
> >   - a style issue: we want curly braces around the 1-line
> >   else body in the first #ifdef block
> >   - please attribute the reporter (or a list URL) in the commit log
> > ]
> > 
> > Do any of the existing tests trigger this malfunction?
> > If not, can you create a small example that triggers the
> > problem on cygwin?  Even better would be the addition of a new
> > script in tests/, which is required for any bug-fix patch.
> > 
> > Also, it'd be great if you would add a NEWS entry that
> > describes your fix.  That said, there's no pressure.
> > If you can tell me how to reproduce the failure, I'll
> > make time to write both the test and NEWS addition, and
> > amend them onto your patch.
> > 
> > PS. Your timing is great. I'm planning to make a release pretty soon.
> Just one thing.  Would the patch be better conditionalized on "if
> (sizeof (wchar_t) == 2)"?

That's what I did when I started to write this patch, but then I decided
against it for the following reason:

The implementation of mbrtowc, wcrtomb and towlower using UTF-16 wchar_t
works *only* in the Cygwin/Newlib-provided functions in exactly the way
used in this patch.  I'm not aware that any other platform provides an
equivalent implementation, even if wchar_t is 2 bytes.  Thus, the
assumption that the code works in all cases in which sizeof (wchar_t) ==
2, is wrong.  It would, for instance, not work with the Windows
implementation of wcrtomb, AFAIK.

I'm not strongly opposed to changing this, but IMHO, to be on the safe
side, this code should only be activated on a case by case basis, so only
for Cygwin for now.  Same with a potential fix to the regex compiler,
for which I have no idea how to do it, yet :(


Corinna Vinschen
Cygwin Maintainer
Red Hat

Attachment: pgpOVFRAFz9n_.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]