bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15199: UTF-16 surrogate pair handling in grep -i option


From: Corinna Vinschen
Subject: bug#15199: UTF-16 surrogate pair handling in grep -i option
Date: Tue, 27 Aug 2013 18:14:40 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Aug 27 17:53, Paolo Bonzini wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Il 20/08/2013 17:11, Corinna Vinschen ha scritto:
> > That's what I did when I started to write this patch, but then I 
> > decided against it for the following reason:
> > 
> > The implementation of mbrtowc, wcrtomb and towlower using UTF-16 
> > wchar_t works *only* in the Cygwin/Newlib-provided functions in 
> > exactly the way used in this patch.  I'm not aware that any other 
> > platform provides an equivalent implementation, even if wchar_t is 
> > 2 bytes.  Thus, the assumption that the code works in all cases in 
> > which sizeof (wchar_t) == 2, is wrong.  It would, for instance,
> > not work with the Windows implementation of wcrtomb, AFAIK.
> 
> Right, MSVCRT is exactly what I was thinking about.
> 
> > I'm not strongly opposed to changing this, but IMHO, to be on the 
> > safe side, this code should only be activated on a case by case 
> > basis, so only for Cygwin for now.  Same with a potential fix to 
> > the regex compiler, for which I have no idea how to do it, yet :(
> 
> Feel free to bug me on IRC if I can be of any help.

Thanks for the offer!  I'll get back to it probably in November and
I would be glad if you could help me through the gnulib regex code
then.


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

Attachment: pgptjOzPg3Dc_.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]