bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-16 surrogate pair handling in grep -i option


From: Corinna Vinschen
Subject: UTF-16 surrogate pair handling in grep -i option
Date: Wed, 14 Aug 2013 18:32:42 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

two days ago we got a report on the Cygwin mailing list that under some
circumstances grep on Cygwin SEGVed.  I tracked this down to grep's -i
option, which calls the function mbtolower.  This function works fine on
systems with UCS-2 or UCS-4 wchar_t's, but it doesn't handle UTF-16
surrogates on UTF-16 wchar_t systems.  It does especially not handle
the case where wcrtomb returns 0, which is what causes the SEGV.

The below patch fixes this at least for Cygwin.  Actually, I don't
know any other OS which uses UTF-16 and provides this set of functions,
so I assume that this solution is very system-specific, unless another
Newlib based OS uses UTF-16 wchar_t as well.

I hope the patch is ok to go into mainline.  I added a lot of comment
to explain what happens.  Feel free to ask any question.

Please keep me CCed, I'm not subscribed to bugs-grep.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

Attachment: 0001-src-searchutils.c-mbtolower-Handle-UTF-16-surrogate-.patch
Description: Text document

Attachment: pgpyDP7pbA7AR.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]