bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21604: grep doesn't match diacritical chars in ISO-8859 files


From: Paul Eggert
Subject: bug#21604: grep doesn't match diacritical chars in ISO-8859 files
Date: Fri, 2 Oct 2015 13:01:04 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 10/02/2015 02:43 AM, Santiago Ruano Rincón wrote:
grep doesn't match characters with diacritical
marks in ISO-8859 files, inside a Unicode enviroment

That is normal and expected behavior. In a UTF-8 locale, "á" is represented by the two bytes 0xC3 and 0xA1. In an ISO-8859 file, the same character is represented by the single byte 0xE1. The UTF-8 pattern won't match the ISO-8859 representation.

To avoid this problem, switch to an ISO-8859 locale before using grep to read ISO-8859 text files. This is true for pretty much any standard utility, not just grep. Alternatively, you can translate the text files from ISO-8859 to UTF-8, before giving the resulting text to grep or to other utilities.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]