emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#21604: closed (grep doesn't match diacritical char


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#21604: closed (grep doesn't match diacritical chars in ISO-8859 files)
Date: Fri, 02 Oct 2015 20:02:02 +0000

Your message dated Fri, 2 Oct 2015 13:01:04 -0700
with message-id <address@hidden>
and subject line Re: bug#21604: grep doesn't match diacritical chars in 
ISO-8859 files
has caused the debbugs.gnu.org bug report #21604,
regarding grep doesn't match diacritical chars in ISO-8859 files
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
21604: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21604
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: grep doesn't match diacritical chars in ISO-8859 files Date: Fri, 2 Oct 2015 11:43:58 +0200 User-agent: Mutt/1.5.23 (2014-03-12)
Hi,

Moreover http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19230 , several
debian users report that grep doesn't match characters with diacritical
marks in ISO-8859 files, inside a Unicode enviroment:

% file /tmp/q.h 
/tmp/q.h: ISO-8859 text

% grep c /tmp/q.h
Coincidencia en el fichero binario /tmp/q.h

% grep -a c /tmp/q.h
    struct cara* lcaras; //array de caras, habr� que usar reserva dinamica de 
memoria.

% grep á /tmp/q.h 

% grep -a á /tmp/q.h

grep matches the "á" pattern if it's is input from an ISO-8859 file:

% grep -f a q.h 
Coincidencia en el fichero binario q.h

Test files attached

Full report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800670

Regards,

Santiago

-- System Information:
Debian Release: stretch/sid
  APT prefers squeeze-lts
    APT policy: (500, 'squeeze-lts'), (500, 'oldoldstable'), (500, 'unstable'), 
(500, 'testing'), (500, 'oldstable'), (1, 'experimental')
    Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
    Locale: LANG=es_CO.utf8, LC_CTYPE=es_CO.utf8 (charmap=UTF-8)
    Shell: /bin/sh linked to /bin/dash
    Init: sysvinit (via /sbin/init)

    Versions of packages grep depends on:
    ii  dpkg          1.18.1
    ii  install-info  6.0.0.dfsg.1-3
    ii  libc6         2.19-19
    ii  libpcre3      2:8.35-7

Attachment: q.h
Description: Text Data


--- End Message ---
--- Begin Message --- Subject: Re: bug#21604: grep doesn't match diacritical chars in ISO-8859 files Date: Fri, 2 Oct 2015 13:01:04 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
On 10/02/2015 02:43 AM, Santiago Ruano Rincón wrote:
grep doesn't match characters with diacritical
marks in ISO-8859 files, inside a Unicode enviroment

That is normal and expected behavior. In a UTF-8 locale, "á" is represented by the two bytes 0xC3 and 0xA1. In an ISO-8859 file, the same character is represented by the single byte 0xE1. The UTF-8 pattern won't match the ISO-8859 representation.

To avoid this problem, switch to an ISO-8859 locale before using grep to read ISO-8859 text files. This is true for pretty much any standard utility, not just grep. Alternatively, you can translate the text files from ISO-8859 to UTF-8, before giving the resulting text to grep or to other utilities.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]