[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#26362: tr -cd -- Problem with UTF-8?
From: |
Ronald Schaten |
Subject: |
bug#26362: tr -cd -- Problem with UTF-8? |
Date: |
Tue, 4 Apr 2017 16:01:52 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hey...
I'm not sure if this is bug or if I'm using it wrong. As a matter of
fact, I tested this on several systems, and on BSD-based systems (Mac)
the tr tool gives different results -- the one I expected.
The simplest way to reproduce this looks like this (sorry, umlaut
ahead):
$ echo -ne "\xc3\x82" | tr -cd "ä" | xxd
% 00000000: c3 .
The echo prints a capital A with a circumflex (Â), and I expect the tr
command to delete everything except the small umlaut ä. It looks as if
tr just deletes the second byte.
When I try without the umlaut it gives me the empty result, as expected:
$ echo -ne "\xc3\x82" | tr -cd "a" | xxd
[empty result]
I tested several systems, the oldest is a Debian with coreutils 8.5, the
newest an Ubuntu with coreutils 8.25.
For the moment, I'll try to solve my problem differently, but... is this
a bug? Thanks in advance!
Regards,
Ronald.
--
There is no reason for any individual to have a computer in his home.
(Ken Olsen, DEC)
- bug#26362: tr -cd -- Problem with UTF-8?,
Ronald Schaten <=