[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#10880: instead of characters, tr works on bytes
From: |
Chris Jones |
Subject: |
bug#10880: instead of characters, tr works on bytes |
Date: |
Mon, 27 Feb 2012 00:44:56 -0500 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote:
[..]
> > $ set | grep ^L
> > LANG=hu_HU.UTF-8
> > LC_ALL=hu_HU.UTF-8
> > LINES=73
> > LOGNAME=kadar1marto518
> >
> > Now let's see the bytestream for the following string
> > (which means flood in Hungarian):
> >
> > $ echo árvíz | od -c
> > 0000000 303 241 r v 303 255 z \n
> > 0000010
> >
> > Let us try to delete a character and see if it worked:
> >
> > $ echo árvíz | tr -d á | od -c
> > 0000000 r v 255 z \n
> > 0000005
[..]
Try this for size...
$ echo árvíz | od -t x1z -w16
$ echo árvíz | tr -d é | od -t x1z -w16
$ echo árvíz | tr -d é > /tmp/u.txt
$ isutf8 /tmp/u.txt
And there is not even an ‘é’ in ‘árvíz’..
CJ
P.S. Though you do have to look for it a bit, the coreutils manual
clearly states that only single-byte encodings are supported:
http://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.html
--
Mooo Canada!!!!