bug#12192: tr - bytes vs characters

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12192: tr - bytes vs characters

From:	Michael Stummvoll
Subject:	bug#12192: tr - bytes vs characters
Date:	Mon, 13 Aug 2012 14:52:22 +0200

Hi gnu folks,

as already known, tr cannot handle multibyte-encodings like utf-8:

> address@hidden:~$ echo "foo" | tr o ö
> fÃÃ

i know, that multibyte encoding support is not needed for
posix-compilance, BUT:

the manpage of tr says the following: 

> Translate, squeeze, and/or delete characters from standard input,
> writing to standard output.

and thats the inconsistence imho.

The typical interpretation of "character" in such a context means one
character on display. regardless which encoding is used or how many
bytes are used to display this. So, if tr realy translates "characters"
it should preserve the encoding. If it doesn't do, it does not
translate "characters" but "bytes". So there I see two ways:

- add multybyte-encoding support to tr
or
- change the manpage and helptext to not say "characters" but "bytes"

since it doesn't seem that somebody want to add the support to tr, an
update of the manpage would be the easier way to ensure the consistence.

Kind regards,
Michael

[Prev in Thread]

Current Thread

[Next in Thread]

bug#12192: tr - bytes vs characters, Michael Stummvoll <=
- bug#12192: tr - bytes vs characters, Eric Blake, 2012/08/13
  - bug#12192: tr - bytes vs characters, Paul Eggert, 2012/08/13
    - bug#12192: tr - bytes vs characters, Eric Blake, 2012/08/14
    - bug#12192: tr - bytes vs characters, Paul Eggert, 2012/08/14
    - bug#12192: tr - bytes vs characters, Michael Stummvoll, 2012/08/17

Prev by Date: bug#11150: getlogin test failure
Next by Date: bug#12192: tr - bytes vs characters
Previous by thread: bug#11150: getlogin test failure
Next by thread: bug#12192: tr - bytes vs characters
Index(es):
- Date
- Thread