bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10880: instead of characters, tr works on bytes


From: Marton Kadar
Subject: bug#10880: instead of characters, tr works on bytes
Date: Fri, 24 Feb 2012 09:29:12 -0500

Don't know which is the official way to report a bug in 'tr'
so I will copy to this list too. CC me on replies as I am not
subscribing.

> ----- Original Message -----
> From: Marton Kadar
> Sent: 02/24/12 03:18 PM
> To: address@hidden
> Subject: Example
> 
> Environment for Hungary where á and í are proper lowercase letters
> but for example Spanish has these letters too:
> 
> $ set | grep ^L
> LANG=hu_HU.UTF-8
> LC_ALL=hu_HU.UTF-8
> LINES=73
> LOGNAME=kadar1marto518
> 
> Now let's see the bytestream for the following string
> (which means flood in Hungarian):
> 
> $ echo árvíz | od -c
> 0000000 303 241   r   v 303 255   z  \n
> 0000010
> 
> Let us try to delete a character and see if it worked:
> 
> $ echo árvíz | tr -d á | od -c
> 0000000   r   v 255   z  \n
> 0000005
> 
> Correct expected behavior would rather be:
> 
> $ echo árvíz | tr -d á | od -c
> 0000000   r   v 303 255   z  \n
> 0000006
> 
> I'll check the source for tr myself although never coded in C.
> This should be a trivial fix. The problem is especially annoying
> as we currently have no real simple and good general purpose case
> conversion tool. (correct me if I'm wrong, but tr should be this
> tool).
> 
> Marton Kadar






reply via email to

[Prev in Thread] Current Thread [Next in Thread]