[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#10880: instead of characters, tr works on bytes
From: |
Marton Kadar |
Subject: |
bug#10880: instead of characters, tr works on bytes |
Date: |
Fri, 24 Feb 2012 09:29:12 -0500 |
Don't know which is the official way to report a bug in 'tr'
so I will copy to this list too. CC me on replies as I am not
subscribing.
> ----- Original Message -----
> From: Marton Kadar
> Sent: 02/24/12 03:18 PM
> To: address@hidden
> Subject: Example
>
> Environment for Hungary where á and í are proper lowercase letters
> but for example Spanish has these letters too:
>
> $ set | grep ^L
> LANG=hu_HU.UTF-8
> LC_ALL=hu_HU.UTF-8
> LINES=73
> LOGNAME=kadar1marto518
>
> Now let's see the bytestream for the following string
> (which means flood in Hungarian):
>
> $ echo árvíz | od -c
> 0000000 303 241 r v 303 255 z \n
> 0000010
>
> Let us try to delete a character and see if it worked:
>
> $ echo árvíz | tr -d á | od -c
> 0000000 r v 255 z \n
> 0000005
>
> Correct expected behavior would rather be:
>
> $ echo árvíz | tr -d á | od -c
> 0000000 r v 303 255 z \n
> 0000006
>
> I'll check the source for tr myself although never coded in C.
> This should be a trivial fix. The problem is especially annoying
> as we currently have no real simple and good general purpose case
> conversion tool. (correct me if I'm wrong, but tr should be this
> tool).
>
> Marton Kadar
- bug#10880: instead of characters, tr works on bytes,
Marton Kadar <=