bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Uppercase string: broken tr?


From: Alex J. Dam
Subject: Re: Uppercase string: broken tr?
Date: Sun, 24 Aug 2003 18:42:14 -0300
User-agent: Opera7.11/Linux M2 build 406

On Sun, 24 Aug 2003 14:16:28 -0600, Bob Proulx <address@hidden> wrote:

Bruno Haible wrote:
Alex J. Dam wrote:
>   $ echo 'ABÇ' | tr [:upper:] [:lower:]
>   abÇ
>   (the last character is an uppercase cedilla)
>   I expecte its output to be:
>   abç

What does 'locale' say in this case?


$ locale
LANG=pt_BR.UTF-8
LC_CTYPE="pt_BR.UTF-8"
LC_NUMERIC="pt_BR.UTF-8"
LC_TIME="pt_BR.UTF-8"
LC_COLLATE="pt_BR.UTF-8"
LC_MONETARY="pt_BR.UTF-8"
LC_MESSAGES="pt_BR.UTF-8"
LC_PAPER="pt_BR.UTF-8"
LC_NAME="pt_BR.UTF-8"
LC_ADDRESS="pt_BR.UTF-8"
LC_TELEPHONE="pt_BR.UTF-8"
LC_MEASUREMENT="pt_BR.UTF-8"
LC_IDENTIFICATION="pt_BR.UTF-8"
LC_ALL=pt_BR.UTF-8
$ echo 'ABÇ' | tr [:upper:] [:lower:]
abÇ

But sed and tr and other utilities just use the locale data provided
on the system by glibc among other places.  These programs are table
driven by tables that are not part of these programs.  This is why
locale problems are global problems across the entire system of
programs such as grep, sed, awk, tr, etc. or anything else that uses
the locale data.

I tried it with different locales, all of them show the same results.
Looking at sed 4.0.7 source code, execeute.c:

 /* Now do the required modifications.  First \[lu]... */
 if (type & repl_uppercase_first)
   {
     *start = toupper(*start);
     start++;
     type &= ~repl_uppercase_first;
   }

 I'm not a Linux C programmer.
 start was declared as "char". sed uses toupper, not towupper. Does
this have something to do with its behaviour?

 I typed a simple program:

#include <string.h>
#include <locale.h>
#include <stdio.h>
int main(){
 setlocale(LC_ALL, "pt_BR.UTF-8");
 int x;
 for(x = 0; x <= 255; x++){
   int y = towupper(x);
   if(x != y)
     printf("%u -> %u *\n", x, y);
   else
     printf("%u -> %u\n", x, y);
 }
}

In its output, the line
 199 -> 231 *
 appears.

Ok, as I said above, I am NOT a Linux programmer and this could be nonsense.

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]