bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tr '[:upper:]' '[:lower:]' -- misaligned construct


From: Jim Meyering
Subject: Re: tr '[:upper:]' '[:lower:]' -- misaligned construct
Date: Mon, 07 Jan 2008 20:15:44 +0100

Micah Cowan <address@hidden> wrote:
> Jim Meyering wrote:
>> Here's a tentative patch that also avoids repeated
>> (and wasteful) initialization of the xlate array.
>
> I note that POSIX requires that, in the case that the arguments are
> exactly '[:lower:]' and '[:upper:]' (or the reverse of the same), tr is
> actually supposed to ignore the 'lower' and 'upper' character classes,
> and instead initialize the mapping from the locale's "tolower"/"toupper"
> definition. This would have avoided the length mismatch in the first
> place, and while that issue appears to be addressed, tr still does not
> conform to POSIX, as, if tr were to encounter a locale definition file
> with an LC_CTYPE category definition such as the following:

Thanks for the feedback.
However, you seem to be misinterpreting something.
GNU tr has always initialized its internal translation
array using the tolower and toupper functions.
The problem I mentioned above is that it was performing
the correct initialization repeatedly.

>   upper A;...;Z
>   lower a;...;z
>   tolower (A,Z)
>   ...
> This would require
>  $ echo AAAA | tr '[:upper:]' '[:lower:]'
> to output "ZZZZ" (though it isn't even lowercased), rather than 'aaaa'.

GNU tr should work properly, even with such an odd locale -- as long
as it's a uni-byte one.  See below.

> While the example above is, of course, contrived, there may well be
> locales where the tolower/toupper mappings differ from the longest
> possible mapping between the 'upper' and 'lower' classes.
>
> In fact, as it currently stands, I expect tr mishandles a case such as:
>   $ echo σιγμας | tr '[:lower:]' '[:upper:]'
> (Note the two variants of "sigma" in there, which both have a single
> corresponding capital letter; I'm afraid I can't actually verify this is
> broken, as my work desktop is not set up to compile coreutils, and I
> lack the time to correct this for now; the stock (old) tr on the system,
> running Fedora Core 6, silently passes it through without conversion.)

Your example uses multi-byte characters, and that is a separated issue.
Upstream GNU tr does not yet work with multi-byte characters.

If you can make tr misbehave, it'd be great to hear about it soon,
since I'm pretty close to being able to release a stable coreutils-6.10.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]