bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10880: instead of characters, tr works on bytes


From: Paul Eggert
Subject: bug#10880: instead of characters, tr works on bytes
Date: Sat, 25 Feb 2012 15:20:44 -0800
User-agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 02/25/2012 02:07 PM, Marton Kadar wrote:

> the execution path (sigle byte specific or generalized
> multibyte capable) can be determined at program startup, so in the
> worst case there can be a tr and a tr-slow-but-multibyte version,
> former calling the latter when so directed by the locale settings.

Something like that should work, yes.  Unfortunately so far nobody has
volunteered to do it.  The task would not be trivial.  We don't want
to maintain two copies of the code, one for single-byte and one for
multibyte, as that'd be a maintenance problem.  Instead, we'd like to
have just one copy of the code, which is easy to read and which
compiles into either unibyte or multibyte versions.

> avoiding a solely performance related penalty in text handling
> command line utilities can never be a justifiable reason for
> incorrect functionality.

As far as I know there is no requirement in POSIX that applications
must support multibyte locales, and there's no documentation claiming
that the utilities in question support multibyte location, so this is
not a bug; it's a feature request.

My opinion about this may be colored by an experience I had yesterday
with the latest version of GNU sed.  Single-byte it worked fine;
multibyte it was so slow that I gave up.  We don't want this to
happen with the core utilities.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]