bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10880: instead of characters, tr works on bytes


From: Marton Kadar
Subject: bug#10880: instead of characters, tr works on bytes
Date: Sat, 25 Feb 2012 17:07:27 -0500

> ----- Original Message -----
> From: Eric Blake
> Sent: 02/25/12 04:28 AM
> To: Marton Kadar
> Subject: Re: bug#10880: instead of characters, tr works on bytes
> 
> On 02/24/2012 07:29 AM, Marton Kadar wrote:
> > Don't know which is the official way to report a bug in 'tr'
> > so I will copy to this list too. CC me on replies as I am not
> > subscribing.
> 
> Sending mail to address@hidden _is_ what creates a bug on
> debbugs.gnu.org, so you have managed to create a duplicate. Paul Eggert
> has already merged 9365, 10880, and 9569, so now, replying to any one of
> those three is merely adding information to the same report.
> 
> >>
> >> Let us try to delete a character and see if it worked:
> >>
> >> $ echo árvíz | tr -d á | od -c
> >> 0000000 r v 255 z \n
> >> 0000005
> 
> Please keep in mind that upstream coreutils is not yet converted over to
> multibyte support. This is evidence of one of the places that multibyte
> support is required, and therefore, where you cannot expect things to
> work yet. No one has yet contributed a maintainable patch that does not
> penalize single-byte locales, at least not upstream. Several distros
> have their own UTF-8 patches that they apply, but then, this would be a
> bug you report to your distro and not upstream.
> 
> >> I'll check the source for tr myself although never coded in C.
> >> This should be a trivial fix.
> 
> Alas, dealing with multibyte characters without penalizing single-byte
> locales is NOT trivial, or it would have been done long ago.

"Penalizing" single-byte locales - did you mean in performance or in 
functionality?
I understand that a generalized algorithm would probably be slower than one 
tuned for the single byte case.

But I suspect that you are also referring to some functional implication, as 
avoiding a solely performance related penalty in text handling command line 
utilities can never be a justifiable reason for incorrect functionality.

Besides, the execution path (sigle byte specific or generalized multibyte 
capable) can be determined at program startup, so in the worst case there can 
be a tr and a tr-slow-but-multibyte version, former calling the latter when so 
directed by the locale settings.

A minimal "solution" could also be to put a warning on each affected program's 
man page: "Multibyte locales currently unsupported!". It is not always 
immediately apparent, what the problem is, as in many special cases it happens 
to work as expected, then in very similar other cases it doesn't.

> 
> -- 
> Eric Blake address@hidden +1-919-301-3266
> Libvirt virtualization library http://libvirt.org






reply via email to

[Prev in Thread] Current Thread [Next in Thread]